Extra Exercise 10 ----------------- The numbers throughout this solution correspond to one run with the applets. Other runs may give different results, this is the nature of simulation. Law of Large Numbers: Dice Rolling Example applet: -------------------------------------------------- (a) Assuming all outcomes to be equally likely (as is done in the applet), we would expect the proportion for each outcome to be somewhere close to 1/6=0.167. With the default seed for the random numbers (123), the proportion for 2 spots is quite a bit higher (0.22) than expected, and the proportions for 1 and 4 spots (both 0.14) are a bit lower. From the graph it appears that the first four averages are: 4, 4.5, 5 and 4. That would correspond to getting the outcomes 4,5,6,1 in the first four rolls. (b) The graph shows that with more rolls (up to 50 for a start), the average moves closer to 3.5. That is what we would expect from the Law of large numbers (LLN), because the mean in the distribution is 3.5. If X denotes the outcome of a single die, we have EX = 1*(1/6) + 2*(1/6) + 3*(1/6) + 4*(1/6) + 5*(1/6) + 6*(1/6) = 21/6 = 3.5 After 10,000 rolls, the sample proportions for the 6 outcomes are all very close to 1/6, and the average is close to 3.5. No value is given so it's difficult to say how close it is exactly. The graph shows only major fluctuations around 3.5 for the first few hundred rolls, thereafter the average gets ever closer to 3.5. That is the expected behaviour from the LLN. Sampling distribution applet: ----------------------------- (i) The top graph shows a black rectangle representing the density curve of the Uniform distribution on the interval (0,50). The density curve is flat, and the mean and median of the distribution both equal 25. For a sample size of 2 and "1 time", the middle graph shows the two samples drawn, at their values. The mean and the median are the same. The bottom graph shows the single average obtained, at the same value as the listed mean in the middle display. For a single value, the mean and median are also the same. After resetting and applying "5 times", the top display is unchanged, the middle display shows the two values obtained in the last run, and the bottom display shows the 5 averages from the 5 trials in a layout similar to a histogram. (ii) When repeating the experiment a large number of times (well beyond 1000), the histogram at the bottom approaches the triangular distribution shown on slide 5L-5 for n=2. (iii) With a sample size of 12, say, and many repetitions the histogram seems to be bell-shaped, corresponding to a normal distribution. It would be helpful to overlay the normal distribution curve, but the applet does not seem to allow that. (iv) For a bell-shaped (i.e., normal) distribution, the histogram at the bottom should be normal regardless of the sample size. An average of normally distributed (i.i.d) variables is always exactly normal. For a right-skewed distribution, the average of two observations is still very right-skewed, and it takes a pretty large sample size to eliminate the skewness. For n=20 the histogram looks quite good, but mean remains a bit larger than the median, corresponding to a slight right-skewness. With a binary distribution, the number of bins in the histogram will be the sample size +1 (for low sample sizes). This limits how closely the distribution can be approximated by a normal distribution. Visually, the approximation apeears quite good for - for n=10 for the binary distribution with p=0.5 - for n=15 for the binary distribution with p=0.7 - for n=30 for the binary distribution with p=0.9 The findings above show that a skewness in the distribution of each component of the sum (average) strongly affects with quality of the approximation. For the binary distributions with p=0.9, it takes a large number of observations to smooth out the contributions from the large peak at 1.