Supplementary Exercise 7.64 of IPS7e ------------------------------------ (same dataset as Supplementary Exercise 6.95) Data: 12 readings of home radon detectors when exposed to 105 picocuries per liter of radon; the purpose being to examine the accuracy of the detectors. Model: a simple random sample (i.i.d. sample) from a distribution with unknown mean, median and standard deviation. Note that for this analysis we do not assume a known value of the standard deviation. (a) Minitab commands and output: MTB > WOpen "H:\VHM\VHM801\Datasets\Minitab\Chapter 7\ex07_064.mtw". Retrieving worksheet from file: ‘H:\VHM\VHM801\Datasets\Minitab\Chapter 7\ex07_064.mtw’ Worksheet was saved on 02/10/2014 MTB > Stem-and-Leaf 'radon'; SUBC> Trim. Stem-and-Leaf Display: radon Stem-and-leaf of radon N = 12 Leaf Unit = 1.0 1 9 1 5 9 5679 (3) 10 134 4 10 5 3 11 1 2 11 9 1 12 2 MTB > Describe 'radon'; SUBC> Mean; SUBC> SEMean; SUBC> StDeviation; SUBC> QOne; SUBC> Median; SUBC> QThree; SUBC> Minimum; SUBC> Maximum; SUBC> Skewness; SUBC> Kurtosis; SUBC> N; SUBC> NMissing. Descriptive Statistics: radon Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 radon 12 0 104.13 2.71 9.40 91.90 96.90 102.75 109.90 Variable Maximum Skewness Kurtosis radon 122.30 0.85 -0.01 MTB > PPlot 'radon'; SUBC> Normal; SUBC> Symbol; SUBC> FitD; SUBC> Grid 2; SUBC> Grid 1; SUBC> MGrid 1. Probability Plot of radon The P-value of the Anderson-Darling test of normality is 0.311. Comments: --------- Estimation: sample mean = 104.1, sample median = 102.8, sample standard deviation = 9.40 The stemplot shows (so would a dotplot) that the distribution is somewhat right-skewed (skewness=0.85), but the normality test is far from significant. There are no obvious outliers. The statement in the problem text that the skewness is "not strong enough to forbid use of the t procedures" is maybe a bit surprising when comparing to textbook guidelines (PSLS 3e p. 426; IPS 7e p. 417-18). Because the sample size is small (<15), we should consider whether the data are close to normal; that seems a bit questionable with the fairly large skewness. It is worth recalling that a nonsignificant normality test is no proof that the data are truly normally distributed, it just tells us that there is not enough evidence to say it's not normally distributed. On the other hand, the guidelines state that we should not use t procedures if the data are clearly non-normal or if outliers are present, and none of these two cases apply here. Note also that the assumption of normality is substantially more important when using t procedures than z procedures (when the standard deviation is assumed known). This is because the estimation of the standard deviation from the data can be substantially affected by non-normality, so that the t-distribution will no longer apply as a reference distribution. We will carry out an analysis using non-parametric methods in a later exercise. (b) Minitab commands and output: MTB > OneT 'radon'; SUBC> Test 105; SUBC> Confidence 95.0; SUBC> Alternative 0. One-Sample T: radon Test of mu = 105 vs not = 105 Variable N Mean StDev SE Mean 95% CI T P radon 12 104.13 9.40 2.71 (98.16, 110.10) -0.32 0.755 Comments: --------- The t-test is clearly non-significant at t=-0.32 and a P-value of 0.76. There is absolutely no evidence that the reading of the detectors differ systematically from the true value of 105 picocuries per liter. That is good but does not by itself mean that the results obtained are satisfactory. One would certainly also want to consider the standard deviation in the readings which seems quite large (and the actual values scatter considerably around the true value of 105). So we might be in a "true mean, large scatter" situation (compare the figure on slide 5L-2). Even the 95% confidence interval is not very narrow around 105, and the measurements themselves are much more variable than the mean. It is also interesting to compare the results with those of Exercise 6.95 where the standard deviation was assumed to be known (and equal to 9). Generally speaking, the analysis with unknown standard deviation is weaker and should lead to larger confidence intervals and higher P-values (because the t distribution is wider than the standard normal). The 95% CI is indeed about 2 units wider, and this is also due to the sample standard deviation being a bit above the assumed value of 9. The P-values are quite similar and far away from the critical values, and this is because the difference between the normal and t(11) distributions starts to show in the tails whereas the observed values (around -0.3) are in the centre of the distributions.