Supplementary Exercise 7.64 of IPS7e
------------------------------------
(same dataset as Supplementary Exercise 6.95)


Data: 12 readings of home radon detectors when exposed to 105 picocuries per liter
of radon; the purpose being to examine the accuracy of the detectors.

Model: a simple random sample (i.i.d. sample) from a distribution with 
unknown mean, median and standard deviation. Note that for this analysis
we do not assume a known value of the standard deviation.

(a)
Minitab commands and output:

MTB > WOpen "H:\VHM\VHM801\Datasets\Minitab\Chapter 7\ex07_064.mtw".
Retrieving worksheet from file: ‘H:\VHM\VHM801\Datasets\Minitab\Chapter
7\ex07_064.mtw’
Worksheet was saved on 02/10/2014

MTB > Stem-and-Leaf 'radon';
SUBC>   Trim.
Stem-and-Leaf Display: radon 

Stem-and-leaf of radon  N  = 12
Leaf Unit = 1.0

 1   9   1
 5   9   5679
(3)  10  134
 4   10  5
 3   11  1
 2   11  9
 1   12  2

MTB > Describe 'radon';
SUBC>   Mean;
SUBC>   SEMean;
SUBC>   StDeviation;
SUBC>   QOne;
SUBC>   Median;
SUBC>   QThree;
SUBC>   Minimum;
SUBC>   Maximum;
SUBC>   Skewness;
SUBC>   Kurtosis;
SUBC>   N;
SUBC>   NMissing.
Descriptive Statistics: radon 

Variable   N  N*    Mean  SE Mean  StDev  Minimum     Q1  Median      Q3
radon     12   0  104.13     2.71   9.40    91.90  96.90  102.75  109.90

Variable  Maximum  Skewness  Kurtosis
radon      122.30      0.85     -0.01

MTB > PPlot 'radon';
SUBC>   Normal;
SUBC>   Symbol;
SUBC>   FitD;
SUBC>   Grid 2;
SUBC>   Grid 1;
SUBC>   MGrid 1.
Probability Plot of radon 
The P-value of the Anderson-Darling test of normality is 0.311.

Comments:
---------
Estimation:
sample mean = 104.1, sample median = 102.8, sample standard deviation = 9.40

The stemplot shows (so would a dotplot) that the distribution is somewhat right-skewed
(skewness=0.85), but the normality test is far from significant. There are no obvious
outliers.

The statement in the problem text that the skewness is "not strong enough to forbid use 
of the t procedures" is maybe a bit surprising when comparing to textbook guidelines 
(PSLS 3e p. 426; IPS 7e p. 417-18). Because the sample size is small (<15), we should 
consider whether the data are close to normal; that seems a bit questionable with the 
fairly large skewness. It is worth recalling that a nonsignificant normality test is 
no proof that the data are truly normally distributed, it just tells us that there is 
not enough evidence to say it's not normally distributed. On the other hand, the 
guidelines state that we should not use t procedures if the data are clearly non-normal 
or if outliers are present, and none of these two cases apply here.

Note also that the assumption of normality is substantially more important when using 
t procedures than z procedures (when the standard deviation is assumed known). This is 
because the estimation of the standard deviation from the data can be substantially 
affected by non-normality, so that the t-distribution will no longer apply as a
reference distribution.

We will carry out an analysis using non-parametric methods in a later exercise.

(b)
Minitab commands and output:

MTB > OneT 'radon';
SUBC>   Test 105;
SUBC>   Confidence 95.0;
SUBC>   Alternative 0.
One-Sample T: radon 

Test of mu = 105 vs not = 105

Variable   N    Mean  StDev  SE Mean       95% CI          T      P
radon     12  104.13   9.40     2.71  (98.16, 110.10)  -0.32  0.755


Comments:
---------
The t-test is clearly non-significant at t=-0.32 and a P-value of 0.76.
There is absolutely no evidence that the reading of the detectors differ
systematically from the true value of 105 picocuries per liter. That is
good but does not by itself mean that the results obtained are satisfactory.
One would certainly also want to consider the standard deviation in the
readings which seems quite large (and the actual values scatter
considerably around the true value of 105). So we might be in a "true
mean, large scatter" situation (compare the figure on slide 5L-2). Even 
the 95% confidence interval is not very narrow around 105, and the
measurements themselves are much more variable than the mean.

It is also interesting to compare the results with those of Exercise 6.95
where the standard deviation was assumed to be known (and equal to 9).
Generally speaking, the analysis with unknown standard deviation is weaker
and should lead to larger confidence intervals and higher P-values (because
the t distribution is wider than the standard normal). The 95% CI is indeed about
2 units wider, and this is also due to the sample standard deviation being 
a bit above the assumed value of 9. The P-values are quite similar and far away
from the critical values, and this is because the difference between the normal
and t(11) distributions starts to show in the tails whereas the observed values
(around -0.3) are in the centre of the distributions.