Supplementary Exercises 7.102, 7.103 and 7.104 of IPS7e
-------------------------------------------------------

Data: 2 samples of changes (improvements, differences after-before) in
spatial-temporal reading test scores for 34 children attending six
months of piano lessons and 44 children in a control group. Note that we
already analyzed the piano group in Exercises 7.58 and 7.59.

Model: the 2 samples are independent and each a simple random sample 
(i.i.d. sample) from a distribution with unknown mean and standard
devation (mu1 and sigma1 for the piano lesson group, mu2 and sigma2 for
the control group).

(a) 
Minitab commands:

MTB > WOpen "H:\VHM\VHM801\Datasets\Minitab\Chapter 7\ex07_102.mtw".
Retrieving worksheet from file: ‘H:\VHM\VHM801\Datasets\Minitab\Chapter
7\ex07_102.mtw’
Worksheet was saved on 02/11/2014

MTB > name c4 'change'
MTB > Stem-and-Leaf 'change';
SUBC>   By 'g'.
Stem-and-Leaf Display: change 

Stem-and-leaf of change  g = 0    N  = 34
Leaf Unit = 0.10

 1   -3  0
 3   -2  00
 4   -1  0
 5   -0  0
 6   0   0
 7   1   0
 10  2   000
 15  3   00000
(7)  4   0000000
 12  5   00
 10  6   000
 7   7   00000
 2   8
 2   9   00

Stem-and-leaf of change  g = 1    N  = 44
Leaf Unit = 0.10

 1   -6  0
 1   -5
 2   -4  0
 5   -3  000
 7   -2  00
 14  -1  0000000
 19  -0  00000
(6)  0   000000
 19  1   000000
 13  2   0000000
 6   3   0
 5   4   000
 2   5   0
 1   6
 1   7   0

MTB > Describe 'change';
SUBC>   By 'group';
SUBC>   Mean;
SUBC>   SEMean;
SUBC>   StDeviation;
SUBC>   QOne;
SUBC>   Median;
SUBC>   QThree;
SUBC>   Minimum;
SUBC>   Maximum;
SUBC>   Skewness;
SUBC>   Kurtosis;
SUBC>   N.
Descriptive Statistics: change 

Variable  group     N   Mean  SE Mean  StDev  Minimum      Q1  Median     Q3  Maximum
change    control  44  0.386    0.365  2.423   -6.000  -1.000   0.000  2.000    7.000
          piano    34  3.618    0.524  3.055   -3.000   2.000   4.000  6.000    9.000

Variable  group    Skewness  Kurtosis
change    control      0.12      1.04
          piano       -0.36     -0.28

MTB > Dotplot ( 'change' ) * 'group'.
Dotplot of change vs group 

MTB > PPlot 'change';
SUBC>   Normal;
SUBC>   Symbol;
SUBC>   FitD;
SUBC>   Grid 2;
SUBC>   Grid 1;
SUBC>   MGrid 1;
SUBC>   Panel 'group'.
Probability Plot of change 
The P-value of the Anderson-Darling test of normality is 0.066 (group=control)
The P-value of the Anderson-Darling test of normality is 0.227 (group=piano)

MTB > GSummary  'change';
SUBC>   By 'group'.
Results for group = control 
Summary Report for change (group = control) 
Results for group = piano 
Summary Report for change (group = piano) 


Comments for 7.102 (a) and (b)
------------------------------
The distributions are displayed by stemplots and dotplots. Note that the stemplot
in Minitab artificially divides the observations with a value of zero into 2 groups
(this is clearly not desirable). The table of descriptive statistics contains the 
mean, standard deviation and standard error of the mean, as requested.

Both distributions look reasonably symmetric and bell-shaped. The normal
plots and normality tests show no reason to reject a normal distribution
for the piano group. The distribution for the control group is somewhat
too peaked for a normal distribution (kurtosis=1.04), and the P-value
for the normality is as low as 0.066. Among the different normality tests,
the A-D test is the only showing something near significance for the control
group. It seems reasonable to maintain the normal distribution assumption
even in view of a possible mild violation.


Comments for 7.102 (c) and 7.103
------------------------------------
The interest is in comparing the changes in score between the piano and
control groups. Even if the primary interest is in an improvement of the
scores piano group over the control group, there seems to be no apriori
reason to focus only on an improvement in the score. Therefore, our
hypotheses are
  H0: mu1=mu2
  Ha: mu1<>mu2
Since both distributions look reasonably normal, we may assume normal
distributions and obtain exact inference (confidence interval and test).

MTB > TwoT 'change' 'group';
SUBC>   Confidence 95.0;
SUBC>   Test 0.0;
SUBC>   Alternative .
Two-Sample T-Test and CI: change, group 

Two-sample T for change

group     N  Mean  StDev  SE Mean
control  44  0.39   2.42     0.37
piano    34  3.62   3.06     0.52

Difference = mu (control) - mu (piano)
Estimate for difference:  -3.231
95% CI for difference:  (-4.508, -1.954)
T-Test of difference = 0 (vs not =): T-Value = -5.06  P-Value = 0.000  DF = 61

Comments:
---------
The t-test (without assuming same standard deviations in the two groups)
gives a value of 5.06 with approximate DF=61 which is highly significant. 
Similar results are obtained with other variants of the test: conservative 
DF, or assuming same variances (not too far off). There is clear
evidence of a difference in scores in the two groups: the piano lesson
group scored higher than the control group. We also note that the evidence 
against H0 is so strong that any deviations from the normal distribution are 
without practical importance.

The 95% confidence interval gives the range of the improvement as 
about 2 to 4.5 units (of test scores). 

Minitab technical note: The difference between the two group means is
computed as control minus piano, and therefore shows as negative. If we were
interested in the difference piano minus control, we can use all the
above results and simply switch the signs. Alternatively, we can make
Minitab do the difference in the preferred way by changing the labels
for the groups so that the piano group becomes the first one
(alphabetically, it's the second one). The variable g in the Minitab
worksheet would do this (g=0 for piano, g=1 for control). We could also
unstack the two columns and then enter the columns in the desired order.


Comments for Exercise 7.104
---------------------------
The advantage of including a control group is that any improvement in
the scores by aging (or perhaps other types of confounding) is taken
into account. The control group data show that such an improvement is
at most minor.

The primary advantage of carrying out a significance test over using
a confidence interval is that it gives a P-value, which is a more 
informative measure of the evidence against the null hypothesis 
than mere significance at 5% level. On the other hand, the confidence
interval is useful, really indispensable, to quantify the likely 
magnitude of the effect; recall that statistical significance is not 
the same as biological significance.