. * do-file for lecture 11 of VHM 802, Winter 2021
. version 16 /* works also with versions 14-15 */

. set more off

. cd "r:\"
r:\

. 
. * Manly Example 4.1
. import delimited r:\sparrow.csv, clear
(6 vars, 49 obs)

. encode survivorship, gen(surv)

. foreach var of varlist total_length-l_keel_sternum {
  2.   ttest `var', by(surv) unequal /* unequal variances are preferable, though not the version used in Manly */
  3.   robvar `var', by(surv) /* Levene's test W50 for equal variances */
  4.   }

Two-sample t test with unequal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      NS |      28    158.4286    .7336013    3.881853    156.9233    159.9338
       S |      21     157.381    .7253117    3.323796     155.868    158.8939
---------+--------------------------------------------------------------------
combined |      49    157.9796    .5220396    3.654277      156.93    159.0292
---------+--------------------------------------------------------------------
    diff |            1.047619    1.031624               -1.028801     3.12404
------------------------------------------------------------------------------
    diff = mean(NS) - mean(S)                                     t =   1.0155
Ho: diff = 0                     Satterthwaite's degrees of freedom =  46.1076

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.8424         Pr(|T| > |t|) = 0.3152          Pr(T > t) = 0.1576

Survivorshi |       Summary of Total_length
          p |        Mean   Std. Dev.       Freq.
------------+------------------------------------
         NS |   158.42857    3.881853          28
          S |   157.38095   3.3237959          21
------------+------------------------------------
      Total |   157.97959   3.6542772          49

W0  =  1.8734718   df(1, 47)     Pr > F = 0.17758478

W50 =  1.4470443   df(1, 47)     Pr > F = 0.23502763

W10 =  1.9971389   df(1, 47)     Pr > F = 0.1641853

Two-sample t test with unequal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      NS |      28    241.5714    1.078197    5.705284    239.3592    243.7837
       S |      21         241    .9128709      4.1833    239.0958    242.9042
---------+--------------------------------------------------------------------
combined |      49    241.3265    .7239746    5.067822    239.8709    242.7822
---------+--------------------------------------------------------------------
    diff |            .5714286    1.412743               -2.270663    3.413521
------------------------------------------------------------------------------
    diff = mean(NS) - mean(S)                                     t =   0.4045
Ho: diff = 0                     Satterthwaite's degrees of freedom =  46.9877

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.6562         Pr(|T| > |t|) = 0.6877          Pr(T > t) = 0.3438

Survivorshi |       Summary of Alar_extent
          p |        Mean   Std. Dev.       Freq.
------------+------------------------------------
         NS |   241.57143   5.7052839          28
          S |         241   4.1833001          21
------------+------------------------------------
      Total |   241.32653   5.0678223          49

W0  =  1.4064059   df(1, 47)     Pr > F = 0.24161486

W50 =  1.4029851   df(1, 47)     Pr > F = 0.24217978

W10 =  1.4446477   df(1, 47)     Pr > F = 0.23540988

Two-sample t test with unequal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      NS |      28    31.47857    .1612908    .8534709    31.14763    31.80951
       S |      21    31.43333    .1590647    .7289262    31.10153    31.76514
---------+--------------------------------------------------------------------
combined |      49    31.45918    .1135362    .7947532     31.2309    31.68746
---------+--------------------------------------------------------------------
    diff |            .0452381    .2265311               -.4107081    .5011843
------------------------------------------------------------------------------
    diff = mean(NS) - mean(S)                                     t =   0.1997
Ho: diff = 0                     Satterthwaite's degrees of freedom =  46.1395

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.5787         Pr(|T| > |t|) = 0.8426          Pr(T > t) = 0.4213

Survivorshi |       Summary of L_beak_head
          p |        Mean   Std. Dev.       Freq.
------------+------------------------------------
         NS |   31.478571   .85347093          28
          S |   31.433333   .72892623          21
------------+------------------------------------
      Total |   31.459184   .79475321          49

W0  =  0.67188596   df(1, 47)     Pr > F = 0.41653233

W50 =  0.66377265   df(1, 47)     Pr > F = 0.4193404

W10 =  0.77479963   df(1, 47)     Pr > F = 0.38321515

Two-sample t test with unequal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      NS |      28    18.44643    .1245608    .6591139    18.19085    18.70201
       S |      21        18.5    .0915475    .4195234    18.30904    18.69096
---------+--------------------------------------------------------------------
combined |      49    18.46939    .0806122    .5642856    18.30731    18.63147
---------+--------------------------------------------------------------------
    diff |           -.0535715    .1545844               -.3647433    .2576003
------------------------------------------------------------------------------
    diff = mean(NS) - mean(S)                                     t =  -0.3466
Ho: diff = 0                     Satterthwaite's degrees of freedom =   45.948

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.3653         Pr(|T| > |t|) = 0.7305          Pr(T > t) = 0.6347

Survivorshi |        Summary of L_humerous
          p |        Mean   Std. Dev.       Freq.
------------+------------------------------------
         NS |   18.446429   .65911387          28
          S |        18.5   .41952343          21
------------+------------------------------------
      Total |   18.469388   .56428562          49

W0  =  3.9250944   df(1, 47)     Pr > F = 0.0534381

W50 =  3.6558491   df(1, 47)     Pr > F = 0.06197919

W10 =  3.8783962   df(1, 47)     Pr > F = 0.05482183

Two-sample t test with unequal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      NS |      28    20.83929    .2172057    1.149344    20.39362    21.28495
       S |      21    20.80952    .1654582    .7582247    20.46438    21.15466
---------+--------------------------------------------------------------------
combined |      49    20.82653    .1416249    .9913744    20.54177    21.11129
---------+--------------------------------------------------------------------
    diff |            .0297618    .2730471                 -.51974    .5792636
------------------------------------------------------------------------------
    diff = mean(NS) - mean(S)                                     t =   0.1090
Ho: diff = 0                     Satterthwaite's degrees of freedom =  46.3548

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.5432         Pr(|T| > |t|) = 0.9137          Pr(T > t) = 0.4568

Survivorshi |      Summary of L_keel_sternum
          p |        Mean   Std. Dev.       Freq.
------------+------------------------------------
         NS |   20.839286   1.1493443          28
          S |   20.809524    .7582247          21
------------+------------------------------------
      Total |   20.826531   .99137445          49

W0  =  2.3376228   df(1, 47)     Pr > F = 0.13298511

W50 =  1.9840519   df(1, 47)     Pr > F = 0.16554548

W10 =  2.3216450   df(1, 47)     Pr > F = 0.13428598

. hotelling total_length-l_keel_sternum, by(surv)

---------------------------------------------------------------------------------------------------------------------------------------
-> surv = NS

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
total_length |         28    158.4286    3.881853        152        165
 alar_extent |         28    241.5714    5.705284        230        252
 l_beak_head |         28    31.47857    .8534709       30.1       33.4
  l_humerous |         28    18.44643    .6591139       17.2       19.8
l_keel_ste~m |         28    20.83929    1.149344       18.6       23.1

---------------------------------------------------------------------------------------------------------------------------------------
-> surv = S

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
total_length |         21     157.381    3.323796        153        164
 alar_extent |         21         241      4.1833        235        248
 l_beak_head |         21    31.43333    .7289262       30.3       32.8
  l_humerous |         21        18.5    .4195234       17.7       19.3
l_keel_ste~m |         21    20.80952    .7582247       19.6         22


2-group Hotelling's T-squared = 2.823701
F test statistic: ((49-5-1)/(49-2)(5)) x 2.823701 = .5166772

H0: Vectors of means are equal for the two groups
              F(5,43) =    0.5167
       Prob > F(5,43) =    0.7622

. manova total_length-l_keel_sternum = surv /* same result as hotelling's T2 */

                       Number of obs =         49

                       W = Wilks' lambda      L = Lawley-Hotelling trace
                       P = Pillai's trace     R = Roy's largest root

              Source | Statistic        df    F(df1,     df2) =   F   Prob>F
          -----------+-------------------------------------------------------
                surv |W   0.9433         1      5.0     43.0     0.52 0.7622 e
                     |P   0.0567                5.0     43.0     0.52 0.7622 e
                     |L   0.0601                5.0     43.0     0.52 0.7622 e
                     |R   0.0601                5.0     43.0     0.52 0.7622 e
                     |-------------------------------------------------------
            Residual |                  47
          -----------+-------------------------------------------------------
               Total |                  48
          -------------------------------------------------------------------
                       e = exact, a = approximate, u = upper bound on F

. mvreg

Equation             Obs   Parms        RMSE    "R-sq"          F        P
--------------------------------------------------------------------------
total_length          49       2    3.654812    0.0205    .985957   0.3258
alar_extent           49       2    5.113306    0.0032   .1498655   0.7004
l_beak_head           49       2    .8028382    0.0008   .0381008   0.8461
l_humerous            49       2    .5696142    0.0023   .1061421   0.7460
l_keel_ste~m          49       2    1.001753    0.0002    .010592   0.9185

--------------------------------------------------------------------------------
               |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
total_length   |
          surv |
            S  |  -1.047619   1.055053    -0.99   0.326    -3.170113    1.074874
         _cons |   158.4286   .6906945   229.38   0.000     157.0391    159.8181
---------------+----------------------------------------------------------------
alar_extent    |
          surv |
            S  |  -.5714286   1.476084    -0.39   0.700    -3.540927     2.39807
         _cons |   241.5714    .966324   249.99   0.000     239.6274    243.5154
---------------+----------------------------------------------------------------
l_beak_head    |
          surv |
            S  |  -.0452381   .2317594    -0.20   0.846    -.5114779    .4210017
         _cons |   31.47857   .1517222   207.48   0.000     31.17335     31.7838
---------------+----------------------------------------------------------------
l_humerous     |
          surv |
            S  |   .0535715   .1644335     0.33   0.746    -.2772259     .384369
         _cons |   18.44643    .107647   171.36   0.000     18.22987    18.66299
---------------+----------------------------------------------------------------
l_keel_sternum |
          surv |
            S  |  -.0297618   .2891811    -0.10   0.918    -.6115191    .5519954
         _cons |   20.83929   .1893134   110.08   0.000     20.45844    21.22014
--------------------------------------------------------------------------------

. matrix errorsscp=e(E)

. matrix corr=corr(errorsscp)

. matrix list corr

symmetric corr[5,5]
              total_length   alar_extent   l_beak_head    l_humerous  l_keel_ste~m
total_length             1
 alar_extent     .73563757             1
 l_beak_head     .66486471     .67348012             1
  l_humerous     .65963604     .77328517     .76571386             1
l_keel_ste~m     .60933337     .52906854     .52611511     .60811557             1

. mvtest covariances total_length-l_keel_sternum, by(surv) /* Box's M-test */

Test of equality of covariance matrices across 2 samples

      Modified LR chi2 =  11.78584
      Box F(15,7429.4) =      0.69      Prob > F =  0.7948
          Box chi2(15) =     10.41   Prob > chi2 =  0.7933

. 
. * univariate two-sample permutation test
. ttest total_length, by(surv) unequal /* to see the actual analysis */

Two-sample t test with unequal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      NS |      28    158.4286    .7336013    3.881853    156.9233    159.9338
       S |      21     157.381    .7253117    3.323796     155.868    158.8939
---------+--------------------------------------------------------------------
combined |      49    157.9796    .5220396    3.654277      156.93    159.0292
---------+--------------------------------------------------------------------
    diff |            1.047619    1.031624               -1.028801     3.12404
------------------------------------------------------------------------------
    diff = mean(NS) - mean(S)                                     t =   1.0155
Ho: diff = 0                     Satterthwaite's degrees of freedom =  46.1076

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.8424         Pr(|T| > |t|) = 0.3152          Pr(T > t) = 0.1576

. permute surv meandiff=(r(mu_1)-r(mu_2)), saving(permdist, replace) reps(1000) seed(210327) nodrop nowarn: ttest total_length, by(surv
> )
(running ttest on estimation sample)

Permutation replications (1000)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
..................................................    50
..................................................   100
..................................................   150
..................................................   200
..................................................   250
..................................................   300
..................................................   350
..................................................   400
..................................................   450
..................................................   500
..................................................   550
..................................................   600
..................................................   650
..................................................   700
..................................................   750
..................................................   800
..................................................   850
..................................................   900
..................................................   950
..................................................  1000

Monte Carlo permutation results           Number of observations  =          49
                                          Number of permutations  =       1,000

      command:  ttest total_length, by(surv)
     meandiff:  r(mu_1)-r(mu_2)
  permute var:  surv

-------------------------------------------------------------------------------
             |                                               Monte Carlo error
             |                                              -------------------
           T |    T(obs)       Test       c       n      p  SE(p)   [95% CI(p)]
-------------+-----------------------------------------------------------------
    meandiff |  1.047619      lower     846    1000  .8460  .0114  .8221  .8678
             |                upper     172    1000  .1720  .0119  .1491  .1968
             |            two-sided                  .3440  .0150  .3146  .3734
-------------------------------------------------------------------------------
Note: For lower one-sided test, c = #{T <= T(obs)} and p = p_lower = c/n.
Note: For upper one-sided test, c = #{T >= T(obs)} and p = p_upper = c/n.
Note: For two-sided test, p = 2*min(p_lower, p_upper); SE and CI approximate.

. preserve

. use permdist, clear
(permute surv : ttest)

. histogram meandiff
(bin=29, start=-3.5357144, width=.23275862)

. sum meandiff, d

                       r(mu_1)-r(mu_2)
-------------------------------------------------------------
      Percentiles      Smallest
 1%    -2.577381      -3.535714
 5%    -1.785714      -3.285714
10%    -1.410714      -3.285714       Obs               1,000
25%    -.7440476      -2.952381       Sum of Wgt.       1,000

50%      .047619                      Mean          -.0129643
                        Largest       Std. Dev.      1.078863
75%     .7142857       2.880952
90%     1.380952       2.964286       Variance       1.163945
95%     1.714286       3.214286       Skewness      -.0416425
99%     2.422619       3.214286       Kurtosis        2.92957

. restore

. 
. * Manly Example 5.2
. import delimited r:\skull.csv, clear
(5 vars, 150 obs)

. encode period, gen(Period)

. discrim lda maximumbreadth-nasalheight, group(Period) notable

. estat grdistances /* Mahalanobis distance matrix, with different group ordering than in Manly's Table 5.4 */

Mahalanobis squared distances between groups

            | Period                
    --------+-----------------------
     group1 | 12th and 13th Dynasty 
     group2 | Early predynastic     
     group3 | Late predynastic      
     group4 | Ptolemaic period      
     group5 | Roman period          

                 |    group1     group2     group3     group4     group5
    -------------+------------------------------------------------------
          group1 |         0                                            
          group2 |  .9030738          0                                 
          group3 |  .7289377   .0910342          0                      
          group4 |   .443113   1.881126   1.594014          0           
          group5 |  .9108716   2.696817   2.175689   .2192851          0

. 
. * univariate one-way ANOVA permutation test
. anova basibregmaticheight Period /* to see the actual analysis */

                         Number of obs =        150    R-squared     =  0.0632
                         Root MSE      =    4.84609    Adj R-squared =  0.0374

                  Source | Partial SS         df         MS        F    Prob>F
              -----------+----------------------------------------------------
                   Model |  229.90667          4   57.476667      2.45  0.0490
                         |
                  Period |  229.90667          4   57.476667      2.45  0.0490
                         |
                Residual |  3405.2667        145   23.484598  
              -----------+----------------------------------------------------
                   Total |  3635.1733        149   24.397136  

. permute Period Fperiod=(e(F_1)), saving(permdist, replace) reps(1000) seed(210327) right nodrop nowarn: anova basibregmaticheight Per
> iod
(running anova on estimation sample)

Permutation replications (1000)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
..................................................    50
..................................................   100
..................................................   150
..................................................   200
..................................................   250
..................................................   300
..................................................   350
..................................................   400
..................................................   450
..................................................   500
..................................................   550
..................................................   600
..................................................   650
..................................................   700
..................................................   750
..................................................   800
..................................................   850
..................................................   900
..................................................   950
..................................................  1000

Monte Carlo permutation results           Number of observations  =         150
                                          Number of permutations  =       1,000

      command:  anova basibregmaticheight Period
      Fperiod:  e(F_1)
  permute var:  Period

-------------------------------------------------------------------------------
             |                                               Monte Carlo error
             |                                              -------------------
           T |    T(obs)       Test       c       n      p  SE(p)   [95% CI(p)]
-------------+-----------------------------------------------------------------
     Fperiod |   2.44742      upper      50    1000  .0500  .0069  .0373  .0654
-------------------------------------------------------------------------------
Note: For upper one-sided test, c = #{T >= T(obs)} and p = p_upper = c/n.

.