. * do-file for lecture 11 of VHM 802, Winter 2021 . version 16 /* works also with versions 14-15 */ . set more off . cd "r:\" r:\ . . * Manly Example 4.1 . import delimited r:\sparrow.csv, clear (6 vars, 49 obs) . encode survivorship, gen(surv) . foreach var of varlist total_length-l_keel_sternum { 2. ttest `var', by(surv) unequal /* unequal variances are preferable, though not the version used in Manly */ 3. robvar `var', by(surv) /* Levene's test W50 for equal variances */ 4. } Two-sample t test with unequal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- NS | 28 158.4286 .7336013 3.881853 156.9233 159.9338 S | 21 157.381 .7253117 3.323796 155.868 158.8939 ---------+-------------------------------------------------------------------- combined | 49 157.9796 .5220396 3.654277 156.93 159.0292 ---------+-------------------------------------------------------------------- diff | 1.047619 1.031624 -1.028801 3.12404 ------------------------------------------------------------------------------ diff = mean(NS) - mean(S) t = 1.0155 Ho: diff = 0 Satterthwaite's degrees of freedom = 46.1076 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.8424 Pr(|T| > |t|) = 0.3152 Pr(T > t) = 0.1576 Survivorshi | Summary of Total_length p | Mean Std. Dev. Freq. ------------+------------------------------------ NS | 158.42857 3.881853 28 S | 157.38095 3.3237959 21 ------------+------------------------------------ Total | 157.97959 3.6542772 49 W0 = 1.8734718 df(1, 47) Pr > F = 0.17758478 W50 = 1.4470443 df(1, 47) Pr > F = 0.23502763 W10 = 1.9971389 df(1, 47) Pr > F = 0.1641853 Two-sample t test with unequal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- NS | 28 241.5714 1.078197 5.705284 239.3592 243.7837 S | 21 241 .9128709 4.1833 239.0958 242.9042 ---------+-------------------------------------------------------------------- combined | 49 241.3265 .7239746 5.067822 239.8709 242.7822 ---------+-------------------------------------------------------------------- diff | .5714286 1.412743 -2.270663 3.413521 ------------------------------------------------------------------------------ diff = mean(NS) - mean(S) t = 0.4045 Ho: diff = 0 Satterthwaite's degrees of freedom = 46.9877 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.6562 Pr(|T| > |t|) = 0.6877 Pr(T > t) = 0.3438 Survivorshi | Summary of Alar_extent p | Mean Std. Dev. Freq. ------------+------------------------------------ NS | 241.57143 5.7052839 28 S | 241 4.1833001 21 ------------+------------------------------------ Total | 241.32653 5.0678223 49 W0 = 1.4064059 df(1, 47) Pr > F = 0.24161486 W50 = 1.4029851 df(1, 47) Pr > F = 0.24217978 W10 = 1.4446477 df(1, 47) Pr > F = 0.23540988 Two-sample t test with unequal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- NS | 28 31.47857 .1612908 .8534709 31.14763 31.80951 S | 21 31.43333 .1590647 .7289262 31.10153 31.76514 ---------+-------------------------------------------------------------------- combined | 49 31.45918 .1135362 .7947532 31.2309 31.68746 ---------+-------------------------------------------------------------------- diff | .0452381 .2265311 -.4107081 .5011843 ------------------------------------------------------------------------------ diff = mean(NS) - mean(S) t = 0.1997 Ho: diff = 0 Satterthwaite's degrees of freedom = 46.1395 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.5787 Pr(|T| > |t|) = 0.8426 Pr(T > t) = 0.4213 Survivorshi | Summary of L_beak_head p | Mean Std. Dev. Freq. ------------+------------------------------------ NS | 31.478571 .85347093 28 S | 31.433333 .72892623 21 ------------+------------------------------------ Total | 31.459184 .79475321 49 W0 = 0.67188596 df(1, 47) Pr > F = 0.41653233 W50 = 0.66377265 df(1, 47) Pr > F = 0.4193404 W10 = 0.77479963 df(1, 47) Pr > F = 0.38321515 Two-sample t test with unequal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- NS | 28 18.44643 .1245608 .6591139 18.19085 18.70201 S | 21 18.5 .0915475 .4195234 18.30904 18.69096 ---------+-------------------------------------------------------------------- combined | 49 18.46939 .0806122 .5642856 18.30731 18.63147 ---------+-------------------------------------------------------------------- diff | -.0535715 .1545844 -.3647433 .2576003 ------------------------------------------------------------------------------ diff = mean(NS) - mean(S) t = -0.3466 Ho: diff = 0 Satterthwaite's degrees of freedom = 45.948 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.3653 Pr(|T| > |t|) = 0.7305 Pr(T > t) = 0.6347 Survivorshi | Summary of L_humerous p | Mean Std. Dev. Freq. ------------+------------------------------------ NS | 18.446429 .65911387 28 S | 18.5 .41952343 21 ------------+------------------------------------ Total | 18.469388 .56428562 49 W0 = 3.9250944 df(1, 47) Pr > F = 0.0534381 W50 = 3.6558491 df(1, 47) Pr > F = 0.06197919 W10 = 3.8783962 df(1, 47) Pr > F = 0.05482183 Two-sample t test with unequal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- NS | 28 20.83929 .2172057 1.149344 20.39362 21.28495 S | 21 20.80952 .1654582 .7582247 20.46438 21.15466 ---------+-------------------------------------------------------------------- combined | 49 20.82653 .1416249 .9913744 20.54177 21.11129 ---------+-------------------------------------------------------------------- diff | .0297618 .2730471 -.51974 .5792636 ------------------------------------------------------------------------------ diff = mean(NS) - mean(S) t = 0.1090 Ho: diff = 0 Satterthwaite's degrees of freedom = 46.3548 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.5432 Pr(|T| > |t|) = 0.9137 Pr(T > t) = 0.4568 Survivorshi | Summary of L_keel_sternum p | Mean Std. Dev. Freq. ------------+------------------------------------ NS | 20.839286 1.1493443 28 S | 20.809524 .7582247 21 ------------+------------------------------------ Total | 20.826531 .99137445 49 W0 = 2.3376228 df(1, 47) Pr > F = 0.13298511 W50 = 1.9840519 df(1, 47) Pr > F = 0.16554548 W10 = 2.3216450 df(1, 47) Pr > F = 0.13428598 . hotelling total_length-l_keel_sternum, by(surv) --------------------------------------------------------------------------------------------------------------------------------------- -> surv = NS Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- total_length | 28 158.4286 3.881853 152 165 alar_extent | 28 241.5714 5.705284 230 252 l_beak_head | 28 31.47857 .8534709 30.1 33.4 l_humerous | 28 18.44643 .6591139 17.2 19.8 l_keel_ste~m | 28 20.83929 1.149344 18.6 23.1 --------------------------------------------------------------------------------------------------------------------------------------- -> surv = S Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- total_length | 21 157.381 3.323796 153 164 alar_extent | 21 241 4.1833 235 248 l_beak_head | 21 31.43333 .7289262 30.3 32.8 l_humerous | 21 18.5 .4195234 17.7 19.3 l_keel_ste~m | 21 20.80952 .7582247 19.6 22 2-group Hotelling's T-squared = 2.823701 F test statistic: ((49-5-1)/(49-2)(5)) x 2.823701 = .5166772 H0: Vectors of means are equal for the two groups F(5,43) = 0.5167 Prob > F(5,43) = 0.7622 . manova total_length-l_keel_sternum = surv /* same result as hotelling's T2 */ Number of obs = 49 W = Wilks' lambda L = Lawley-Hotelling trace P = Pillai's trace R = Roy's largest root Source | Statistic df F(df1, df2) = F Prob>F -----------+------------------------------------------------------- surv |W 0.9433 1 5.0 43.0 0.52 0.7622 e |P 0.0567 5.0 43.0 0.52 0.7622 e |L 0.0601 5.0 43.0 0.52 0.7622 e |R 0.0601 5.0 43.0 0.52 0.7622 e |------------------------------------------------------- Residual | 47 -----------+------------------------------------------------------- Total | 48 ------------------------------------------------------------------- e = exact, a = approximate, u = upper bound on F . mvreg Equation Obs Parms RMSE "R-sq" F P -------------------------------------------------------------------------- total_length 49 2 3.654812 0.0205 .985957 0.3258 alar_extent 49 2 5.113306 0.0032 .1498655 0.7004 l_beak_head 49 2 .8028382 0.0008 .0381008 0.8461 l_humerous 49 2 .5696142 0.0023 .1061421 0.7460 l_keel_ste~m 49 2 1.001753 0.0002 .010592 0.9185 -------------------------------------------------------------------------------- | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------------+---------------------------------------------------------------- total_length | surv | S | -1.047619 1.055053 -0.99 0.326 -3.170113 1.074874 _cons | 158.4286 .6906945 229.38 0.000 157.0391 159.8181 ---------------+---------------------------------------------------------------- alar_extent | surv | S | -.5714286 1.476084 -0.39 0.700 -3.540927 2.39807 _cons | 241.5714 .966324 249.99 0.000 239.6274 243.5154 ---------------+---------------------------------------------------------------- l_beak_head | surv | S | -.0452381 .2317594 -0.20 0.846 -.5114779 .4210017 _cons | 31.47857 .1517222 207.48 0.000 31.17335 31.7838 ---------------+---------------------------------------------------------------- l_humerous | surv | S | .0535715 .1644335 0.33 0.746 -.2772259 .384369 _cons | 18.44643 .107647 171.36 0.000 18.22987 18.66299 ---------------+---------------------------------------------------------------- l_keel_sternum | surv | S | -.0297618 .2891811 -0.10 0.918 -.6115191 .5519954 _cons | 20.83929 .1893134 110.08 0.000 20.45844 21.22014 -------------------------------------------------------------------------------- . matrix errorsscp=e(E) . matrix corr=corr(errorsscp) . matrix list corr symmetric corr[5,5] total_length alar_extent l_beak_head l_humerous l_keel_ste~m total_length 1 alar_extent .73563757 1 l_beak_head .66486471 .67348012 1 l_humerous .65963604 .77328517 .76571386 1 l_keel_ste~m .60933337 .52906854 .52611511 .60811557 1 . mvtest covariances total_length-l_keel_sternum, by(surv) /* Box's M-test */ Test of equality of covariance matrices across 2 samples Modified LR chi2 = 11.78584 Box F(15,7429.4) = 0.69 Prob > F = 0.7948 Box chi2(15) = 10.41 Prob > chi2 = 0.7933 . . * univariate two-sample permutation test . ttest total_length, by(surv) unequal /* to see the actual analysis */ Two-sample t test with unequal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- NS | 28 158.4286 .7336013 3.881853 156.9233 159.9338 S | 21 157.381 .7253117 3.323796 155.868 158.8939 ---------+-------------------------------------------------------------------- combined | 49 157.9796 .5220396 3.654277 156.93 159.0292 ---------+-------------------------------------------------------------------- diff | 1.047619 1.031624 -1.028801 3.12404 ------------------------------------------------------------------------------ diff = mean(NS) - mean(S) t = 1.0155 Ho: diff = 0 Satterthwaite's degrees of freedom = 46.1076 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.8424 Pr(|T| > |t|) = 0.3152 Pr(T > t) = 0.1576 . permute surv meandiff=(r(mu_1)-r(mu_2)), saving(permdist, replace) reps(1000) seed(210327) nodrop nowarn: ttest total_length, by(surv > ) (running ttest on estimation sample) Permutation replications (1000) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 .................................................. 150 .................................................. 200 .................................................. 250 .................................................. 300 .................................................. 350 .................................................. 400 .................................................. 450 .................................................. 500 .................................................. 550 .................................................. 600 .................................................. 650 .................................................. 700 .................................................. 750 .................................................. 800 .................................................. 850 .................................................. 900 .................................................. 950 .................................................. 1000 Monte Carlo permutation results Number of observations = 49 Number of permutations = 1,000 command: ttest total_length, by(surv) meandiff: r(mu_1)-r(mu_2) permute var: surv ------------------------------------------------------------------------------- | Monte Carlo error | ------------------- T | T(obs) Test c n p SE(p) [95% CI(p)] -------------+----------------------------------------------------------------- meandiff | 1.047619 lower 846 1000 .8460 .0114 .8221 .8678 | upper 172 1000 .1720 .0119 .1491 .1968 | two-sided .3440 .0150 .3146 .3734 ------------------------------------------------------------------------------- Note: For lower one-sided test, c = #{T <= T(obs)} and p = p_lower = c/n. Note: For upper one-sided test, c = #{T >= T(obs)} and p = p_upper = c/n. Note: For two-sided test, p = 2*min(p_lower, p_upper); SE and CI approximate. . preserve . use permdist, clear (permute surv : ttest) . histogram meandiff (bin=29, start=-3.5357144, width=.23275862) . sum meandiff, d r(mu_1)-r(mu_2) ------------------------------------------------------------- Percentiles Smallest 1% -2.577381 -3.535714 5% -1.785714 -3.285714 10% -1.410714 -3.285714 Obs 1,000 25% -.7440476 -2.952381 Sum of Wgt. 1,000 50% .047619 Mean -.0129643 Largest Std. Dev. 1.078863 75% .7142857 2.880952 90% 1.380952 2.964286 Variance 1.163945 95% 1.714286 3.214286 Skewness -.0416425 99% 2.422619 3.214286 Kurtosis 2.92957 . restore . . * Manly Example 5.2 . import delimited r:\skull.csv, clear (5 vars, 150 obs) . encode period, gen(Period) . discrim lda maximumbreadth-nasalheight, group(Period) notable . estat grdistances /* Mahalanobis distance matrix, with different group ordering than in Manly's Table 5.4 */ Mahalanobis squared distances between groups | Period --------+----------------------- group1 | 12th and 13th Dynasty group2 | Early predynastic group3 | Late predynastic group4 | Ptolemaic period group5 | Roman period | group1 group2 group3 group4 group5 -------------+------------------------------------------------------ group1 | 0 group2 | .9030738 0 group3 | .7289377 .0910342 0 group4 | .443113 1.881126 1.594014 0 group5 | .9108716 2.696817 2.175689 .2192851 0 . . * univariate one-way ANOVA permutation test . anova basibregmaticheight Period /* to see the actual analysis */ Number of obs = 150 R-squared = 0.0632 Root MSE = 4.84609 Adj R-squared = 0.0374 Source | Partial SS df MS F Prob>F -----------+---------------------------------------------------- Model | 229.90667 4 57.476667 2.45 0.0490 | Period | 229.90667 4 57.476667 2.45 0.0490 | Residual | 3405.2667 145 23.484598 -----------+---------------------------------------------------- Total | 3635.1733 149 24.397136 . permute Period Fperiod=(e(F_1)), saving(permdist, replace) reps(1000) seed(210327) right nodrop nowarn: anova basibregmaticheight Per > iod (running anova on estimation sample) Permutation replications (1000) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 .................................................. 150 .................................................. 200 .................................................. 250 .................................................. 300 .................................................. 350 .................................................. 400 .................................................. 450 .................................................. 500 .................................................. 550 .................................................. 600 .................................................. 650 .................................................. 700 .................................................. 750 .................................................. 800 .................................................. 850 .................................................. 900 .................................................. 950 .................................................. 1000 Monte Carlo permutation results Number of observations = 150 Number of permutations = 1,000 command: anova basibregmaticheight Period Fperiod: e(F_1) permute var: Period ------------------------------------------------------------------------------- | Monte Carlo error | ------------------- T | T(obs) Test c n p SE(p) [95% CI(p)] -------------+----------------------------------------------------------------- Fperiod | 2.44742 upper 50 1000 .0500 .0069 .0373 .0654 ------------------------------------------------------------------------------- Note: For upper one-sided test, c = #{T >= T(obs)} and p = p_upper = c/n. .