. * VHm 812 - Winter 2015 . * 2al - Linear regression: predictors, confounding and interaction . . * change working directory and open a log file . cd h:\vhm\vhm802\data_stata h:\vhm\vhm802\data_stata . . * open the DAISY dataset . use daisy2red.dta, clear . . *meaningless use of a nominal variable . egen farm=group(herd) . label var farm farm . table farm, c(mean milk120) ------------------------- farm | mean(milk120) ----------+-------------- 1 | 3357.8 2 | 3279.3 3 | 2778.6 4 | 3032.5 5 | 2838.2 6 | 3730.6 7 | 3435.8 ------------------------- . twoway (scatter milk120 farm) (lfit milk120 farm) , name(pos, replace) . regress milk120 farm Source | SS df MS Number of obs = 1536 -------------+------------------------------ F( 1, 1534) = 33.10 Model | 15802185.7 1 15802185.7 Prob > F = 0.0000 Residual | 732338007 1534 477404.177 R-squared = 0.0211 -------------+------------------------------ Adj R-squared = 0.0205 Total | 748140192 1535 487387.748 Root MSE = 690.94 ------------------------------------------------------------------------------ milk120 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- farm | 50.90872 8.848643 5.75 0.000 33.552 68.26543 _cons | 3026.143 37.27521 81.18 0.000 2953.028 3099.259 ------------------------------------------------------------------------------ . . gen farm1=abs(farm-7)+1 . preserve . collapse milk120 farm, by(farm1) . sort farm . list farm milk120 farm1 +------------------------+ | farm milk120 farm1 | |------------------------| 1. | 1 3357.8 7 | 2. | 2 3279.3 6 | 3. | 3 2778.6 5 | 4. | 4 3032.5 4 | 5. | 5 2838.2 3 | |------------------------| 6. | 6 3730.6 2 | 7. | 7 3435.8 1 | +------------------------+ . restore . twoway (scatter milk120 farm1) (lfit milk120 farm1) , name(pos, replace) . regress milk120 farm1 Source | SS df MS Number of obs = 1536 -------------+------------------------------ F( 1, 1534) = 33.10 Model | 15802185.7 1 15802185.7 Prob > F = 0.0000 Residual | 732338007 1534 477404.177 R-squared = 0.0211 -------------+------------------------------ Adj R-squared = 0.0205 Total | 748140192 1535 487387.748 Root MSE = 690.94 ------------------------------------------------------------------------------ milk120 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- farm1 | -50.90872 8.848643 -5.75 0.000 -68.26543 -33.552 _cons | 3433.413 41.84204 82.06 0.000 3351.339 3515.487 ------------------------------------------------------------------------------ . . * create a 4 level categorical variable for age . egen parity_c4=cut(parity), at(1 2 3 4 99) icodes . sort parity . br parity parity_c4 . tab parity parity_c4 Lactation | parity_c4 number | 0 1 2 3 | Total -----------+--------------------------------------------+---------- 1 | 417 0 0 0 | 417 2 | 0 374 0 0 | 374 3 | 0 0 319 0 | 319 4 | 0 0 0 222 | 222 5 | 0 0 0 169 | 169 6 | 0 0 0 69 | 69 7 | 0 0 0 4 | 4 -----------+--------------------------------------------+---------- Total | 417 374 319 464 | 1,574 . . *INDICATOR VARIABLES . * create indicator variables "manually" (you never actually do this) . gen parity_c4_0=(parity_c4==0) if !missing(parity) . gen parity_c4_1=(parity_c4==1) if !missing(parity) . gen parity_c4_2=(parity_c4==2) if !missing(parity) . gen parity_c4_3=(parity_c4==3) if !missing(parity) . . br parity parity_c4 parity_c4_0 parity_c4_1 parity_c4_2 parity_c4_3 . . * determine the average milk120 in each age group . format milk120 %4.2f . tab parity_c4, sum(milk120) | Summary of Milk volume (l) in first | 120 days of lactation parity_c4 | Mean Std. Dev. Freq. ------------+------------------------------------ 0 | 2639.65 486.40 403 1 | 3347.86 626.47 369 2 | 3429.49 655.11 308 3 | 3471.42 650.91 456 ------------+------------------------------------ Total | 3215.10 698.13 1536 . . * regress -milk120- on the age groups . reg milk120 parity_c4_1 parity_c4_2 parity_c4_3 Source | SS df MS Number of obs = 1536 -------------+------------------------------ F( 3, 1532) = 166.65 Model | 184071923 3 61357307.8 Prob > F = 0.0000 Residual | 564068269 1532 368190.776 R-squared = 0.2460 -------------+------------------------------ Adj R-squared = 0.2446 Total | 748140192 1535 487387.748 Root MSE = 606.79 ------------------------------------------------------------------------------ milk120 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity_c4_1 | 708.2134 43.71992 16.20 0.000 622.4561 793.9706 parity_c4_2 | 789.8435 45.92439 17.20 0.000 699.7622 879.9248 parity_c4_3 | 831.7748 41.48567 20.05 0.000 750.4001 913.1495 _cons | 2639.645 30.22623 87.33 0.000 2580.356 2698.934 ------------------------------------------------------------------------------ . * or . reg milk120 i.parity_c4 Source | SS df MS Number of obs = 1536 -------------+------------------------------ F( 3, 1532) = 166.65 Model | 184071923 3 61357307.8 Prob > F = 0.0000 Residual | 564068269 1532 368190.776 R-squared = 0.2460 -------------+------------------------------ Adj R-squared = 0.2446 Total | 748140192 1535 487387.748 Root MSE = 606.79 ------------------------------------------------------------------------------ milk120 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity_c4 | 1 | 708.2134 43.71992 16.20 0.000 622.4561 793.9706 2 | 789.8435 45.92439 17.20 0.000 699.7622 879.9248 3 | 831.7748 41.48567 20.05 0.000 750.4001 913.1495 | _cons | 2639.645 30.22623 87.33 0.000 2580.356 2698.934 ------------------------------------------------------------------------------ . * testing a group of categorical predictors . testparm i.parity_c4 ( 1) 1.parity_c4 = 0 ( 2) 2.parity_c4 = 0 ( 3) 3.parity_c4 = 0 F( 3, 1532) = 166.65 Prob > F = 0.0000 . . * HIERARCHICAL DUMMY VARIABLES . * create hierarchical dummy variables "manually" . gen parity_h0=1 if !missing(parity) . gen parity_h1=(parity_c4>=1) if !missing(parity) . gen parity_h2=(parity_c4>=2) if !missing(parity) . gen parity_h3=(parity_c4>=3) if !missing(parity) . . br parity parity_c4 parity_h0 parity_h1 parity_h2 parity_h3 . . * regress -milk120 on hierarchical dummy variables for parity . reg milk120 parity_h1 parity_h2 parity_h3 Source | SS df MS Number of obs = 1536 -------------+------------------------------ F( 3, 1532) = 166.65 Model | 184071923 3 61357307.8 Prob > F = 0.0000 Residual | 564068269 1532 368190.776 R-squared = 0.2460 -------------+------------------------------ Adj R-squared = 0.2446 Total | 748140192 1535 487387.748 Root MSE = 606.79 ------------------------------------------------------------------------------ milk120 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity_h1 | 708.2134 43.71992 16.20 0.000 622.4561 793.9706 parity_h2 | 81.6301 46.83195 1.74 0.082 -10.23141 173.4916 parity_h3 | 41.93131 44.75333 0.94 0.349 -45.85295 129.7156 _cons | 2639.645 30.22623 87.33 0.000 2580.356 2698.934 ------------------------------------------------------------------------------ . . * SCALING X VARIABLES . * substract min value . reg milk120 parity Source | SS df MS Number of obs = 1536 -------------+------------------------------ F( 1, 1534) = 262.27 Model | 109234227 1 109234227 Prob > F = 0.0000 Residual | 638905966 1534 416496.718 R-squared = 0.1460 -------------+------------------------------ Adj R-squared = 0.1455 Total | 748140192 1535 487387.748 Root MSE = 645.37 ------------------------------------------------------------------------------ milk120 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity | 178.347 11.01266 16.19 0.000 156.7455 199.9484 _cons | 2727.08 34.33991 79.41 0.000 2659.722 2794.438 ------------------------------------------------------------------------------ . gen parity_1=parity-1 . reg milk120 parity_1 Source | SS df MS Number of obs = 1536 -------------+------------------------------ F( 1, 1534) = 262.27 Model | 109234227 1 109234227 Prob > F = 0.0000 Residual | 638905966 1534 416496.718 R-squared = 0.1460 -------------+------------------------------ Adj R-squared = 0.1455 Total | 748140192 1535 487387.748 Root MSE = 645.37 ------------------------------------------------------------------------------ milk120 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity_1 | 178.347 11.01266 16.19 0.000 156.7455 199.9484 _cons | 2905.427 25.23474 115.14 0.000 2855.928 2954.925 ------------------------------------------------------------------------------ . . * centring . reg milk120 c.herd_size##c.herd_size, vsquish Source | SS df MS Number of obs = 1536 -------------+------------------------------ F( 2, 1533) = 111.15 Model | 94747175.1 2 47373587.5 Prob > F = 0.0000 Residual | 653393017 1533 426218.537 R-squared = 0.1266 -------------+------------------------------ Adj R-squared = 0.1255 Total | 748140192 1535 487387.748 Root MSE = 652.85 ----------------------------------------------------------------------------------------- milk120 | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------------------------+---------------------------------------------------------------- herd_size | 28.73126 1.993023 14.42 0.000 24.82192 32.6406 c.herd_size#c.herd_size | -.0608255 .0041101 -14.80 0.000 -.0688875 -.0527634 _cons | 66.06488 231.8877 0.28 0.776 -388.7858 520.9155 ----------------------------------------------------------------------------------------- . estat vce, corr Correlation matrix of coefficients of regress model | c.herd~e# e(V) | herd_s~e c.herd~e _cons -------------+------------------------------ herd_size | 1.0000 c.herd_size#| c.herd_size | -0.9907 1.0000 _cons | -0.9844 0.9535 1.0000 . . summ herd_size Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- herd_size | 1574 251.0076 62.01692 125 333 . gen hrdsz_ctr=herd_size - 251 . reg milk120 c.hrdsz_ctr##c.hrdsz_ctr, vsquish Source | SS df MS Number of obs = 1536 -------------+------------------------------ F( 2, 1533) = 111.15 Model | 94747175.1 2 47373587.5 Prob > F = 0.0000 Residual | 653393017 1533 426218.537 R-squared = 0.1266 -------------+------------------------------ Adj R-squared = 0.1255 Total | 748140192 1535 487387.748 Root MSE = 652.85 ----------------------------------------------------------------------------------------- milk120 | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------------------------+---------------------------------------------------------------- hrdsz_ctr | -1.803116 .2847073 -6.33 0.000 -2.361573 -1.244659 c.hrdsz_ctr#c.hrdsz_ctr | -.0608255 .0041101 -14.80 0.000 -.0688875 -.0527634 _cons | 3445.547 22.80494 151.09 0.000 3400.815 3490.279 ----------------------------------------------------------------------------------------- . estat vce, corr Correlation matrix of coefficients of regress model | c.hrds~r# e(V) | hrdsz_~r c.hrds~r _cons -------------+------------------------------ hrdsz_ctr | 1.0000 c.hrdsz_ctr#| c.hrdsz_ctr | 0.3115 1.0000 _cons | -0.2118 -0.6830 1.0000 . . * scale of measurement . reg milk120 herd_size Source | SS df MS Number of obs = 1536 -------------+------------------------------ F( 1, 1534) = 2.88 Model | 1400932.73 1 1400932.73 Prob > F = 0.0900 Residual | 746739260 1534 486792.216 R-squared = 0.0019 -------------+------------------------------ Adj R-squared = 0.0012 Total | 748140192 1535 487387.748 Root MSE = 697.7 ------------------------------------------------------------------------------ milk120 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- herd_size | -.49048 .2891242 -1.70 0.090 -1.057601 .0766405 _cons | 3338.164 74.69789 44.69 0.000 3191.643 3484.685 ------------------------------------------------------------------------------ . capture drop herdsz_100 . gen herdsz_100=herd_size/100 /*rescale herd_size so coef are larger*/ . reg milk120 herdsz_100 Source | SS df MS Number of obs = 1536 -------------+------------------------------ F( 1, 1534) = 2.88 Model | 1400930.12 1 1400930.12 Prob > F = 0.0900 Residual | 746739262 1534 486792.218 R-squared = 0.0019 -------------+------------------------------ Adj R-squared = 0.0012 Total | 748140192 1535 487387.748 Root MSE = 697.7 ------------------------------------------------------------------------------ milk120 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- herdsz_100 | -49.04796 28.91242 -1.70 0.090 -105.76 7.664098 _cons | 3338.164 74.69789 44.69 0.000 3191.643 3484.685 ------------------------------------------------------------------------------ . . * DETECTING CONFOUNDING . reg wpc vag_disch /* vag_disch adds ~12 days, P=0.04 */ Source | SS df MS Number of obs = 1574 -------------+------------------------------ F( 1, 1572) = 4.21 Model | 11186.2576 1 11186.2576 Prob > F = 0.0404 Residual | 4176904.3 1572 2657.0638 R-squared = 0.0027 -------------+------------------------------ Adj R-squared = 0.0020 Total | 4188090.56 1573 2662.48605 Root MSE = 51.547 ------------------------------------------------------------------------------ wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- vag_disch | 11.99647 5.846716 2.05 0.040 .5282858 23.46465 _cons | 68.17426 1.334494 51.09 0.000 65.55669 70.79184 ------------------------------------------------------------------------------ . reg wpc rp /* rp adds 12 days, P=0.008 */ Source | SS df MS Number of obs = 1574 -------------+------------------------------ F( 1, 1572) = 7.00 Model | 18574.5377 1 18574.5377 Prob > F = 0.0082 Residual | 4169516.02 1572 2652.36388 R-squared = 0.0044 -------------+------------------------------ Adj R-squared = 0.0038 Total | 4188090.56 1573 2662.48605 Root MSE = 51.501 ------------------------------------------------------------------------------ wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- rp | 11.7344 4.434231 2.65 0.008 3.036767 20.43203 _cons | 67.68842 1.364298 49.61 0.000 65.01239 70.36446 ------------------------------------------------------------------------------ . reg wpc rp vag_disch /* vag_disch no sig. , rp is a confounder */ Source | SS df MS Number of obs = 1574 -------------+------------------------------ F( 2, 1571) = 4.65 Model | 24663.1482 2 12331.5741 Prob > F = 0.0097 Residual | 4163427.41 1571 2650.17658 R-squared = 0.0059 -------------+------------------------------ Adj R-squared = 0.0046 Total | 4188090.56 1573 2662.48605 Root MSE = 51.48 ------------------------------------------------------------------------------ wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- rp | 10.2397 4.540774 2.26 0.024 1.333087 19.14632 vag_disch | 9.066942 5.9819 1.52 0.130 -2.666406 20.80029 _cons | 67.35756 1.381095 48.77 0.000 64.64857 70.06654 ------------------------------------------------------------------------------ . . tab vag_disch rp, chi2 Vaginal | Retained placenta at discharge | calving observed | no yes | Total -----------+----------------------+---------- no | 1,373 119 | 1,492 yes | 52 30 | 82 -----------+----------------------+---------- Total | 1,425 149 | 1,574 Pearson chi2(1) = 74.2346 Pr = 0.000 . cs vag_disch rp, or | Retained placenta at | | calving | | Exposed Unexposed | Total -----------------+------------------------+------------ Cases | 30 52 | 82 Noncases | 119 1373 | 1492 -----------------+------------------------+------------ Total | 149 1425 | 1574 | | Risk | .2013423 .0364912 | .0520966 | | | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Risk difference | .1648511 | .0997315 .2299706 Risk ratio | 5.517553 | 3.638118 8.367895 Attr. frac. ex. | .8187602 | .7251326 .8804956 Attr. frac. pop | .2995464 | Odds ratio | 6.656432 | 4.104597 10.79865 (Cornfield) +------------------------------------------------- chi2(1) = 74.23 Pr>chi2 = 0.0000 . . * CONFOUNDING AND COLLINEARITY . use bw5k.dta, clear . corr cig_2 cig_3 /*considering only cig_2 for the example*/ (obs=5000) | cig_2 cig_3 -------------+------------------ cig_2 | 1.0000 cig_3 | 0.9451 1.0000 . . * Model for cig_3 as exposure of interest . * 1) relationship Z->E . corr cig_3 cig_2 (obs=5000) | cig_3 cig_2 -------------+------------------ cig_3 | 1.0000 cig_2 | 0.9451 1.0000 . * 2) relationship Z-Y in E- . reg bwt cig_2 if cig_3==0 /*borderline sig* but not the best model*/ Source | SS df MS Number of obs = 4656 -------------+------------------------------ F( 1, 4654) = 2.77 Model | 879610.363 1 879610.363 Prob > F = 0.0960 Residual | 1.4771e+09 4654 317380.896 R-squared = 0.0006 -------------+------------------------------ Adj R-squared = 0.0004 Total | 1.4780e+09 4655 317501.676 Root MSE = 563.37 ------------------------------------------------------------------------------ bwt | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- cig_2 | -22.32463 13.41002 -1.66 0.096 -48.61462 3.965367 _cons | 3310.075 8.263501 400.57 0.000 3293.874 3326.275 ------------------------------------------------------------------------------ . * 3) not an int. variable . * 4) change in coef. . reg bwt cig_3 Source | SS df MS Number of obs = 5000 -------------+------------------------------ F( 1, 4998) = 22.55 Model | 7189497.98 1 7189497.98 Prob > F = 0.0000 Residual | 1.5935e+09 4998 318835.965 R-squared = 0.0045 -------------+------------------------------ Adj R-squared = 0.0043 Total | 1.6007e+09 4999 320210.372 Root MSE = 564.66 ------------------------------------------------------------------------------ bwt | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- cig_3 | -12.48752 2.629726 -4.75 0.000 -17.64293 -7.332101 _cons | 3303.448 8.177729 403.96 0.000 3287.416 3319.48 ------------------------------------------------------------------------------ . reg bwt cig_3 cig_2 Source | SS df MS Number of obs = 5000 -------------+------------------------------ F( 2, 4997) = 12.61 Model | 8037757.94 2 4018878.97 Prob > F = 0.0000 Residual | 1.5927e+09 4997 318730.017 R-squared = 0.0050 -------------+------------------------------ Adj R-squared = 0.0046 Total | 1.6007e+09 4999 320210.372 Root MSE = 564.56 ------------------------------------------------------------------------------ bwt | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- cig_3 | -.0859205 8.043799 -0.01 0.991 -15.8553 15.68346 cig_2 | -12.11372 7.425483 -1.63 0.103 -26.67093 2.44348 _cons | 3304.134 8.187191 403.57 0.000 3288.084 3320.185 ------------------------------------------------------------------------------ . estat vce, corr /*high correlation*/ Correlation matrix of coefficients of regress model e(V) | cig_3 cig_2 _cons -------------+------------------------------ cig_3 | 1.0000 cig_2 | -0.9451 1.0000 _cons | -0.0218 -0.0514 1.0000 . * try centring . summ cig_2 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- cig_2 | 5000 .743 3.28979 0 45 . gen cig2_ctr=cig_2-r(mean) . summ cig_3 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- cig_3 | 5000 .6704 3.036908 0 45 . gen cig3_ctr=cig_3-r(mean) . reg bwt cig3_ctr cig2_ctr Source | SS df MS Number of obs = 5000 -------------+------------------------------ F( 2, 4997) = 12.61 Model | 8037757.91 2 4018878.95 Prob > F = 0.0000 Residual | 1.5927e+09 4997 318730.017 R-squared = 0.0050 -------------+------------------------------ Adj R-squared = 0.0046 Total | 1.6007e+09 4999 320210.372 Root MSE = 564.56 ------------------------------------------------------------------------------ bwt | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- cig3_ctr | -.0859208 8.043799 -0.01 0.991 -15.8553 15.68346 cig2_ctr | -12.11372 7.425483 -1.63 0.103 -26.67093 2.44348 _cons | 3295.076 7.984109 412.70 0.000 3279.424 3310.728 ------------------------------------------------------------------------------ . estat vce, corr /*still high correlation since centring will reduce correlation > on derived variables*/ Correlation matrix of coefficients of regress model e(V) | cig3_ctr cig2_ctr _cons -------------+------------------------------ cig3_ctr | 1.0000 cig2_ctr | -0.9451 1.0000 _cons | 0.0000 -0.0000 1.0000 . . * DETECTING INTERACTION . * 2 dichotomous variables . use daisy2red, clear . gen parity_1=parity-1 . . reg wpc vag_disch Source | SS df MS Number of obs = 1574 -------------+------------------------------ F( 1, 1572) = 4.21 Model | 11186.2576 1 11186.2576 Prob > F = 0.0404 Residual | 4176904.3 1572 2657.0638 R-squared = 0.0027 -------------+------------------------------ Adj R-squared = 0.0020 Total | 4188090.56 1573 2662.48605 Root MSE = 51.547 ------------------------------------------------------------------------------ wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- vag_disch | 11.99647 5.846716 2.05 0.040 .5282858 23.46465 _cons | 68.17426 1.334494 51.09 0.000 65.55669 70.79184 ------------------------------------------------------------------------------ . reg wpc i.vag_disch i.rp vag_disch#rp Source | SS df MS Number of obs = 1574 -------------+------------------------------ F( 3, 1570) = 4.53 Model | 35915.9774 3 11971.9925 Prob > F = 0.0036 Residual | 4152174.58 1570 2644.69719 R-squared = 0.0086 -------------+------------------------------ Adj R-squared = 0.0067 Total | 4188090.56 1573 2662.48605 Root MSE = 51.427 ------------------------------------------------------------------------------ wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- vag_disch | yes | .5429296 7.265382 0.07 0.940 -13.70794 14.7938 | rp | yes | 6.339794 4.914322 1.29 0.197 -3.299531 15.97912 | vag_disch#rp | yes#yes | 26.34867 12.77367 2.06 0.039 1.293414 51.40392 | _cons | 67.66861 1.387883 48.76 0.000 64.94631 70.39091 ------------------------------------------------------------------------------ . reg wpc vag_disch##rp //same as above Source | SS df MS Number of obs = 1574 -------------+------------------------------ F( 3, 1570) = 4.53 Model | 35915.9774 3 11971.9925 Prob > F = 0.0036 Residual | 4152174.58 1570 2644.69719 R-squared = 0.0086 -------------+------------------------------ Adj R-squared = 0.0067 Total | 4188090.56 1573 2662.48605 Root MSE = 51.427 ------------------------------------------------------------------------------ wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- vag_disch | yes | .5429296 7.265382 0.07 0.940 -13.70794 14.7938 | rp | yes | 6.339794 4.914322 1.29 0.197 -3.299531 15.97912 | vag_disch#rp | yes#yes | 26.34867 12.77367 2.06 0.039 1.293414 51.40392 | _cons | 67.66861 1.387883 48.76 0.000 64.94631 70.39091 ------------------------------------------------------------------------------ . table vag_disch rp, c(mean wpc) // display wpc means by vag_dich and rp ------------------------------ Vaginal | Retained placenta discharge | at calving observed | no yes ----------+------------------- no | 67.66861 74.0084 yes | 68.21154 100.9 ------------------------------ . . *Use of margins command as a tool to create plots - this will be explained later . reg wpc vag_disch#rp //total effects Source | SS df MS Number of obs = 1574 -------------+------------------------------ F( 3, 1570) = 4.53 Model | 35915.9774 3 11971.9925 Prob > F = 0.0036 Residual | 4152174.58 1570 2644.69719 R-squared = 0.0086 -------------+------------------------------ Adj R-squared = 0.0067 Total | 4188090.56 1573 2662.48605 Root MSE = 51.427 ------------------------------------------------------------------------------ wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- vag_disch#rp | no#yes | 6.339794 4.914322 1.29 0.197 -3.299531 15.97912 yes#no | .5429296 7.265382 0.07 0.940 -13.70794 14.7938 yes#yes | 33.23139 9.491195 3.50 0.000 14.61464 51.84814 | _cons | 67.66861 1.387883 48.76 0.000 64.94631 70.39091 ------------------------------------------------------------------------------ . margins vag_disch#rp //same as table command Adjusted predictions Number of obs = 1574 Model VCE : OLS Expression : Linear prediction, predict() ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- vag_disch#rp | no#no | 67.66861 1.387883 48.76 0.000 64.94631 70.39091 no#yes | 74.0084 4.71427 15.70 0.000 64.76147 83.25533 yes#no | 68.21154 7.131589 9.56 0.000 54.2231 82.19998 yes#yes | 100.9 9.389173 10.75 0.000 82.48336 119.3166 ------------------------------------------------------------------------------ . marginsplot, legend(title("Retained Placenta")) noci ytitle("Predicted WPC") title("") Variables that uniquely identify margins: vag_disch rp . marginsplot, x(rp) legend(title("Vaginal Discharge")) noci ytitle("Predicted WPC") title("") Variables that uniquely identify margins: vag_disch rp . . * dichotomous and continuous variable . gen milk120k=milk120/1000 (38 missing values generated) . summ milk120k Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- milk120k | 1536 3.215096 .6981316 1.1102 5.6303 . gen milk120k_ctr=milk120k-r(mean) (38 missing values generated) . reg wpc i.dyst##c.milk120k_ctr, vsquish Source | SS df MS Number of obs = 1536 -------------+------------------------------ F( 3, 1532) = 3.83 Model | 30572.8752 3 10190.9584 Prob > F = 0.0095 Residual | 4073791.78 1532 2659.13302 R-squared = 0.0074 -------------+------------------------------ Adj R-squared = 0.0055 Total | 4104364.66 1535 2673.8532 Root MSE = 51.567 ------------------------------------------------------------------------------------- wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------------+---------------------------------------------------------------- dyst | yes | 8.20714 5.718528 1.44 0.151 -3.00983 19.42411 milk120k_ctr | -3.446531 1.928535 -1.79 0.074 -7.229379 .3363161 dyst#c.milk120k_ctr | yes | 29.14238 9.468101 3.08 0.002 10.57057 47.71419 _cons | 68.75682 1.357147 50.66 0.000 66.09475 71.41888 ------------------------------------------------------------------------------------- . margins dyst, at(milk120k_ctr=(-2(1)3)) Adjusted predictions Number of obs = 1536 Model VCE : OLS Expression : Linear prediction, predict() 1._at : milk120k_ctr = -2 2._at : milk120k_ctr = -1 3._at : milk120k_ctr = 0 4._at : milk120k_ctr = 1 5._at : milk120k_ctr = 2 6._at : milk120k_ctr = 3 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at#dyst | 1#no | 75.64988 4.106318 18.42 0.000 67.59528 83.70448 1#yes | 25.57227 17.96398 1.42 0.155 -9.664321 60.80885 2#no | 72.20335 2.37331 30.42 0.000 67.54807 76.85863 2#yes | 51.26811 9.531863 5.38 0.000 32.57123 69.96499 3#no | 68.75682 1.357147 50.66 0.000 66.09475 71.41888 3#yes | 76.96396 5.555152 13.85 0.000 66.06745 87.86046 4#no | 65.31028 2.342987 27.87 0.000 60.71448 69.90609 4#yes | 102.6598 11.94631 8.59 0.000 79.22694 126.0927 5#no | 61.86375 4.071342 15.19 0.000 53.87776 69.84975 5#yes | 128.3556 20.64995 6.22 0.000 87.85048 168.8608 6#no | 58.41722 5.924572 9.86 0.000 46.79609 70.03835 6#yes | 154.0515 29.69811 5.19 0.000 95.79823 212.3047 ------------------------------------------------------------------------------ . marginsplot, noci title("") legend(order(1 "dyst=0" 2 "dyst=1")) scheme(s2mono) Variables that uniquely identify margins: milk120k_ctr dyst . . * two continuous variables . reg wpc c.parity_1##c.milk120k_ctr Source | SS df MS Number of obs = 1536 -------------+------------------------------ F( 3, 1532) = 2.26 Model | 18084.1899 3 6028.06329 Prob > F = 0.0797 Residual | 4086280.47 1532 2667.2849 R-squared = 0.0044 -------------+------------------------------ Adj R-squared = 0.0025 Total | 4104364.66 1535 2673.8532 Root MSE = 51.646 ------------------------------------------------------------------------------------------- wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------------------+---------------------------------------------------------------- parity_1 | 2.072164 .9549532 2.17 0.030 .1990103 3.945318 milk120k_ctr | -2.542834 3.091716 -0.82 0.411 -8.607278 3.521609 | c.parity_1#c.milk120k_ctr | -.8764358 1.363504 -0.64 0.520 -3.550968 1.798096 | _cons | 65.73655 2.207989 29.77 0.000 61.40555 70.06755 ------------------------------------------------------------------------------------------- . margins, at(milk120k_ctr=(-2(1)3) parity=(0(1)3)) Adjusted predictions Number of obs = 1536 Model VCE : OLS Expression : Linear prediction, predict() 1._at : parity_1 = 0 milk120k_ctr = -2 2._at : parity_1 = 0 milk120k_ctr = -1 3._at : parity_1 = 0 milk120k_ctr = 0 4._at : parity_1 = 0 milk120k_ctr = 1 5._at : parity_1 = 0 milk120k_ctr = 2 6._at : parity_1 = 0 milk120k_ctr = 3 7._at : parity_1 = 1 milk120k_ctr = -2 8._at : parity_1 = 1 milk120k_ctr = -1 9._at : parity_1 = 1 milk120k_ctr = 0 10._at : parity_1 = 1 milk120k_ctr = 1 11._at : parity_1 = 1 milk120k_ctr = 2 12._at : parity_1 = 1 milk120k_ctr = 3 13._at : parity_1 = 2 milk120k_ctr = -2 14._at : parity_1 = 2 milk120k_ctr = -1 15._at : parity_1 = 2 milk120k_ctr = 0 16._at : parity_1 = 2 milk120k_ctr = 1 17._at : parity_1 = 2 milk120k_ctr = 2 18._at : parity_1 = 2 milk120k_ctr = 3 19._at : parity_1 = 3 milk120k_ctr = -2 20._at : parity_1 = 3 milk120k_ctr = -1 21._at : parity_1 = 3 milk120k_ctr = 0 22._at : parity_1 = 3 milk120k_ctr = 1 23._at : parity_1 = 3 milk120k_ctr = 2 24._at : parity_1 = 3 milk120k_ctr = 3 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at | 1 | 70.82222 5.665012 12.50 0.000 59.71022 81.93422 2 | 68.27938 2.987487 22.86 0.000 62.41939 74.13938 3 | 65.73655 2.207989 29.77 0.000 61.40555 70.06755 4 | 63.19371 4.465733 14.15 0.000 54.43412 71.95331 5 | 60.65088 7.357156 8.24 0.000 46.21972 75.08204 6 | 58.10805 10.36485 5.61 0.000 37.77725 78.43884 7 | 74.64725 4.302751 17.35 0.000 66.20735 83.08716 8 | 71.22798 2.33446 30.51 0.000 66.64891 75.80706 9 | 67.80871 1.601972 42.33 0.000 64.66642 70.951 10 | 64.38944 3.140622 20.50 0.000 58.22907 70.54982 11 | 60.97017 5.228888 11.66 0.000 50.71364 71.22671 12 | 57.5509 7.416173 7.76 0.000 43.00398 72.09783 13 | 78.47229 4.592419 17.09 0.000 69.4642 87.48038 14 | 74.17658 2.691774 27.56 0.000 68.89663 79.45653 15 | 69.88088 1.442666 48.44 0.000 67.05107 72.71069 16 | 65.58517 2.365847 27.72 0.000 60.94453 70.22581 17 | 61.28947 4.218233 14.53 0.000 53.01534 69.56359 18 | 56.99376 6.218525 9.17 0.000 44.79604 69.19148 19 | 82.29732 6.310492 13.04 0.000 69.91921 94.67544 20 | 77.12518 3.783241 20.39 0.000 69.7043 84.54606 21 | 71.95304 1.849358 38.91 0.000 68.3255 75.58058 22 | 66.7809 2.672818 24.99 0.000 61.53813 72.02367 23 | 61.60876 5.048207 12.20 0.000 51.70663 71.51089 24 | 56.43662 7.644701 7.38 0.000 41.44143 71.4318 ------------------------------------------------------------------------------ . marginsplot, noci title("") scheme(s2mono) Variables that uniquely identify margins: milk120k_ctr parity_1 . . * interaction between 2 cont. variables assumes both effect are linear . * is this true for -milk120k- ? - generate a smoothed scatterplot . twoway (scatter wpc milk120k) (lowess wpc milk120k) . twoway (scatter wpc parity) (lowess wpc parity) . . * CAUSAL INTERPRETATION . use daisy2red.dta, clear . . * model to evaluate effect of - rp and vag_disch . gen calv_mth=month(calv_dt) . tab calv_mth, summ(wpc) | Summary of Interval from wait | period to conception calv_mth | Mean Std. Dev. Freq. ------------+------------------------------------ 1 | 77.00885 50.850533 113 2 | 60.009615 45.765017 104 3 | 69.402516 54.240687 159 4 | 60.385827 53.046258 127 5 | 63 49.882911 106 6 | 64.350365 51.665154 137 7 | 63.581818 45.047453 110 8 | 69.514019 54.277478 107 9 | 74.664773 52.827873 176 10 | 72.477011 49.332856 174 11 | 72.026846 54.231479 149 12 | 73.508929 53.222801 112 ------------+------------------------------------ Total | 68.799238 51.599283 1574 . gen aut_calv=(calv_mth>=2 & calv_mth<=7) if !missing(calv_mth) . tab aut_calv, summ(wpc) | Summary of Interval from wait | period to conception aut_calv | Mean Std. Dev. Freq. ------------+------------------------------------ 0 | 73.233454 52.236588 831 1 | 63.839838 50.451973 743 ------------+------------------------------------ Total | 68.799238 51.599283 1574 . gen hs100=(herd_size/100) . gen parity_1=parity-1 . . reg wpc c.hs100##c.hs100 parity_1 i.aut_calv twin dyst i.rp##vag_disch , vsquish Source | SS df MS Number of obs = 1574 -------------+------------------------------ F( 9, 1564) = 13.22 Model | 296062.681 9 32895.8535 Prob > F = 0.0000 Residual | 3892027.88 1564 2488.50887 R-squared = 0.0707 -------------+------------------------------ Adj R-squared = 0.0653 Total | 4188090.56 1573 2662.48605 Root MSE = 49.885 --------------------------------------------------------------------------------- wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] ----------------+---------------------------------------------------------------- hs100 | -36.05705 15.05032 -2.40 0.017 -65.57798 -6.53612 c.hs100#c.hs100 | 11.13827 3.111145 3.58 0.000 5.035818 17.24073 parity_1 | 1.13721 .8583103 1.32 0.185 -.54635 2.82077 1.aut_calv | -8.263839 2.537751 -3.26 0.001 -13.24159 -3.286086 twin | 20.68314 9.845165 2.10 0.036 1.37203 39.99425 dyst | 11.70041 5.462576 2.14 0.032 .9856659 22.41516 rp | yes | 5.98687 4.811976 1.24 0.214 -3.451734 15.42547 vag_disch | yes | 1.228195 7.161395 0.17 0.864 -12.81875 15.27514 rp#vag_disch | yes#yes | 22.85194 12.51605 1.83 0.068 -1.698056 47.40194 _cons | 84.66125 17.61671 4.81 0.000 50.10639 119.2161 --------------------------------------------------------------------------------- . estat vce, corr Correlation matrix of coefficients of regress model | c.hs100# 1. 1. 1. e(V) | hs100 c.hs100 parity_1 aut_calv twin dyst rp vag_di~h -------------+-------------------------------------------------------------------------------- hs100 | 1.0000 c.hs100#| c.hs100 | -0.9907 1.0000 parity_1 | 0.0426 -0.0468 1.0000 1.aut_calv | 0.0349 -0.0265 -0.0500 1.0000 twin | -0.0435 0.0464 -0.0666 -0.0487 1.0000 dyst | -0.0579 0.0717 0.1303 0.0023 -0.0214 1.0000 1.rp | -0.0527 0.0540 0.0318 0.0438 -0.0807 -0.0656 1.0000 1.vag_disch | -0.0680 0.0698 0.0649 0.0387 -0.0383 -0.1188 0.0742 1.0000 1.rp#| 1.vag_disch | 0.0566 -0.0571 -0.0644 -0.0024 -0.0429 0.0861 -0.3873 -0.5752 _cons | -0.9772 0.9458 -0.1201 -0.1090 0.0417 0.0132 0.0235 0.0427 | 1.rp# e(V) | 1.vag_~h _cons -------------+-------------------- 1.rp#| 1.vag_disch | 1.0000 _cons | -0.0410 1.0000 . *centre hs100 . summ hs100 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- hs100 | 1574 2.510076 .6201691 1.25 3.33 . gen hs100_ctr=hs100-r(mean) . . reg wpc c.hs100_ctr##c.hs100_ctr parity_1 i.aut_calv i.twin i.dyst i.rp##vag_disch , vsquish Source | SS df MS Number of obs = 1574 -------------+------------------------------ F( 9, 1564) = 13.22 Model | 296062.682 9 32895.8535 Prob > F = 0.0000 Residual | 3892027.88 1564 2488.50887 R-squared = 0.0707 -------------+------------------------------ Adj R-squared = 0.0653 Total | 4188090.56 1573 2662.48605 Root MSE = 49.885 ----------------------------------------------------------------------------------------- wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------------------------+---------------------------------------------------------------- hs100_ctr | 19.85878 2.163552 9.18 0.000 15.61501 24.10255 c.hs100_ctr#c.hs100_ctr | 11.13827 3.111145 3.58 0.000 5.035818 17.24073 parity_1 | 1.13721 .8583103 1.32 0.185 -.54635 2.82077 1.aut_calv | -8.263839 2.537751 -3.26 0.001 -13.24159 -3.286086 twin | yes | 20.68314 9.845165 2.10 0.036 1.37203 39.99425 dyst | yes | 11.70041 5.462576 2.14 0.032 .9856659 22.41516 rp | yes | 5.98687 4.811976 1.24 0.214 -3.451734 15.42547 vag_disch | yes | 1.228195 7.161395 0.17 0.864 -12.81875 15.27514 rp#vag_disch | yes#yes | 22.85194 12.51605 1.83 0.068 -1.698056 47.40194 _cons | 64.3318 2.634086 24.42 0.000 59.16509 69.49851 ----------------------------------------------------------------------------------------- . estat vce, corr Correlation matrix of coefficients of regress model | c.hs10~r# 1. 1. 1. 1. 1. e(V) | hs100_~r c.hs10~r parity_1 aut_calv twin dyst rp vag_di~h -------------+-------------------------------------------------------------------------------- hs100_ctr | 1.0000 c.hs100_ctr#| c.hs100_ctr | 0.3271 1.0000 parity_1 | -0.0415 -0.0468 1.0000 1.aut_calv | 0.0521 -0.0265 -0.0500 1.0000 1.twin | 0.0326 0.0464 -0.0666 -0.0487 1.0000 1.dyst | 0.1146 0.0717 0.1303 0.0023 -0.0214 1.0000 1.rp | 0.0230 0.0540 0.0318 0.0438 -0.0807 -0.0656 1.0000 1.vag_disch | 0.0305 0.0698 0.0649 0.0387 -0.0383 -0.1188 0.0742 1.0000 1.rp#| 1.vag_disch | -0.0184 -0.0571 -0.0644 -0.0024 -0.0429 0.0861 -0.3873 -0.5752 _cons | -0.1716 -0.4415 -0.5404 -0.4246 0.0004 -0.2091 -0.1974 -0.1711 | 1.rp# e(V) | 1.vag_~h _cons -------------+-------------------- 1.rp#| 1.vag_disch | 1.0000 _cons | 0.1133 1.0000 . . *Plot to show herd size effect . margins i.rp, at(hs100=(-1.26 -0.66 -0.5 -0.16 0.12 0.43 0.82) vag_disch=0 twin=0 aut_calv=1) Predictive margins Number of obs = 1574 Model VCE : OLS Expression : Linear prediction, predict() 1._at : hs100_ctr = -1.26 aut_calv = 1 twin = 0 vag_disch = 0 2._at : hs100_ctr = -.66 aut_calv = 1 twin = 0 vag_disch = 0 3._at : hs100_ctr = -.5 aut_calv = 1 twin = 0 vag_disch = 0 4._at : hs100_ctr = -.16 aut_calv = 1 twin = 0 vag_disch = 0 5._at : hs100_ctr = .12 aut_calv = 1 twin = 0 vag_disch = 0 6._at : hs100_ctr = .43 aut_calv = 1 twin = 0 vag_disch = 0 7._at : hs100_ctr = .82 aut_calv = 1 twin = 0 vag_disch = 0 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at#rp | 1#no | 51.39514 4.179809 12.30 0.000 43.19652 59.59376 1#yes | 57.38201 6.237152 9.20 0.000 45.14794 69.61607 2#no | 50.47911 2.305193 21.90 0.000 45.95752 55.00071 2#yes | 56.46598 5.017018 11.25 0.000 46.62519 66.30677 3#no | 51.58925 2.272508 22.70 0.000 47.13177 56.04674 3#yes | 57.57612 4.979707 11.56 0.000 47.80852 67.34373 4#no | 55.84181 2.307135 24.20 0.000 51.31641 60.36721 4#yes | 61.82868 4.975561 12.43 0.000 52.06921 71.58815 5#no | 61.27752 2.227896 27.50 0.000 56.90754 65.6475 5#yes | 67.26439 4.951049 13.59 0.000 57.553 76.97578 6#no | 69.33282 2.148793 32.27 0.000 65.118 73.54764 6#yes | 75.31969 4.95891 15.19 0.000 65.59287 85.0465 7#no | 82.50765 2.927494 28.18 0.000 76.76542 88.24987 7#yes | 88.49452 5.433098 16.29 0.000 77.83759 99.15144 ------------------------------------------------------------------------------ . marginsplot, noci xlabel(-1.26 "125" -0.66 "185" -0.5 "201" -0.16 "235" 0.12 "263" 0.43 "294" 0.82 "333") /// > xtitle("Herd Size") ytitle("Days") title("Preditive WPC by RP") Variables that uniquely identify margins: hs100_ctr rp . . * dyst conf for rp . reg wpc c.hs100_ctr##c.hs100_ctr parity_1 i.aut_calv i.twin i.rp Source | SS df MS Number of obs = 1574 -------------+------------------------------ F( 6, 1567) = 18.07 Model | 270961.954 6 45160.3256 Prob > F = 0.0000 Residual | 3917128.61 1567 2499.76299 R-squared = 0.0647 -------------+------------------------------ Adj R-squared = 0.0611 Total | 4188090.56 1573 2662.48605 Root MSE = 49.998 ----------------------------------------------------------------------------------------- wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------------------------+---------------------------------------------------------------- hs100_ctr | 19.30525 2.151998 8.97 0.000 15.08415 23.52635 | c.hs100_ctr#c.hs100_ctr | 10.71722 3.099622 3.46 0.001 4.637372 16.79706 | parity_1 | .9394592 .8495005 1.11 0.269 -.7268182 2.605737 1.aut_calv | -8.459954 2.540778 -3.33 0.001 -13.44364 -3.476272 | twin | yes | 23.09719 9.826032 2.35 0.019 3.823631 42.37074 | rp | yes | 11.18504 4.35328 2.57 0.010 2.646174 19.72391 _cons | 65.59313 2.527825 25.95 0.000 60.63485 70.5514 ----------------------------------------------------------------------------------------- . reg wpc c.hs100_ctr##c.hs100_ctr parity_1 i.aut_calv i.twin i.rp dyst Source | SS df MS Number of obs = 1574 -------------+------------------------------ F( 7, 1566) = 16.16 Model | 282216.669 7 40316.667 Prob > F = 0.0000 Residual | 3905873.89 1566 2494.17234 R-squared = 0.0674 -------------+------------------------------ Adj R-squared = 0.0632 Total | 4188090.56 1573 2662.48605 Root MSE = 49.942 ----------------------------------------------------------------------------------------- wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------------------------+---------------------------------------------------------------- hs100_ctr | 19.85311 2.165007 9.17 0.000 15.60649 24.09973 | c.hs100_ctr#c.hs100_ctr | 11.25323 3.10642 3.62 0.000 5.160047 17.34641 | parity_1 | 1.194524 .8570034 1.39 0.164 -.4864711 2.875519 1.aut_calv | -8.425575 2.537986 -3.32 0.001 -13.40378 -3.447366 | twin | yes | 22.58687 9.817977 2.30 0.022 3.3291 41.84463 | rp | yes | 10.69412 4.354546 2.46 0.014 2.152766 19.23548 dyst | 11.53206 5.428789 2.12 0.034 .8836013 22.18052 _cons | 64.29614 2.597768 24.75 0.000 59.20067 69.39161 -----------------------------------------------------------------------------------------