------------------------------------------------------------------------------------------------------------------------------------------------------------ name: log: C:\vhm812-data\l2b_linear_reg_diag.txt log type: text opened on: 14 Jan 2019, 16:34:35 . set more off . . * open the DAISY dataset . use daisy2red.dta, clear . * create variables . gen parity1=parity-1 . gen calv_mth=month(calv_dt) . gen aut_calv=(calv_mth>=2 & calv_mth<=7) if !missing(calv_mth) . label value aut_calv noyes . gen hs100_ctr=(herd_size-251)/100 . gen hs100_ctr_sq=hs100_ctr^2 . * save new dataset . save daisy2red_01, replace file daisy2red_01.dta saved . . *final model . reg wpc hs100_ctr hs100_ctr_sq parity1 i.aut_calv i.twin i.dyst i.rp##i.vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(9, 1564) = 13.22 Model | 296062.694 9 32895.8549 Prob > F = 0.0000 Residual | 3892027.86 1,564 2488.50886 R-squared = 0.0707 -------------+---------------------------------- Adj R-squared = 0.0653 Total | 4188090.56 1,573 2662.48605 Root MSE = 49.885 ------------------------------------------------------------------------------ wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- hs100_ctr | 19.85708 2.163397 9.18 0.000 15.61361 24.10054 hs100_ctr_sq | 11.13827 3.111145 3.58 0.000 5.035817 17.24073 parity1 | 1.13721 .8583103 1.32 0.185 -.5463501 2.82077 | aut_calv | yes | -8.263839 2.537751 -3.26 0.001 -13.24159 -3.286086 | twin | yes | 20.68314 9.845165 2.10 0.036 1.37203 39.99425 | dyst | yes | 11.70041 5.462576 2.14 0.032 .985666 22.41516 | rp | yes | 5.98687 4.811976 1.24 0.214 -3.451734 15.42547 | vag_disch | yes | 1.228196 7.161395 0.17 0.864 -12.81875 15.27514 | rp#vag_disch | yes#yes | 22.85194 12.51605 1.83 0.068 -1.698056 47.40194 | _cons | 64.33029 2.634114 24.42 0.000 59.16352 69.49705 ------------------------------------------------------------------------------ . *initial diagnostic . capture drop fit stdres . predict fit, xb . predict stdres, rstandard . . *homoskedasticity . scatter stdres fit . egen fitcat=cut(fit), at(0 50 55 60 65 70 75 80 85 90 95 200) icodes . tabstat stdres, s(mean sd count) by(fitcat) Summary for variables: stdres by categories of: fitcat fitcat | mean sd N ---------+------------------------------ 0 | -.0570064 .7397317 92 1 | -.0073181 .899948 179 2 | .0578738 .9474714 210 3 | -.0595596 .8973734 233 4 | .0241511 1.009195 240 5 | .1411393 1.186747 129 6 | -.0596733 1.042643 135 7 | -.0978648 1.090551 119 8 | .1050465 1.206279 93 9 | -.0443726 1.063637 107 10 | -.0275786 .9893797 37 ---------+------------------------------ Total | .0000153 1.000497 1574 ---------------------------------------- . estat hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of wpc chi2(1) = 20.58 Prob > chi2 = 0.0000 . estat imtest Cameron & Trivedi's decomposition of IM-test --------------------------------------------------- Source | chi2 df p ---------------------+----------------------------- Heteroskedasticity | 74.11 44 0.0030 Skewness | 143.84 9 0.0000 Kurtosis | 33.27 1 0.0000 ---------------------+----------------------------- Total | 251.22 54 0.0000 --------------------------------------------------- . . *normality . qnorm stdres . hist stdres, normal (bin=31, start=-1.5436347, width=.18891791) . summ stdres, d Standardized residuals ------------------------------------------------------------- Percentiles Smallest 1% -1.295559 -1.543635 5% -1.085226 -1.491268 10% -.9411177 -1.445124 Obs 1,574 25% -.7014489 -1.43959 Sum of Wgt. 1,574 50% -.3049904 Mean .0000153 Largest Std. Dev. 1.000497 75% .4302664 3.716394 90% 1.438834 3.910809 Variance 1.000993 95% 2.065976 4.020359 Skewness 1.371514 99% 3.24163 4.31282 Kurtosis 4.72451 . swilk stdres Shapiro-Wilk W test for normal data Variable | Obs W V z Prob>z -------------+------------------------------------------------------ stdres | 1,574 0.87871 115.660 11.977 0.00000 . . **transforming wpc . xi:boxcox wpc hs100_ctr hs100_ctr_sq parity1 i.aut_calv i.twin i.dyst i.rp*i.vag_disch i.aut_calv _Iaut_calv_0-1 (naturally coded; _Iaut_calv_0 omitted) i.twin _Itwin_0-1 (naturally coded; _Itwin_0 omitted) i.dyst _Idyst_0-1 (naturally coded; _Idyst_0 omitted) i.rp _Irp_0-1 (naturally coded; _Irp_0 omitted) i.vag_disch _Ivag_disch_0-1 (naturally coded; _Ivag_disch_0 omitted) i.rp*i.vag_di~h _IrpXvag_#_# (coded as above) Fitting comparison model Iteration 0: log likelihood = -8439.9903 Iteration 1: log likelihood = -8082.9461 Iteration 2: log likelihood = -8035.2171 Iteration 3: log likelihood = -8034.9512 Iteration 4: log likelihood = -8034.9511 Fitting full model Iteration 0: log likelihood = -8382.2918 Iteration 1: log likelihood = -8022.9541 Iteration 2: log likelihood = -7958.3552 Iteration 3: log likelihood = -7957.985 Iteration 4: log likelihood = -7957.9849 Number of obs = 1,574 LR chi2(9) = 153.93 Log likelihood = -7957.9849 Prob > chi2 = 0.000 ------------------------------------------------------------------------------ wpc | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- /theta | .1097595 .0270646 4.06 0.000 .0567138 .1628052 ------------------------------------------------------------------------------ Estimates of scale-variant parameters ---------------------------- | Coef. -------------+-------------- Notrans | hs100_ctr | .5236879 hs100_ctr_sq | .316867 parity1 | .0207726 _Iaut_calv_1 | -.2119782 _Itwin_1 | .5994942 _Idyst_1 | .1803076 _Irp_1 | .1697644 _Ivag_disc~1 | -.0333896 _IrpXvag_1_1 | .6352271 _cons | 4.90172 -------------+-------------- /sigma | 1.119778 ---------------------------- --------------------------------------------------------- Test Restricted LR statistic P-value H0: log likelihood chi2 Prob > chi2 --------------------------------------------------------- theta = -1 -9664.734 3413.50 0.000 theta = 0 -7966.6087 17.25 0.000 theta = 1 -8382.2918 848.61 0.000 --------------------------------------------------------- . gen wpc_ln=ln(wpc) . reg wpc_ln hs100_ctr hs100_ctr_sq parity1 i.aut_calv i.twin i.dyst i.rp##vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(9, 1564) = 18.05 Model | 86.9474063 9 9.66082293 Prob > F = 0.0000 Residual | 836.874538 1,564 .535086022 R-squared = 0.0941 -------------+---------------------------------- Adj R-squared = 0.0889 Total | 923.821945 1,573 .587299393 Root MSE = .7315 ------------------------------------------------------------------------------ wpc_ln | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- hs100_ctr | .343658 .0317233 10.83 0.000 .2814333 .4058827 hs100_ctr_sq | .2118728 .0456207 4.64 0.000 .1223885 .301357 parity1 | .0129844 .012586 1.03 0.302 -.0117028 .0376715 | aut_calv | yes | -.1371621 .0372127 -3.69 0.000 -.2101542 -.0641701 | twin | yes | .3927426 .1443661 2.72 0.007 .1095711 .6759141 | dyst | yes | .1109405 .0801013 1.39 0.166 -.0461768 .2680578 | rp | yes | .1123166 .0705612 1.59 0.112 -.0260878 .250721 | vag_disch | yes | -.0260135 .1050122 -0.25 0.804 -.231993 .1799661 | rp#vag_disch | yes#yes | .4136999 .183531 2.25 0.024 .0537072 .7736926 | _cons | 3.888587 .0386257 100.67 0.000 3.812823 3.96435 ------------------------------------------------------------------------------ . **model checking . capture drop fit stdres . predict fit, xb . predict stdres, rstandard . *homoskedasticity . estat hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of wpc_ln chi2(1) = 19.98 Prob > chi2 = 0.0000 . estat imtest Cameron & Trivedi's decomposition of IM-test --------------------------------------------------- Source | chi2 df p ---------------------+----------------------------- Heteroskedasticity | 48.63 44 0.2919 Skewness | 8.75 9 0.4605 Kurtosis | 0.52 1 0.4698 ---------------------+----------------------------- Total | 57.91 54 0.3332 --------------------------------------------------- . scatter stdres fit . *normality . qnorm stdres . hist stdres, normal (bin=31, start=-5.3996067, width=.25507099) . summ stdres, d Standardized residuals ------------------------------------------------------------- Percentiles Smallest 1% -2.211829 -5.399607 5% -1.564966 -4.989504 10% -1.242075 -3.61681 Obs 1,574 25% -.7143057 -3.316 Sum of Wgt. 1,574 50% -.0133218 Mean .000018 Largest Std. Dev. 1.000267 75% .7199247 2.358784 90% 1.333008 2.373729 Variance 1.000533 95% 1.628151 2.452677 Skewness -.1738834 99% 2.15899 2.507594 Kurtosis 3.382854 . swilk stdres Shapiro-Wilk W test for normal data Variable | Obs W V z Prob>z -------------+------------------------------------------------------ stdres | 1,574 0.98974 9.787 5.751 0.00000 . * check for collinearity issues . estat vif Variable | VIF 1/VIF -------------+---------------------- hs100_ctr | 1.14 0.878856 hs100_ctr_sq | 1.14 0.879714 parity1 | 1.04 0.962306 1.aut_calv | 1.02 0.985045 1.twin | 1.03 0.967484 1.dyst | 1.06 0.943538 1.rp | 1.26 0.796702 1.vag_disch | 1.60 0.624261 rp#vag_disch | 1 1 | 1.85 0.539810 -------------+---------------------- Mean VIF | 1.24 . estat vce, corr Correlation matrix of coefficients of regress model | 1. 1. 1. 1. 1. 1.rp# e(V) | hs100_~r hs100_~q parity1 aut_calv twin dyst rp vag_di~h 1.vag_~h _cons -------------+---------------------------------------------------------------------------------------------------- hs100_ctr | 1.0000 hs100_ctr_sq | 0.3269 1.0000 parity1 | -0.0415 -0.0468 1.0000 1.aut_calv | 0.0521 -0.0265 -0.0500 1.0000 1.twin | 0.0326 0.0464 -0.0666 -0.0487 1.0000 1.dyst | 0.1146 0.0717 0.1303 0.0023 -0.0214 1.0000 1.rp | 0.0230 0.0540 0.0318 0.0438 -0.0807 -0.0656 1.0000 1.vag_disch | 0.0304 0.0698 0.0649 0.0387 -0.0383 -0.1188 0.0742 1.0000 1.rp#| 1.vag_disch | -0.0184 -0.0571 -0.0644 -0.0024 -0.0429 0.0861 -0.3873 -0.5752 1.0000 _cons | -0.1716 -0.4415 -0.5404 -0.4246 0.0004 -0.2091 -0.1974 -0.1711 0.1133 1.0000 . . **linearity . * check modelling of herd_size and parity . lowess stdres herd_size . lowess stdres parity . * looking at herd_size without herd_size in the model . reg wpc_ln parity1 i.aut_calv i.twin i.dyst i.rp##vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(7, 1566) = 5.82 Model | 23.4240047 7 3.34628639 Prob > F = 0.0000 Residual | 900.39794 1,566 .574966756 R-squared = 0.0254 -------------+---------------------------------- Adj R-squared = 0.0210 Total | 923.821945 1,573 .587299393 Root MSE = .75827 ------------------------------------------------------------------------------ wpc_ln | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity1 | .0191623 .0130272 1.47 0.142 -.0063904 .0447149 | aut_calv | yes | -.156154 .0384813 -4.06 0.000 -.2316343 -.0806737 | twin | yes | .3353984 .1494624 2.24 0.025 .0422309 .6285659 | dyst | yes | .0081531 .0824313 0.10 0.921 -.1535343 .1698404 | rp | yes | .0906877 .0730356 1.24 0.215 -.0525701 .2339455 | vag_disch | yes | -.0684004 .1085865 -0.63 0.529 -.2813906 .1445898 | rp#vag_disch | yes#yes | .4618901 .1899367 2.43 0.015 .089333 .8344472 | _cons | 3.978786 .0359075 110.81 0.000 3.908354 4.049218 ------------------------------------------------------------------------------ . predict stdres1, rstandard . lowess stdres1 herd_size . * looking at parity without parity in the model . reg wpc_ln hs100_ctr hs100_ctr_sq i.aut_calv i.twin i.dyst i.rp##vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(8, 1565) = 20.18 Model | 86.3779092 8 10.7972387 Prob > F = 0.0000 Residual | 837.444036 1,565 .53510801 R-squared = 0.0935 -------------+---------------------------------- Adj R-squared = 0.0889 Total | 923.821945 1,573 .587299393 Root MSE = .73151 ------------------------------------------------------------------------------ wpc_ln | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- hs100_ctr | .3450169 .0316966 10.88 0.000 .2828447 .4071892 hs100_ctr_sq | .2140753 .0455717 4.70 0.000 .1246873 .3034633 | aut_calv | yes | -.1352417 .0371669 -3.64 0.000 -.2081438 -.0623395 | twin | yes | .4026686 .1440481 2.80 0.005 .1201211 .6852162 | dyst | yes | .1001739 .0794202 1.26 0.207 -.0556073 .2559551 | rp | yes | .1099998 .0705269 1.56 0.119 -.0283373 .2483368 | vag_disch | yes | -.0330431 .1047931 -0.32 0.753 -.2385927 .1725065 | rp#vag_disch | yes#yes | .4258954 .1831536 2.33 0.020 .066643 .7851478 | _cons | 3.910122 .0324999 120.31 0.000 3.846374 3.97387 ------------------------------------------------------------------------------ . predict stdres2, rstandard . lowess stdres2 parity1 . . **estimates and CIs backtransformed from ln-scale . reg wpc_ln hs100_ctr hs100_ctr_sq parity1 i.aut_calv i.twin i.dyst i.rp##i.vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(9, 1564) = 18.05 Model | 86.9474063 9 9.66082293 Prob > F = 0.0000 Residual | 836.874538 1,564 .535086022 R-squared = 0.0941 -------------+---------------------------------- Adj R-squared = 0.0889 Total | 923.821945 1,573 .587299393 Root MSE = .7315 ------------------------------------------------------------------------------ wpc_ln | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- hs100_ctr | .343658 .0317233 10.83 0.000 .2814333 .4058827 hs100_ctr_sq | .2118728 .0456207 4.64 0.000 .1223885 .301357 parity1 | .0129844 .012586 1.03 0.302 -.0117028 .0376715 | aut_calv | yes | -.1371621 .0372127 -3.69 0.000 -.2101542 -.0641701 | twin | yes | .3927426 .1443661 2.72 0.007 .1095711 .6759141 | dyst | yes | .1109405 .0801013 1.39 0.166 -.0461768 .2680578 | rp | yes | .1123166 .0705612 1.59 0.112 -.0260878 .250721 | vag_disch | yes | -.0260135 .1050122 -0.25 0.804 -.231993 .1799661 | rp#vag_disch | yes#yes | .4136999 .183531 2.25 0.024 .0537072 .7736926 | _cons | 3.888587 .0386257 100.67 0.000 3.812823 3.96435 ------------------------------------------------------------------------------ . *prediction for cows from avg herd size herds with dyst, parity=3, . *calv in fall, single calving . margins , over(rp vag_disch) at(hs100_ctr=0 hs100_ctr_sq=0 dyst=1 parity1=2 aut_calv=1 twin=0) expression(exp(predict(xb))) Predictive margins Number of obs = 1,574 Model VCE : OLS Expression : exp(predict(xb)) over : rp vag_disch at : 0.rp#0.vag_disch hs100_ctr = 0 hs100_ctr_sq = 0 parity1 = 2 aut_calv = 1 twin = 0 dyst = 1 0.rp#1.vag_disch hs100_ctr = 0 hs100_ctr_sq = 0 parity1 = 2 aut_calv = 1 twin = 0 dyst = 1 1.rp#0.vag_disch hs100_ctr = 0 hs100_ctr_sq = 0 parity1 = 2 aut_calv = 1 twin = 0 dyst = 1 1.rp#1.vag_disch hs100_ctr = 0 hs100_ctr_sq = 0 parity1 = 2 aut_calv = 1 twin = 0 dyst = 1 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- rp#vag_disch | no#no | 48.82946 4.028419 12.12 0.000 40.9339 56.72502 no#yes | 47.57561 5.844969 8.14 0.000 36.11969 59.03154 yes#no | 54.63367 5.547506 9.85 0.000 43.76076 65.50658 yes#yes | 80.50641 12.65005 6.36 0.000 55.71278 105.3001 ------------------------------------------------------------------------------ . marginsplot, noci Variables that uniquely identify margins: rp vag_disch . . **residuals and diagnostics . reg wpc_ln hs100_ctr hs100_ctr_sq parity1 aut_calv twin i.dyst rp##vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(9, 1564) = 18.05 Model | 86.9474063 9 9.66082293 Prob > F = 0.0000 Residual | 836.874538 1,564 .535086022 R-squared = 0.0941 -------------+---------------------------------- Adj R-squared = 0.0889 Total | 923.821945 1,573 .587299393 Root MSE = .7315 ------------------------------------------------------------------------------ wpc_ln | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- hs100_ctr | .343658 .0317233 10.83 0.000 .2814333 .4058827 hs100_ctr_sq | .2118728 .0456207 4.64 0.000 .1223885 .301357 parity1 | .0129844 .012586 1.03 0.302 -.0117028 .0376715 aut_calv | -.1371621 .0372127 -3.69 0.000 -.2101542 -.0641701 twin | .3927426 .1443661 2.72 0.007 .1095711 .6759141 | dyst | yes | .1109405 .0801013 1.39 0.166 -.0461768 .2680578 | rp | yes | .1123166 .0705612 1.59 0.112 -.0260878 .250721 | vag_disch | yes | -.0260135 .1050122 -0.25 0.804 -.231993 .1799661 | rp#vag_disch | yes#yes | .4136999 .183531 2.25 0.024 .0537072 .7736926 | _cons | 3.888587 .0386257 100.67 0.000 3.812823 3.96435 ------------------------------------------------------------------------------ . capture drop fit stdres /* adjust if not defined */ . capture drop delres - dfb_int . predict fit, xb . predict stdres, rstandard . predict delres, rstudent . predict lev, lev . predict cook, cooksd . predict dfit, dfit . predict dfb_hs2, dfbeta(hs100_ctr_sq) . predict dfb_dyst, dfbeta(1.dyst) . predict dfb_rp, dfbeta(1.rp) . predict dfb_vd, dfbeta(1.vag_disch) . predict dfb_int, dfbeta(1.rp#1.vag_disch) . scalar nobs=1574 . scalar nparam=10 . . ** formatting several variables at once to show only 3 decimals - advanced command . foreach var in wpc_ln fit stdres lev cook dfit dfb_hs2 dfb_dyst dfb_rp dfb_vd dfb_int { 2. format `var' %5.3f 3. } . . * examine outliers . * number of standardized residuals outside -2/+2 and -3/+3 . * expect 5% (n=79) outside -2,+2 and 0.3% (n=14) outside -3,+3 . count if abs(stdres)>2 /* n=50 */ 50 . count if abs(stdres)>3 /* n=6, so one more very extreme residual values than expected */ 6 . sum delres, d Studentized residuals ------------------------------------------------------------- Percentiles Smallest 1% -2.214588 -5.448908 5% -1.565692 -5.028087 10% -1.242291 -3.630869 Obs 1,574 25% -.7141938 -3.326655 Sum of Wgt. 1,574 50% -.0133175 Mean -.0000384 Largest Std. Dev. 1.001035 75% .7198138 2.362236 90% 1.33334 2.377255 Variance 1.002071 95% 1.629012 2.456621 Skewness -.1785964 99% 2.161524 2.511847 Kurtosis 3.415061 . * outlier test based on deletion residuals . display 2*nobs*ttail(nobs-nparam-1, 5.02) /* P=0.0009 so S */ .0009064 . display invttail(nobs-nparam-1,.025/nobs) /* values >= abs(4.17) outliers*/ 4.1726361 . br if abs(delres) >=4 . **there are two observations with extreme residual values (outliers) . ** two cows with 1 days interval from the end of wp to conception . ** cows 2272 (herd 106) and cow 1032 (herd 4) . . * browse most extreme residuals . sort stdres . br wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres delres in 1/10 /* most extreme negative residuals */ . br wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres delres in -10/-1 /* most extreme positive residuals */ . . sort delres . br wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres delres in 1/5 /* most extreme negative residuals */ . br wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres delres in -5/-1 /* most extreme positive residuals */ . . . * leverage and influence diagnostics . * leverage . summ lev ,d Leverage ------------------------------------------------------------- Percentiles Smallest 1% .0016636 .0016636 5% .0018854 .0016636 10% .0020899 .0016636 Obs 1,574 25% .0023828 .0016636 Sum of Wgt. 1,574 50% .0031862 Mean .0063532 Largest Std. Dev. .0083435 75% .007048 .0633344 90% .0124777 .0636456 Variance .0000696 95% .0214789 .0637559 Skewness 3.50352 99% .0444552 .0642237 Kurtosis 17.16434 . di "leverage cutoff: " 2*nparam/nobs leverage cutoff: .01270648 . di "conservative leverage cutoff: " 3*nparam/nobs conservative leverage cutoff: .01905972 . * large leverage values . count if lev>=.01905972 /* many (n=108) high leverage values */ 108 . gsort -lev . list wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres lev in 1/10, clean noobs wpc wpc_ln fit aut_calv herd_s~e parity1 twin dyst rp vag_di~h stdres lev 76 4.331 4.699 no 185 4 yes no yes yes -0.520 0.064 45 3.807 4.577 yes 201 4 yes no yes yes -1.089 0.064 137 4.920 4.994 no 294 2 yes no yes yes -0.105 0.064 94 4.543 4.865 no 263 3 yes no yes yes -0.454 0.063 53 3.970 4.227 yes 263 5 yes no no yes -0.362 0.059 110 4.700 4.162 yes 263 0 yes no no yes 0.758 0.058 32 3.466 4.262 yes 201 1 yes yes yes no -1.117 0.052 99 4.595 4.592 no 294 1 yes yes no no 0.004 0.051 23 3.135 4.121 yes 185 0 yes yes no no -1.381 0.050 40 3.689 4.407 yes 263 0 no yes yes yes -1.005 0.046 . table twin rp dyst ---------------------------------------- | Dystocia at calving and | Retained placenta at calving Twins | ---- no ---- ---- yes --- born | no yes no yes ----------+----------------------------- no | 1,332 124 76 15 yes | 15 9 2 1 ---------------------------------------- . . * large Cook's D values . summ cook,d Cook's D ------------------------------------------------------------- Percentiles Smallest 1% 8.65e-08 5.57e-10 5% 1.44e-06 5.86e-09 10% 6.87e-06 1.08e-08 Obs 1,574 25% .0000447 1.16e-08 Sum of Wgt. 1,574 50% .000199 Mean .0006351 Largest Std. Dev. .0013679 75% .0006061 .0120243 90% .0014951 .0133958 Variance 1.87e-06 95% .0029054 .0141347 Skewness 5.314337 99% .0068727 .0174304 Kurtosis 42.05991 . di "Cook's D cutoff: " 4/nobs Cook's D cutoff: .0025413 . count if cook>=.0025413 /* many (n=92) high Cook's D values */ 92 . gsort -cook /*gsort is similar to sort but with more options, here I sort the variable cook in descending order*/ . list wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres lev cook in 1/10, clean noobs /* most extreme Cook's D values */ wpc wpc_ln fit aut_calv herd_s~e parity1 twin dyst rp vag_di~h stdres lev cook 14 2.639 4.121 yes 235 2 yes no no no -2.066 0.039 0.017 11 2.398 4.031 no 263 1 no yes no yes -2.263 0.027 0.014 205 5.323 3.766 no 125 0 no no no yes 2.159 0.028 0.013 11 2.398 3.881 yes 263 0 no yes no yes -2.056 0.028 0.012 38 3.638 4.758 no 333 4 yes no no no -1.565 0.043 0.011 25 3.219 4.446 no 263 1 no no yes yes -1.708 0.035 0.011 23 3.135 4.121 yes 185 0 yes yes no no -1.381 0.050 0.010 229 5.434 3.925 yes 294 1 no no no yes 2.084 0.021 0.009 213 5.361 4.254 no 185 0 no no yes yes 1.542 0.036 0.009 45 3.807 4.577 yes 201 4 yes no yes yes -1.089 0.064 0.008 . scatter cook lev, xline(0.019) yline(0.0025) . . * large DFIT values . summ dfit, d Dfits ------------------------------------------------------------- Percentiles Smallest 1% -.2314434 -.4179343 5% -.1251009 -.3764578 10% -.0824913 -.3471186 Obs 1,574 25% -.0435816 -.3314645 Sum of Wgt. 1,574 50% -.000781 Mean .0003931 Largest Std. Dev. .0797594 75% .0452949 .2795224 90% .0887097 .298769 Variance .0063616 95% .120787 .3084654 Skewness -.1148466 99% .2251135 .3664328 Kurtosis 5.645826 . di "DFITS cutoff: " 2*sqrt(nparam/nobs)*(nobs>=120)+1*(nobs<120) DFITS cutoff: .15941443 . count if abs(dfit)>=.15941443 /* many (n=92) high DFITS values */ 92 . sort dfit . list wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres lev cook dfit in 1/10 , clean/* most extreme negative DFITS values */ wpc wpc_ln fit aut_calv herd_s~e parity1 twin dyst rp vag_di~h stdres lev cook dfit 1. 14 2.639 4.121 yes 235 2 yes no no no -2.066 0.039 0.017 -0.418 2. 11 2.398 4.031 no 263 1 no yes no yes -2.263 0.027 0.014 -0.376 3. 11 2.398 3.881 yes 263 0 no yes no yes -2.056 0.028 0.012 -0.347 4. 38 3.638 4.758 no 333 4 yes no no no -1.565 0.043 0.011 -0.331 5. 25 3.219 4.446 no 263 1 no no yes yes -1.708 0.035 0.011 -0.325 6. 23 3.135 4.121 yes 185 0 yes yes no no -1.381 0.050 0.010 -0.316 7. 45 3.807 4.577 yes 201 4 yes no yes yes -1.089 0.064 0.008 -0.284 8. 31 3.434 4.485 no 263 4 no no yes yes -1.463 0.036 0.008 -0.283 9. 14 2.639 3.839 no 185 0 no yes no yes -1.663 0.027 0.008 -0.277 10. 32 3.466 4.262 yes 201 1 yes yes yes no -1.117 0.052 0.007 -0.262 . list wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres lev cook dfit in -10/-1, clean /* most extreme positive DFITS values */ wpc wpc_ln fit aut_calv herd_s~e parity1 twin dyst rp vag_di~h stdres lev cook dfit 1565. 218 5.384 3.728 yes 185 0 no yes no no 2.279 0.012 0.007 0.257 1566. 170 5.136 4.018 no 263 0 no yes no yes 1.549 0.027 0.007 0.257 1567. 205 5.323 4.050 no 294 0 no no no yes 1.759 0.021 0.007 0.258 1568. 253 5.533 3.920 no 201 3 no yes no no 2.221 0.013 0.007 0.259 1569. 149 5.004 3.920 yes 263 3 no yes no yes 1.505 0.030 0.007 0.263 1570. 240 5.481 3.767 yes 185 3 no yes no no 2.359 0.013 0.008 0.275 1571. 232 5.447 3.795 yes 201 4 no yes no no 2.274 0.015 0.008 0.280 1572. 213 5.361 4.254 no 185 0 no no yes yes 1.542 0.036 0.009 0.299 1573. 229 5.434 3.925 yes 294 1 no no no yes 2.084 0.021 0.009 0.308 1574. 205 5.323 3.766 no 125 0 no no no yes 2.159 0.028 0.013 0.366 . . * dfbeta diagnostics (for a subset of predictors) . di "DFBETA cutoff: " 2/sqrt(nobs)*(nobs>=120)+1*(nobs<120) DFBETA cutoff: .05041127 . count if abs(dfb_dyst)>.05041127 /* n=74 */ 74 . sum dfb_hs2 dfb_dyst dfb_rp dfb_vd dfb_int //only for hsize2 since hsize will be quite collinear and difficult to interpret// Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- dfb_hs2 | 1,574 9.10e-08 .0239995 -.1043 .1487626 dfb_dyst | 1,574 -.0000162 .0299013 -.2078438 .2472617 dfb_rp | 1,574 4.82e-07 .0254968 -.1650124 .1895015 dfb_vd | 1,574 2.40e-06 .029381 -.2821976 .3064255 dfb_int | 1,574 -7.50e-06 .0236208 -.2360168 .2135575 . sort dfb_dyst . br wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres delres lev cook dfit dfb_dyst in 1/10 /*, clean most extreme negative > dfb_dyst values */ . list wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres delres lev cook dfit dfb_dyst in -10/-1 /* most extreme positive dfb_ > dyst values */ +-------------------------------------------------------------------------------------------------------------------------------------------+ | wpc wpc_ln fit aut_calv herd_s~e parity1 twin dyst rp vag_di~h stdres delres lev cook dfit dfb_dyst | |-------------------------------------------------------------------------------------------------------------------------------------------| 1565. | 184 5.215 3.950 no 235 0 no yes no no 1.740 1.741162 0.012 0.004 0.194 0.171 | 1566. | 169 5.130 3.946 yes 263 3 no yes no no 1.630 1.631035 0.014 0.004 0.192 0.174 | 1567. | 153 5.030 3.728 yes 185 0 no yes no no 1.792 1.793212 0.012 0.004 0.202 0.175 | 1568. | 178 5.182 3.933 no 201 4 no yes no no 1.721 1.721754 0.015 0.005 0.213 0.184 | 1569. | 178 5.182 3.839 yes 235 2 no yes no no 1.848 1.849122 0.013 0.004 0.209 0.191 | |-------------------------------------------------------------------------------------------------------------------------------------------| 1570. | 218 5.384 3.728 yes 185 0 no yes no no 2.279 2.282074 0.012 0.007 0.257 0.223 | 1571. | 214 5.366 3.741 yes 185 1 no yes no no 2.235 2.238212 0.012 0.006 0.249 0.224 | 1572. | 253 5.533 3.920 no 201 3 no yes no no 2.221 2.223887 0.013 0.007 0.259 0.232 | 1573. | 232 5.447 3.795 yes 201 4 no yes no no 2.274 2.277405 0.015 0.008 0.280 0.244 | 1574. | 240 5.481 3.767 yes 185 3 no yes no no 2.359 2.362236 0.013 0.008 0.275 0.247 | +-------------------------------------------------------------------------------------------------------------------------------------------+ . graph box dfb_hs2 dfb_dyst dfb_rp dfb_vd dfb_int, yline(-0.05 0.05) . . * same approach for other DFBETAs . * dfbeta diagnostic for hsize2 . sort dfb_hs2 . list wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres delres lev cook dfit dfb_hs2 in 1/10, clean /* most extreme negative dfb_dyst > values */ wpc wpc_ln fit aut_calv herd_s~e parity1 twin dyst rp vag_di~h stdres delres lev cook dfit dfb_hs2 1. 12 2.485 3.792 no 125 0 no no no no -1.794 -1.795646 0.008 0.003 -0.165 -0.104 2. 12 2.485 3.792 no 125 0 no no no no -1.794 -1.795646 0.008 0.003 -0.165 -0.104 3. 12 2.485 3.668 yes 125 1 no no no no -1.623 -1.623839 0.007 0.002 -0.139 -0.091 4. 15 2.708 3.818 no 125 2 no no no no -1.523 -1.523598 0.007 0.002 -0.132 -0.086 5. 13 2.565 3.681 yes 125 2 no no no no -1.531 -1.531436 0.007 0.002 -0.129 -0.084 6. 13 2.565 3.707 yes 125 4 no no no no -1.567 -1.568122 0.008 0.002 -0.143 -0.084 7. 16 2.773 3.792 no 125 0 no no no no -1.399 -1.399855 0.008 0.002 -0.129 -0.081 8. 14 2.639 3.655 yes 125 0 no no no no -1.394 -1.394688 0.008 0.002 -0.127 -0.079 9. 17 2.833 3.805 no 125 1 no no no no -1.334 -1.333839 0.008 0.001 -0.117 -0.076 10. 22 3.091 3.955 no 125 4 no yes no no -1.193 -1.19337 0.021 0.003 -0.173 -0.075 . list wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres delres lev cook dfit dfb_hs2 in -10/-1, clean /* most extreme positive dfb_dy > st values */ wpc wpc_ln fit aut_calv herd_s~e parity1 twin dyst rp vag_di~h stdres delres lev cook dfit dfb_hs2 1565. 130 4.868 3.681 yes 125 2 no no no no 1.628 1.629012 0.007 0.002 0.137 0.090 1566. 126 4.836 3.655 yes 125 0 no no no no 1.622 1.622661 0.008 0.002 0.147 0.092 1567. 127 4.844 3.655 yes 125 0 no no no no 1.633 1.633537 0.008 0.002 0.148 0.093 1568. 138 4.927 3.668 yes 125 1 no no no no 1.728 1.729221 0.007 0.002 0.148 0.097 1569. 3 1.099 3.741 yes 235 3 no no no no -3.617 -3.630869 0.003 0.003 -0.186 0.101 1570. 1 0.000 3.946 no 263 1 no no no no -5.400 -5.448908 0.002 0.006 -0.243 0.117 1571. 208 5.338 3.805 no 125 1 no no no no 2.103 2.105534 0.008 0.003 0.185 0.121 1572. 253 5.533 3.707 yes 125 4 no no no no 2.508 2.511847 0.008 0.005 0.230 0.134 1573. 234 5.455 3.668 yes 125 1 no no no no 2.453 2.456621 0.007 0.004 0.211 0.137 1574. 205 5.323 3.766 no 125 0 no no no yes 2.159 2.161524 0.028 0.013 0.366 0.149 . * dfbeta diagnostic for hsize . * ideally we would need to create two terms that are independent in order to assess dfbeta for hsize and hsize2 . * I am not going to do this here, instead I will fit a model with only hsize and look at the dfbeta for the linear term . regress wpc_ln aut_calv hs100_ctr parity1 twin dyst rp##vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(8, 1565) = 17.39 Model | 75.4062567 8 9.42578209 Prob > F = 0.0000 Residual | 848.415688 1,565 .54211865 R-squared = 0.0816 -------------+---------------------------------- Adj R-squared = 0.0769 Total | 923.821945 1,573 .587299393 Root MSE = .73629 ------------------------------------------------------------------------------ wpc_ln | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- aut_calv | -.1325884 .0374433 -3.54 0.000 -.2060328 -.059144 hs100_ctr | .2955003 .0301771 9.79 0.000 .2363085 .3546921 parity1 | .0157198 .0126545 1.24 0.214 -.0091018 .0405414 twin | .3616024 .1451549 2.49 0.013 .0768838 .6463209 dyst | .0842735 .0804186 1.05 0.295 -.073466 .242013 | rp | yes | .094625 .0709198 1.33 0.182 -.0444827 .2337327 | vag_disch | yes | -.0600363 .1054425 -0.57 0.569 -.2668598 .1467872 | rp#vag_disch | yes#yes | .4623984 .1844314 2.51 0.012 .1006398 .824157 | _cons | 3.967781 .0348848 113.74 0.000 3.899356 4.036207 ------------------------------------------------------------------------------ . predict dfb_hs, dfbeta(hs100_ctr) . summ dfb_hs Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- dfb_hs | 1,574 -.0000113 .0239101 -.1423404 .0976687 . sort dfb_hs . list wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres lev cook dfit dfb_hs in 1/10, clean /* most extreme negative dfb_dyst values > */ wpc wpc_ln fit aut_calv herd_s~e parity1 twin dyst rp vag_di~h stdres lev cook dfit dfb_hs 1. 253 5.533 3.707 yes 125 4 no no no no 2.508 0.008 0.005 0.230 -.1423404 2. 234 5.455 3.668 yes 125 1 no no no no 2.453 0.007 0.004 0.211 -.1360926 3. 205 5.323 3.766 no 125 0 no no no yes 2.159 0.028 0.013 0.366 -.1294927 4. 208 5.338 3.805 no 125 1 no no no no 2.103 0.008 0.003 0.185 -.1265206 5. 138 4.927 3.668 yes 125 1 no no no no 1.728 0.007 0.002 0.148 -.0996296 6. 139 4.934 3.805 no 125 1 no no no no 1.550 0.008 0.002 0.136 -.0969104 7. 130 4.868 3.681 yes 125 2 no no no no 1.628 0.007 0.002 0.137 -.0953079 8. 127 4.844 3.655 yes 125 0 no no no no 1.633 0.008 0.002 0.148 -.0941594 9. 126 4.836 3.655 yes 125 0 no no no no 1.622 0.008 0.002 0.147 -.0936193 10. 108 4.682 3.668 yes 125 1 no no no no 1.392 0.007 0.001 0.119 -.0827399 . browse wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres lev cook dfit dfb_hs in -10/-1 /* most extreme positive dfb_dyst values */ . . *analysis outliers observations . list wpc wpc_ln fit herd_size parity dyst rp vag_disch if wpc==1, clean compress noobs wpc wpc~n fit her~e par~y dyst rp vag~h 1 0.000 3.946 263 2 no no no 1 0.000 3.646 201 2 no no no . list wpc wpc_ln delres lev dfit cook dfb_dyst dfb_rp dfb_vd dfb_int dfb_hs if wpc==1, clean compress noobs wpc wpc~n delres lev dfit cook df~st dfb~p dfb~d dfb~nt dfb_hs 1 0.000 -5.448908 0.002 -0.243 0.006 0.044 0.053 0.042 -0.026 -.0160816 1 0.000 -5.028087 0.002 -0.243 0.006 0.051 0.037 0.028 -0.021 .0976687 . . *original final model . reg wpc_ln hs100_ctr hs100_ctr_sq parity1 aut_calv twin dyst rp##vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(9, 1564) = 18.05 Model | 86.9474063 9 9.66082293 Prob > F = 0.0000 Residual | 836.874538 1,564 .535086022 R-squared = 0.0941 -------------+---------------------------------- Adj R-squared = 0.0889 Total | 923.821945 1,573 .587299393 Root MSE = .7315 ------------------------------------------------------------------------------ wpc_ln | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- hs100_ctr | .343658 .0317233 10.83 0.000 .2814333 .4058827 hs100_ctr_sq | .2118728 .0456207 4.64 0.000 .1223885 .301357 parity1 | .0129844 .012586 1.03 0.302 -.0117028 .0376715 aut_calv | -.1371621 .0372127 -3.69 0.000 -.2101542 -.0641701 twin | .3927426 .1443661 2.72 0.007 .1095711 .6759141 dyst | .1109405 .0801013 1.39 0.166 -.0461768 .2680578 | rp | yes | .1123166 .0705612 1.59 0.112 -.0260878 .250721 | vag_disch | yes | -.0260135 .1050122 -0.25 0.804 -.231993 .1799661 | rp#vag_disch | yes#yes | .4136999 .183531 2.25 0.024 .0537072 .7736926 | _cons | 3.888587 .0386257 100.67 0.000 3.812823 3.96435 ------------------------------------------------------------------------------ . estimate store ln . . *re-fit without outliers/influential obs . reg wpc_ln hs100_ctr hs100_ctr_sq parity1 aut_calv twin dyst rp##vag_disch if wpc~=1 Source | SS df MS Number of obs = 1,572 -------------+---------------------------------- F(9, 1562) = 18.15 Model | 84.5110938 9 9.39012153 Prob > F = 0.0000 Residual | 807.934865 1,562 .517243831 R-squared = 0.0947 -------------+---------------------------------- Adj R-squared = 0.0895 Total | 892.445959 1,571 .568075085 Root MSE = .7192 ------------------------------------------------------------------------------ wpc_ln | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- hs100_ctr | .3391632 .031199 10.87 0.000 .2779669 .4003596 hs100_ctr_sq | .2026965 .0448706 4.52 0.000 .1146836 .2907094 parity1 | .0113333 .0123763 0.92 0.360 -.0129427 .0356093 aut_calv | -.1369515 .0366092 -3.74 0.000 -.2087598 -.0651432 twin | .3894441 .1419397 2.74 0.006 .1110316 .6678566 dyst | .1033889 .0787611 1.31 0.189 -.0510998 .2578777 | rp | yes | .1060341 .0693799 1.53 0.127 -.0300535 .2421216 | vag_disch | yes | -.0332815 .1032512 -0.32 0.747 -.2358071 .1692441 | rp#vag_disch | yes#yes | .4223009 .1804488 2.34 0.019 .0683535 .7762484 | _cons | 3.901024 .0380169 102.61 0.000 3.826455 3.975594 ------------------------------------------------------------------------------ . estimate store ln_noout . . estat hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of wpc_ln chi2(1) = 16.22 Prob > chi2 = 0.0001 . estat imtest Cameron & Trivedi's decomposition of IM-test --------------------------------------------------- Source | chi2 df p ---------------------+----------------------------- Heteroskedasticity | 71.76 44 0.0051 Skewness | 14.82 9 0.0960 Kurtosis | 9.69 1 0.0019 ---------------------+----------------------------- Total | 96.27 54 0.0004 --------------------------------------------------- . capture drop fit_group . capture drop fit stdres . capture drop dfit cook delres . predict fit, xb . predict stdres, rstandard . predict delres, rstudent . predict dfit, dfit (2 missing values generated) . predict cook, cook . scalar nobs=1572 . scalar nparam=10 . . hist stdres (bin=31, start=-5.5060368, width=.26001653) . egen fit_group=cut(fit), group(10) . tabstat stdres, statistics( mean sd count ) by(fit_group) Summary for variables: stdres by categories of: fit_group fit_group | mean sd N ----------+------------------------------ 0 | -.1127984 1.068298 152 1 | -.0052168 1.078422 158 2 | .0429111 1.119622 160 3 | -.0532808 1.040071 125 4 | .0708433 1.016679 191 5 | .042046 1.083892 154 6 | -.0311166 1.100208 133 7 | -.0623551 .9994817 185 8 | .0161114 .8454157 138 9 | .0035924 .8149584 178 ----------+------------------------------ Total | -.0067148 1.017397 1574 ----------------------------------------- . sort delres . browse cow wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres delres cook dfit in 1/10 . * outlier test based on deletion residuals . display invttail(nobs-nparam-1,.025/nobs) /* values >= abs(4.17) outliers*/ 4.1723589 . count if abs(delres) >=4.17 2 . sort dfit . browse cow wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres delres cook dfit in 1/10 . gsort -cook . browse cow wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres delres cook dfit in 1/10 . *the two outliers don't have the highest infl. values . . estimate table ln ln_noout, //models are very similar since outliers did not have much influence ---------------------------------------- Variable | ln ln_noout -------------+-------------------------- hs100_ctr | .34365799 .33916323 hs100_ctr_sq | .21187276 .20269649 parity1 | .01298436 .0113333 aut_calv | -.13716214 -.13695147 twin | .39274261 .3894441 dyst | .1109405 .10338895 | rp | yes | .11231661 .10603408 | vag_disch | yes | -.02601346 -.03328151 | rp#vag_disch | yes#yes | .41369992 .42230092 | _cons | 3.8885867 3.9010244 ---------------------------------------- . estimate table ln ln_noout, se //models are very similar since outliers did not have much influence ---------------------------------------- Variable | ln ln_noout -------------+-------------------------- hs100_ctr | .34365799 .33916323 | .03172331 .031199 hs100_ctr_sq | .21187276 .20269649 | .04562075 .04487057 parity1 | .01298436 .0113333 | .01258597 .01237635 aut_calv | -.13716214 -.13695147 | .0372127 .03660917 twin | .39274261 .3894441 | .14436609 .14193974 dyst | .1109405 .10338895 | .08010133 .07876115 | rp | yes | .11231661 .10603408 | .07056115 .0693799 | vag_disch | yes | -.02601346 -.03328151 | .10501221 .10325122 | rp#vag_disch | yes#yes | .41369992 .42230092 | .18353098 .18044883 | _cons | 3.8885867 3.9010244 | .03862574 .03801689 ---------------------------------------- legend: b/se . end of do-file . log close name: log: C:\vhm812-data\l2b_linear_reg_diag.txt log type: text closed on: 14 Jan 2019, 16:34:56 ------------------------------------------------------------------------------------------------------------------------------------------------------------