-------------------------------------------------------------------------------------------------------------------------------------------------------------------------- name: log: c:\vhm812-data\l3a_model_build_II.txt log type: text opened on: 18 Jan 2024, 14:29:48 . set more off . . * open the DAISY Red dataset . use daisy2red_01.dta, clear . gen milk120k=milk120/1000 (38 missing values generated) . gen wpc_sqrt=sqrt(wpc) . save daisy2red_02.dta, replace file daisy2red_02.dta saved . . * Specifying maximum model . * no analyses . . * Causal model . * no analyses . . **# Reducing the Number of Predictors . * descriptive statistics . codebook cf vag_disch -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cf Calving to first service interval -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Type: Numeric (float) Range: [21,240] Units: 1 Unique values: 129 Missing .: 14/1,574 Mean: 71.3115 Std. dev.: 21.9039 Percentiles: 10% 25% 50% 75% 90% 51 59 67 78 96 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- vag_disch Vaginal discharge observed -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Type: Numeric (float) Label: noyes Range: [0,1] Units: 1 Unique values: 2 Missing .: 0/1,574 Tabulation: Freq. Numeric Label 1,492 0 no 82 1 yes . codebook milk120k parity herd_size dyst rp vag_disch, compact Variable Obs Unique Mean Min Max Label -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- milk120k 1536 1486 3.215096 1.1102 5.6303 parity 1574 7 2.729987 1 7 Lactation number herd_size 1574 7 251.0076 125 333 Herd size dyst 1574 2 .0597205 0 1 Dystocia at calving rp 1574 2 .0946633 0 1 Retained placenta at calving vag_disch 1574 2 .0520966 0 1 Vaginal discharge observed -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- . . sum milk120k parity herd_size dyst rp vag_disch, detail milk120k ------------------------------------------------------------- Percentiles Smallest 1% 1.6985 1.1102 5% 2.0772 1.3977 10% 2.2982 1.4615 Obs 1,536 25% 2.73195 1.467 Sum of wgt. 1,536 50% 3.21525 Mean 3.215096 Largest Std. dev. .6981316 75% 3.6821 5.1818 90% 4.0807 5.278 Variance .4873877 95% 4.4033 5.3999 Skewness .1101838 99% 4.9044 5.6303 Kurtosis 2.845637 Lactation number ------------------------------------------------------------- Percentiles Smallest 1% 1 1 5% 1 1 10% 1 1 Obs 1,574 25% 1 1 Sum of wgt. 1,574 50% 2 Mean 2.729987 Largest Std. dev. 1.493841 75% 4 7 90% 5 7 Variance 2.23156 95% 5 7 Skewness .5450922 99% 6 7 Kurtosis 2.315593 Herd size ------------------------------------------------------------- Percentiles Smallest 1% 125 125 5% 125 125 10% 185 125 Obs 1,574 25% 201 125 Sum of wgt. 1,574 50% 263 Mean 251.0076 Largest Std. dev. 62.01692 75% 294 333 90% 333 333 Variance 3846.098 95% 333 333 Skewness -.3550929 99% 333 333 Kurtosis 2.256969 Dystocia at calving ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1,574 25% 0 0 Sum of wgt. 1,574 50% 0 Mean .0597205 Largest Std. dev. .2370435 75% 0 1 90% 0 1 Variance .0561896 95% 1 1 Skewness 3.715938 99% 1 1 Kurtosis 14.80819 Retained placenta at calving ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1,574 25% 0 0 Sum of wgt. 1,574 50% 0 Mean .0946633 Largest Std. dev. .2928423 75% 0 1 90% 0 1 Variance .0857566 95% 1 1 Skewness 2.769173 99% 1 1 Kurtosis 8.66832 Vaginal discharge observed ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1,574 25% 0 0 Sum of wgt. 1,574 50% 0 Mean .0520966 Largest Std. dev. .2222924 75% 0 1 90% 0 1 Variance .0494139 95% 1 1 Skewness 4.031139 99% 1 1 Kurtosis 17.25008 . . * correlation . corr milk120k parity herd_size (obs=1,536) | milk120k parity herd_s~e -------------+--------------------------- milk120k | 1.0000 parity | 0.3821 1.0000 herd_size | -0.0433 0.0356 1.0000 . pwcorr milk120k parity herd_size, obs star(0.05) | milk120k parity herd_s~e -------------+--------------------------- milk120k | 1.0000 | 1536 | parity | 0.3821* 1.0000 | 1536 1574 | herd_size | -0.0433 0.0386 1.0000 | 1536 1574 1574 | . tab rp dyst,chi exact Retained | placenta | Dystocia at calving at calving | no yes | Total -----------+----------------------+---------- no | 1,347 78 | 1,425 yes | 133 16 | 149 -----------+----------------------+---------- Total | 1,480 94 | 1,574 Pearson chi2(1) = 6.6580 Pr = 0.010 Fisher's exact = 0.017 1-sided Fisher's exact = 0.012 . tab vag_disch dyst, chi exact Vaginal | discharge | Dystocia at calving observed | no yes | Total -----------+----------------------+---------- no | 1,412 80 | 1,492 yes | 68 14 | 82 -----------+----------------------+---------- Total | 1,480 94 | 1,574 Pearson chi2(1) = 18.9847 Pr = 0.000 Fisher's exact = 0.000 1-sided Fisher's exact = 0.000 . tab vag_disch rp, chi exact Vaginal | Retained placenta at discharge | calving observed | no yes | Total -----------+----------------------+---------- no | 1,373 119 | 1,492 yes | 52 30 | 82 -----------+----------------------+---------- Total | 1,425 149 | 1,574 Pearson chi2(1) = 74.2346 Pr = 0.000 Fisher's exact = 0.000 1-sided Fisher's exact = 0.000 . . * indices . * no analyses . . * unconditional associations . reg cf parity Source | SS df MS Number of obs = 1,560 -------------+---------------------------------- F(1, 1558) = 1.46 Model | 699.698494 1 699.698494 Prob > F = 0.2273 Residual | 747280.894 1,558 479.641139 R-squared = 0.0009 -------------+---------------------------------- Adj R-squared = 0.0003 Total | 747980.592 1,559 479.782291 Root MSE = 21.901 ------------------------------------------------------------------------------ cf | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- parity | -.4480144 .3709323 -1.21 0.227 -1.175594 .2795649 _cons | 72.53554 1.155186 62.79 0.000 70.26965 74.80142 ------------------------------------------------------------------------------ . reg cf i.vag_disch Source | SS df MS Number of obs = 1,560 -------------+---------------------------------- F(1, 1558) = 0.49 Model | 236.166881 1 236.166881 Prob > F = 0.4831 Residual | 747744.425 1,558 479.938656 R-squared = 0.0003 -------------+---------------------------------- Adj R-squared = -0.0003 Total | 747980.592 1,559 479.782291 Root MSE = 21.908 ------------------------------------------------------------------------------ cf | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- vag_disch | yes | 1.743523 2.485484 0.70 0.483 -3.131724 6.61877 _cons | 71.21989 .5698436 124.98 0.000 70.10215 72.33763 ------------------------------------------------------------------------------ . . * principal components / factor analysis / correspondence analysis . * not covered . . **# Functional Form of predictors . * residual plots . reg cf milk120k Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(1, 1523) = 0.72 Model | 329.282096 1 329.282096 Prob > F = 0.3977 Residual | 700869.964 1,523 460.19039 R-squared = 0.0005 -------------+---------------------------------- Adj R-squared = -0.0002 Total | 701199.246 1,524 460.104492 Root MSE = 21.452 ------------------------------------------------------------------------------ cf | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- milk120k | .6666736 .7881302 0.85 0.398 -.8792618 2.212609 _cons | 68.95612 2.595207 26.57 0.000 63.86556 74.04667 ------------------------------------------------------------------------------ . predict stdres, rsta (49 missing values generated) . . * lowess smoother . lowess stdres milk120k . * Detecting and Correcting for non-linearity (transformation of X) . * categorization of predictor . reg cf parity Source | SS df MS Number of obs = 1,560 -------------+---------------------------------- F(1, 1558) = 1.46 Model | 699.698494 1 699.698494 Prob > F = 0.2273 Residual | 747280.894 1,558 479.641139 R-squared = 0.0009 -------------+---------------------------------- Adj R-squared = 0.0003 Total | 747980.592 1,559 479.782291 Root MSE = 21.901 ------------------------------------------------------------------------------ cf | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- parity | -.4480144 .3709323 -1.21 0.227 -1.175594 .2795649 _cons | 72.53554 1.155186 62.79 0.000 70.26965 74.80142 ------------------------------------------------------------------------------ . egen parity_c6=cut(parity), at(0 1 2 3 4 5 6 15) icodes . tab parity parity_c6 Lactation | parity_c6 number | 1 2 3 4 5 6 | Total -----------+------------------------------------------------------------------+---------- 1 | 417 0 0 0 0 0 | 417 2 | 0 374 0 0 0 0 | 374 3 | 0 0 319 0 0 0 | 319 4 | 0 0 0 222 0 0 | 222 5 | 0 0 0 0 169 0 | 169 6 | 0 0 0 0 0 69 | 69 7 | 0 0 0 0 0 4 | 4 -----------+------------------------------------------------------------------+---------- Total | 417 374 319 222 169 73 | 1,574 . reg cf i.parity_c6 Source | SS df MS Number of obs = 1,560 -------------+---------------------------------- F(5, 1554) = 0.91 Model | 2174.44893 5 434.889785 Prob > F = 0.4761 Residual | 745806.143 1,554 479.926733 R-squared = 0.0029 -------------+---------------------------------- Adj R-squared = -0.0003 Total | 747980.592 1,559 479.782291 Root MSE = 21.907 ------------------------------------------------------------------------------ cf | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- parity_c6 | 2 | -.6468817 1.568168 -0.41 0.680 -3.722829 2.429066 3 | -3.181369 1.635853 -1.94 0.052 -6.390081 .027343 4 | -1.528531 1.831255 -0.83 0.404 -5.120523 2.063462 5 | -2.322236 2.004684 -1.16 0.247 -6.254406 1.609935 6 | -.9349232 2.781437 -0.34 0.737 -6.390688 4.520841 | _cons | 72.61985 1.077984 67.37 0.000 70.5054 74.73431 ------------------------------------------------------------------------------ . di "F-statistic = " ((747280.9-745806.1)/(1558-1554))/480 F-statistic = .768125 . di "P-value = " Ftail(4, 1554, ((747280.9-745806.1)/(1558-1554))/480) // R model is valid but term is NS P-value = .54594195 . . . *box cox did not work with raw need to use log(cf) . gen ln_cf=ln(cf) (14 missing values generated) . boxcox ln_cf milk120k, model(rhs) Fitting full model Iteration 0: Log likelihood = -183.07949 (not concave) Iteration 1: Log likelihood = -180.48708 Iteration 2: Log likelihood = -180.32784 Iteration 3: Log likelihood = -180.1167 Iteration 4: Log likelihood = -179.88238 Iteration 5: Log likelihood = -179.86958 Iteration 6: Log likelihood = -179.86958 (38 missing values generated) (38 missing values generated) (38 missing values generated) Number of obs = 1,525 LR chi2(2) = 8.21 Log likelihood = -179.86958 Prob > chi2 = 0.016 ------------------------------------------------------------------------------ ln_cf | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- /lambda | -3.647194 1.174882 -3.10 0.002 -5.94992 -1.344468 ------------------------------------------------------------------------------ Estimates of scale-variant parameters ---------------------------- | Coefficient -------------+-------------- Notrans | _cons | 3.60132 -------------+-------------- Trans | milk120k | 2.329791 -------------+-------------- /sigma | .2722618 ---------------------------- --------------------------------------------------------- Test Restricted LR statistic H0: log likelihood chi2 Prob > chi2 --------------------------------------------------------- lambda = -1 -181.5923 3.45 0.063 lambda = 0 -182.42186 5.10 0.024 lambda = 1 -183.07949 6.42 0.011 --------------------------------------------------------- . . * quadratic function of X . reg ln_cf c.milk120k##c.milk120k Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(2, 1522) = 4.78 Model | .709133443 2 .354566722 Prob > F = 0.0085 Residual | 112.944101 1,522 .074207688 R-squared = 0.0062 -------------+---------------------------------- Adj R-squared = 0.0049 Total | 113.653234 1,524 .074575613 Root MSE = .27241 --------------------------------------------------------------------------------------- ln_cf | Coefficient Std. err. t P>|t| [95% conf. interval] ----------------------+---------------------------------------------------------------- milk120k | .2056932 .0697547 2.95 0.003 .0688677 .3425186 | c.milk120k#c.milk120k | -.0295122 .0105961 -2.79 0.005 -.0502966 -.0087277 | _cons | 3.883482 .1122207 34.61 0.000 3.663358 4.103605 --------------------------------------------------------------------------------------- . estat vif Variable | VIF 1/VIF -------------+---------------------- milk120k | 48.58 0.020585 c.milk120k#| c.milk120k | 48.58 0.020585 -------------+---------------------- Mean VIF | 48.58 . vce, corr Correlation matrix of coefficients of regress model | c.m~120k# e(V) | milk120k c.m~120k _cons -------------+----------------------------- milk120k | 1.0000 c.milk120k#| c.milk120k | -0.9897 1.0000 _cons | -0.9872 0.9559 1.0000 . . * re-doing the analysis with milk120 centred . summ milk120k Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- milk120k | 1,536 3.215096 .6981316 1.1102 5.6303 . gen m120k_ct=(milk120k-r(mean)) /*r(mean) - memory variable created by summ command that store the mean of the variable*/ (38 missing values generated) . reg ln_cf c.m120k_ct##c.m120k_ct Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(2, 1522) = 4.78 Model | .709133424 2 .354566712 Prob > F = 0.0085 Residual | 112.944101 1,522 .074207688 R-squared = 0.0062 -------------+---------------------------------- Adj R-squared = 0.0049 Total | 113.653234 1,524 .074575613 Root MSE = .27241 --------------------------------------------------------------------------------------- ln_cf | Coefficient Std. err. t P>|t| [95% conf. interval] ----------------------+---------------------------------------------------------------- m120k_ct | .0159242 .0100484 1.58 0.113 -.0037859 .0356344 | c.m120k_ct#c.m120k_ct | -.0295122 .0105961 -2.79 0.005 -.0502966 -.0087277 | _cons | 4.239742 .0086679 489.13 0.000 4.22274 4.256745 --------------------------------------------------------------------------------------- . estat vif Variable | VIF 1/VIF -------------+---------------------- m120k_ct | 1.01 0.992010 c.m120k_ct#| c.m120k_ct | 1.01 0.992010 -------------+---------------------- Mean VIF | 1.01 . vce, corr Correlation matrix of coefficients of regress model | c.m120~t# e(V) | m120k_ct c.m120~t _cons -------------+----------------------------- m120k_ct | 1.0000 c.m120k_ct#| c.m120k_ct | -0.0894 1.0000 _cons | 0.0494 -0.5936 1.0000 . . capture drop stdres . predict stdres, rsta (49 missing values generated) . lowess stdres m120k_ct . . * fractional polynomials . fp , scale center replace: reg ln_cf (fitting 44 models) (....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%) Fractional polynomial comparisons: ------------------------------------------------------------------------------ | Test Residual Deviance milk120k | df Deviance std. dev. diff. P Powers -------------+---------------------------------------------------------------- omitted | 4 367.951 0.273 10.533 0.033 linear | 3 366.159 0.273 8.741 0.033 1 m = 1 | 2 361.396 0.273 3.978 0.138 -2 m = 2 | 0 357.418 0.272 0.000 -- -2 3 ------------------------------------------------------------------------------ Note: Test df is degrees of freedom, and P = P > F is sig. level for tests comparing models vs. model with m = 2 based on deviance difference, F(df, 1520). Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(2, 1522) = 5.27 Model | .782289941 2 .391144971 Prob > F = 0.0052 Residual | 112.870944 1,522 .074159622 R-squared = 0.0069 -------------+---------------------------------- Adj R-squared = 0.0056 Total | 113.653234 1,524 .074575613 Root MSE = .27232 ------------------------------------------------------------------------------ ln_cf | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- milk120k_1 | -.5301266 .1654544 -3.20 0.001 -.8546694 -.2055839 milk120k_2 | -.0008516 .0004271 -1.99 0.046 -.0016894 -.0000138 _cons | 4.238434 .008295 510.96 0.000 4.222163 4.254704 ------------------------------------------------------------------------------ . fp plot, r(none) ytitle(Predicted Ln(cf)) . fp plot, r(rstand) ytitle(Predicted Ln(cf)) /*plot with standardized residuals*/ . . *more predictors - one predictor at a time . fp , scale center replace: reg ln_cf parity rp (fitting 44 models) (....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%) Fractional polynomial comparisons: ------------------------------------------------------------------------------ | Test Residual Deviance milk120k | df Deviance std. dev. diff. P Powers -------------+---------------------------------------------------------------- omitted | 4 361.366 0.273 12.986 0.012 linear | 3 357.527 0.272 9.147 0.028 1 m = 1 | 2 351.183 0.272 2.803 0.248 -2 m = 2 | 0 348.380 0.272 0.000 -- -2 3 ------------------------------------------------------------------------------ Note: Test df is degrees of freedom, and P = P > F is sig. level for tests comparing models vs. model with m = 2 based on deviance difference, F(df, 1518). Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(4, 1520) = 4.91 Model | 1.44923978 4 .362309946 Prob > F = 0.0006 Residual | 112.203994 1,520 .073818417 R-squared = 0.0128 -------------+---------------------------------- Adj R-squared = 0.0102 Total | 113.653234 1,524 .074575613 Root MSE = .2717 ------------------------------------------------------------------------------ ln_cf | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- milk120k_1 | -.5748179 .1665152 -3.45 0.001 -.9014417 -.2481941 milk120k_2 | -.0007211 .0004312 -1.67 0.095 -.001567 .0001247 parity | -.0097322 .005001 -1.95 0.052 -.0195417 .0000773 rp | .053428 .0236505 2.26 0.024 .007037 .099819 _cons | 4.260094 .0162315 262.46 0.000 4.228255 4.291932 ------------------------------------------------------------------------------ . fp , scale center replace: reg ln_cf milk120k_1 milk120k_2 rp (fitting 44 models) (....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%) Fractional polynomial comparisons: ------------------------------------------------------------------------------ | Test Residual Deviance parity | df Deviance std. dev. diff. P Powers -------------+---------------------------------------------------------------- omitted | 4 352.175 0.272 11.021 0.027 linear | 3 348.380 0.272 7.227 0.066 1 m = 1 | 2 344.414 0.271 3.261 0.198 -2 m = 2 | 0 341.154 0.271 0.000 -- 2 2 ------------------------------------------------------------------------------ Note: Test df is degrees of freedom, and P = P > F is sig. level for tests comparing models vs. model with m = 2 based on deviance difference, F(df, 1517). Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(5, 1519) = 5.39 Model | 1.97968732 5 .395937465 Prob > F = 0.0001 Residual | 111.673547 1,519 .073517806 R-squared = 0.0174 -------------+---------------------------------- Adj R-squared = 0.0142 Total | 113.653234 1,524 .074575613 Root MSE = .27114 ------------------------------------------------------------------------------ ln_cf | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- milk120k_1 | -.609147 .1666002 -3.66 0.000 -.9359378 -.2823563 milk120k_2 | -.0005881 .0004331 -1.36 0.175 -.0014377 .0002615 parity_1 | -.020564 .0064416 -3.19 0.001 -.0331993 -.0079287 parity_2 | .010903 .0035577 3.06 0.002 .0039244 .0178816 rp | .0572608 .0236445 2.42 0.016 .0108815 .1036401 _cons | 4.216664 .0107005 394.06 0.000 4.195674 4.237653 ------------------------------------------------------------------------------ . . * interactions . * 2-way . reg wpc_sqrt c.hs100_ctr##c.hs100_ctr i.aut_calv i.twin i.dyst i.rp##i.vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(8, 1565) = 18.02 Model | 1122.44946 8 140.306183 Prob > F = 0.0000 Residual | 12183.8174 1,565 7.78518682 R-squared = 0.0844 -------------+---------------------------------- Adj R-squared = 0.0797 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7902 ----------------------------------------------------------------------------------------- wpc_sqrt | Coefficient Std. err. t P>|t| [95% conf. interval] ------------------------+---------------------------------------------------------------- hs100_ctr | 1.23627 .1209 10.23 0.000 .9991268 1.473413 | c.hs100_ctr#c.hs100_ctr | .7184457 .1738237 4.13 0.000 .3774938 1.059398 | aut_calv | yes | -.5050294 .1417653 -3.56 0.000 -.7830992 -.2269595 | twin | yes | 1.430024 .5494412 2.60 0.009 .3523056 2.507743 | dyst | yes | .4939374 .3029317 1.63 0.103 -.1002574 1.088132 | rp | yes | .3790562 .2690099 1.41 0.159 -.1486016 .906714 | vag_disch | yes | -.0448493 .3997112 -0.11 0.911 -.8288753 .7391766 | rp#vag_disch | yes#yes | 1.545781 .6986012 2.21 0.027 .1754877 2.916074 | _cons | 7.613353 .123964 61.42 0.000 7.3702 7.856506 ----------------------------------------------------------------------------------------- . . **# Selection criteria . * non-statistical . * no analyses . . * statistical - nested models . * Wald test . reg wpc_sqrt c.hs100_ctr##c.hs100_ctr i.aut_calv i.twin i.dyst i.rp##i.vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(8, 1565) = 18.02 Model | 1122.44946 8 140.306183 Prob > F = 0.0000 Residual | 12183.8174 1,565 7.78518682 R-squared = 0.0844 -------------+---------------------------------- Adj R-squared = 0.0797 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7902 ----------------------------------------------------------------------------------------- wpc_sqrt | Coefficient Std. err. t P>|t| [95% conf. interval] ------------------------+---------------------------------------------------------------- hs100_ctr | 1.23627 .1209 10.23 0.000 .9991268 1.473413 | c.hs100_ctr#c.hs100_ctr | .7184457 .1738237 4.13 0.000 .3774938 1.059398 | aut_calv | yes | -.5050294 .1417653 -3.56 0.000 -.7830992 -.2269595 | twin | yes | 1.430024 .5494412 2.60 0.009 .3523056 2.507743 | dyst | yes | .4939374 .3029317 1.63 0.103 -.1002574 1.088132 | rp | yes | .3790562 .2690099 1.41 0.159 -.1486016 .906714 | vag_disch | yes | -.0448493 .3997112 -0.11 0.911 -.8288753 .7391766 | rp#vag_disch | yes#yes | 1.545781 .6986012 2.21 0.027 .1754877 2.916074 | _cons | 7.613353 .123964 61.42 0.000 7.3702 7.856506 ----------------------------------------------------------------------------------------- . testparm i.rp#i.vag_disch ( 1) 1.rp#1.vag_disch = 0 F( 1, 1565) = 4.90 Prob > F = 0.0271 . . * AIC and BIC . * statistical non-nested - this in logistic regression . . **# Selection strategy . use coleman1.dta, clear . * forward selection . stepwise, pe(0.1): reg y_test_scr x1_staff_sal x2_father_job x3_ses x4_test_teach x5_edu_mother Wald test, begin with empty model: p = 0.0000 < 0.1000, adding x3_ses p = 0.0566 < 0.1000, adding x4_test_teach Source | SS df MS Number of obs = 20 -------------+---------------------------------- F(2, 17) = 66.95 Model | 570.497872 2 285.248936 Prob > F = 0.0000 Residual | 72.4264222 17 4.26037777 R-squared = 0.8873 -------------+---------------------------------- Adj R-squared = 0.8741 Total | 642.924294 19 33.8381207 Root MSE = 2.0641 ------------------------------------------------------------------------------- y_test_scr | Coefficient Std. err. t P>|t| [95% conf. interval] --------------+---------------------------------------------------------------- x3_ses | .5415607 .0500441 10.82 0.000 .4359768 .6471445 x4_test_teach | .7498915 .3666402 2.05 0.057 -.0236517 1.523435 _cons | 14.5827 9.175409 1.59 0.130 -4.775723 33.94112 ------------------------------------------------------------------------------- . * backward elimination . stepwise, pr(0.1): reg y_test_scr x1_staff_sal x2_father_job x3_ses /// > x4_test_teach x5_edu_mother Wald test, begin with full model: p = 0.4267 >= 0.1000, removing x2_father_job p = 0.6863 >= 0.1000, removing x5_edu_mother p = 0.1616 >= 0.1000, removing x1_staff_sal Source | SS df MS Number of obs = 20 -------------+---------------------------------- F(2, 17) = 66.95 Model | 570.497872 2 285.248936 Prob > F = 0.0000 Residual | 72.4264222 17 4.26037777 R-squared = 0.8873 -------------+---------------------------------- Adj R-squared = 0.8741 Total | 642.924294 19 33.8381207 Root MSE = 2.0641 ------------------------------------------------------------------------------- y_test_scr | Coefficient Std. err. t P>|t| [95% conf. interval] --------------+---------------------------------------------------------------- x3_ses | .5415607 .0500441 10.82 0.000 .4359768 .6471445 x4_test_teach | .7498915 .3666402 2.05 0.057 -.0236517 1.523435 _cons | 14.5827 9.175409 1.59 0.130 -4.775723 33.94112 ------------------------------------------------------------------------------- . * stepwise selection . * backward . stepwise, pe(0.1) pr(0.11): reg y_test_scr x1_staff_sal x2_father_job x3_ses x4_test_teach x5_edu_mother Wald test, begin with full model: p = 0.4267 >= 0.1100, removing x2_father_job p = 0.6863 >= 0.1100, removing x5_edu_mother p = 0.1616 >= 0.1100, removing x1_staff_sal Source | SS df MS Number of obs = 20 -------------+---------------------------------- F(2, 17) = 66.95 Model | 570.497872 2 285.248936 Prob > F = 0.0000 Residual | 72.4264222 17 4.26037777 R-squared = 0.8873 -------------+---------------------------------- Adj R-squared = 0.8741 Total | 642.924294 19 33.8381207 Root MSE = 2.0641 ------------------------------------------------------------------------------- y_test_scr | Coefficient Std. err. t P>|t| [95% conf. interval] --------------+---------------------------------------------------------------- x3_ses | .5415607 .0500441 10.82 0.000 .4359768 .6471445 x4_test_teach | .7498915 .3666402 2.05 0.057 -.0236517 1.523435 _cons | 14.5827 9.175409 1.59 0.130 -4.775723 33.94112 ------------------------------------------------------------------------------- . * forward . stepwise, pe(0.1) pr(0.11) forward: reg y_test_scr x1_staff_sal x2_father_job x3_ses x4_test_teach x5_edu_mother Wald test, begin with empty model: p = 0.0000 < 0.1000, adding x3_ses p = 0.0566 < 0.1000, adding x4_test_teach Source | SS df MS Number of obs = 20 -------------+---------------------------------- F(2, 17) = 66.95 Model | 570.497872 2 285.248936 Prob > F = 0.0000 Residual | 72.4264222 17 4.26037777 R-squared = 0.8873 -------------+---------------------------------- Adj R-squared = 0.8741 Total | 642.924294 19 33.8381207 Root MSE = 2.0641 ------------------------------------------------------------------------------- y_test_scr | Coefficient Std. err. t P>|t| [95% conf. interval] --------------+---------------------------------------------------------------- x3_ses | .5415607 .0500441 10.82 0.000 .4359768 .6471445 x4_test_teach | .7498915 .3666402 2.05 0.057 -.0236517 1.523435 _cons | 14.5827 9.175409 1.59 0.130 -4.775723 33.94112 ------------------------------------------------------------------------------- . . *Daisy dataset . use daisy2red_02.dta, clear . *stepwise backward . stepwise, pe(0.05) pr(0.051) : reg wpc_sqrt parity1 i.aut_calv c.hs100_ctr##c.hs100_ctr i.dyst##i.twin i.rp##i.vag_disch /*no in notes*/ note: 0b.aut_calv omitted because of estimability. note: 0b.dyst omitted because of estimability. note: 0b.twin omitted because of estimability. note: 0o.dyst#0o.twin omitted because of estimability. note: 0o.dyst#1o.twin omitted because of estimability. note: 1o.dyst#0o.twin omitted because of estimability. note: 0b.rp omitted because of estimability. note: 0b.vag_disch omitted because of estimability. note: 0o.rp#0o.vag_disch omitted because of estimability. note: 0o.rp#1o.vag_disch omitted because of estimability. note: 1o.rp#0o.vag_disch omitted because of estimability. Wald test, begin with full model: p = 0.9161 >= 0.0510, removing 1.vag_disch p = 0.2407 >= 0.0510, removing parity1 p = 0.1522 >= 0.0510, removing 1.rp p = 0.1062 >= 0.0510, removing 1.dyst#1.twin p = 0.0855 >= 0.0510, removing 1.dyst Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(5, 1568) = 27.80 Model | 1083.58379 5 216.716758 Prob > F = 0.0000 Residual | 12222.6831 1,568 7.79507848 R-squared = 0.0814 -------------+---------------------------------- Adj R-squared = 0.0785 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.792 ----------------------------------------------------------------------------------------- wpc_sqrt | Coefficient Std. err. t P>|t| [95% conf. interval] ------------------------+---------------------------------------------------------------- rp#vag_disch | yes#yes | 1.843187 .5192482 3.55 0.000 .8246926 2.861681 | aut_calv | yes | -.5170909 .1415804 -3.65 0.000 -.7947978 -.2393839 hs100_ctr | 1.206309 .1198844 10.06 0.000 .971158 1.441459 | c.hs100_ctr#c.hs100_ctr | .6799173 .1725081 3.94 0.000 .3415464 1.018288 | twin | yes | 1.508587 .547721 2.75 0.006 .4342445 2.58293 _cons | 7.689887 .1171432 65.65 0.000 7.460113 7.91966 ----------------------------------------------------------------------------------------- . *stepwise backward but keeping first term even if NS and keep all the terms in quadratic . stepwise, pe(0.05) pr(0.051) lockterm1: reg wpc_sqrt parity1 i.aut_calv (c.hs100_ctr##c.hs100_ctr) (i.dyst##i.twin) (i.rp##i.vag_disch) note: 0b.aut_calv omitted because of estimability. note: 0b.dyst omitted because of estimability. note: 0b.twin omitted because of estimability. note: 0b.dyst#0b.twin omitted because of estimability. note: 0b.dyst#1o.twin omitted because of estimability. note: 1o.dyst#0b.twin omitted because of estimability. note: 0b.rp omitted because of estimability. note: 0b.vag_disch omitted because of estimability. note: 0b.rp#0b.vag_disch omitted because of estimability. note: 0b.rp#1o.vag_disch omitted because of estimability. note: 1o.rp#0b.vag_disch omitted because of estimability. Wald test, begin with full model: p < 0.0510 for all terms in model Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(10, 1563) = 14.84 Model | 1153.69711 10 115.369711 Prob > F = 0.0000 Residual | 12152.5697 1,563 7.77515658 R-squared = 0.0867 -------------+---------------------------------- Adj R-squared = 0.0809 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7884 ----------------------------------------------------------------------------------------- wpc_sqrt | Coefficient Std. err. t P>|t| [95% conf. interval] ------------------------+---------------------------------------------------------------- parity1 | .0558659 .0480009 1.16 0.245 -.0382871 .150019 | aut_calv | yes | -.5128308 .1418525 -3.62 0.000 -.7910719 -.2345896 hs100_ctr | 1.232803 .1209377 10.19 0.000 .9955855 1.47002 | c.hs100_ctr#c.hs100_ctr | .7130448 .173925 4.10 0.000 .3718939 1.054196 | dyst | yes | .627718 .3100054 2.02 0.043 .0196477 1.235788 | twin | yes | 1.698238 .5842331 2.91 0.004 .552275 2.844201 | dyst#twin | yes#yes | -2.772795 1.739103 -1.59 0.111 -6.184016 .6384255 | rp | yes | .3904235 .2689735 1.45 0.147 -.1371634 .9180105 | vag_disch | yes | -.0422026 .4007079 -0.11 0.916 -.8281844 .7437792 | rp#vag_disch | yes#yes | 1.476058 .6996667 2.11 0.035 .1036735 2.848442 | _cons | 7.515274 .1472404 51.04 0.000 7.226465 7.804084 ----------------------------------------------------------------------------------------- . estimates store sw_1 . *now assessing variables from interactions . stepwise, pe(0.05) pr(0.051) lockterm1: reg wpc_sqrt parity1 i.aut_calv (c.hs100_ctr##c.hs100_ctr) i.dyst##i.twin i.rp##i.vag_disch note: 0b.aut_calv omitted because of estimability. note: 0b.dyst omitted because of estimability. note: 0b.twin omitted because of estimability. note: 0o.dyst#0o.twin omitted because of estimability. note: 0o.dyst#1o.twin omitted because of estimability. note: 1o.dyst#0o.twin omitted because of estimability. note: 0b.rp omitted because of estimability. note: 0b.vag_disch omitted because of estimability. note: 0o.rp#0o.vag_disch omitted because of estimability. note: 0o.rp#1o.vag_disch omitted because of estimability. note: 1o.rp#0o.vag_disch omitted because of estimability. Wald test, begin with full model: p = 0.9161 >= 0.0510, removing 1.vag_disch p = 0.1434 >= 0.0510, removing 1.rp p = 0.1136 >= 0.0510, removing 1.dyst#1.twin p = 0.0616 >= 0.0510, removing 1.dyst Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(6, 1567) = 23.31 Model | 1090.25318 6 181.708864 Prob > F = 0.0000 Residual | 12216.0137 1,567 7.79579684 R-squared = 0.0819 -------------+---------------------------------- Adj R-squared = 0.0784 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7921 ----------------------------------------------------------------------------------------- wpc_sqrt | Coefficient Std. err. t P>|t| [95% conf. interval] ------------------------+---------------------------------------------------------------- parity1 | .0438808 .0474418 0.92 0.355 -.0491753 .1369369 | aut_calv | yes | -.5244014 .1418074 -3.70 0.000 -.8025536 -.2462491 hs100_ctr | 1.19939 .1201231 9.98 0.000 .9637707 1.435009 | c.hs100_ctr#c.hs100_ctr | .6694138 .1728894 3.87 0.000 .3302949 1.008533 | rp#vag_disch | yes#yes | 1.831729 .5194199 3.53 0.000 .8128976 2.85056 | twin | yes | 1.478791 .5486927 2.70 0.007 .4025419 2.55504 _cons | 7.622191 .1381318 55.18 0.000 7.351249 7.893134 ----------------------------------------------------------------------------------------- . estimates store sw_2 . *since dyst##twin is NS, then we can ignore brackets. . stepwise, pe(0.05) pr(0.051) lockterm1: reg wpc_sqrt parity1 i.aut_calv (c.hs100_ctr##c.hs100_ctr) i.dys##i.twin (i.rp##i.vag_disch) note: 0b.aut_calv omitted because of estimability. note: 0b.dyst omitted because of estimability. note: 0b.twin omitted because of estimability. note: 0o.dyst#0o.twin omitted because of estimability. note: 0o.dyst#1o.twin omitted because of estimability. note: 1o.dyst#0o.twin omitted because of estimability. note: 0b.rp omitted because of estimability. note: 0b.vag_disch omitted because of estimability. note: 0b.rp#0b.vag_disch omitted because of estimability. note: 0b.rp#1o.vag_disch omitted because of estimability. note: 1o.rp#0b.vag_disch omitted because of estimability. Wald test, begin with full model: p = 0.1111 >= 0.0510, removing 1.dyst#1.twin p = 0.0761 >= 0.0510, removing 1.dyst Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(8, 1565) = 17.79 Model | 1109.40798 8 138.675997 Prob > F = 0.0000 Residual | 12196.8589 1,565 7.79352004 R-squared = 0.0834 -------------+---------------------------------- Adj R-squared = 0.0787 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7917 ----------------------------------------------------------------------------------------- wpc_sqrt | Coefficient Std. err. t P>|t| [95% conf. interval] ------------------------+---------------------------------------------------------------- parity1 | .0472026 .0476238 0.99 0.322 -.0462106 .1406157 | aut_calv | yes | -.5142217 .1420186 -3.62 0.000 -.7927884 -.235655 hs100_ctr | 1.205566 .1202722 10.02 0.000 .9696547 1.441478 | c.hs100_ctr#c.hs100_ctr | .6864161 .1736596 3.95 0.000 .3457861 1.027046 | rp | yes | .4207971 .2687102 1.57 0.118 -.1062728 .9478671 | vag_disch | yes | .0711392 .3979339 0.18 0.858 -.7094005 .8516789 | rp#vag_disch | yes#yes | 1.384039 .6978284 1.98 0.048 .0152618 2.752816 | twin | yes | 1.40634 .5508343 2.55 0.011 .3258891 2.486791 _cons | 7.571339 .144152 52.52 0.000 7.288588 7.854091 ----------------------------------------------------------------------------------------- . estimates store sw_3 . *compare the different model results . estimates table sw_1 sw_2 sw_3, star(0.10 0.05 0.001) -------------------------------------------------------------- Variable | sw_1 sw_2 sw_3 -------------+------------------------------------------------ parity1 | .05586593 .04388077 .04720258 | aut_calv | yes | -.51283075*** -.52440137*** -.51422168*** hs100_ctr | 1.2328027*** 1.1993896*** 1.2055664*** | c.hs100_ctr#| c.hs100_ctr | .71304483*** .66941384*** .68641606*** | 1.dyst | .62771805** | twin | yes | 1.6982381** 1.4787911** 1.4063402** | dyst#twin | 1#yes | -2.7727951 | rp | yes | .39042354 .42079714 | vag_disch | yes | -.04220257 .07113917 | rp#vag_disch | yes#yes | 1.4760578** 1.8317287*** 1.3840389** | _cons | 7.5152743*** 7.6221914*** 7.5713394*** -------------------------------------------------------------- Legend: * p<.1; ** p<.05; *** p<.001 . . **# Reliability . **leave-one-out . ***computing "n" regression models without obs "i" using residual and leverage values . ***and then calculating PRESS (predicted residual sum of squares) . reg wpc_sqrt parity1 i.aut_calv c.hs100_ctr##c.hs100_ctr i.twin i.dyst i.rp##i.vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(9, 1564) = 16.19 Model | 1133.93223 9 125.99247 Prob > F = 0.0000 Residual | 12172.3346 1,564 7.78282264 R-squared = 0.0852 -------------+---------------------------------- Adj R-squared = 0.0800 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7898 ----------------------------------------------------------------------------------------- wpc_sqrt | Coefficient Std. err. t P>|t| [95% conf. interval] ------------------------+---------------------------------------------------------------- parity1 | .058304 .0480002 1.21 0.225 -.0358476 .1524556 | aut_calv | yes | -.5136529 .1419214 -3.62 0.000 -.7920293 -.2352766 hs100_ctr | 1.230168 .120986 10.17 0.000 .992856 1.46748 | c.hs100_ctr#c.hs100_ctr | .7085557 .1739879 4.07 0.000 .3672814 1.04983 | twin | yes | 1.385453 .5505819 2.52 0.012 .3054964 2.465409 | dyst | yes | .5422828 .3054896 1.78 0.076 -.0569296 1.141495 | rp | yes | .3894596 .2691054 1.45 0.148 -.1383858 .917305 | vag_disch | yes | -.0132839 .4004945 -0.03 0.974 -.7988467 .7722788 | rp#vag_disch | yes#yes | 1.491019 .6999486 2.13 0.033 .1180826 2.863956 | _cons | 7.516653 .1473104 51.03 0.000 7.227706 7.805599 ----------------------------------------------------------------------------------------- . press /*command*/ PRESS = 12335.801 . *by hand . capture drop res lev eq1 . predict res, res . predict lev ,lev . gen eq1=(res/(1-lev))^2 . summ eq1 Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- eq1 | 1,574 7.83723 11.05655 6.06e-06 84.10674 . di "PRESS = " r(sum) //error sum square from predicted points PRESS = 12335.801 . scalar PRESS = r(sum) . . di in gr "R2(pred) = "1-(PRESS/ (e(mss) + e(rss))) R2(pred) = .07293301 . di in gr "R2 full model = " e(r2) R2 full model = .08521791 . . **# Presenting results . * standardize coefficients . reg wpc_sqrt parity1 i.aut_calv c.hs100_ctr##c.hs100_ctr i.twin i.dyst i.rp##i.vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(9, 1564) = 16.19 Model | 1133.93223 9 125.99247 Prob > F = 0.0000 Residual | 12172.3346 1,564 7.78282264 R-squared = 0.0852 -------------+---------------------------------- Adj R-squared = 0.0800 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7898 ----------------------------------------------------------------------------------------- wpc_sqrt | Coefficient Std. err. t P>|t| [95% conf. interval] ------------------------+---------------------------------------------------------------- parity1 | .058304 .0480002 1.21 0.225 -.0358476 .1524556 | aut_calv | yes | -.5136529 .1419214 -3.62 0.000 -.7920293 -.2352766 hs100_ctr | 1.230168 .120986 10.17 0.000 .992856 1.46748 | c.hs100_ctr#c.hs100_ctr | .7085557 .1739879 4.07 0.000 .3672814 1.04983 | twin | yes | 1.385453 .5505819 2.52 0.012 .3054964 2.465409 | dyst | yes | .5422828 .3054896 1.78 0.076 -.0569296 1.141495 | rp | yes | .3894596 .2691054 1.45 0.148 -.1383858 .917305 | vag_disch | yes | -.0132839 .4004945 -0.03 0.974 -.7988467 .7722788 | rp#vag_disch | yes#yes | 1.491019 .6999486 2.13 0.033 .1180826 2.863956 | _cons | 7.516653 .1473104 51.03 0.000 7.227706 7.805599 ----------------------------------------------------------------------------------------- . * rescale coefficient by SD of the predictor . matrix define coeff = e(b)' . capture drop coeff* . svmat coeff . summ hs100_ctr parity1 aut_calv twin dyst vag_disch Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- hs100_ctr | 1,574 .0000762 .6201692 -1.26 .82 parity1 | 1,574 1.729987 1.493841 0 6 aut_calv | 1,574 .4720457 .4993766 0 1 twin | 1,574 .0171537 .1298854 0 1 dyst | 1,574 .0597205 .2370435 0 1 -------------+--------------------------------------------------------- vag_disch | 1,574 .0520966 .2222924 0 1 . . * iqr . summ parity1,d /* note IQR = 3 - 0 = 3 */ parity1 ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1,574 25% 0 0 Sum of wgt. 1,574 50% 1 Mean 1.729987 Largest Std. dev. 1.493841 75% 3 6 90% 4 6 Variance 2.23156 95% 4 6 Skewness .5450922 99% 5 6 Kurtosis 2.315593 . /* coefficient for parity = 0.058 */ . . log close name: log: c:\vhm812-data\l3a_model_build_II.txt log type: text closed on: 18 Jan 2024, 14:30:00 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------