------------------------------------------------------------------------------------------------------------------------------------------------------------ name: log: c:\vhm812-data\l3b_model_build_II.txt log type: text opened on: 22 Jan 2019, 09:07:42 . set more off . . * open the DAISY Red dataset . use daisy2red.dta, clear . gen month=month(calv_dt) . gen aut_clv=(month>=2 & month<=7) if !missing(calv_dt) . gen hs_ct=herd_size-251 . gen hs_sq=herd_size^2 . gen parity1=parity-1 . gen milk120k=milk120/1000 (38 missing values generated) . gen wpc_sqrt=sqrt(wpc) . save daisy2red1.dta, replace file daisy2red1.dta saved . . * Specifying maximum model . * no analyses . . * Causal model . * no analyses . . * Reducing the Number of Predictors . * descriptive statistics . codebook cf vag_disch ------------------------------------------------------------------------------------------------------------------------------------------------------------ cf Calving to first service interval ------------------------------------------------------------------------------------------------------------------------------------------------------------ type: numeric (float) range: [21,240] units: 1 unique values: 129 missing .: 14/1,574 mean: 71.3115 std. dev: 21.9039 percentiles: 10% 25% 50% 75% 90% 51 59 67 78 96 ------------------------------------------------------------------------------------------------------------------------------------------------------------ vag_disch Vaginal discharge observed ------------------------------------------------------------------------------------------------------------------------------------------------------------ type: numeric (float) label: noyes range: [0,1] units: 1 unique values: 2 missing .: 0/1,574 tabulation: Freq. Numeric Label 1,492 0 no 82 1 yes . sum milk120 parity herd_size dyst rp vag_disch, detail Milk volume (l) in first 120 days of lactation ------------------------------------------------------------- Percentiles Smallest 1% 1698.5 1110.2 5% 2077.2 1397.7 10% 2298.2 1461.5 Obs 1,536 25% 2731.95 1467 Sum of Wgt. 1,536 50% 3215.25 Mean 3215.096 Largest Std. Dev. 698.1316 75% 3682.1 5181.8 90% 4080.7 5278 Variance 487387.7 95% 4403.3 5399.9 Skewness .1101838 99% 4904.4 5630.3 Kurtosis 2.845637 Lactation number ------------------------------------------------------------- Percentiles Smallest 1% 1 1 5% 1 1 10% 1 1 Obs 1,574 25% 1 1 Sum of Wgt. 1,574 50% 2 Mean 2.729987 Largest Std. Dev. 1.493841 75% 4 7 90% 5 7 Variance 2.23156 95% 5 7 Skewness .5450922 99% 6 7 Kurtosis 2.315593 Herd size ------------------------------------------------------------- Percentiles Smallest 1% 125 125 5% 125 125 10% 185 125 Obs 1,574 25% 201 125 Sum of Wgt. 1,574 50% 263 Mean 251.0076 Largest Std. Dev. 62.01692 75% 294 333 90% 333 333 Variance 3846.098 95% 333 333 Skewness -.3550929 99% 333 333 Kurtosis 2.256969 Dystocia at calving ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1,574 25% 0 0 Sum of Wgt. 1,574 50% 0 Mean .0597205 Largest Std. Dev. .2370435 75% 0 1 90% 0 1 Variance .0561896 95% 1 1 Skewness 3.715938 99% 1 1 Kurtosis 14.80819 Retained placenta at calving ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1,574 25% 0 0 Sum of Wgt. 1,574 50% 0 Mean .0946633 Largest Std. Dev. .2928423 75% 0 1 90% 0 1 Variance .0857566 95% 1 1 Skewness 2.769173 99% 1 1 Kurtosis 8.66832 Vaginal discharge observed ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1,574 25% 0 0 Sum of Wgt. 1,574 50% 0 Mean .0520966 Largest Std. Dev. .2222924 75% 0 1 90% 0 1 Variance .0494139 95% 1 1 Skewness 4.031139 99% 1 1 Kurtosis 17.25008 . . * correlation . corr milk120 parity herd_size (obs=1,536) | milk120 parity herd_s~e -------------+--------------------------- milk120 | 1.0000 parity | 0.3821 1.0000 herd_size | -0.0433 0.0356 1.0000 . pwcorr milk120 parity herd_size, obs star(0.05) | milk120 parity herd_s~e -------------+--------------------------- milk120 | 1.0000 | 1536 | parity | 0.3821* 1.0000 | 1536 1574 | herd_size | -0.0433 0.0386 1.0000 | 1536 1574 1574 | . . * indices . * no analyses . . * unconditional associations . reg cf parity Source | SS df MS Number of obs = 1,560 -------------+---------------------------------- F(1, 1558) = 1.46 Model | 699.698494 1 699.698494 Prob > F = 0.2273 Residual | 747280.894 1,558 479.641139 R-squared = 0.0009 -------------+---------------------------------- Adj R-squared = 0.0003 Total | 747980.592 1,559 479.782291 Root MSE = 21.901 ------------------------------------------------------------------------------ cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity | -.4480144 .3709323 -1.21 0.227 -1.175594 .2795649 _cons | 72.53554 1.155186 62.79 0.000 70.26965 74.80142 ------------------------------------------------------------------------------ . reg cf i.vag_disch Source | SS df MS Number of obs = 1,560 -------------+---------------------------------- F(1, 1558) = 0.49 Model | 236.166881 1 236.166881 Prob > F = 0.4831 Residual | 747744.425 1,558 479.938656 R-squared = 0.0003 -------------+---------------------------------- Adj R-squared = -0.0003 Total | 747980.592 1,559 479.782291 Root MSE = 21.908 ------------------------------------------------------------------------------ cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- vag_disch | yes | 1.743523 2.485484 0.70 0.483 -3.131724 6.61877 _cons | 71.21989 .5698436 124.98 0.000 70.10215 72.33763 ------------------------------------------------------------------------------ . . * principal components / factor analysis / correspondence analysis . * not covered . . * Functional Form of predictors . * residual plots . reg cf milk120 Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(1, 1523) = 0.72 Model | 329.282002 1 329.282002 Prob > F = 0.3977 Residual | 700869.964 1,523 460.19039 R-squared = 0.0005 -------------+---------------------------------- Adj R-squared = -0.0002 Total | 701199.246 1,524 460.104492 Root MSE = 21.452 ------------------------------------------------------------------------------ cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- milk120 | .0006667 .0007881 0.85 0.398 -.0008793 .0022126 _cons | 68.95612 2.595207 26.57 0.000 63.86556 74.04667 ------------------------------------------------------------------------------ . predict stdres, rsta (49 missing values generated) . . * lowess smoother . twoway (scatter stdres milk120) (lowess stdres milk120) . . * Detecting and Correcting for non-linearity (transformation of X) . * categorization of predictor . reg cf parity Source | SS df MS Number of obs = 1,560 -------------+---------------------------------- F(1, 1558) = 1.46 Model | 699.698494 1 699.698494 Prob > F = 0.2273 Residual | 747280.894 1,558 479.641139 R-squared = 0.0009 -------------+---------------------------------- Adj R-squared = 0.0003 Total | 747980.592 1,559 479.782291 Root MSE = 21.901 ------------------------------------------------------------------------------ cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity | -.4480144 .3709323 -1.21 0.227 -1.175594 .2795649 _cons | 72.53554 1.155186 62.79 0.000 70.26965 74.80142 ------------------------------------------------------------------------------ . egen parity_c6=cut(parity), at(0 1 2 3 4 5 6 15) icodes . tab parity parity_c6 Lactation | parity_c6 number | 1 2 3 4 5 6 | Total -----------+------------------------------------------------------------------+---------- 1 | 417 0 0 0 0 0 | 417 2 | 0 374 0 0 0 0 | 374 3 | 0 0 319 0 0 0 | 319 4 | 0 0 0 222 0 0 | 222 5 | 0 0 0 0 169 0 | 169 6 | 0 0 0 0 0 69 | 69 7 | 0 0 0 0 0 4 | 4 -----------+------------------------------------------------------------------+---------- Total | 417 374 319 222 169 73 | 1,574 . reg cf i.parity_c6 Source | SS df MS Number of obs = 1,560 -------------+---------------------------------- F(5, 1554) = 0.91 Model | 2174.44893 5 434.889785 Prob > F = 0.4761 Residual | 745806.143 1,554 479.926733 R-squared = 0.0029 -------------+---------------------------------- Adj R-squared = -0.0003 Total | 747980.592 1,559 479.782291 Root MSE = 21.907 ------------------------------------------------------------------------------ cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity_c6 | 2 | -.6468817 1.568168 -0.41 0.680 -3.722829 2.429066 3 | -3.181369 1.635853 -1.94 0.052 -6.390081 .027343 4 | -1.528531 1.831255 -0.83 0.404 -5.120523 2.063462 5 | -2.322236 2.004684 -1.16 0.247 -6.254406 1.609935 6 | -.9349232 2.781437 -0.34 0.737 -6.390688 4.520841 | _cons | 72.61985 1.077984 67.37 0.000 70.5054 74.73431 ------------------------------------------------------------------------------ . di "F-statistic = " ((747280.9-745806.1)/(1558-1554))/480 F-statistic = .768125 . di "P-value = " Ftail(4, 1554, ((747280.9-745806.1)/(1558-1554))/480) // R model is valid but term is NS P-value = .54594195 . . . *box cox did not work with raw need to use log(cf) . gen ln_cf=ln(cf) (14 missing values generated) . boxcox ln_cf milk120, model(rhs) Fitting full model Iteration 0: log likelihood = -183.07949 (not concave) Iteration 1: log likelihood = -180.48708 (not concave) Iteration 2: log likelihood = -180.09968 Iteration 3: log likelihood = -180.01808 Iteration 4: log likelihood = -180.01807 (not concave) Iteration 5: log likelihood = -180.01807 Iteration 6: log likelihood = -180.01806 Iteration 7: log likelihood = -180.01804 Iteration 8: log likelihood = -180.01803 (38 missing values generated) (38 missing values generated) (38 missing values generated) Number of obs = 1,525 LR chi2(2) = 7.91 Log likelihood = -180.01803 Prob > chi2 = 0.019 ------------------------------------------------------------------------------ ln_cf | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- /lambda | -2.997824 .0059421 -504.50 0.000 -3.009471 -2.986178 ------------------------------------------------------------------------------ Estimates of scale-variant parameters ---------------------------- | Coef. -------------+-------------- Notrans | _cons | -4.74e+08 -------------+-------------- Trans | milk120 | 1.42e+09 -------------+-------------- /sigma | .2722883 ---------------------------- --------------------------------------------------------- Test Restricted LR statistic P-value H0: log likelihood chi2 Prob > chi2 --------------------------------------------------------- lambda = -1 -181.5923 3.15 0.076 lambda = 0 -182.42186 4.81 0.028 lambda = 1 -183.07949 6.12 0.013 --------------------------------------------------------- . . * quadratic function of X . reg ln_cf c.milk120k##c.milk120k Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(2, 1522) = 4.78 Model | .709133443 2 .354566722 Prob > F = 0.0085 Residual | 112.944101 1,522 .074207688 R-squared = 0.0062 -------------+---------------------------------- Adj R-squared = 0.0049 Total | 113.653234 1,524 .074575613 Root MSE = .27241 --------------------------------------------------------------------------------------- ln_cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] ----------------------+---------------------------------------------------------------- milk120k | .2056932 .0697547 2.95 0.003 .0688677 .3425186 | c.milk120k#c.milk120k | -.0295122 .0105961 -2.79 0.005 -.0502966 -.0087277 | _cons | 3.883482 .1122207 34.61 0.000 3.663358 4.103605 --------------------------------------------------------------------------------------- . estat vif Variable | VIF 1/VIF -------------+---------------------- milk120k | 48.58 0.020585 c.milk120k#| c.milk120k | 48.58 0.020585 -------------+---------------------- Mean VIF | 48.58 . vce, corr Correlation matrix of coefficients of regress model | c.m~120k# e(V) | milk120k c.m~120k _cons -------------+------------------------------ milk120k | 1.0000 c.milk120k#| c.milk120k | -0.9897 1.0000 _cons | -0.9872 0.9559 1.0000 . . * re-doing the analysis with milk120 centred . summ milk120k Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- milk120k | 1,536 3.215096 .6981316 1.1102 5.6303 . gen m120k_ct=(milk120k-r(mean)) /*r(mean) - memory variable created by summ command that store the mean of the variable*/ (38 missing values generated) . reg ln_cf c.m120k_ct##c.m120k_ct Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(2, 1522) = 4.78 Model | .709133424 2 .354566712 Prob > F = 0.0085 Residual | 112.944101 1,522 .074207688 R-squared = 0.0062 -------------+---------------------------------- Adj R-squared = 0.0049 Total | 113.653234 1,524 .074575613 Root MSE = .27241 --------------------------------------------------------------------------------------- ln_cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] ----------------------+---------------------------------------------------------------- m120k_ct | .0159242 .0100484 1.58 0.113 -.0037859 .0356344 | c.m120k_ct#c.m120k_ct | -.0295122 .0105961 -2.79 0.005 -.0502966 -.0087277 | _cons | 4.239742 .0086679 489.13 0.000 4.22274 4.256745 --------------------------------------------------------------------------------------- . estat vif Variable | VIF 1/VIF -------------+---------------------- m120k_ct | 1.01 0.992010 c.m120k_ct#| c.m120k_ct | 1.01 0.992010 -------------+---------------------- Mean VIF | 1.01 . vce, corr Correlation matrix of coefficients of regress model | c.m120~t# e(V) | m120k_ct c.m120~t _cons -------------+------------------------------ m120k_ct | 1.0000 c.m120k_ct#| c.m120k_ct | -0.0894 1.0000 _cons | 0.0494 -0.5936 1.0000 . . capture drop stdres . predict stdres, rsta (49 missing values generated) . twoway (scatter stdres m120k_ct) (lowess stdres m120k_ct) . . * fractional polynomials . fp , scale center replace: reg ln_cf (fitting 44 models) (....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%) Fractional polynomial comparisons: ------------------------------------------------------------------------------- milk120k | df Deviance Res. s.d. Dev. dif. P(*) Powers -------------+----------------------------------------------------------------- omitted | 0 367.951 0.273 10.533 0.033 linear | 1 366.159 0.273 8.741 0.033 1 m = 1 | 2 361.396 0.273 3.978 0.138 -2 m = 2 | 4 357.418 0.272 0.000 -- -2 3 ------------------------------------------------------------------------------- (*) P = sig. level of model with m = 2 based on F with 1520 denominator dof. Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(2, 1522) = 5.27 Model | .782289941 2 .391144971 Prob > F = 0.0052 Residual | 112.870944 1,522 .074159622 R-squared = 0.0069 -------------+---------------------------------- Adj R-squared = 0.0056 Total | 113.653234 1,524 .074575613 Root MSE = .27232 ------------------------------------------------------------------------------ ln_cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- milk120k_1 | -.5301266 .1654544 -3.20 0.001 -.8546694 -.2055839 milk120k_2 | -.0008516 .0004271 -1.99 0.046 -.0016894 -.0000138 _cons | 4.238434 .008295 510.96 0.000 4.222163 4.254704 ------------------------------------------------------------------------------ . fp plot, r(none) ytitle(Predicted Ln(cf)) . fp plot, r(residuals) ytitle(Predicted Ln(cf)) /*plot with residuals*/ . . *more predictors . fp , scale center replace: reg ln_cf parity rp (fitting 44 models) (....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%) Fractional polynomial comparisons: ------------------------------------------------------------------------------- milk120k | df Deviance Res. s.d. Dev. dif. P(*) Powers -------------+----------------------------------------------------------------- omitted | 0 361.366 0.273 12.986 0.012 linear | 1 357.527 0.272 9.147 0.028 1 m = 1 | 2 351.183 0.272 2.803 0.248 -2 m = 2 | 4 348.380 0.272 0.000 -- -2 3 ------------------------------------------------------------------------------- (*) P = sig. level of model with m = 2 based on F with 1518 denominator dof. Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(4, 1520) = 4.91 Model | 1.44923978 4 .362309946 Prob > F = 0.0006 Residual | 112.203994 1,520 .073818417 R-squared = 0.0128 -------------+---------------------------------- Adj R-squared = 0.0102 Total | 113.653234 1,524 .074575613 Root MSE = .2717 ------------------------------------------------------------------------------ ln_cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- milk120k_1 | -.5748179 .1665152 -3.45 0.001 -.9014417 -.2481941 milk120k_2 | -.0007211 .0004312 -1.67 0.095 -.001567 .0001247 parity | -.0097322 .005001 -1.95 0.052 -.0195417 .0000773 rp | .053428 .0236505 2.26 0.024 .007037 .099819 _cons | 4.260094 .0162315 262.46 0.000 4.228255 4.291932 ------------------------------------------------------------------------------ . fp , scale center replace: reg ln_cf milk120k_1 milk120k_2 rp (fitting 44 models) (....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%) Fractional polynomial comparisons: ------------------------------------------------------------------------------- parity | df Deviance Res. s.d. Dev. dif. P(*) Powers -------------+----------------------------------------------------------------- omitted | 0 352.175 0.272 11.021 0.027 linear | 1 348.380 0.272 7.227 0.066 1 m = 1 | 2 344.414 0.271 3.261 0.198 -2 m = 2 | 4 341.154 0.271 0.000 -- 2 2 ------------------------------------------------------------------------------- (*) P = sig. level of model with m = 2 based on F with 1517 denominator dof. Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(5, 1519) = 5.39 Model | 1.97968732 5 .395937465 Prob > F = 0.0001 Residual | 111.673547 1,519 .073517806 R-squared = 0.0174 -------------+---------------------------------- Adj R-squared = 0.0142 Total | 113.653234 1,524 .074575613 Root MSE = .27114 ------------------------------------------------------------------------------ ln_cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- milk120k_1 | -.609147 .1666002 -3.66 0.000 -.9359378 -.2823563 milk120k_2 | -.0005881 .0004331 -1.36 0.175 -.0014377 .0002615 parity_1 | -.020564 .0064416 -3.19 0.001 -.0331993 -.0079287 parity_2 | .010903 .0035577 3.06 0.002 .0039244 .0178816 rp | .0572608 .0236445 2.42 0.016 .0108815 .1036401 _cons | 4.216664 .0107005 394.06 0.000 4.195674 4.237653 ------------------------------------------------------------------------------ . fp plot, r(none) ytitle(Predicted Ln(cf)) . fp plot, r(residuals) ytitle(Predicted Ln(cf)) /*plot with residuals*/ . . *or you can predict fitted values and standardized residuals and plot them . capture drop fit1 . capture drop stdres . predict fit1 (option xb assumed; fitted values) (49 missing values generated) . predict stdres, rstand (49 missing values generated) . twoway (scatter stdres milk120k) (lowess stdres milk120k) . . * interactions . * 2-way . reg wpc_sqrt c.hs_ct##c.hs_ct i.aut_clv i.twin i.dyst##i.vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(7, 1566) = 19.94 Model | 1089.06861 7 155.581229 Prob > F = 0.0000 Residual | 12217.1982 1,566 7.80153144 R-squared = 0.0818 -------------+---------------------------------- Adj R-squared = 0.0777 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7931 --------------------------------------------------------------------------------- wpc_sqrt | Coef. Std. Err. t P>|t| [95% Conf. Interval] ----------------+---------------------------------------------------------------- hs_ct | .0124973 .0012109 10.32 0.000 .0101222 .0148724 | c.hs_ct#c.hs_ct | .0000729 .0000174 4.20 0.000 .0000389 .0001069 | 1.aut_clv | -.5220441 .1417584 -3.68 0.000 -.8001004 -.2439877 | twin | yes | 1.533154 .5476246 2.80 0.005 .4589989 2.607308 | dyst | yes | .7636387 .3243731 2.35 0.019 .1273874 1.39989 | vag_disch | yes | .979344 .3502916 2.80 0.005 .2922541 1.666434 | dyst#vag_disch | yes#yes | -2.272293 .8833614 -2.57 0.010 -4.004989 -.5395972 | _cons | 7.631652 .1213388 62.90 0.000 7.393648 7.869655 --------------------------------------------------------------------------------- . . * Selection criteria . * non-statistical . * no analyses . . * statistical - nested models . * Wald test . reg wpc_sqrt c.hs_ct##c.hs_ct i.aut_clv i.twin i.dyst##i.vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(7, 1566) = 19.94 Model | 1089.06861 7 155.581229 Prob > F = 0.0000 Residual | 12217.1982 1,566 7.80153144 R-squared = 0.0818 -------------+---------------------------------- Adj R-squared = 0.0777 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7931 --------------------------------------------------------------------------------- wpc_sqrt | Coef. Std. Err. t P>|t| [95% Conf. Interval] ----------------+---------------------------------------------------------------- hs_ct | .0124973 .0012109 10.32 0.000 .0101222 .0148724 | c.hs_ct#c.hs_ct | .0000729 .0000174 4.20 0.000 .0000389 .0001069 | 1.aut_clv | -.5220441 .1417584 -3.68 0.000 -.8001004 -.2439877 | twin | yes | 1.533154 .5476246 2.80 0.005 .4589989 2.607308 | dyst | yes | .7636387 .3243731 2.35 0.019 .1273874 1.39989 | vag_disch | yes | .979344 .3502916 2.80 0.005 .2922541 1.666434 | dyst#vag_disch | yes#yes | -2.272293 .8833614 -2.57 0.010 -4.004989 -.5395972 | _cons | 7.631652 .1213388 62.90 0.000 7.393648 7.869655 --------------------------------------------------------------------------------- . testparm i.dyst#i.vag_disch ( 1) 1.dyst#1.vag_disch = 0 F( 1, 1566) = 6.62 Prob > F = 0.0102 . . * AIC and BIC . * statistical non-nested - this in logistic regression . . *Selection strategy . use coleman.dta, clear . rename x1 x1_staff_sal . rename x2 x2_father_job . rename x3 x3_ses . rename x4 x4_test_teach . rename x5 x5_edu_mother . rename y y_test_scr . * forward selection . stepwise, pe(0.1): reg y_test_scr x1_staff_sal x2_father_job x3_ses x4_test_teach x5_edu_mother begin with empty model p = 0.0000 < 0.1000 adding x3_ses p = 0.0566 < 0.1000 adding x4_test_teach Source | SS df MS Number of obs = 20 -------------+---------------------------------- F(2, 17) = 66.95 Model | 570.497872 2 285.248936 Prob > F = 0.0000 Residual | 72.4264222 17 4.26037777 R-squared = 0.8873 -------------+---------------------------------- Adj R-squared = 0.8741 Total | 642.924294 19 33.8381207 Root MSE = 2.0641 ------------------------------------------------------------------------------- y_test_scr | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- x3_ses | .5415607 .0500441 10.82 0.000 .4359768 .6471445 x4_test_teach | .7498915 .3666402 2.05 0.057 -.0236517 1.523435 _cons | 14.5827 9.175409 1.59 0.130 -4.775723 33.94112 ------------------------------------------------------------------------------- . * backward elimination . stepwise, pr(0.1): reg y_test_scr x1_staff_sal x2_father_job x3_ses x4_test_teach x5_edu_mother begin with full model p = 0.4267 >= 0.1000 removing x2_father_job p = 0.6863 >= 0.1000 removing x5_edu_mother p = 0.1616 >= 0.1000 removing x1_staff_sal Source | SS df MS Number of obs = 20 -------------+---------------------------------- F(2, 17) = 66.95 Model | 570.497872 2 285.248936 Prob > F = 0.0000 Residual | 72.4264222 17 4.26037777 R-squared = 0.8873 -------------+---------------------------------- Adj R-squared = 0.8741 Total | 642.924294 19 33.8381207 Root MSE = 2.0641 ------------------------------------------------------------------------------- y_test_scr | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- x3_ses | .5415607 .0500441 10.82 0.000 .4359768 .6471445 x4_test_teach | .7498915 .3666402 2.05 0.057 -.0236517 1.523435 _cons | 14.5827 9.175409 1.59 0.130 -4.775723 33.94112 ------------------------------------------------------------------------------- . * stepwise selection . stepwise, pe(0.1) pr(0.11): reg y_test_scr x1_staff_sal x2_father_job x3_ses x4_test_teach x5_edu_mother begin with full model p = 0.4267 >= 0.1100 removing x2_father_job p = 0.6863 >= 0.1100 removing x5_edu_mother p = 0.1616 >= 0.1100 removing x1_staff_sal Source | SS df MS Number of obs = 20 -------------+---------------------------------- F(2, 17) = 66.95 Model | 570.497872 2 285.248936 Prob > F = 0.0000 Residual | 72.4264222 17 4.26037777 R-squared = 0.8873 -------------+---------------------------------- Adj R-squared = 0.8741 Total | 642.924294 19 33.8381207 Root MSE = 2.0641 ------------------------------------------------------------------------------- y_test_scr | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- x3_ses | .5415607 .0500441 10.82 0.000 .4359768 .6471445 x4_test_teach | .7498915 .3666402 2.05 0.057 -.0236517 1.523435 _cons | 14.5827 9.175409 1.59 0.130 -4.775723 33.94112 ------------------------------------------------------------------------------- . . * Reliability . **leave-one-out . ***computing "n" regression models without obs "i" using residual and leverage values . ***and then calculating PRESS (predicted residual sum of squares) . use daisy2red1, clear . reg wpc_sqrt c.hs_ct##c.hs_ct parity1 i.aut_clv i.twin i.dyst##i.vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(8, 1565) = 17.67 Model | 1102.29212 8 137.786515 Prob > F = 0.0000 Residual | 12203.9747 1,565 7.79806691 R-squared = 0.0828 -------------+---------------------------------- Adj R-squared = 0.0782 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7925 --------------------------------------------------------------------------------- wpc_sqrt | Coef. Std. Err. t P>|t| [95% Conf. Interval] ----------------+---------------------------------------------------------------- hs_ct | .0124288 .0012117 10.26 0.000 .010052 .0148056 | c.hs_ct#c.hs_ct | .0000717 .0000174 4.13 0.000 .0000377 .0001058 | parity1 | .0624408 .04795 1.30 0.193 -.0316122 .1564939 1.aut_clv | -.5313957 .1419088 -3.74 0.000 -.8097471 -.2530443 | twin | yes | 1.484421 .5487805 2.70 0.007 .407999 2.560844 | dyst | yes | .8159669 .3267812 2.50 0.013 .1749918 1.456942 | vag_disch | yes | .9922199 .3503534 2.83 0.005 .3050084 1.679431 | dyst#vag_disch | yes#yes | -2.257598 .8832373 -2.56 0.011 -3.990051 -.5251447 | _cons | 7.529392 .1445102 52.10 0.000 7.245938 7.812846 --------------------------------------------------------------------------------- . capture drop res lev eq1 . predict res, res . predict lev ,lev . gen eq1=(res/(1-lev))^2 . summ eq1 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- eq1 | 1,574 7.851331 10.9765 1.85e-07 83.92817 . di "PRESS = " r(sum) //error sum square from predicted points PRESS = 12357.995 . scalar PRESS = r(sum) . . di "RSS full= "e(rss) //residuals sum square from full model RSS full= 12203.975 . scalar TSS = (e(mss) + e(rss)) . . di "R2(pred) = "1-(PRESS/ TSS) R2(pred) = .07126506 . di "R2 full model = " e(r2) R2 full model = .08284007 . . * Presenting results . * standardize coefficients . reg wpc_sqrt c.hs_ct##c.hs_ct parity1 i.aut_clv i.twin i.dyst##i.vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(8, 1565) = 17.67 Model | 1102.29212 8 137.786515 Prob > F = 0.0000 Residual | 12203.9747 1,565 7.79806691 R-squared = 0.0828 -------------+---------------------------------- Adj R-squared = 0.0782 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7925 --------------------------------------------------------------------------------- wpc_sqrt | Coef. Std. Err. t P>|t| [95% Conf. Interval] ----------------+---------------------------------------------------------------- hs_ct | .0124288 .0012117 10.26 0.000 .010052 .0148056 | c.hs_ct#c.hs_ct | .0000717 .0000174 4.13 0.000 .0000377 .0001058 | parity1 | .0624408 .04795 1.30 0.193 -.0316122 .1564939 1.aut_clv | -.5313957 .1419088 -3.74 0.000 -.8097471 -.2530443 | twin | yes | 1.484421 .5487805 2.70 0.007 .407999 2.560844 | dyst | yes | .8159669 .3267812 2.50 0.013 .1749918 1.456942 | vag_disch | yes | .9922199 .3503534 2.83 0.005 .3050084 1.679431 | dyst#vag_disch | yes#yes | -2.257598 .8832373 -2.56 0.011 -3.990051 -.5251447 | _cons | 7.529392 .1445102 52.10 0.000 7.245938 7.812846 --------------------------------------------------------------------------------- . * rescale coefficient by SD of the predictor . matrix define coeff = e(b)' . capture drop coeff* . svmat coeff . summ hs_ct parity1 aut_clv twin dyst vag_disch Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- hs_ct | 1,574 .0076239 62.01692 -126 82 parity1 | 1,574 1.729987 1.493841 0 6 aut_clv | 1,574 .4720457 .4993766 0 1 twin | 1,574 .0171537 .1298854 0 1 dyst | 1,574 .0597205 .2370435 0 1 -------------+--------------------------------------------------------- vag_disch | 1,574 .0520966 .2222924 0 1 . . * iqr . reg wpc_sqrt c.hs_ct##c.hs_ct parity1 i.aut_clv i.twin i.dyst##i.vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(8, 1565) = 17.67 Model | 1102.29212 8 137.786515 Prob > F = 0.0000 Residual | 12203.9747 1,565 7.79806691 R-squared = 0.0828 -------------+---------------------------------- Adj R-squared = 0.0782 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7925 --------------------------------------------------------------------------------- wpc_sqrt | Coef. Std. Err. t P>|t| [95% Conf. Interval] ----------------+---------------------------------------------------------------- hs_ct | .0124288 .0012117 10.26 0.000 .010052 .0148056 | c.hs_ct#c.hs_ct | .0000717 .0000174 4.13 0.000 .0000377 .0001058 | parity1 | .0624408 .04795 1.30 0.193 -.0316122 .1564939 1.aut_clv | -.5313957 .1419088 -3.74 0.000 -.8097471 -.2530443 | twin | yes | 1.484421 .5487805 2.70 0.007 .407999 2.560844 | dyst | yes | .8159669 .3267812 2.50 0.013 .1749918 1.456942 | vag_disch | yes | .9922199 .3503534 2.83 0.005 .3050084 1.679431 | dyst#vag_disch | yes#yes | -2.257598 .8832373 -2.56 0.011 -3.990051 -.5251447 | _cons | 7.529392 .1445102 52.10 0.000 7.245938 7.812846 --------------------------------------------------------------------------------- . summ parity1,d /* note IQR = 3 - 0 = 3 */ parity1 ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1,574 25% 0 0 Sum of Wgt. 1,574 50% 1 Mean 1.729987 Largest Std. Dev. 1.493841 75% 3 6 90% 4 6 Variance 2.23156 95% 4 6 Skewness .5450922 99% 5 6 Kurtosis 2.315593 . /* coefficient for parity = 0.062 */ . end of do-file . log close name: log: c:\vhm812-data\l3b_model_build_II.txt log type: text closed on: 22 Jan 2019, 09:08:04 ------------------------------------------------------------------------------------------------------------------------------------------------------------