Supplementary Exercise 2.50 of IPS7e ------------------------------------ Measurements of the concentration of fenthion (a pesticide) in olive oil after 28-365 days (presumably after exposure). On each measurement day, 5 concentration values were obtained; it is not clear whether these are multiple measurements of the same sample or different samples, or whether the samples were measured repeatedly over time. The simple exponential decay model would correspond to a linear relation for logarithmic concentrations: ln(conc) = ln(c0)-k*t, and the interest is in fitting a linear regression model to logarithmic concentrations as a function of time. The model assumes ln(conc_i) = beta_0 + beta_1*day_i + epsilon_i, i=1,...,25 where the errors (epsilon_i) are i.i.d. from N(0,sigma). Note that the replication of observations for 5 different days has no direct impact on the regression model and its assumptions. (a)+(b) Minitab commands and output: MTB > WOpen "H:\VHM\VHM801\Datasets\Minitab\Chapter 2\ex02_050.mtw". Retrieving worksheet from file: 'H:\VHM\VHM801\Datasets\Minitab\Chapter 2\ex02_050.mtw' Worksheet was saved on 15/11/2014 MTB > Name C3 'lnconc' MTB > Let 'lnconc' = ln('fenthion') MTB > Fitline 'lnconc' 'days'; SUBC> GFourpack; SUBC> RType 2; SUBC> Confidence 95.0. Regression Analysis: lnconc versus days The regression equation is lnconc = - 0.03412 - 0.000507 days S = 0.0243845 R-Sq = 87.6% R-Sq(adj) = 87.0% Analysis of Variance Source DF SS MS F P Regression 1 0.096286 0.0962858 161.93 0.000 Error 23 0.013676 0.0005946 Total 24 0.109962 Fitted Line: lnconc versus days Residual Plots for lnconc MTB > Fitline 'lnconc' 'days'; SUBC> Poly 2; SUBC> GFourpack; SUBC> RType 2; SUBC> Confidence 95.0. Polynomial Regression Analysis: lnconc versus days The regression equation is lnconc = - 0.01769 - 0.000786 days + 0.000001 days^2 S = 0.0233227 R-Sq = 89.1% R-Sq(adj) = 88.1% Analysis of Variance Source DF SS MS F P Regression 2 0.097995 0.0489974 90.08 0.000 Error 22 0.011967 0.0005439 Total 24 0.109962 Sequential Analysis of Variance Source DF SS F P Linear 1 0.0962858 161.93 0.000 Quadratic 1 0.0017090 3.14 0.090 Fitted Line: lnconc versus days Residual Plots for lnconc Answers to questions: (a) The fitted line plot shows a fairly equal scatter of points around the estimated line for each of the 5 time points. Actually the line has two points above and two points below it for all days except day 183; this may seem curious with 5 observations per time point but there one set of tied values for each time point. It is not obvious from the graph whether the fit can be improved by allowing for a non-linear relation; we will explore this question below in (b). Our first assessment, however, is that the data are described reasonably well by the linear relation. (b) The estimated linear relation is: lnconc = -0.03412 - 0.000507 days. The estimated slope gives our value for k, except for a sign change: k=0.000507. The value of k is numerically very small but this reflects that the change in concentration is also quite small over a large storage time. The estimated intercept can be exponentiated to give an estimate of the constant C0: C0= exp(-0.03412) = 0.966. One simple way of exploring whether the fit of the model can be improved by allowing for a non-linear relation with time is to add a quadratic term (of time) to the equation: ln(conc_i) = beta_0 + beta_1*day_i + beta_2*(day_i^2) + epsilon_i. The fit of this model is also shown above, and visually there seems to be some improvement by allowing a curvature (which turns out to be positive, corresponding to an upwards bending curve). The Minitab listing also gives a significance test for this added quadratic term: F=3.14, P=0.092. Without going into details with how this is computed (this is discussed in the VHM 802 and VHM 812 courses), we can conclude, by the non-significant test result, that there is no clear evidence of a non-linearity in the relation.