Supplementary Exercises 2.12, 2.27, 2.28 and 2.60 of IPS7e ---------------------------------------------------------- Data on a car's (a British Ford Escort) speed and fuel. The fuel consumption is a response variable, whereas the speed is a controlled variable (set at 10 to 150 km/h in steps of 10 km/h) and is therefore an explanatory variable. 2.12: ----- Minitab commands for requested plot. MTB > WOpen "H:\VHM\VHM801\Datasets\Minitab\Chapter 2\ex02_012.mtw". Retrieving worksheet from file: 'H:\VHM\VHM801\Datasets\Minitab\Chapter 2\ex02_012.mtw' Worksheet was saved on 07/11/2014 MTB > Plot 'fuel'*'speed'; SUBC> Symbol. Scatterplot of fuel vs speed Answers to questions: (a) We put the speed on the x-axis because it is an explanatory variable. (b) The curve is composed by a decreasing part and an increasing part, and therefore not linear. The fuel consumption is minimal for a speed around 60 km/h. The increase in fuel when speeding more than 60 km/h is almost linear, but the increase when driving slower than 60 km/h is quite curved. The relationship probably makes sense because car engines were constructed to perform optimally at a certain speed. (c) The association is positive above 60 km/h and negative below. (d) The association seems quite strong because the points seem to follow a very regular pattern (which is just not linear). 2.27: ----- Before carrying out the requested computation, a note about the interpretation of a correlation when one variable is an explanatory variable. Strictly speaking, the correlation only makes sense for a pair of response variables (because the explanatory variable, in this case the speed, does not have a sampling distribution). For one explanatory and one response variable it is more appropriate to compute the linear regression equation. The correlation has strong links with that equation and may therefore be interpreted indirectly from the linear regression equation. The P-value is the same for correlation and linear regression, and therefore also valid. For the sake of this exercise, we ignore the problem with interpreting the correlation when one variable is explanatory (controlled), but in a real example this use of the correlation is not recommended. Minitab commands and output (continuing on the same worksheet as above): MTB > Correlation 'speed' 'fuel'. Correlation: speed, fuel Pearson correlation of speed and fuel = -0.172 P-Value = 0.541 Answer to question: ------------------- The correlation is quite close to zero because there is no strong linear association; the association is clearly non-linear. In this case, the correlation does not give an adequate summary of the strength and direction of the association. 2.28: ----- (a) To measure the speed in miles per hour, divide the values by 1.609. To measure the fuel consumption in gallons per mile, multiply the values by 1.609/3.785. The pattern in the graph is unchanged (only the axes have changed), and the correlation is unchanged from above. Minitab commands and output (continuing on the same worksheet as above): MTB > Name C3 'speed_mph' MTB > Let 'speed_mph' = 'speed'/1.609 MTB > Name C4 'fuel_gpm' MTB > Let 'fuel_gpm' = 'fuel'*(1.609/3.785) MTB > Plot 'fuel_gpm'*'speed_mph'; SUBC> Symbol. Scatterplot of fuel_gpm vs speed_mph MTB > Correlation 'speed_mph' 'fuel_gpm'. Correlation: speed_mph, fuel_gpm Pearson correlation of speed_mph and fuel_gpm = -0.172 P-Value = 0.541 (b) Minitab commands and output (continuing on the same worksheet as above): MTB > Name C5 'milespergallon' MTB > Let 'milespergallon' = 1/'fuel_gpm' MTB > Plot 'Milespergallon'*'speed'; SUBC> Symbol. Scatterplot of Milespergallon vs speed MTB > Correlation 'Milespergallon' 'speed'. Correlation: Milespergallon, speed Pearson correlation of Milespergallon and speed = -0.043 P-Value = 0.879 Answer to question: ------------------- The correlation is now -0.043, and has changed from above (there is no easy formula to know what it would be without actually doing the transformation and computing the value directly). However, the correlation still does not give an adequate impression of the strength and direction of the association. 2.60: ----- We fit a linear regression model (for demonstration purposes, because we have already demonstrated that this is not an appropriate model): Minitab commands and output (continuing on the same worksheet as above): MTB > Name c4 "RESI1" MTB > Fitline 'fuel' 'speed'; SUBC> GFourpack; SUBC> GVars 'speed'; SUBC> RType 2; SUBC> Confidence 95.0; SUBC> Resid 'RESI1'. Regression Analysis: fuel versus speed The regression equation is fuel = 11.06 - 0.01466 speed S = 3.90475 R-Sq = 2.9% R-Sq(adj) = 0.0% Analysis of Variance Source DF SS MS F P Regression 1 6.015 6.0153 0.39 0.541 Error 13 198.211 15.2470 Total 14 204.227 Fitted Line: fuel versus speed Residual Plots for fuel Residuals from fuel vs speed MTB > Sum 'RESI1'. Sum of RESI1 Sum of RESI1 = -9.23706E-14 Answers to questions: --------------------- (a) We used the Fitted Line Plot menu to plot the observations with the regression line overlaid. (b) The calculation above (from the Calc-Column Statistics menu) showed that the sum of the residuals (stored from the regression in the Fitted Line Plot menu) is essentially zero (up to the numerical precision of the software). (c) The residual plot (requested in the Graphs submenu of the Fitted Line Plot menu) shows a similar pattern. Note that when the residuals are plotted against the fitted values, the pattern is reversed along the x-axis from the original data, due to the estimated negative slope. As expected, the output from the regression model also demonstrates the inappropriateness of the model because one of its major assumptions, the linear relation between x and y, is violated.