Assignment III for Biostats Course VHM 801 at AVC - Fall semester 2022 (slightly updated)

The assignment is worth either 10% or 15% of the final course mark. Questions 1-2 constitute an assignment for 10%, whereas Questions 1-3 constitute an assignment for 15%. You must choose the version (percentage) you did not choose for home assignment 2. Please be aware that by handing in the home assignment you implicitly acknowledge to have read and accepted the instructions for home assignments as described on the VHM 801 homepage.

The assignment is a continuation of the first home assignment about a study of measurements on and gradings of beef cattle carcasses. You may want to revisit the first home assignment as a preparation for this assignment, and you will use the previously described and supplied dataset. A full multivariable analysis of the data is beyond the scope of the course. We will instead use the data to address specific questions that would normally not be addressed separately but be either a preparation for or part of a full analysis.

  1. The focus of the study is on the carcass grades and their association with the different quantitative measurements. As we do not have the statistical tools to study the effects of quantitative variables on a categorical outcome (such as the multinomial regressions reported in the journal article), we will establish these effects by less direct approaches. The first such approach will use the outcome variable of interest (i.e., grade) to define groups among which a continuous variable can be compared. (As a general example, to assess an association between "age" and a dichotomous outcome taking the values "yes" and "no", compare the age distribution among subjects with outcome "yes" and the age distribution among subjects with outcome "no". A difference in the age distribution between these two groups would correspond to an association between "age" and the dichotomous outcome.)

    This question will involve two of the quantitative variables for the carcasses. For one of them you are required to focus on inference for the mean(s) and for the other one you wil focus on the median(s). You may select the variables as you want, but you should critically discuss the validity of the assumptions needed for your analyses, and ideally you should choose variables where you think those assumptions are satisfied to a reasonable degree. For each of your two selected variables, complete the following steps.

    1. Compute 95% confidence intervals, separately for each of the grade categories, for the parameter in question representing the center of the distribution of your selected variable. Discuss whether the assumptions for the confidence intervals are met. It is allowed to carry out your calculations on a transformed scale, but then you need to bring the estimates and confidence intervals back to original scale. (Software hint: To carry out separate analyses for each grade, the Data-Split Worksheet or the Data-Unstack menus in Minitab, or the bysort variable: prefix to Stata commands, may come in handy.)

    2. Next, use a (single) statistical test to compare your variable and parameter in question among the grade categories. As grade has three categories, you may either use methods to compare multiple samples (covered in Session 9 of VHM 801) or create a dichotomous version of the grading, by either combining two categories or omitting one category. If you choose to work with a dichotomous grade, explain and justify your chosen data modification. Discuss whether the assumptions for the test is met. Summarize your analysis in a conclusion about the relation between your selected variable and the carcass grade; note that your conclusion should cover both the statistical significance and a summary of what the analysis tells you about how the carcass grade and your selected variable are related.

  2. Another approach for studying the relation between a quantitative and a categorical variable is to construct a categorical version of the quantitative variable (whereby the task becomes to study the relation between two categorical variables). The categorical version is determined from cutpoints selected within the range of the quantitative variable. (As a general example, "age" (of adult persons) could be categorized into the categories "<30", "30-49", "50-69" and ">=70".) The cutpoints may be chosen as biologically relevant or may be chosen from percentiles in the variable's distribution. One should select the (number of) categories so as to retain as much information as possible while avoiding categories with too few observations.

    Complete the following steps for each of the two variables from the first question. Divide its range into suitable categories (as described above), and assess its association with carcass grade. Your analysis should include both a descriptive component and a statistical assessment of the statistical significance of the association studied. Include estimates and confidence intervals for the proportion of grade AAA carcasses for each of your variable's categories. Draw conclusions, both about the statistical significance and about the relation between your variable and grade, and compare also your findings with those of the previous analysis for the same variable. (Software hint: For construction of a categorical version of a variable, the Data-Recode menu in Minitab, or the egen cut command in Stata, may come in handy.) Note: The calculation of grade AAA proportion confidence intervals may be limited to one of your variables.

  3. In the last question of home assignment I, you discussed the relation between implant and grade and how this relation might be affected by lurking variables. For a categorical lurking variable, we now have the tools to quantitatively study the associations involved based on the observed data. Select a candidate lurking variable among those present in the data, and carry out statistical analyses to explore all (three) relations involved in the causal diagram previously established. You are not limited to lurking variables that you included in your first home assignment, and as described in the solution for home assignment I one plausible lurking variable is the farm; however, you may choose any categorical variable in the dataset as your candidate lurking variable. Make sure to include detailed conclusions from your analyses. Note that you may not be able to reach a definitive conclusion about whether the lurking variable affects the relation between implant and grade, but you should conclude as much as you can from your analyses.

Henrik Stryhn (hstryhn@upei.ca) 2022-11-09