Assignment III for Biostats Course VHM 801 at AVC - Fall semester 2022 (slightly updated)
The assignment is worth either 10% or 15% of the final course mark. Questions 1-2 constitute an
assignment for 10%, whereas Questions 1-3 constitute an assignment for 15%. You must choose the version (percentage)
you did not choose for home assignment 2. Please be aware that by handing
in the home assignment you implicitly acknowledge to have read and accepted
the instructions for home assignments as described
on the VHM 801 homepage.
The assignment is a continuation of the first home
assignment about a study of measurements on and gradings of beef cattle carcasses.
You may want to revisit the first home assignment as a preparation
for this assignment, and you will use the previously described and
supplied dataset. A full multivariable analysis of the data is beyond
the scope of the course. We will instead use the data to address
specific questions that would normally not be addressed separately but
be either a preparation for or part of a full analysis.
-
The focus of the study is on the carcass grades and their association with
the different quantitative measurements. As we do not have the
statistical tools to study the effects of quantitative variables on a
categorical outcome (such as the multinomial regressions reported in
the journal article), we will establish these effects by less direct approaches. The
first such approach will use the outcome
variable of interest (i.e., grade) to define groups among which a continuous variable can be
compared. (As a general example, to assess an association between "age"
and a dichotomous outcome taking the values "yes" and "no", compare the
age distribution among subjects with outcome "yes" and the age distribution
among subjects with outcome "no". A difference in the age distribution between these two
groups would correspond to an association between "age" and the dichotomous outcome.)
This question will involve two of the quantitative variables for the
carcasses. For one of them you are required to focus on inference for the mean(s)
and for the other one you wil focus on the median(s). You may select the
variables as you want, but you should critically discuss the validity of
the assumptions needed for your analyses, and ideally you should choose
variables where you think those assumptions are satisfied to a
reasonable degree. For each of your two selected variables, complete the
following steps.
-
Compute 95% confidence intervals, separately for each of the grade categories, for the parameter
in question representing the center of the distribution of your selected variable. Discuss whether
the assumptions for the confidence intervals are met. It is allowed to carry out your calculations
on a transformed scale, but then you need to bring the estimates and confidence intervals
back to original scale. (Software hint: To carry out separate analyses for each grade,
the Data-Split Worksheet or the Data-Unstack menus in Minitab, or
the bysort variable: prefix to Stata commands, may come in handy.)
-
Next, use a (single) statistical test to compare your variable and parameter
in question among the grade categories. As grade has three categories, you may
either use methods to compare multiple samples (covered in Session
9 of VHM 801) or create a dichotomous version of the grading, by either combining
two categories or omitting one category. If you choose to work with a dichotomous grade,
explain and justify your chosen data modification. Discuss whether
the assumptions for the test is met. Summarize your
analysis in a conclusion about the relation between your selected variable and the carcass grade; note that
your conclusion should cover both the statistical significance and a summary of what the analysis tells you
about how the carcass grade and your selected variable are related.
-
Another approach for studying the relation between a quantitative
and a categorical variable is to construct a categorical version of the
quantitative variable (whereby the task becomes to study the relation
between two categorical variables). The categorical version is
determined from cutpoints selected within the range of the quantitative variable.
(As a general example, "age" (of adult persons) could be categorized
into the categories "<30", "30-49", "50-69" and ">=70".) The cutpoints
may be chosen as biologically relevant or may be chosen from percentiles
in the variable's distribution. One should select the (number of) categories so as to retain as much
information as possible while avoiding categories with too few observations.
Complete the following steps for each of the two variables from the first question. Divide its range into suitable categories (as described above),
and assess its association with carcass grade. Your analysis should include both a
descriptive component and a statistical assessment of the statistical
significance of the association studied. Include estimates and confidence intervals for the proportion
of grade AAA carcasses for each of your variable's categories. Draw conclusions, both about
the statistical significance and about the relation between your variable and grade, and compare also your findings with
those of the previous analysis for the same variable.
(Software hint: For construction
of a categorical version of a variable, the Data-Recode menu in Minitab, or
the egen cut command in Stata, may come in handy.) Note: The calculation of grade AAA proportion confidence intervals may
be limited to one of your variables.
-
In the last question of home assignment I, you
discussed the relation between implant and grade and how
this relation might be affected by lurking variables. For a categorical
lurking variable, we now have the
tools to quantitatively study the associations involved based on the
observed data. Select a candidate lurking variable among
those present in the data, and carry out statistical analyses to
explore all (three) relations involved in the causal diagram previously
established. You are not limited to lurking variables that you
included in your first home assignment, and as described in the
solution for home assignment I one plausible
lurking variable is the farm; however, you may choose any categorical
variable in the dataset as your candidate lurking variable. Make sure
to include detailed conclusions from your analyses. Note that you may
not be able to reach a definitive conclusion about whether the lurking
variable affects the relation between implant and grade,
but you should conclude as much as you can from your analyses.
Henrik Stryhn
(hstryhn@upei.ca) 2022-11-09