Assignment II for Biostats Course VHM 801 at AVC - Fall semester 2024

The assignment is worth either 10% or 15% of the final course mark. Questions 1-4 constitute an assignment for 10%, whereas Questions 1-6 constitute an assignment for 15%. Home assignment III will be for either 15% or 10% depending on whether you chose the 10% or 15% version for this assignment, respectively. You need to indicate clearly which version of the assignment you answer. Please be aware that by handing in the home assignment you implicitly acknowledge to have read and accepted the instructions for home assignments as described on the VHM 801 homepage.

The assignment is a continuation of the first home assignment on blood cholesterol measurements for a subset of participants in the Framingham heart study. For the first two questions, we will continue to work with the data provided for the first home assignment (Minitab and .csv formats), whereas the last four questions will use a different version of the data (described below).

  1. Question 3 of the first home assignment included informal comparisons between the cholesterol values in the Framingham dataset for both women and men in the age group 35-44 years with the respective mean values reported in a national survey for a comparable time period. The task now is to carry out statistical analyses (separately, for both men and women) to investigate whether the Framingham data in this age group seem to differ in their means from the corresponding values in the national survey. State your statistical models/assumptions explicitly and discuss critically to which extent these seem to be met. Based on this discussion, assess whether the statistical results should be considered as exact or approximate. Give 95% confidence intervals for the means (for men and women) of the Framingham population and interpret these carefully. Additionally, carry out statistical tests of relevant hypotheses to investigate the question of interest. Draw conclusions from the statistical analysis, and indicate how you confident you are in your conclusions (e.g., weak or strong confidence).

  2. Continuing with the data of the previous question, carry out a statistical analysis to compare the mean cholesterol levels for men and women in the 35-44 years age group in the Framingham data. Include also here the statistical model/assumptions with the specific discussions outlined in the previous question, a 95% confidence interval and a statistical test relevant for the question studied, and draw conclusions.

The next four questions will use a version of the data extracted from the Framingham heart study that includes cholesterol values measured for the same subjects bi-yearly (i.e., every two years) over a 10-year period (Minitab and .csv formats). The values included in the first dataset were those obtained at the first measurement. In the expanded dataset, these values are included in the variable chol0; the values at subsequent years (2,4,6,8,10) are denoted chol2,...,chol10. Only subjects with a complete series of measurements are included, reducing the size of the dataset to 133 subjects. The other variables are unchanged from the first dataset.

  1. Discuss in general terms the implications (factual and/or potential) of restricting an analysis to the subset of the original data consisting of all complete series over 10 years. You could for example address the reference population for the analysis and potential biases or confounding/lurking variables. For this question you are not expected to carry out any major analyses, although calculations supporting your arguments are allowed.

  2. Using data across all ages but separately for women and men, carry out statistical analysies to compare the cholesterol values for year 10 to baseline (i.e., the values at year 0) in order to investigate whether the cholesterol values show change in their mean over these 10 years (as the study persons grew older). Include also here in each analysis the statistical model/assumptions and discussions along the lines of the previous questions as well as a relevant confidence interval and test. Draw conclusions and, if you're not continuing to Questions 5-6, summarize your findings from Questions 1-4 to a brief statement about the findings of your analyses and what they tell us about the cholesterol levels among individuals in the Framingham population.

The final two questions are for the 15% version of the home assignment only. Unless you have indicated otherwise, if you submit answers to any of these questions your assignment will be evaluated towards 15%.

  1. Continuing from Question 4, still using data across all ages and both gender groups (separately), carry out statistical analyses to compare the cholesterol values for years 2,...,10 to baseline (i.e., the values at year 0) in order to investigate whether the cholesterol values show change in their mean over these years. You are expected to analyse each of the years 2,...,10 separately (i.e., not include them all in a combined analysis); you don't need to repeat the analyses for year 10 from Question 4 here. Determine, if possible, when such a change can first be established. Include also here the statistical model/assumptions and discussions along the lines of the previous questions; however, confine the discussions to the most important points. Summarise your findings across the different analyses to answer the main questions posed for this point.

  2. Irrespective of your results in the previous question, carry out additional statistical analyses to compare the changes at years 2,...,10 relative to baseline between women and men. Hence, the objective of these analyses is to investigate whether the cholesterol levels change relative to baseline in the same way for women and men, or whether for example stronger changes can be determined in one of the gender groups. Analyse also here the years 2,...,10 separately, and include the statistical model/assumptions and discussions along the lines of the previous questions, confining yourself to the most important points. Summarize your findings from Questions 1-6 to a brief statement about the findings of your analyses and what they tell us about the cholesterol levels among individuals in the Framingham population.

Henrik Stryhn (hstryhn@upei.ca) 2024-10-09