Assignment I for Biostats Course VHM 801 at AVC - Fall semester 2019

The assignment is worth 10% of the final course mark. Please be aware that by handing in the home assignment you implicitly acknowledge to have read and accepted the instructions for home assignments as described on the VHM 801 homepage.

This assignment is based on a subset of data that was collected in aquaculture clinical field trials in the Bay of Fundy, Canada, during the years 2004-2007. The trials were conducted by CAHS (Centre for Aquatic Health Sciences), and the research formed part of the PhD project of Tim Burnley at AVC. The objective of the research was to compare the performance of different vaccines on the growth and survival of Atlantic salmon under standard production conditions. The data here originate from a single cage of salmon that were individually tagged in the winter of 2004 and followed through to harvest. Vaccines were randomly allocated to the fish, and the different vaccine groups were held within the same cage throughout the production. We consider here measurements taken on the fish either at vaccination or at transfer from the hatchery to a sea cage in the late summer of 2004. The following variables are included in the dataset:

For the purpose of this assignment, we assume that the information about the physical characteristics of the fish (operc, jaw, prec) was obtained by inspection of the fish at vaccination.

In order to keep the dataset at a manageable size, only 200 fish from each of the six vaccine groups are included in the present data. The dataset is available in Minitab format and as a comma-separated file, for import into Stata and other statistical software.

The home assignment has four questions which should all be answered.

  1. Select three variables in the dataset: one continuous variable, one categorical variable (with more than two categories), and one dichotomous (or binary) variable. Apart from this restriction on the variable types you are free to select the variables as you want. First, briefly further describe the variable type for each of your selected variables, e.g. using one or several of the descriptors: nominal, ordinal, discrete, continuous. Next, carry out a descriptive analysis for each of your three selected variables including both a graphical representation and descriptive statistics. Choose the graphical representation and the statistics you find most useful to show the distributions, in consideration of the variable's type and range of values. Where appropriate, comment specifically on the distribution's center, spread and shape. If your descriptive analysis identifies any 'potential outliers', discuss whether these should be considered as truly outlying observations, in the sense that they don't really belong to the distribution, or whether they should be considered as part of the distribution.

  2. Continue your analysis from Question 1 by computing descriptive statistics and graphical displays to illustrate any differences in the distribution of your chosen continuous variable across the categories of either your categorical or your binary variable. For example, if you chose the variables weight and sex, you should compare the distribution of weights (at vaccination) between males and females. Describe your findings and try to draw conclusions. Note that you are not exptected to compute any statistical tests to compare the distributions.

  3. Compute for each fish the weight gain from vaccination to transfer. Carry also out a descriptive analysis for this variable, and include in your analysis an assessment of whether it would seem reasonable to assume the values of this variable to be normally distributed. If you conclude that the variable is not normally distributed, describe how its distribution seems to differ from a normal distribution. Also here, if your descriptive analysis identifies any 'potential outliers', discuss whether these should be considered as truly outlying observations, in the sense that they don't really belong to the distribution, or whether they should be considered as part of the distribution.

  4. Describe how you would randomize the fish on the six vaccine groups, in a situation where the total number of fish available in a tank is not known exactly (the actual scenarios in the trials were different, but would have similar issues). Say, for simplicity, that the maximal number would be 2100 fish, but due to mortalities and the logistic difficulties in ensuring that every fish in the tank gets caught and vaccinated, the actual number of fish included in the trial will be somewhat smaller. Furthermore, that number will not be known until the vaccinations have been completed. For the description of your randomization procedure, you may assume that the fish are sampled from the tank one by one, and for each fish sampled you will need to decide which vaccination it should receive. Note however that it is virtually impossible to ensure that the order in which the fish are sampled from the tank is completely random. It is suggested that you illustrate your procedure in a spreadsheet with a smaller number of fish than 2100. After you have described your procedure, answer the following two questions for the case that the actual number of fish included in the study happened to be exactly 2000:
    1. Give the expected (mean) number of fish in each of the six vaccine groups.
    2. Give the possible range or the standard deviation (depending on your procedure, only one of these two values may be meaningful to compute) for the number of fish in each of the six vaccine groups.
    Note: Several approaches may be taken for the randomization, and a careful description of a valid approach, together with a correct answer to just one of the two questions above (1+2), will be considered as a fully acceptable answer to Question 4.

Henrik Stryhn (hstryhn@upei.ca) 2019-09-26