Assignment I for Biostats Course VHM 801 at AVC - Fall semester 2019
The assignment is worth 10% of the final course mark. Please be aware that by handing
in the home assignment you implicitly acknowledge to have read and accepted
the instructions for home assignments as described
on the VHM 801 homepage.
This assignment is based on a subset of data that was collected in aquaculture
clinical field trials in the Bay of Fundy, Canada, during the years
2004-2007. The trials were conducted by CAHS (Centre for Aquatic Health Sciences),
and the research formed part of the PhD project of Tim Burnley at AVC.
The objective of the research was to compare the performance
of different vaccines on the growth and survival of Atlantic salmon
under standard production conditions. The data here originate from a
single cage of salmon that were individually tagged in the winter of 2004
and followed through to harvest. Vaccines were randomly allocated to the fish, and
the different vaccine groups were held within
the same cage throughout the production. We consider here measurements taken
on the fish either at vaccination or at transfer from the hatchery to a sea cage
in the late summer of 2004. The following variables are included in the
dataset:
- fishid: fish number (with no intrinsic meaning),
- weight: weight at vaccination (g),
- length: length at vaccination (cm),
- condfac: condition factor (weight divided by cubed length, times 100) at vaccination,
- weight2: weight at transfer to sea cage (g),
- sex: sex (0/1 ~ female/male),
- operc: shortened operculum (bony flap covering the gills) (0/1 ~ no/yes),
- jaw: jaw deformity (0/1 ~ no/yes),
- prec: precocious parr (early maturation in males) (0/1 ~ no/yes),
- sex3: sex (0/1/2 ~ female/non-precocious male/precocious male),
- vaccine: vaccine group (1-6).
For the purpose of this assignment, we assume that the information about the physical characteristics of the fish
(operc, jaw, prec) was obtained by inspection of
the fish at vaccination.
In order to keep the dataset at a manageable size, only 200 fish from each
of the six vaccine groups are included in the present data.
The dataset is available in Minitab format and as a comma-separated file, for
import into Stata and other statistical software.
The home assignment has four questions which should all be answered.
- Select three variables in the dataset: one continuous variable, one categorical variable (with more than
two categories), and one dichotomous (or binary) variable. Apart from this restriction on the variable types
you are free to select the variables as you want. First, briefly further describe the variable type for each of your
selected variables, e.g. using one or several of the descriptors: nominal, ordinal, discrete, continuous.
Next, carry out a descriptive analysis for each of
your three selected variables including both a graphical representation and
descriptive statistics. Choose the graphical representation and the statistics you find most
useful to show the distributions,
in consideration of the variable's type and range of values. Where appropriate,
comment specifically on the distribution's center, spread and shape.
If your descriptive analysis identifies any 'potential outliers', discuss
whether these should be considered as truly outlying observations, in the sense that they
don't really belong to the distribution, or whether they should be considered as part
of the distribution.
- Continue your analysis from Question 1 by computing descriptive statistics and graphical displays
to illustrate any differences in the distribution of your chosen continuous variable across the
categories of either your categorical or your binary variable. For example, if you chose the variables
weight and sex, you should compare the distribution of weights (at vaccination) between males and females.
Describe your findings and try to draw conclusions. Note that you are not exptected to
compute any statistical tests to compare the distributions.
- Compute for each fish the weight gain from vaccination to transfer. Carry also out a descriptive
analysis for this variable, and include in your analysis an assessment of whether it would
seem reasonable to assume the values of this variable to be normally
distributed. If you conclude that the variable is not normally distributed, describe how its distribution seems to differ
from a normal distribution.
Also here, if your descriptive analysis identifies any 'potential outliers', discuss
whether these should be considered as truly outlying observations, in the sense that they
don't really belong to the distribution, or whether they should be considered as part
of the distribution.
- Describe how you would randomize the fish on the six vaccine groups, in a situation where
the total number of fish available in a tank is not known exactly (the actual scenarios in the
trials were different, but would have similar issues). Say, for simplicity, that the maximal
number would be 2100 fish, but due to mortalities and the logistic difficulties in ensuring that
every fish in the tank gets caught and vaccinated, the actual number of
fish included in the trial will be somewhat smaller. Furthermore, that number will not be known
until the vaccinations have been
completed. For the description of your randomization procedure, you may
assume that the fish are sampled from the tank one by one, and for each
fish sampled you will need to decide which vaccination it should
receive. Note however that it is virtually impossible to ensure that the
order in which the fish are sampled from the tank is completely random.
It is suggested that you illustrate your procedure in a spreadsheet with a smaller number
of fish than 2100. After you have described your procedure, answer the
following two questions for the case that the actual number of fish included in the study
happened to be exactly 2000:
- Give the expected (mean) number of fish in each of the six vaccine groups.
- Give the possible range or the standard deviation (depending on your
procedure, only one of these two values may be meaningful to compute) for the
number of fish in each of the six vaccine groups.
Note: Several approaches may be taken for the randomization, and a careful
description of a valid approach, together with a correct answer to just one of the two
questions above (1+2), will be considered as a fully acceptable answer to Question 4.
Henrik Stryhn
(hstryhn@upei.ca) 2019-09-26