Here are brief solutions for Assignment 1.
OpenIntroStats refers to:
Diez, DM, Barr, CD, and Ã‡etinkaya-Rundel, M (2012). OpenIntro Statistics
(2nd edition).
1. There's a header line (name of the variables on the first line of the
file), missing data are coded as '.', and values are separated by single
space. The last observation allows to discard all solutions that consider
read.csv() without updating the default delimiter option, and NA values
should be passed through the na.strings= option. So only option D is valid.
2. Option C is the correct one. Option A does not address the variable "id"
corerctly (it should be d$id), while option B does not specify that single
imputation should be done on variable "time3".
3. Option A is incorrect because there's no such option (levels=). Option B
is correct even if it uses repeated labels in the first assignement.
4. The quantity sd(x)/sqrt(n) represent the standard error of the mean, so
the correct answer is B, see OpenIntroStats, 2nd ed., p. 170.
5. The correct answer is A, because the density function for a
t-distribution with 29 degrees of freedom will have slightly larger tails
than that of a standard normal distribution, hence a higher value for the
corresponding quantiles, e.g.
> qnorm(0.025)
[1] -1.959964
> qt(0.025, 29)
[1] -2.04523
6. Options A and B are incorrect interpretation of Type I (rejecting the
null when we shouldn't) and II (not rejecting the null when we should)
errors. The correct interpretation is C: there is a 100-80 = 20% risk that a
true difference exists but we fail to demonstrate it using this sample.
7. If there are 15 individuals, the degrees of freedom of a t-test for
paired sample will be 15-1 = 14, while for independant samples it would be
15*2-2 = 28. Only option C considers that there is observations are paired.
8. In a frequentist framework, the parameter of interest is fixed, hence it
has no probability, so option A is incorrect. Instead, option B correctly
reflects the sampling process.
9. The 95% CI doesn't cover the value 0, so the observed effect can be
regarded as significant at a 5% level. However this CI is rather large which
suggests that the estimate is rather imprecise (low sample size?). As the
observed difference is far from 0 (and twice the expected decrease in
mortality), the effect might be considered as large. So the correct
answer is B.
10. Working with difference of scores (d) is equivalent to testing H0 : d =
0, which is strictly comparable to H0 : mu1 = mu2, when the two samples are
paired, see OpenIntroStats, 2nd ed., p. 214.