Here are brief solutions for Assignment 2.
OpenIntroStats refers to:
Diez, DM, Barr, CD, and Çetinkaya-Rundel, M (2012). OpenIntro Statistics
(2nd edition).
1. There are three groups, so DF = 3-1 = 2 and F = 30.708/1.019 ≃ 30.13
(answer B).
2. The degrees of freedom for the residuals are computed as Total DF - Cells
DF, i.e., (N-1) - (ab-1), where a and b are the number of levels of each
factor, so that the residual DF would be 100 - 4 = 96 if there were no
missing data. So there are two missing values (answer B).
3. A partial effect-size easure (eta squared) is defined as SS(effect) /
SS(effect) + SS(error), so in this case it is 3255 / (3255+20688) (answer
B).
4. The regression coefficient represents the deviation from the
intercept. With a binary variable (A vs. B), the intercept reflects the mean
of A, so the coefficient equals mean(B) - mean(A) = 1.518 (answer B).
5. Yes, in the case of two-group comparison and R's default coding of
contrast, a t-test and the summary of a regression analysis will yield
identical results (answer A).
6. The first two solutions are incorrect as they do not take into account
missing values, which do not contribute to SS. So answer C is correct as it
computes the DF for the residuals as N - 5 where N = total number of
available cases.
7.The function confint() does not apply to correlation coefficient as
returned by the cor() function. There's no conf.level= option in cor(). So
answer C is correct.
8. The relation between the slope of a regression line (s) and the
correlation coefficient (r) is simply r = s x Sx/Sy, where Sx and Sy
represents the standard deviations of X and Y. So r equals 0.5053 x
sqrt(1.374)/sqrt(0.642) = 0.739 (answer A).
9. No, because the degrees of freedom, hence the p-value associated to the
F-test, would be different (4-1 = 3 instead of 1). So answer B is correct.
10. No hypothesis regarding the normality of X or Y is necessary to compute
a regression line by the OLS method (answer D).