Assignment 1

This is the first quiz for the cogmaster-stats course. There are 10 multiple choice questions in total, with only one correct response. It is recommended not to answer at random, but rather to check the "Don't know" option. When you are done with the quizz, you can validate your responses by clicking on the Validate button at the bottom of the page.
Responses are not saved as you typed them, so be sure to check all questions before submitting your answers.
If you have any question, you can ask on Twitter: @cogmasterstats, or by email.
Due date: November, 5.

Date (E.g., 21/08/2008; this is also checked internally.)

1. We would like to read the following data set into R.
id gender time1 time2 time3 time4
1 2 229.7 227 250.5 199.6
2 2 202.2 248.7 224.1 183.8
3 2 221 234.2 . 257.5
4 1 259 272.7 247.6 207.6
5 2 227.9 224.5 256.3 196.8
6 1 208.4 248.3 321.1 214.6
7 1 209.6 314.8 218.7 191.4
8 1 264.4 232.4 262.6 215.8
9 1 299.4 258.1 321.1 257.6
10 1 222.3 255.1 308.4 213.2
These are reaction times collected on 10 subjects (male=1, female=2) at four different moment (this defines the experimental condition, time, with 4 levels). There is one missing value, which was coded as ".". Assuming that this data file is named results.dat and is available in your working directory, what command would you use to import the data?
read.csv("results.dat", na=".")
read.table("results.dat", na=".")
read.csv("results.dat", header=TRUE, na.strings=".")
read.table("results.dat", header=TRUE, na.strings=".")
Don't know.
2. With the same data set, we would like to replace the missing individual reaction time with the average reaction time for that particular condition. In what follows, we assume that data were correctly imported in R and that a data frame named d is available in the workspace. How would you proceed?
d["id"==3,"time3"] <- mean(d[,"time3"], na.rm=TRUE)
d$time3["3",] <- mean(d$time3, na.rm=TRUE)
d$time3[$time3)] <- mean(d$time3, na.rm=TRUE)
Don't know.
3. Let's consider a categorical variable coded as an R factor named grp with the following levels (16 observations in total) :
> grp
 [1] A A B B C C D D A A B B C C D D
Levels: A B C D
> str(grp)
 Factor w/ 4 levels "A","B","C","D": 1 1 2 2 3 3 4 4 1 1 ...
What command could be used to recode this variable into a factor with the following ordered labels : A = "negative", B and C = "neutral", D = "positive" (negative < neutral < positive)?
grp <- as.ordered(grp, levels=c("negative", "neutral", "neutral", "positive"))
levels(grp) <- c("negative", "neutral", "neutral", "positive"); grp <- factor(grp, ordered=TRUE)
levels(grp)[2:3] <- "neutral"; grp <- ordered(grp, levels=c(1,3), labels=c("negative", "positive"))
Don't know.
4. A small sample of 20 observations was simulated as follows:
> n <- 30
> x <- rnorm(n, mean=12, sd=2)
> mean(x) + qnorm(c(0.025, 0.975)) * sd(x)/sqrt(n)
[1] 11.75140 13.13708
The last command displays a 95% confidence interval (CI) for the population parameter (here, the mean). What is the name of qnorm(0.975) * sd(x)/sqrt(n)?
The upper bound of the 95% CI.
The margin of error.
The standard error of the mean.
Don't know.
5. Instead of relying on the Normal distribution to compute a 95% CI with the preceding data, we want to use a Student distribution with 29 degrees of freedom. Is it reasonnable to expect a wider confidence interval for the same parameter?
Don't know.
6. At the end of a randomized clinical trial, with a fixed Type I error of 5% and a power of 80%, researchers have failed to demonstrate a significant difference in the measured outcome between the two groups that received different treatments. This means that:
We can be 80% confident that the two treatments perfom equally well.
There is a 5% risk that a true difference between these two treatments does exist.
There is a 20% risk that a true difference between these two treatments does exist.
Don't know.
7. Here is some output produced by R when processing a data set composed of 15 individuals who were enrolled in a memory recall task where two series of measurement were collected, x1 et x2. It can be assumed that both variables can be accessed in the current workspace through a named data frame, d. What command produced the following result?
t = -5.1321, df = 14, p-value = 0.0001524
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -0.9006892 -0.3697498
t.test(x1 ~ x2, data=d, var.equal=TRUE)
with(d, t.test(x1, x2, var.equal=FALSE))
t.test(d$x1, d$x2, paired=TRUE)
Don't know.
8. The means of two groups were compared using a Student t-test, and the following results were observed: the difference of the two means (which is the parameter of interest) was estimated at 2.3 with a 95% confidence interval of [1.6;3.0]. What assertion is correct in this case:
The population parameter has a probability 0.95 of being comprised between 1.6 and 3.0; i.e., if we could draw 100 samples, the parameter of interest would be in [1.6;3.0] with probability 0.95.
We can be 95% confident that the population parameter is between 1.6 and 3.0; i.e., if we could draw 100 samples, 95% of the time the CI would cover the parameter of interest.
Neither of the above propositions.
Don't know.
9. Results from a randomized clinical trial suggest that a new treatment reduces the mortality at 6 month by -13% (95% CI [-23;-3]) compared to the tretament in use. Given that a 5% decrease in mortality is usually considered as a positive criteria when switching from one treatment to the other, what can be concluded from this study?
There exists a statistically significant effect, which is large and accurate.
There exists a statistically significant effect, which is large but poorly accurate.
There exists a statistically significant effect, which is small and accurate.
There exists a statistically significant effect, which is small and poorly accurate.
There exists a statistically non-significant effect, which is small and accurate.
Don't know.
10. To compare results (continuous response variable) on a sample of subjects, before (pre) and after (post) an intervention, it is equivalent to use a Student t-test for paired sample and to use a one-sample t-test based on the difference post minus pre.
Don't know.

When you are done, please validate your answers by clicking on the button below.
Be careful: Submitting this form is definitive, you won't be able to check your answers again. Please, check them one last time.