1. 
We would like to read the following data set into R.
id gender time1 time2 time3 time4
1 2 229.7 227 250.5 199.6
2 2 202.2 248.7 224.1 183.8
3 2 221 234.2 . 257.5
4 1 259 272.7 247.6 207.6
5 2 227.9 224.5 256.3 196.8
6 1 208.4 248.3 321.1 214.6
7 1 209.6 314.8 218.7 191.4
8 1 264.4 232.4 262.6 215.8
9 1 299.4 258.1 321.1 257.6
10 1 222.3 255.1 308.4 213.2
These are reaction times collected on 10 subjects (male=1, female=2) at four
different moment (this defines the experimental
condition, time , with 4 levels). There
is one missing value, which was coded as ".". Assuming that this data file
is named results.dat and
is available in your working directory, what command would you use to import
the data? 

read.csv("results.dat", na=".") 

read.table("results.dat", na=".") 

read.csv("results.dat",
header=TRUE, na.strings=".") 

read.table("results.dat",
header=TRUE, na.strings=".") 

Don't know. 
2. 
With the same data set, we would like to replace the missing
individual reaction time with the average reaction time for that
particular condition. In what follows, we assume that data were
correctly imported in R and that a data frame named d is
available in the workspace. How would you proceed? 

d["id"==3,"time3"]
< mean(d[,"time3"], na.rm=TRUE) 

d$time3["3",]
< mean(d$time3, na.rm=TRUE) 

d$time3[is.na(d$time3)]
< mean(d$time3, na.rm=TRUE) 

Don't know. 
3. 
Let's consider a categorical variable coded as an R
factor named grp with the following levels (16
observations in total) :
> grp
[1] A A B B C C D D A A B B C C D D
Levels: A B C D
> str(grp)
Factor w/ 4 levels "A","B","C","D": 1 1 2 2 3 3 4 4 1 1 ...
What command could be used to recode this variable into a factor with
the following ordered labels : A = "negative", B and C =
"neutral", D = "positive" (negative < neutral < positive)? 

grp < as.ordered(grp, levels=c("negative", "neutral", "neutral", "positive")) 

levels(grp)
< c("negative", "neutral", "neutral", "positive"); grp < factor(grp, ordered=TRUE) 

levels(grp)[2:3]
< "neutral"; grp < ordered(grp, levels=c(1,3), labels=c("negative", "positive")) 

Don't know. 
4. 
A small sample of 20 observations was simulated as follows:
> n < 30
> x < rnorm(n, mean=12, sd=2)
> mean(x) + qnorm(c(0.025, 0.975)) * sd(x)/sqrt(n)
[1] 11.75140 13.13708
The last command displays a 95% confidence interval (CI) for the population
parameter (here, the mean). What is the name of qnorm(0.975) *
sd(x)/sqrt(n) ? 

The upper bound of the 95% CI. 

The margin of error. 

The standard error of the mean. 

Don't know. 
5. 
Instead of relying on the Normal distribution to compute a 95% CI
with the preceding data, we want to use a Student distribution with
29 degrees of freedom. Is it reasonnable to expect a wider confidence
interval for the same parameter? 

Yes. 

No. 

Don't know. 
6. 
At the end of a randomized clinical trial, with a fixed Type I
error of 5% and a power of 80%, researchers have failed
to demonstrate a significant difference in the measured outcome between
the two groups that received different treatments. This means that: 

We can be 80% confident that the
two treatments perfom equally well. 

There is a 5% risk that a true
difference between these two treatments does exist. 

There is a 20% risk that a true
difference between these two treatments does exist. 

Don't know. 
7. 
Here is some output produced by R when processing a data set
composed of 15 individuals who were enrolled in a memory recall task where two
series of measurement were collected, x1
et x2 . It can be assumed that both variables can be accessed
in the current workspace through a named data
frame, d . What command produced the following result?
t = 5.1321, df = 14, pvalue = 0.0001524
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.9006892 0.3697498


t.test(x1 ~ x2, data=d, var.equal=TRUE) 

with(d, t.test(x1, x2, var.equal=FALSE)) 

t.test(d$x1, d$x2, paired=TRUE) 

Don't know. 
8. 
The means of two groups were compared using a Student ttest, and
the following results were observed: the difference of the two means
(which is the parameter of interest) was estimated at 2.3 with a 95%
confidence interval of [1.6;3.0]. What assertion is correct in this
case:


The population parameter has a
probability 0.95 of being comprised between 1.6 and 3.0; i.e., if we
could draw 100 samples, the parameter of interest would be in
[1.6;3.0] with probability 0.95. 

We can be 95% confident that the
population parameter is between 1.6 and 3.0; i.e., if we could draw 100
samples, 95% of the time the CI would cover the parameter of interest. 

Neither of the above propositions. 

Don't know. 
9. 
Results from a randomized clinical trial suggest that a new treatment reduces the mortality at 6 month by 13% (95% CI [23;3]) compared to the tretament in use. Given that a 5% decrease in mortality is usually considered as a positive criteria when switching from one treatment to the other, what can be concluded from this study?


There exists a statistically
significant effect, which is large and accurate. 

There exists a statistically
significant effect, which is large but poorly accurate. 

There exists a statistically
significant effect, which is small and accurate. 

There exists a statistically
significant effect, which is small and poorly accurate. 

There exists a statistically
nonsignificant effect, which is small and accurate. 

Don't know. 
10. 
To compare results (continuous response variable) on a sample of
subjects, before (pre) and after (post) an intervention, it is equivalent
to use a Student ttest for paired sample and to use a onesample ttest
based on the difference post minus pre. 

Yes. 

No. 

Don't know. 