< a quantity that can be divided into another a whole number of time />

Testlet response theory

August 27, 2012

Here is a brief overview of Testlet Response Theory and its Applications, by Wainer, Bradlow, and Wang (Cambridge University Press, 2007).

This book provides a very nice introduction to true score (which focus on test scores) and item response (which focus on item scores) theory, and discusses the advantages of using testlets as the basis of measurement. I like such clear overview of main concepts which form the basis of one’s field of study. No unnecessary maths, just facts, good references and supporting examples, and nice visual illustrations. Lastly, I enjoyed reading Chapter 2 of Jenkinson’s Measuring Health and Medical Outcomes (UCL Press, 1994) which offers a historical overview of subjective health assessment. A review (PDF) of the book was published in Quality of Life Research.

Testlets are defined as “a group of items related to a single content area that is developed as a unit and contains a fixed number of predetermined paths that an examinee may follow.” Classical test (or true score) theory considers the whole test as its fungible unit, while IRT models focus on item as the basic unit of analysis. The authors recommend a “middle path”, coined test response theory, which uses:

pieces of the test (as its unit of measurement) that are simultaneously small enough to be usefully adaptive and large enough to maintain some stability.

Testlets are thought to overcome several limitations of classical linear and adaptive testing forms, especially context effects (cross-information, unbalanced content, robustness, order effects), as well as usual assumptions made by common IRT models, including conditional independence (i.e., the probability of answering a particular item correctly is independent of responses to any of the other items, conditional on proficiency) which is hard to control in computer adaptive testing.

Starting with Chapter 3, the authors highlight an interesting and nicely illustrated connection between usual IRT models parameters estimates by marginal likelihood method and Bayes modal estimates, where “the maximum likelihood estimator is conceptually the same as a Bayes modal estimator with an improper uniform prior on proficiency (p. 32).”

In words, the posterior likelihood (probability of observing a given response pattern as a function of proficiency, θ) is obtained by multiplying the corresponding item characteristic curves (which describe the probability of endorsing–positively or negatively–a given item as generated by the IRT model). Of course, in order to write the conditional probability of $x_i$ given $\theta$ and $\beta$ (i.e., the likelihood), with $\beta_j$ the item parameter vector ($a_j, b_j, c_j$) for item $j$, as

$$ \Pr(x_i\mid\theta_i,\beta)=\prod_j\Pr_j(\theta_i)^{x_{ij}}Q_j(\theta_i)^{1-x_{ij}}, $$

conditional independence must hold. From a Bayesian perspective, the Bayes modal estimate is based on the posterior distribution

$$ \Pr(\theta\mid x_i)\propto L(\theta\mid x_i)p(\theta), $$

with p(θ) reflecting our knowledge about θ before observing the results, i.e. the prior distribution. The latter is treated as one more item in the estimation scheme, and it is multiplied with the likelihood and everything else, yielding again the posterior distribution. Here, the choice of the prior distribution matters: when using an uniform prior, which basically amounts to say that p(θ) takes the same value for all θ, the posterior distribution for θ will be proportional to the likelihood function. If instead of an uniform distribution we use a gaussian distribution, there might be more subtle effects on the results. In any case, we can interpret this prior as the correct answer to the question: “Are you part of the population whose proficiency distribution is N(0,1)?” (Remember that prior distribution are treated as one supplementary item in this estimation framework.)


  1. Wainer, H. and Kiely, G.L. (1987). Item Clusters and Computerized Adaptive Testing: A Case for Testlets. Journal of Educational Measurement, 24(3), 185–201.
  2. Wainer, H. and Lewis, C. (1990). Toward a psychometrics for testlets. Journal of Educational Measurement, 27, 1-14.
  3. Wang, X., Bradlow, E.T., and Wainer, H. (2002). A General Bayesian Model for Testlets: Theory and Applications. ETS Research Report 02-02.
  4. Lu, Y. and Wang, X. (2006). A Hierarchical Bayesian Framework for Item Response Theory Models with Applications in Ideal Point Estimation.
  5. Glas, C.A.W., Wainer, H., and Bradlow, E.T. (2000). MML and EAP estimation in testlet- based adaptive testing. In W.J. van der Linden, and C.A.W. Glas (Eds.) Computerized adaptive testing: Theory and practice, (p. 271-288). Boston, MA: Kluwer Academic Publishers.
readings psychometrics

See Also

» Mokken scale analysis » Random notes » Dimensions or categories? » Cronbach's alpha yet again » A bunch of paper on multivariate data analysis