- Dings:2002
-
The effects of matrix sampling on student score comparability in constructed-response and multiple-choice assessments
J. Dings and R. Childs and N. Kingston
(2002)
- Thomas:2002b
-
EMBEDDING IRT IN STRUCTURAL EQUATION MODELS: A COMPARISON WITH REGRESSION BASED ON IRT SCORES
D. R. Thomas and I. R. R. Lu and B. D. Zumbo
(2002)
- Thamerus:1996
-
Fitting a finite mixture distribution to a variable subject to heteroscedastic measurement error
M. Thamerus
(1996)
- Yamamoto:1999
-
Scaling Methodology and Procedures for the TIMSS Mathematics and Science Scales
K. Yamamoto and E. Kulick
(1999)
- Yeh:2007
-
Using Trapezoidal Rule for the Area Under a Curve Calculation
S. Yeh
(2007)
- Hardouin:2007b
-
The SAS Macro-Program %AnaQol to Estimate the Parameters of Item Responses Theory Models
J. Hardouin
Communications in Statistics - Simulation and Computation
36
437-453
(2007)
- Fox:2007a
-
Multilevel IRT Model Assessment
J. Fox
(2007)
- Fox:2007
-
Modeling Measurement Error in Structural Multilevel Models
J. Fox and C. A. W. Glas
(2007)
- Ark:2005b
-
The Effect of Missing Data Imputation on Mokken Scale Analysis
L. A. van der Ark and K. Sijtsma
(2005)
- Ark:2005a
-
Statistical Models for Categorical Variables
L. A. van der Ark and M. A. Croon and K. Sijtsma
(2005)
- Ark:2002
-
Hierarchically Related Nonparametric IRT Models, and Practical Data Analysis Methods
L. A. van der Ark and B. T. Hemker and K. Sijtsma
(2002)
- Sijtsma:2001
-
Progress in NIRT Analysis of Polytomous Item Scores: Dilemmas and Practical Solutions
K. Sijtsma and L. A. van der Ark
(2001)
This paper discusses three open problems in nonparametric polytomous item re-
sponse theory: (1) theoretically, the latent trait $\theta$ is not stochastically ordered by the observed total score X+; (2) the models do not imply an invariant item ordering; and (3) the regression of an item score on the total score X+ or on the restscore R is not a monotone nondecreasing function and, as a result, it cannot be used for investigating the monotonicity of the item step response function. Tentative solutions for these problems are discussed. The computer program MSP for nonparametric IRT analysis is based on models which neither imply the stochastic ordering property nor an invariant item ordering. Also, MSP uses item-restscore regression for investigating item step response functions. It is discussed whether computer programs may be based temporarily) on models which lack desirable properties and use methods which are not
(yet) supported by sound psychometric theory.
- Ark:1999
-
Contributions to Latent Budget Analysis: A Tool For the Analysis of Compositional Data.
L. A. van der Ark
(1999)
- Ark:1998
-
Graphical Display of Latent Budget Analysis and Latent Class Analysis, with Special Reference to Correspondence Analysis
L. A. van der Ark and P. G. M. van der Heijden
(1998)
- Heijden:2002
-
Some Examples of Latent Budget Analysis and its Extensions
P. G. M. van der Heijden and L. A. van der Ark and A. Mooijaart
(2002)
- Thomas:2002a
-
APPLYING ITEM RESPONSE THEORY METHODS TO COMPLEX SURVEY DATA
D. R. Thomas and A. Cyr
(2002)
- Carletta:1996
-
Assessing agreement on classification tasks: the kappa statistic
J. Carletta
Computational Linguistics
22
(1996)
- Zou:2004
-
Sparse principal component analysis
H. Zou and T. Hastie and R. Tibshirani
(2004)
- Bond:2003b
-
Measuring Client Satisfaction with Public Education III: Group Effects in Client Satisfaction
T. G. Bond and J. A. King
Journal of Applied Measurement
4
326-334
(2003)
- Bond:2003a
-
Measuring Client Satisfaction with Public Education II: Comparing Schools with State Benchmarks
T. G. Bond and J. A. King
Journal of Applied Measurement
4
258-268
(2003)
- King:2003
-
Measuring Client Satisfaction with Public Education I: Meeting Competing Demands in Establishing State-wide Benchmarks
J. A. King and T. G. Bond
Journal of Applied Measurement
4
111-123
(2003)
- Smits:2003a
-
A Componential IRT Model for Guilt
D. J. M. Smits and P. D. Boeck
Multivariate Behavioral Research
38
161-188
(2003)
- Jehangir:2005
-
Evaluation of Relations between Scales in an IRT Framework
K. Jehangir
(2005)
- Schumacher:1996
-
Neural network and logistic regression. Part I
M. Schumacher and R. Rossner and W. Vach
(1996)
- Tricot:2000
-
Un modèle de réponses aux items. Propriétés et comparaison de groupes de traitement en épidémiologie
J. Tricot and M. Mesbah
Revue de Statistique Appliquée
48
29-39
(2000)
- Ricker:2003
-
Setting Cut Scores: Critical Review of Angoff and Modified-Angoff Methods
K. L. Ricker
(2003)
This paper presents a critical review of the Angoff (1971) and Angoff derived methods,
according to criteria for assessing cut score setting methods originally proposed by Berk
(1986) and further recommendations by Hambleton (2001). The criteria have been
updated to reflect the progress that has been made in standard setting research over the
past 17 years. The paper also discusses the assumptions of the Angoff method, and other current issues surrounding this method. Recommendations for using the Angoff method are made.
- Sheng:2005
-
BAYESIAN ANALYSIS OF HIERARCHICAL IRT MODELS: COMPARING AND COMBINING THE UNIDIMENSIONAL & MULTI-UNIDIMENSIONAL IRT MODELS
Y. Sheng
(2005)
- Verstralen:2000
-
IRT models for subjective weights of options of multiple choice questions
H. H. F. M. Verstralen and N. D. Verhelst
(2000)
- Lauritzen:2007
-
Exchangeable Rasch Matrices
S. L. Lauritzen
(2007)
- Davidson:2006
-
Bootstrap Inference in a Linear Equation Estimated by Instrumental Variables
R. Davidson and J. MacKinnon
(2006)
- Festy:2008
-
MESURES, FORMES ET FACTEURS DE LA PAUVRETÉ. APPROCHES COMPARATIVES
P. Festy and L. Prokofieva
(2008)
- Ward:2008
-
Presence-only data and the EM algorithm
G. Ward and T. Hastie and S. C. Barry and J. Elith and J. R. Leathwick
Biometrics
(2008)
- Ponocny:2002
-
On the applicability of some IRT models for repeated measurement designs: Conditions, consequences, and Goodness-of-Fit tests
I. Ponocny
Methods of Psychological Research Online
7
21-40
(2002)
- Rouder:2005
-
A hierarchical model for estimating response time distributions
J. N. Rouder and J. Lu and P. Speckman and D. Sun and Y. Jiang
Psychonomic Bulletin & Review
12
195-223
(2005)
- Castelloe:2007
-
Power and Sample Size Determination for Linear Models
J. M. Castelloe and R. G. O'Brien
(2007)
- Zubicaray:2007
-
Support for an auto-associative model of spoken cued recall: Evidence from fMRI
G. de Zubicaray and K. McMahon and M. Eastburn and A. J. Pringle and L. Lorenz and M. S. Humphreys
Neuropsychologia
45
824-835
(2007)
- Gibbons:2007
-
The Added Value of Multidimensional IRT Models
R. D. Gibbons and J. C. Immekus and R. D. Bock
(2007)
- Diaz:2006
-
NAEP-QA FY06 Special Study: 12th Grade Math Trend Estimates
T. E. Diaz and H. A. Le and L. L. Wise
(2006)
- Keerthi:2002
-
A fast dual algorithm for kernel logistic regression
S. S. Keerthi and K. Duan and S. K. Shevade and A. N. Poo
(2002)
- Bystrom:2007
-
TASK COMPLEXITY AFFECTS INFORMATION SEEKING AND USE
K. Byström and K. Järvelin
(2007)
- Saxton:2005
-
Development of a Short Form of the Severe Impairment Battery
J. Saxton and K. B. Kastango and L. Hugonot-Diener and F. Boller and M. Verny and C. E. Sarles and R. R. Girgis and E. Devouche and P. Mecocci and B. G. Pollock and S. T. DeKosky
American Journal of Geriatric Psychiatry
13
(2005)
- Assaf:2007
-
A new approach for interexaminer reliability data analysis on dental caries calibration
A. V. Assaf and E. P. da Silva Tagliaferro and M. de Castro Meneghim and C. Tengan and A. C. Pereira and G. M. B. Ambrosano and F. L. Mialhe
Journal of Applied Oral Science
15
(2007)
- Devouche:2003
-
Les banques d'items. Construction d'une banque pour le Test de Connaissance du Français
E. Devouche
Psychologie et Psychométrie
24
57-88
(2003)
- Postlethwaite:1993
-
TORSTEN HUSÉN
T. N. Postlethwaite
Perspectives : revue trimestrielle d'éducation comparée
XXIII
697-707
(1993)
- Roju:1995
-
IRT-Based Internal Measures of Differential Functioning of Items and Tests
N. S. Roju and W. J. van der Linden and P. F. Fleer
Applied Psychological Measurement
19
353-368
(1995)
- Jacobusse:2006
-
An interval scale for development of children aged 0 --2 years
G. Jacobusse and S. van Buuren and P. H. Verberk
Statistics in Medicine
25
2272-2283
(2006)
- Howell:2005
-
A model of grounded language acquisition: Sensorimotor features improve lexical and grammatical learning
S. R. Howell and D. Jankowicz and S. Becker
Journal of Memory and Language
53
258-276
(2005)
- Hastedt:2007
-
Differences between multiple-choice and constructed response items in PIRLS 2001
D. Hastedt
(2007)
- Devroye:2007
-
NON-UNIFORM RANDOM VARIATE GENERATION
L. Devroye
(2007)
This chapter provides a survey of the main methods in non-uniform random variate generation, and highlights recent research on the sub ject. Classical paradigms such as inversion, rejection, guide tables, and transformations are reviewed. We provide information on the expected time complexity of various algorithms, before addressing modern topics such as indirectly specified distributions, random processes, and Markov chain methods.
- Goldstein:1999
-
Multilevel statistical models
H. Goldstein
(1999)
- Michailidis:2007
-
Multilevel Homogeneity Analysis
G. Michailidis
(2007)
- CoE:2005
-
The Common European Framework
CoE
(2005)
- Meade:2004
-
Exploratory Measurement Invariance: A New Method Based on Item Response Theory
A. W. Meade and J. K. Ellington and S. B. Craig
(2004)
- Lunz:2007
-
Examination Development Guidelines
M. E. Lunz
(2007)
- Courville:2004
-
An empirical comparison of item response theory and classical test theory item/person statistics
T. G. Courville
(2004)
- Keller:2002
-
Annual College of Education Educational Research Exchange
(2002)
- Stage:2007
-
A Comparison Between Item Analysis Based on Item Response Theory and Classical Test Theory. A Study of the SweSAT Subtest WORD
C. Stage
(2007)
- Stage:2003
-
Classical Test Theory or Item Response Theory: The Swedish Experience
C. Stage
(2003)
- Yu:2007
-
Automation and visualization of distractor analysis using SAS/GRAPH
C. H. Yu
(2007)
- Robinson:2000
-
Canadian Journal of Education
S. Robinson
25
(2000)
- Mojduszka:2000
-
Consumer Choice of Food Products and the Implications for Price Competition and Government Labeling Policy
E. M. Mojduszka and J. A. Caswell and J. M. Harris
(2000)
- Garcia-Perez:1999
-
Fitting Logistic IRTModels: Small Wonder
M. A. Garcia-Perez
The Spanish Journal of Psychology
2
74-94
(1999)
- Zubairi:2006
-
Classical And Rasch Analyses Of Dichotomously Scored Reading Comprehension Test Items
A. M. Zubairi and N. L. A. Kassim
Malaysian Journal of ELT Research
2
(2006)
- Yamamoto:2002
-
Estimating PISA students on the IALS prose literacy scale
K. Yamamoto
(2002)
- Stewart:2005
-
Absolute Identification by Relative Judgment
N. Stewart and G. D. A. Brown and N. Chater
Psychological Review
112
881-911
(2005)
- Cazievel:2000
-
Estimation for the Rasch Model under a linkage structure: a case study
V. Cazievel
(2000)
- Hochheiser:1999
-
Performance Benefits of Simultaneous over Sequential Menus As Task Complexity Increases
H. Hochheiser and B. Shneiderman
(1999)
- E-V-Smith:2006
-
Book Review: Developing and Validating Multiple-Choice Test Items (3rd ed.)
J. E V Smith
Applied Psychological Measurement
30
69-72
(2006)
- Chen:2006
-
Verification of Cognitive Attributes Required to Solve the TIMSS-1999 Mathematics Items for Taiwanese Students
Y. Chen and J. Gorin and M. Thompson
(2006)
- Shigemasu:2000
-
Bayesian hierarchical analysis of polytomous item responses
K. Shigemasu and O. Yoshimura and T. Nakamura
Behaviormetrika
27
51-65
(2000)
- Schwarz:1995
-
What respondents learn from questionnaires: The survey interview and the logic
N. Schwarz
International Statistical Review
63
153-177
(1995)
- Bryce:1981
-
Rasch-Fitting
T. G. K. Bryce
British Educational Research Journal
7
(1981)
- Adams:1997
-
The Multidimensional Random Coefficients Multinomial Logit Model
R. J. Adams and M. Wilson and W. Wang
Applied Psychological Measurement
21
1-24
(1997)
- Monseur:2007
-
Equating errors in international surveys in education
C. Monseur and H. Sibbens and D. Hastedt
(2007)
- Brown:2005
-
The Multidimensional Measure of Conceptual Complexity
N. J. S. Brown
(2005)
- Mitkov:2005
-
A computer-aided environment for generating multiple-choice test items
R. Mitkov and L. A. Ha and N. Karamanis
Natural Language Engineering
1
1-17
(2005)
- Wu:2006
-
Modelling Mathematics Problem Solving Item Responses Using a Multidimensional IRT Model
M. Wu and R. Adams
Mathematics Education Research Journal
18
93-113
(2006)
- Watson:2006
-
A Longitudinal Study of Student Understanding of Chance and Data
J. Watson and B. Kelly
Mathematics Education Research Journal
18
40-55
(2006)
- Stacey:2006
-
A Case of the Inapplicability of the Rasch Model: Mapping Conceptual Learning
K. Stacey and V. Steinle
Mathematics Education Research Journal
18
77-92
(2006)
- Grimbeek:2006
-
Surveying Primary Teachers about Compulsory Numeracy Testing: Combining Factor Analysis with Rasch Analysis
P. Grimbeek and S. Nisbet
Mathematics Education Research Journal
18
27-39
(2006)
- Doig:2006
-
Easier Analysis and Better Reporting: Modelling Ordinal Data in MEducation Research
B. Doig and S. Groves
Mathematics Education Research Journal
18
56-76
(2006)
- Bradley:2006
-
Applying the Rasch Rating Scale Model to Gain Insights into Students' Conceptualisation of Quality Mathematics Instruction
K. Bradley and S. Sampson and K. Royal
Mathematics Education Research Journal
18
11-26
(2006)
- Willms:2007
-
A Manual for Conducting Analyses with Data from TIMSS and PISA
J. D. Willms and T. Smith
(2007)
- Dray:2003
-
Co-inertia analysis and the linking of ecological data tables
S. Dray and D. Chessel and J. Thioulouse
Ecology
84
3078-3089
(2003)
- Leeuw:1986
-
Random coefficient models for multilevel analysis
J. de Leeuw and I. Kreft
Journal of Educational Statistics
11
57-85
(1986)
- Benjamini:2002
-
John W. Tukey's contributions to multiple comparisons
Y. Benjamini and H. Braun
The Annals of Statistics
30
1576-1594
(2002)
- Holmes:2005
-
Multivariate data analysis: The french way
S. Holmes
(2005)
- Davier:1997
-
WINMIRA -- program description and recent enhancements
M. von Davier
Methods of Psychological Research - Online
2
25-28
(1997)
- Hugonot-Diener:2003
-
Version abrégée de la severe impairment battery (SIB)
L. Hugonot-Diener and M. Verny and E. Devouche and J. Saxton and P. Mecocci and F. Boller
Psychologie \& Neuropsychiatrie du Vieillissement
1
273-283
(2003)
- CJE:2000
-
Canadian Journal of Education
25
(2000)
- Antonietti:2006
-
Mesures objectives de traits latents
J. Antonietti
(2006)
- Antonietti:2004
-
Comment s'assurer de l'alignement d'un ensemble d'items
J. Antonietti
(2004)
- Antonietti:2003b
-
Designs de testage incomplets et modèle non-paramétrique de la réponse à l'item
J. Antonietti
(2003)
- Antonietti:2003a
-
Comment mesurer la similarité entre deux stuctures factorielles latentes
J. Antonietti
(2003)
- Christensen:2003a
-
SAS macros for Rasch based latent variable modelling
K. B. Christensen and J. B. Bjorner
(2003)
- Christensen:2003
-
Latent Covariates in Generalized Linear Models
K. B. Christensen and M. L. Nielsen and L. Smith-Hansen
(2003)
- Walker:2000
-
Forecasting the political behavior of leaders with the verbs in context system of operational code analysis
S. G. Walker
(2000)
- Camiz:2005
-
Application de l'analyse factorielle multiple pour le traitement de caractères en échelle dans les enquêtes
S. Camiz and J. Pagès
(2005)
- Claeskens:2007
-
On local estimating equations in additive multiparameter models
G. Claeskens and M. Aerts
(2007)
- Al-Kandari:1993
-
Variable Selection and Principal Component Analysis
N. Al-Kandari
(1993)
- Calvo:2007
-
A Comparative Study of Principal Component Analysis Techniques
R. A. Calvo and M. Partridge and M. A. Jabri
(2007)
- Goodman:2002
-
Applied Latent Class Analysis
L. A. Goodman
(2002)
- Balidis:2002
-
Intraobserver and interobserver reliability of the R/D score for evaluation of iris configuration by ultrasound biomicroscopy, in patients with pigment dispersion syndrome
M. O. Balidis and C. Bunce and K. Boboridis and J. Salzman and R. P. L. Wormald and M. H. Miller
Eye
16
722-726
(2002)
- Birkett:1986
-
Selecting the number of response categories for a Lickert-type scale
N. J. Birkett
(1986)
- Bhakta:2005
-
Using item response theory to explore the psychometric properties of extended matching questions examination in undergraduate medical education
B. Bhakta and A. Tennant and M. Horton and G. Lawton and D. andrich
BMC Medical Education
5
(2005)
- Kadouri:2007
-
The improved Clinical Global Impression Scale (iCGI): development and validation in depression
A. Kadouri and E. Corruble and B. Falissard
BMC Psychiatry
7
(2007)
- Revah-Levy:2007
-
The Adolescent Depression Rating Scale (ADRS): a validation study
A. Revah-Levy and B. Birmaher and I. Gasquet and B. Falissard
BMC Psychiatry
7
(2007)
- Montanari:2000
-
Independent Factor Discriminant Analysis
A. Montanari and D. G. Calo and C. Viroli
(2000)
- Schafer:2002
-
Computational strategies for multivariate linear mixed-effects models with missing values
J. L. Schafer and R. M. Yucel
Journal of Computational and Graphical Statistics
11
437-457
(2002)
- Ackerman:1996
-
Graphical Representation of Multidimensional Item Response Theory Analyses
T. Ackerman
Applied Psychological Measurement
20
311-329
(1996)
- Stein:2007
-
Calculation of the Kappa Statistic for Inter-rater Reliability: The Case Where Raters Can Select Multiple Responses from a Large Number of Categories
C. R. Stein and R. B. Devore and B. E. Wojcik
(2007)
- Leeuw:2007
-
Statistics and Probability
J. de Leeuw
(2007)
- Graves:1995
-
The pseudoscience of psychometry and the Bell Curve
J. L. Graves
Journal of Negro Education
64
277-
(1995)
- Cikrikci-Demirtasli:2000
-
A study of Raven Standard Progressive Matrices test's item measures under classic and item response models: An empirical comparison
N. Cikrikci-Demirtasli
(2000)
- Stuger:2006
-
Asymmetric Loss Functions and Sample Size Determination: A Bayesian Approach
H. P. Stüger
Austrian Journal of Statistics
35
57-66
(2006)
- Antonietti:2003
-
Evaluation des compétences en mathématiques en fin de 2e année primaire
J. Antonietti and N. Guignard and A. Mudry and L. Ntamakiliro and W. Rieben and C. T. Christinat and A. V. der Klink
(2003)
- Charland:1996
-
Fidélité et validité de la version française du "Children of Alcoholics Screening Test" (CAST)
H. Charland and G. Côté
Revue québécoise de psychologie
17
45-62
(1996)
- Wu:2005
-
Algorithmes et codes R pour la méthode de la pseudo-vraisemblance empirique dans les sondages
C. Wu
Techniques d'enquête
31
261-266
(2005)
- Grim:2005
-
Checking for Nonresponse Bias in Web-Only Surveys of Special Populations using a Mixed-Mode (Web-with-Mail) Design
B. J. Grim and L. M. Semali
(2005)
- Youngstrom:2002
-
Reliability Generalization of self-report of emotions when using the Differential Emotions Scale
E. A. Youngstrom and K. W. Green
Educational and Psychological Measurement
62
(2002)
- Yin:2000
-
Assessing the reliability of Beck Depression Inventory scores: Reliability Generalization across studies
P. Yin and X. Fan
Educational and Psychological Measurement
60
201-223
(2000)
- Wallace:2002
-
Reliability Generalization of the Life Satisfaction Index
K. A. Wallace and A. J. Wheeler
Educational and Psychological Measurement
62
(2002)
- Viswesvaran:2000
-
Measurement error in "Big Five Factors" personality assessment: Reliability Generalization across studies and measures
C. Viswesvaran and D. Ones
Educational and Psychological Measurement
60
224-235
(2000)
- Vacha-Haase:2001a
-
Reliability generalization: Exploring reliability variations on MMPI/MMPI-2 Validity scale scores
T. Vacha-Haase and C. R. Tani and L. R. Kogan and R. A. Woodall and B. Thompson
Assessment
8
391-401
(2001)
- Vacha-Haase:2001
-
Reliability generalization: Exploring reliability coefficients of MMPI clinical scales scores
T. Vacha-Haase and L. Kogan and C. R. Tani and R. A. Woodall
Educational and Psychological Measurement
61
45-59
(2001)
- Vacha-Haase:2002
-
Reliability Generalization: Moving toward improved understanding and use of score reliability
T. Vacha-Haase and R. K. Henson and J. Caruso
Educational and Psychological Measurement
62
(2002)
- Vacha-Haase:1998
-
Reliability generalization: Exploring variance in measurement error affecting score reliability across studies
T. Vacha-Haase
Educational and Psychological Measurement
58
6-20
(1998)
- Thompson:2002b
-
Stability of the reliability of LibQUAL+TM scores: A "Reliability Generalization" meta-analysis study
B. Thompson and C. Cook
Educational and Psychological Measurement
62
(2002)
- Reese:2002
-
A Reliability Generalization study of select measures of adult attachment style
R. J. Reese and K. M. Kieffer and B. K. Briggs
Educational and Psychological Measurement
62
(2002)
- Nilsson:2002
-
Reliability Generalization: An examination of the Career Decision-making Self-efficacy Scale
J. E. Nilsson and C. K. Schmidt and W. D. Meek
Educational and Psychological Measurement
62
(2002)
- Lane:2002
-
Expanding reliability generalization methods with KR-21 estimates: An RG study of the Coopersmith Self-esteem Inventory
G. G. Lane and A. E. White and R. K. Henson
Educational and Psychological Measurement
62
(2002)
- Kieffer:2002
-
A Reliability Generalization study of the Geriatric Depression Scale (GDS)
K. M. Kieffer and R. J. Reese
Educational and Psychological Measurement
62
(2002)
- Henson:2001a
-
Characterizing measurement error in scores across studies: Some recommendations for conducting "Reliability Generalization" (RG) studies
R. K. Henson and B. Thompson
(2001)
Given the potential value of reliability generalization (RG) studies in the development of cumulative psychometric knowledge, the purpose of this paper is to provide a tutorial on how to conduct such studies and to serve as a guide for researchers wishing to use this methodology. After some brief comments on classical test theory, the paper provides a practical framework for structuring an RG study, including: (1) test selection with an eye toward frequency of test use and reporting practices by authors; (2) development of a coding sheet that will capture potential variation in score reliability across studies; (3) procedural recommendations regarding data collection; (4) identification and use of potential dependent variables; and (5) application of general linear model analyses to the data.
- Henson:2001
-
A reliability generalization study of the Teacher Efficacy Scale and related instruments
R. K. Henson and L. R. Kogan and T. Vacha-Haase
Educational and Psychological Measurement
61
(2001)
- Henson:2002
-
Variability and prediction of measurement error in Kolb's Learning Style Inventory scores: A reliability generalization study
R. K. Henson and D. Hwang
Educational and Psychological Measurement
62
(2002)
- Helms:1999
-
Another meta-analysis of the White Racial Identity Attitude Scale's Cronbach alphas: Implications for validity
J. E. Helms
Measurement and Evaluation in Counseling and Development
32
122-137
(1999)
- Hanson:2002
-
Reliability Generalization of Working Alliance Inventory scale scores
W. E. Hanson and K. T. Curry and D. L. Bandalos
Educational and Psychological Measurement
62
(2002)
- Dimitrov:2002
-
Reliability: Arguments for multiple perspectives and potential problems with generalization across studies
D. M. Dimitrov
Educational and Psychological Measurement
62
(2002)
- Deditius-Island:2002
-
An examination of the reliability of scores from Zuckerman's Sensation Seeking Scales
H. K. Deditius-Island and J. C. Caruso
Educational and Psychological Measurement
62
(2002)
- Caruso:2001a
-
Reliability of scores from the Eysenck Personality Questionnaire: A Reliability Generalization (RG) study
J. C. Caruso and K. Witkiewitz and A. Belcourt-Dittloff and J. Gottlieb
Educational and Psychological Measurement
61
675-682
(2001)
- Caruso:2001
-
Reliability Generalization of the Junior Eysenck Personality Questionnaire
J. C. Caruso and S. Edwards
Personality and Individual Differences
31
173-184
(2001)
A reliability generalization was conducted on the Psychoticism (P), Extraversion (E), Neuroticism (N) and Lie (L) scales of the Junior Eysenck Personality Questionnaire (J-EPQ). Twenty-three studies provided data on 44 samples of children who had been administered the J-EPQ. Score reliability was found to vary significantly both between and within scales. N and L provided the most reliable scores (with median reliabilities of 0.80 and 0.79 respectively) followed by E (median RELIABILITY=0.73) and P (median RELIABILITY=0.68). Scale length was the best predictor of score reliability, but sample gender makeup, language of administration, and the amount of variation in the ages of children in each sample were also significant predictors of reliability for various J-EPQ scales. The results highlight the importance of considering reliability to be a property of scores for a particular group, as opposed to a property of a test generally.
- Caruso:2000
-
Reliability Generalization of the NEO personality scales
J. C. Caruso
Educational and Psychological Measurement
60
236-254
(2000)
- Capraro:2002
-
Myers-Briggs Type Indicator score reliability across studies: A meta-analytic Reliability Generalization study
R. M. Capraro and M. M. Capraro
Educational and Psychological Measurement
62
659-673
(2002)
- Voelkle:2007
-
Effect sizes and F ratios < 1.0
M. C. Voelkle and P. L. Ackerman and W. W. Wittmann
Methodology
3
35-46
(2007)
Standard statistics texts indicate that the expected value of the F ratio is 1.0 (more precisely: N/(N-2)) in a completely balanced fixed-effects ANOVA, when the null hypothesis is true. Even though some authors suggest that the null hypothesis is rarely true in practice (e.g., Meehl, 1990), F ratios < 1.0 are reported quite frequently in the literature. However, standard effect size statistics (e.g., Cohen's f) often yield positive values when F < 1.0, which appears to create confusion about the meaningfulness of effect size statistics when the null hypothesis may be true. Given the repeated emphasis on reporting effect sizes, it is shown that in the face of F < 1.0 it is misleading to only report sample effect size estimates as often recommended. Causes of F ratios < 1.0 are reviewed, illustrated by a short simulation study. The calculation and interpretation of corrected and uncorrected effect size statistics under these conditions are discussed. Computing adjusted measures of association strength and incorporating effect size confidence intervals are helpful in an effort to reduce confusion surrounding results when sample sizes are small. Detailed recommendations are directed to authors, journal editors, and reviewers.
- Capraro:2001
-
Measurement error of scores on the Mathematics Anxiety Rating Scale across studies.
M. M. Capraro and R. M. Capraro and R. K. Henson
Educational and Psychological Measurement
61
373-386
(2001)
- Beretvas:2002
-
Using mixed-effects models in Reliability Generalization studies
S. N. Beretvas and D. A. Pastor
Educational and Psychological Measurement
62
(2002)
- Beretvas:2002a
-
A Reliability Generalization study of the Marlowe-Crowne Social Desirability Scale
S. N. Beretvas and J. L. Meyers and W. L. Leite
Educational and Psychological Measurement
62
(2002)
- Barnes:2002
-
Reliability Generalization of scores on the Speilberger State-trait Anxiety Inventory
L. L. B. Barnes and D. Harp and W. S. Jung
Educational and Psychological Measurement
62
(2002)
- Steiger:1992
-
R2: A computer program for interval estimation, power calculation, and hypothesis testing for the squared multiple correlation
J. H. Steiger and R. T. Fouladi
Behavior Research Methods, Instruments, and Computers
4
581-582
(1992)
- Cumming:2008
-
Inference by eye: Confidence intervals, and how to read pictures of data
G. Cumming and S. Finch
American Psychologist
(2008)
- Cumming:2001
-
A primer on the understanding, use and calculation of confidence intervals that are based on central and noncentral distributions
G. Cumming and S. Finch
Educational and Psychological Measurement
61
532-575
(2001)
- Algina:2003
-
Approximate confidence intervals for effect sizes
J. Algina and H. J. Keselman
Educational and Psychological Measurement
63
537-553
(2003)
- Vacha-Haase:2004
-
How to estimate and interpret various effect sizes
T. Vacha-Haase and B. Thompson
Counseling Psychology
51
473-481
(2004)
- Thompson:2008a
-
Complementary methods for research in education
B. Thompson
(2008)
- Thompson:2008
-
Research in organizations: Foundational principles, processes, and methods of inquiry
B. Thompson
(2008)
- Thompson:2002a
-
What future quantitative social science research could look like: Confidence intervals for effect sizes
B. Thompson
Educational Researcher
31
24-31
(2002)
- Thompson:2002
-
"Statistical," "practical," and "clinical": How many kinds of significance do counselors need to consider?
B. Thompson
Journal of Counseling and Development
80
64-71
(2002)
- Snyder:1993
-
Evaluating results using corrected and uncorrected effect size estimates
P. Snyder and S. Lawson
Journal of Experimental Education
61
334-349
(1993)
- Rosenthal:1994
-
The handbook of research synthesis
R. Rosenthal
(1994)
- Olejnik:2000
-
Measures of effect size for comparative studies: Applications, interpretations, and limitations
S. Olejnik and J. Algina
Contemporary Educational Psychology
25
241-286
(2000)
- Kline:2004
-
Beyond significance testing: Reforming data analysis methods in behavioral research
R. Kline
(2004)
- Kirk:2003
-
Handbook of research methods in experimental psychology
R. E. Kirk
83-105
(2003)
- Kirk:1996
-
Practical significance: A concept whose time has come
R. Kirk
Educational and Psychological Measurement
56
746-759
(1996)
- Hill:2004
-
Higher education: Handbook of theory and research
C. R. Hill and B. Thompson
19
175-196
(2004)
- Cortina:2000
-
Effect size for ANOVA designs
J. M. Cortina and H. Nouri
(2000)
- Thompson:1994
-
The Concept of Statistical Hypothesis Testing
B. Thompson
Measurement Update
4
5-6
(1994)
http://www.coe.tamu.edu/~bthompson/hyptest1.htm
- Thompson:1998a
-
Five methodology errors in educational research: The pantheon of statistical significance and other faux pas
B. Thompson
(1998)
http://www.coe.tamu.edu/~bthompson/aeraaddr.htm
- Thompson:1999
-
Common methodology mistakes in educational research, revisited, along with a primer on both effect sizes and the bootstrap
B. Thompson
(1999)
http://www.coe.tamu.edu/~bthompson/aeraad99.htm
- Thompson:1998
-
Statistical significance and effect size reporting: Portrait of a possible future
B. Thompson
Research in the Schools
5
33-38
(1998)
- Moore:1991
-
A confirmatory factor analysis of the Threat Index
M. K. Moore and R. A. Neimeyer
Journal of Personality and Social Psychology
60
122-129
(1991)
The Threat Index (TI), a measure of death concern grounded in personal construct theory, was submitted to psychometric refinement. The factorability of the TI using the traditional split-match scoring was compared with methods based on Manhattan, Euclidian, standardized Euclidian, and Mahalanobis distance formulas. Statistical and substantive interpretability were enhanced with the standardized Euclidian factor structure. The LISREL VI program was used to determine the best model for the scale in an exploratory factor analysis. A nonhierarchical, G + 3 model met the criterion of goodness of fit >0.9 for the 1st subsample (n = 405). In a confirmatory factor analysis with a 2nd subsample (n = 405), the model was confirmed. Internal consistency and test-retest reliability were acceptable for Global Threat and 3 subfactors--Threat to Well-Being, Uncertainty, and Fatalism--and all subfactors were found to be independent of social desirability.
- Agresti:2000
-
Random effects modeling of categorical response data
A. Agresti and J. G. Booth and J. P. Hobert and B. Caffo
(2000)
- From:2006
-
Estimation of the paramters of the Birnbaum-Saunders distribution
S. G. From and L. Li
Communications in Statistics -- Theory and Methods
35
2157-2169
(2006)
- Berg:2007
-
Variance decomposition using an IRT measurement model
S. M. van den Berg and C. A. W. Glas and D. I. Boomsma
Behavioral Genetics
37
604-616
(2007)
- Jackel:2003
-
A note on multivariate Gauss-Hermite quadrature
P. Jäckel
(2003)
- Boeck:2005
-
Conceptual and psychometric framework for distinguishing categories and dimensions
P. D. Boeck and M. Wilson and G. S. Acton
Psychological Review
112
129-158
(2005)
- Presnell:1994
-
Resampling methods for sample survey
B. Presnell and J. G. Booth
(1994)
- Gonzalez:2006
-
Numerical integration in logistic-normal models
J. Gonz\'{a}lez and F. Tuerlinckx and P. D. Boeck and R. Cools
Computational Statistics \& Data Analysis
51
1535-1548
(2006)
- Ip:2004
-
Locally dependent latent trait model for polytomous responses with application to inventory of hostility
E. H. Ip and Y. J. Wang and P. D. Boeck
Psychometrika
69
191-216
(2004)
- Hedeker:2000
-
Application of item response theory models for longitudinal data
D. Hedeker and R. J. Mermelstein and B. R. Flay
(2000)
- Janssen:1999
-
Confirmatory analyses of componential test structure using multidimensional item response theory
R. Janssen and P. D. Boeck
Multivariate Behavioral Research
34
245-268
(1999)
- Komarek:2003
-
Fast robust logistic regression for large sparse datasets with binary outputs
P. R. Komarek and A. W. Moore
(2003)
- Lubke:2000
-
Factor-analyzing Likert-scale data under the assumption of mutlivariate normality complicates a meaningful comparison of observed groups or latent classes
G. Lubke and B. Muth\'{e}
(2000)
- Leenen:2001
-
Models for ordinal hierarchical classes analysis
I. Leenen and I. V. Mechelen and P. D. Boeck
Psychometrika
66
389-404
(2001)
- Meulders:2003
-
A taxonomy of latent structure assumptions for probability matrix decomposition models
M. Meulders and P. D. Boeck and I. V. Mechelen
Psychometrika
68
61-77
(2003)
- Meulders:2005
-
Latent variable models for partially ordered responses and trajectory analysis of anger-related feelings
M. Meulders and E. H. Ip and P. D. Boeck
British Journal of Mathematical and Statistical Psychology
58
117-143
(2005)
- Lawrence:2000
-
Bayesian inference for ordinal data using multivariate probit models
E. Lawrence and D. Bingham and C. Liu and V. N. Nair
(2000)
- Wilson:2003
-
On choosing a model for measuring
M. Wilson
Methods of Psychological Research Online
8
1-22
(2003)
- Wermuth:2000
-
Analysing social science data with graphical Markov models
N. Wermuth
(2000)
- Tay-Lim:2000
-
Generating item responses for balanced-incomplete-block (BIB) design using the generalized partial credit model (GPCM)
B. S. Tay-Lim
(2000)
- Revelle:1979
-
Very Simple Structure: An alternative procedure for estimating the optimal number of interpretable factors
W. Revelle and T. Rocklin
Multivariate Behavioral Research
14
403-414
(1979)
- Rupp:2007
-
The development, calibration, and inferential validation of standards-based assessments for english as a first foreign language at the IQB
A. A. Rupp and M. Vock and C. Harsch
(2007)
- Thacher:2005
-
Using patient characteristics and attitudinal data to identify depression treatment preference groups: A latent-class model
J. A. Thacher and E. Morey and W. E. Craighead
(2005)
- Teresi:2004
-
Differential item functionning and health assessment
J. Teresi
(2004)
- Hardouin:2007a
-
Non parametric item response theory with SAS and Stata
J. Hardouin
Journal of Statistical Software
(2007)
- Norquist:2003
-
Rasch measurement in the assessment of amytrophic lateral sclerosis patients
J. M. Norquist and R. Fitzpatrick and C. Jenkinson
Journal of Applied Measurement
4
249-257
(2003)
- Groenen:2006
-
Visions of 70 years of psychometrics: the past, present, and future
P. J. F. Groenen and L. A. van der Ark
Statistica Neerlandica
60
135-144
(2006)
- Martin:2007
-
On the analysis of bayesian semiparametric IRT-type models
E. S. Martin and A. Jara and J. Rolin and M. Mouchart
(2007)
- Fox:2005b
-
Bayesian modification indices for IRT models
J. Fox and C. A. W. Glas
Statistica Neerlandica
59
95-106
(2005)
- Garcia-Zattera:2005
-
Conditional independence of multivariate binary data with an application in caries research
M. J. Garcia-Zattera and A. Jara and E. Lesaffre and D. Declerck
(2005)
- Finkelman:2007
-
Using person fit in a body of work standard setting
M. Finkelman and W. Kim
(2007)
- Hamon:2000
-
Modèle de Rasch et validation de questionnaires de qualité de vie
A. Hamon
(2000)
- Hamon:2002
-
Statistical Methods for Quality of Life Studies. Design, Measurement and Analysis
A. Hamon and M. Mesbah
(2002)
- Thomas:2002
-
Intégration de la théorie de la réponse aux items aux modèles par équations structurelles: Comparaison avec une régression fondée sur des scores TRI
D. R. Thomas and I. R. R. Lu and B. D. Zumbo
(2002)
- Fox:2001a
-
Multilevel IRT: A bayesian perspective on estimating parameters and testing statistical hypotheses
J. Fox
(2001)
- Vermunt:2007
-
Latent class and finite mixture models for multilevel data sets
J. K. Vermunt
Statistical Methods in Medical Research
(2007)
- Vermunt:2001a
-
Modeling joint and marginal distributions in the analysis of categorical panel data
J. K. Vermunt and M. F. Rodrigo and M. Ato-Garcia
Sociological Methods and Research
30
170-196
(2001)
- Vermunt:2001
-
The use restricted latent class models for defining and testing nonparametric and parame tric IRT models
J. K. Vermunt
Applied Psychological Measurement
25
283-294
(2001)
- Ark:2005
-
Stochastic Ordering of the Latent Trait by the Sum Score Under Various Polytomous IRT Models
L. A. van der Ark
Psychometrika
70
283-304
(2005)
- Fox:2005a
-
Multilevel IRT using dichotomous and polytomous response data
J. Fox
British Journal of Mathematical and Statistical Psychology
58
145-172
(2005)
- Cui:2006
-
The hierarchy consistency index: A person-fit statistic for the attribute hierarchy model
Y. Cui and J. P. Leighton and M. J. Gierl and S. M. Hunka
(2006)
- Zijlstra:2007
-
Outlier detection in test and questionnaire data
W. P. Zijlstra and L. A. van der Ark and K. Sijtsma
Multivariate Behavioral Research
(2007)
- Ginkel:2007b
-
Multiple Imputation of Item Scores in Test and Questionnaire Data, and Influence on Psychometric Results
J. R. van Ginkel and L. A. van der Ark and K. Sijtsma
Multivariate Behavioral Research
42
387-414
(2007)
The performance of five simple multiple imputation methods for dealing with missing data were compared. In addition, random imputation and multivariate normal imputation were used as lower and upper benchmark, respectively. Test data were simulated and item scores were deleted such that they were either missing completely at random, missing at random, or not missing at random. Cronbach's alpha, Loevinger's scalability coefficient H, and the item cluster solution from Mokken scale analysis of the complete data were compared with the corresponding results based on the data including imputed scores. The multiple-imputation methods, two-way with normally distributed errors, corrected item-mean substitution with normally distributed errors, and response function, produced discrepancies in Cronbach's coefficient alpha, Loevinger's coefficient H, and the cluster solution from Mokken scale analysis, that were smaller than the discrepancies in upper benchmark multivariate normal imputation.
- Ginkel:2007a
-
Multiple imputation for item scores when test data are factorially complex
J. R. van Ginkel and L. A. van der Ark and K. Sijtsma
British Journal of Mathematical and Statistical Psychology
(2007)
Multiple imputation under a two-way model with error is a simple and effective method that has been used to handle missing item scores in unidimensional test and questionnaire data. Extensions of this method to multidimensional data are proposed. A simulation study is used to investigate whether these extensions produce biased estimates of important statistics in multidimensional data, and to compare them with lower benchmark listwise deletion, two-way with error and multivariate normal imputation. The new methods produce smaller bias in several psychometrically interesting statistics than the existing methods of two-way with error and multivariate normal imputation. One of these new methods clearly is preferable for handling missing item scores in multidimensional test data.
- Rabe-Hesketh:2001b
-
Multilevel modeling of cognitive function in schizophrenic patients and their first degree relatives
S. Rabe-Hesketh and T. Toulopoulou and R. M. Murray
Multivariate Behavioral Research
36
279-298
(2001)
- Rossi:2007
-
Factor analysis of the Dutch-Language version of the MCMI-III
G. Rossi and L. A. van der Ark and H. Sloore
Journal of Personality Assessment
88
144-157
(2007)
- Fox:2005
-
Randomized item response theory models
J. Fox
Journal of Educational and Behavioral Statistics
30
1-24
(2005)
- Jong:2007
-
Using item response theory to measure extreme response style in marketing research: A global investigation
M. G. D. Jong and J. B. E. M. Steenkamp and J. Fox
Journal of Marketing Research
(2007)
- Petridou:2006
-
Instability of person misfit and ability estimates subject to assessment modality
A. Petridou and J. Williams
(2006)
- Hardouin:2007
-
Mathematical methods for survival analysis, reliability and Quality of life
J. Hardouin and M. Mesbah
(2007)
- Rabe-Hesketh:2001a
-
Maximum likelihood estimation of generalized linear model with covariate measurement error
S. Rabe-Hesketh and A. Skrondal and A. Pickles
The Stata Journal
1
(2001)
- Fox:2004a
-
Modelling Response Error in School Effectiveness Research
J. Fox
Statistica Neerlandica
58
138-160
(2004)
- Fox:2004
-
Applications of Multilevel IRT Modeling
J. Fox
School Effectiveness and School Improvement
15
261-280
(2004)
- Fox:2003
-
Stochastic EM for Estimating the Parameters of a Multilevel IRT Model
J. Fox
British Journal of Mathematical and Statistical Psychology
56
65-81
(2003)
- Fox:2001
-
Bayesian Estimation of a Multilevel IRT Model using Gibbs Sampling
J. Fox and C. A. W. Glas
Psychometrika
66
269-286
(2001)
- Fox:2000
-
Bayesian Modeling of Measurement Error in Predictor Variables Using Item Response Theory
J. Fox and C. A. W. Glas
(2000)
- Jara:2007
-
A Dirichlet process mixture model for the analysis of correlated binary responses
A. Jara and M. J. Garcia-Zattera and E. Lesaffre
Computational Statistics \& Data Analysis
51
5402-5415
(2007)
The multivariate probit model is a popular choice for modelling correlated binary responses. It assumes an underlying multivariate normal distribution dichotomized to yield a binary response vector. Other choices for the latent distribution have been suggested, but basically all models assume homogeneity in the correlation structure across the subjects. When interest lies in the association structure, relaxing this homogeneity assumption could be useful. The latent multivariate normal model is replaced by a location and association mixture model defined by a Dirichlet process. Attention is paid to the parameterization of the covariance matrix in order to make the Bayesian computations convenient. The approach is illustrated on a simulated data set and applied to oral health data from the Signal Tandmobiel^(R) study to examine the hypothesis that caries is mainly a spatially local disease.
- King:2001
-
Analyzing incomplete political science data: An alternative algorithm for multiple imputation
G. King and J. Honaker and A. Joseph and K. Scheve
American Political Science Review
95
49-69
(2001)
- Skrondal:2003
-
Some applications of generalized linear latent and mixed models in epidemiology: Repeated measures, measurement error and multilevel modeling
A. Skrondal and S. Rabe-Hesketh
Norsk Epidemiologi
13
265-278
(2003)
- Pornel:2004
-
A new statistic to detect misfitting score vector
J. B. Pornel and L. S. Sotaridona and A. L. Vallejo
(2004)
- Ginkel:2007
-
Two-way imputation: A bayesian method for estimating missing scores in tests and questionnaires, and an accurate approximation
J. R. van Ginkel and L. A. van der Ark and K. Sijtsma and J. K. Vermunt
Computational Statistics \& Data Analysis
51
4013-4027
(2007)
- Rabe-Hesketh:2001
-
Parametrization of multivariate random effects models for categorical data
S. Rabe-Hesketh and A. Skrondal
Biometrics
57
1256-1264
(2001)
- Abswoude:2004a
-
Mokken scale analysis using hierarchical clustering procedures
A. A. H. van Abswoude and J. K. Vermunt and B. T. Hemker and L. A. van der Ark
Applied Psychological Measurement
28
332-354
(2004)
- Abswoude:2004
-
A comparative study of test data dimensionality assessment procedures under nonparametric IRT models
A. A. H. van Abswoude and L. A. van der Ark and K. Sijtsma
Applied Psychological Measurement
28
3-24
(2004)
- Ark:2001
-
Relationships and properties of polytomous item response theory models
L. A. van der Ark
Applied Psychological Measurement
25
273-282
(2001)
- Noortgate:2003
-
Cross-classification multilevel logistic models in psychometrics
W. V. den Noortgate and P. D. Boeck and M. Meulders
Journal of Educational and Behavioral Statistics
28
369-386
(2003)
- Rijmen:2005
-
A relation between a between-item multidimensional IRT model and the mixture-Rasch model
F. Rijmen and P. D. Boeck
Psychometrika
70
481-496
(2005)
- Martin:2006
-
IRT models for ability-based guessing
E. S. Martin and G. del Pino
Applied Psychological Measurement
30
183-203
(2006)
- Smits:2003
-
Estimation of the MIRID: A program and a SAS based approach
D. J. M. Smits and P. D. Boeck and N. Verhelst
Behavior Research Methods, Instruments, & Computers
35
537-549
(2003)
- Verguts:2001a
-
Some Mantel-Haenszel tests of Rasch model assumptions
T. Verguts and P. D. Boeck
Journal of Mathematical & Statistical Psychology
54
21-37
(2001)
- Verguts:2000a
-
A note on the Martin-Löf test for unidimensionality
T. Verguts and P. D. Boeck
Methods of Psychological Research - Online
5
77-82
(2000)
- Tuerlinckx:2001
-
Non-modeled item interactions can lead to distorted discrimination parameters: A case study
F. Tuerlinckx and P. D. Boeck
Methods of Psychological Research - Online
6
159-174
(2001)
- Tuerlinckx:2006
-
Statistical inference in generalized linear mixed models: A review
F. Tuerlinckx and F. Rijmen and G. Verbeke and P. D. Boeck
British Journal of Mathematical & Statistical Psychology
59
225-255
(2006)
- Tuerlinckx:2005
-
Two interpretations of the discrimination parameter
F. Tuerlinckx and P. D. Boeck
Psychometrika
70
629-650
(2005)
In this paper we propose two interpretations for the discrimination parameter in the two-parameter logistic model (2PLM). The interpretations are based on the relation between the 2PLM and two stochastic models. In the first interpretation, the 2PLM is linked to a diffusion model so that the probability of absorption equals the 2PLM. The discrimination parameter is the distance between the two absorbing boundaries and therefore the amount of information that has to be collected before a response to an item can be given. For the second interpretation, the 2PLM is connected to a specific type of race model. In the race model, the discrimination parameter is inversely related to the dependency of the information used in the decision process. Extended versions of both models with person-to-person variability in the difficulty parameter are considered. When fitted to a data set, it is shown that a generalization of the race model that allows for dependency between choices and response times (RTs) is the best-fitting model.
- Verstralen:2000ab
-
A DOUBLE HAZARD MODEL FOR MENTAL SPEED
H. H. F. M. Verstralen and N. D. Verhelst and T. M. Bechger
(2000)
The administration of tests via the computer allows the registration of response times along with the actual response. This paper describes a model that combines these two kinds of data to estimate a subject latent variable usually called mental speed, but more appropriately called mental power. The model implies that the expected item score increases with invested time. Nevertheless, it allows for a decreasing expected item score with response time, which is sometimes found in experiments. This paradox is obtained by assuming that a subject not only stops working on a problem because of time pressure, but also when he has solved the problem. The model builds on a familiar framework of IRT models. An MML estimation procedure is developed, and model fit on the item level is evaluated using Lagrange
multiplier tests.
- Verstralen:1998aa
-
A Latent IRT Model for Options of Multiple Choice Items
H. H. F. M. Verstralen
(1998)
A latent IRT model for the analysis of multiple choice questions is proposed. The incorrect options of an item are associated with a decreasing logistic function that models the probability of being judged correct. It is assumed that the correct option is always recognized as such. According to the model a subject selects
randomly from the subset of options considered correct. Like its companion treated in Verstralen (1997) the model can be viewed as a generalization of Nedelsky's (1954) method to determine a pass/fail score. With this other model it has in common that the ML latent variable estimator gains some precision
compared to binary scoring. Both models also share some other favorable psychometric properties.
- Verstralen:1998ab
-
A Logistic Latent Class Model for Multiple Choice Items
H. H. F. M. Verstralen
(1998)
A logistic latent class model for the analysis of options of a class of multiple choice items is presented. For each item a set of latent classes with a chain structure is assumed. The probability of latent class membership is modeled by a logistic function. The conditional probability of the observed response, the
selection of an option, given the latent class membership is assumed to be constant. The model can be viewed as a generalization of Nedelsky's (1954) method to determine a pass/fail score. Apart from giving a more detailed model on the process of solving a multiple choice item an increase in the precision of
latent variable estimates in comparison with binary scoring is achieved. The model is shown to possess some favorable psychometric properties.
- Rijn:2000aa
-
A Selection Procedure for Polytomous Items in Computerized Adaptive Testing
P. W. van Rijn and T. J. H. M. Eggen and B. T. Hemker and P. F. Sanders
(2000)
In the present study, a procedure which was developed to select dichotomous items in
computerized adaptive testing was applied to polytomous items. The aim of this procedure is to select the item with maximum weighted information. In a simulation study, the item information function was integrated over a fixed interval of ability values and the item with the maximum area was selected. This maximum interval information item selection procedure was compared to a maximum point information item selection procedure. No substantial differences between the two item selection procedures were found when computerized adaptive tests were evaluated on bias and root mean square of the ability estimate.
- Bechger:2001aa
-
About the Cluster Kappa Coefficient
T. M. Bechger and B. T. Hemker and G. K. J. Maris
(2001)
The cluster kappa was proposed by Schouten (1982) as a measure of chance-corrected rater agreement suitable for studies where objects are rated on a categorical scale by two or more judges. We discuss a way to calculate the cluster kappa which is suited even if ratings are missing. Further, we demonstrate how the sampling error of the cluster kappa may be estimated.
- Ruitenburg:2006aa
-
ALGORITHMS FOR PARAMETER ESTIMATION IN THE RASCH MODEL
J. van Ruitenburg
(2006)
- Verschoor:2005aa
-
An Approximation of Cronbach's Î$\pm$ and its Use in Test Assembly
A. J. Verschoor
(2005)
In this paper a new approximation of Cronbach's Î$\pm$ is presented. It is especially suited in the context of test assembly. Using this approximation, two test assembly models
are introduced. Being non-linear models, they are solved by Genetic Algorithms as the
commonly used Linear Programming methods cannot be used here. A comparison is made
with existing test assembly models.
- Maris:2004aa
-
AN INTRODUCTION TO THE DA-T GIBBS SAMPLER FOR THE TWO-PARAMETER LOGISTIC (2PL) MODEL AND ITS APPLICATION
G. Maris and T. M. Bechger
(2004)
- Maris:2003aa
-
Are attitude items monotone or single peaked? An analysis using bayesian methods
G. Maris
(2003)
- Hickendorff:2005aa
-
Clustering Nominal Data with Equivalent Categories: a Simulation Study Comparing Restricted GROUPALS and Restricted Latent Class Analysis
M. Hickendorff
(2005)
- Bechger:2003ab
-
Combining classical test theory and item response theory
T. Bechger and G. Maris and A. Béguin and H. Verstralen
(2003)
- Straetmans:1998aa
-
Comparison of Test Administration Procedures for Placement Decisions in a Mathematics Course
G. J. J. M. Straetmans and T. J. H. M. Eggen
(1998)
In this study, three different test administration procedures for making placement decisions in adult education were compared: a paper-based test (PBT), a computer-based test (CBT), and a computerized adaptive test (CAT). All tests were prepared from an item response theory calibrated item bank. The subjects were 90 volunteer students from three adult education schools. They were randomly assigned to one of six experimental groups to take two tests which
differed in mode of administration. The results indicate that test performance was not differentially affected by the mode of administration and that the CAT always
yielded more accurate ability estimates than the two other test administration procedures. The CAT was also found to be capable of making placement decisions with a test that was on average 24% shorter.
- Straetmans:2003aa
-
Computerize Adaptive Testing: What It Is and How It Works
G. J. J. M. Straetmans and T. J. H. M. Eggen
(2003)
- Maris:2003ac
-
Concerning the identification of the 3PL model
G. Maris
(2003)
- Beguin:2001aa
-
Effect of Noncompensatory Multidimensionality on Separate and Concurrent estimation in IRT Observed Score Equating
A. A. Béguin and B. A. Hanson
(2001)
In this article, the results of a simulation study comparing the performance of separate and concurrent estimation of a unidimensional item response theory (IRT) model applied to multidimensional noncompensatory data are reported. Data were simulated according to a two-dimensional noncompensatory IRT model for both equivalent and nonequivalent groups designs. The criteria used were the accuracy
of estimating a distribution of observed scores, and the accuracy of IRT observed score equating. In general, unidimensional concurrent estimation resulted in lower or equivalent total error than separate estimation, although there were a few cases where separate estimation resulted in slightly less error than concurrent estimation.
Estimates from the correctly specified multidimensional model generally resulted in
less error than estimates from the unidimensional model. The results of this study,
along with results from a previous study where data were simulated using a compensatory multidimensional model, make clear that multidimensionality of the data affects the relative performance of separate and concurrent estimation, although the degree to which the unidimensional model produces biased results with multidimensional data depends on the type of multidimensionality present.
- Bechger:2000aa
-
Equivalent Linear Logistic Test Models
T. M. Bechger and H. H. F. M. Verstralen and N. D. Verhelst
(2000)
This paper is about the Linear Logistic Test Model (LLTM). We demonstrate that there are infinitely many equivalent ways to specify a model . An implication is that there may well be many ways to change the specification of a given LLTM and achieve the same improvement in model fit. To illustrate this phenomenon we analyze a real data set us ing a Lagrange multiplier test for the specification of the model.
- Maris:2003ab
-
Equivalent mirid models
G. Maris and T. Bechger
(2003)
- Verhelst:2000aa
-
Estimating the Reliability of a Test from a Single Test Administration
N. D. Verhelst
(2000)
The article discusses methods of estimating the reliability of a test from a single test administration. In the first part a review of existing indices is given, supplemented with two heuristics to approximate Guttmanís λ4 and a
new similar coefficient. Special attention is given to the greatest lower bound, to its meaning as well as to the problems in computing it. In the second part the relation between Cronbachís Î$\pm$ and the reliability is studied by means of
a factorial model for the item scores. This part gives some useful formulae to appreciate the amount with which the reliability is underestimated when Î$\pm$ is used as its estimand. In the last part, the sampling distribution of
the indices is investigated by means of two simulation studies, showing that the indices exhibit severe bias, the direction of which depends partly on the factorial structure of the test. For three indices the bias is modeled. The
model describes the bias accurately for all cases studied in the simulation studies. It is shown how this bias correction may be applied in the case of a single data set.
- Verstralen:2006aa
-
Explorations in recursive designs
H. Verstralen
(2006)
Starting from a set of basic designs, more complex designs are created by recursive application of the basic designs. Properties of these designs, and their effects on the accuracy of Rasch CML-parameter estimates are investigated.
- Maris:2005aa
-
FUZZY SET THEORY ⊆ PROBABILITY THEORY?
G. Maris
(2005)
- Bechger:2000ab
-
Identifiability of Non-Linear Logistic Test Models
T. M. Bechger and N. D. Verhelst and H. H. F. M. Verstralen
(2000)
The linear logistic test model (LLTM) specifies the item parameters as a weighted sum of basic parameters. The LLTM is a special case of a more
general non-linear logistic test model (NLTM) where the weights are partially unknown. This paper is about the identifiability of the NLTM. Sufficient and necessary conditions for global identifiability are presented for a NLTM where
the weights are linear functions, while conditions for local identifiability are shown to require less assumptions. It is also discussed how these conditions are checked using an algorithm due to Bekker, Merckens, and Wansbeek (1994). Several illustrations are given.
- Huitzing:2004aa
-
Infeasibility in Automated Test Assembly Models: A Comparison Study of Different Methods
H. A. Huitzing and B. P. Veldkamp and A. J. Verschoor
(2004)
Several techniques exist to automatically put together a test meeting a number of
specifications. In an item bank, the items are stored with their characteristics. A test is
constructed by selecting a set of items that fulfills the specifications set by the test
assembler. Test assembly problems are often formulated in terms of a model consisting
of restrictions and an objective to be maximized or minimized. A problem arises when it
is impossible to construct a test from the item pool that meets all specifications, that is,
when the model is not feasible. Several methods exist to handle these infeasibility
problems.
In this paper, test assembly models resulting from two practical testing programs
were reconstructed to be infeasible. These models were analyzed using methods that
either forced a solution (Goal programming, Multiple-Goal programming, Greedy
Heuristic), that analyzed the causes (Relaxed and Ordered Deletion Algorithm, Integer
Randomized Deletion Algorithm, Set Covering and Item Sampling), or that analyzed the
causes and used this information to force a solution (Irreducible-Infeasible-Set Solver).
Specialized methods like the Integer Randomized Deletion Algorithm, and the
Irreducible-Infeasible-Set-Solver performed best. Recommendations about the use of
different methods are given.
- Verstralen:2000aa
-
IRT models for subjective weights of options of multiple choice questions.
H. H. F. M. Verstralen and N. D. Verhelst
(2000)
From earlier investigations it was found that the information from Multiple Choice (MC) questions could be increased about four fold by having the subject indicate the subset of options that he is unable to expose as false. In the present models this approach is general ized by having the subject distribute a number of 'taws' over the options, or draw a line after the options, such that the number of taws given to an option, or the line length rejects its subjective degree of correctness. It appears that even with values of the relevant parameters that seem modest, the information relative to binary scoring still is in excess of two. This means that with less than half the test length the same accuracy or reliability can be obtained as with binary scoring. With a real data set we found a relative information greater than five.
If a few main fallacies can be rejected in the distractors of the items, the model can be applied to identify subjects with one of these fallacies.
- Verschoor:2004aa
-
IRT Test Assembly Using Genetic Algorithms
A. J. Verschoor
(2004)
This paper intro duces a new class of ptimisation methods in test assembly: Genetic Algorithms (GAs). In the first part an overview is given of the concepts and principles of GAs, in the second part they are applied to three commonly used test assembly models using Item Response Theory. Simulation studies are performed in order to find conditions under which GAs can be successfully used.
- Eggen:1998ab
-
Item Selection in Adaptive Testing with the Sequential Probability Ratio Test
T. J. H. M. Eggen
(1998)
Computerized adaptive tests (CATs) were originally developed to obtain an efficient estimate of an examinee's ability. For classification problems, applications of the Sequential Probability Ratio Test (Wald, 1947) have been shown to be a promising alternative for testing algorithms which are based on statistical estimation. However, the method of item selection currently being used in these
algorithms, which use statistical testing to infer on the examinees, is either random or based on a criterion which is related to optimizing estimates of examinees (maximum (Fisher) information). In this study, an item selection method based on Kullback-Leibler information is presented, which is theoretically more suitable
for statistical testing problems and which can improve the testing algorithm for classification problems.
Simulation studies were conducted for two- and three-way classification problems, in which item selection based on Fisher information and Kullback-Leibler information were compared. The results of these studies showed that the
performance of the testing algorithms with Kullback-Leibler information-based item selection are sometimes better and never worse than algorithms with Fisher information-based item selection.
- Eggen:2004aa
-
Loss of Information in Estimating Item Parameters in Incomplete Designs
T. J. H. M. Eggen and N. D. Verhelst
(2004)
In this paper, the efficiency of conditional maximum likelihood (CML) and marginal maximum likelihood (MML) estimation of the item parameters of the Rasch model in incomplete designs is studied. The use of the concept of F-information (Eggen, 2000) is generalized to incomplete testing designs. The standardized determinant of the F-information matrix is used for a scalar measure of information in a set of item parameters. In this paper, the relation between the normalization of the Rasch model and this determinant is clarified. It is shown that comparing estimation methods with the defined information efficiency is independent of the chosen normalization.
In examples, information comparisons are conducted. It is found that for both CML and
MML some information is lost in all incomplete designs compared to complete designs. A
general trend is that with increasing test booklet length the efficiency of an incomplete to a
complete design and also the efficiency of CML compared to MML is increasing. The main
differences between CML and MML is seen in relation to the length of the test booklet. It will
be demonstrated that with very small booklets, there is a substantial loss in information (about
35%) with CML estimation, while this loss is only about 10% in MML estimation. However,
with increasing test length, the differences between CML and MML quickly disappear.
- Verhelst:1998aa
-
Modeling Sums of Binary Responses by the Partial Credit Model
N. D. Verhelst and H. H. F. M. Verstralen
(1998)
The Partial Credit Model (PCM) is sometimes interpreted as a model for stepwise solution of polytomously scored items, where the item parameters are interpreted as difficulties of the steps. It is argued that this interpretation is not justified. A model for stepwise solution is discussed. It is shown that the PCM is suited to
model sums of binary responses which are not supposed to be stochastically independent. As a practical result, a statistical test of stochastic independence in the Rasch model is derived.
- Hemker:2000aa
-
On Measurement Properties of Continuation Ratio Models
B. T. Hemker and L. A. van der Ark and K. Sijtsma
(2000)
Three classes of polytomous IRT models are distinguished. These classes are the adjacent
category models, the cumulative probability models, and the continuation ratio models. So far, the latter class has received relatively little attention. The class of continuation ratio models
includes logistic models, such as the sequential model (Tutz, 1990), and non-logistic models,
such as the acceleration model (Samejima, 1995) and the nonparametric sequential model (Hemker, 1996). Four measurement properties are discussed. These are monotone likelihood
ratio of the total score, stochastic ordering of the latent trait by the total score, stochastic
ordering of the total score by the latent trait, and invariant item ordering. These properties
have been investigated previously for the adjacent category models and the cumulative
probability models, and for the continuation ratio models this is done here. It is shown that
stochastic ordering of the total score by the latent trait is implied by all continuation ratio
models, while monotone likelihood ratio of the total score and stochastic ordering on the
latent trait by the total score are not implied by any of the continuation ratio models. Only the
sequential rating scale model implies the property of invariant item ordering. Also, we present a Venn-diagram showing the relationships between all known polytomous IRT models from all three classes.
- Eggen:1998aa
-
On the Loss of Information in Conditional Maximum Likelihood Estimation of Item Parameters
T. J. H. M. Eggen
(1998)
In item response models of the Rasch type (Fischer & Molenaar, 1995), item
parameters are often estimated by the conditional maximum likelihood (CML)
method. This paper addresses the loss of information in CML estimation by using
the information concept of F-information (Liang, 1983). This concept makes it
possible to specify the conditions for no loss of information and to define a
quantification of information loss. For the dichotomous Rasch model, the
derivations will be given in detail to show the use of the F-information concept
for making efficiency comparisons for different estimation methods. It is shown
that by using CML for item parameter estimation, some information is almost
always lost. But compared to JML (joint maximum likelihood) as well as to MML
(marginal maximum likelihood) the loss is very small. The reported efficiency of
CML to JML and to MML in several comparisons is always larger than 93%, and
in tests with a length of 20 items or more, larger than 99%.
- Eggen:2004ab
-
Optimal Testing With Easy or Difficult Items in Computerized Adaptive Testing
T. J. H. M. Eggen and A. J. Verschoor
(2004)
Computerized adaptive tests (CATs) are individualized tests which, from a measurement point of view, are optimal for each individual, possibly under some practical conditions. In the
present study it is shown that maximum information item selection in CATs using an item bank which is calibrated with the one- or the two-parameter logistic model, results in each individual answering about 50% of the items correctly. Two item selection procedures giving easier (or more difficult) tests for students are presented and evaluated. Item selection on probability points of items yields good results only with the 1pl model and not with the 2pl model. An alternative selection procedure, based on maximum information at a shifted ability level, gives satisfactory results with both models.
- Eggen:2001aa
-
Overexposure and underexposure of items in computerized adaptive testing
T. J. H. M. Eggen
(2001)
Computerized adaptive tests (CATS) have shown to be considerably more efficient than paper-and-pencil tests. This gain is realized by offering each candidate the most informative item from an available item bank on the basis of the results of items that have already been administered. The item selection methods that are used to compose an optimum test for each individual do, however, have a number of drawbacks. Though a CAT generally presents each candidate with a different test, it often occurs that some items from the item bank are administered very frequently while others are never or hardly ever used. These two problems, i.e., overexposure and underexposure of items, can be eliminated by adding further restrictions to the item selection methods. However, this exposure control will affect the efficiency of the CAT. This paper presents a solution for both problems. The functioning of these methods will be illustrated with the results of simulation research that has been carried out to develop adaptive tests.
- Roelofs:2001aa
-
Preferences for various learning environments: Teachers' and parents' perceptions
E. C. Roelofs and J. J. C. M. Visser
(2001)
In the last ten years, a number of innovations, mainly inspired by constructivist notions of
learning, have been introduced in various levels of the Dutch educational system. However,
constructivist learning environments are rarely implemented. Teachers tend to stick to
expository and structured learning environments. This consistent finding requires research in order to gain insight into teachersí preferences for learning environments and to determine the factors that support and impede the realization of these learning environments. Regarding the influence of social backgrounds on student learning, is it also important to take stock of parental views on learning environments.
This study is focused on teachers' preferences for learning environments, their reported
teaching behavior, and how these match with parents' preferences. Three parallel
questionnaires were developed for teachers (n=281), students (n=952), and parents (n=717), measuring preferences and behavior in different levels of education, for three types of
learning environments: direct instruction, discovery learning, and authentic pedagogy.
The results show that teachers often prefer direct instruction, and seldom promote discovery
learning. While teachers sometimes realize authentic pedagogy, constructive learning tasks
are seldom used. Teachers' reported practice and parents' preferences for their children appear
to correspond reasonably.
Results of multiple regression analyses show that the use of the three types of learning
environments yield different predictors. For the use of discovery learning and authentic
pedagogy, confidence in students' regulative skills is an important predictor. In predicting the
use of direct instruction, the teacher's own conception of learning turns out to be an important
predictor.
- Bechger:2004aa
-
STRUCTURAL EQUATION MODELLING OF MULTIPLE FACET DATA: EXTENDING MODELS FOR MULTITRAIT-MULTIMETHOD DATA
T. M. Bechger and G. Maris
(2004)
This paper is about the structural equation modelling of quantitative measures that are obtained from a multiple facet design. A facet is simply a set consisting of a finite number of elements. It is assumed that measures are
obtained by combining each element of each facet. Methods and traits are two such facets, and a multitrait-multimethod study is a two-facet design. We extend models that were proposed for multitrait-multimethod data by
Wothke (1984;1996) and Browne (1984, 1989, 1993), and demonstrate how they can be fitted using standard software for structural equation modelling. Each model is derived from the model for individual measurements in order to clarify the first principles underlying each model.
- Verhelst:2002aa
-
Testing the unidimensionality assumption of the Rasch model
N. Verhelst
(2002)
Statistical tests especially designed to test the unidimensionality axiom of the Rasch model are scarce. For two of them, the Martin-Löf test
(ML-test) and the splitter-item-technique, an extensive power analysis has been carried out , showing clearly the superiority of the ML-test. The disadvantage of the ML-test, however, is that its null distribution deviates strongly from the asymptotic chi-square distribution unless one has huge samples. A new test with one degree of freedom is proposed. Its power is
superior to that of the ML-test, and its null distribution converges rapidly to the chi-square.
- Verstralen:2001aa
-
The Combined Use of Classical Test Theory and Item Response Theory
H. Verstralen and T. Bechger and G. Maris
(2001)
The present paper is about a number of relations between concepts of models from classical test theory (CTT), such as reliability, and item response theory (IRT). It is demonstrated that the use of IRT models allows us to extend the range of applications of CTT, and investigate relations among concepts that are central in CTT such as reliability and item-test correlation.
- Bechger:2003ac
-
The componential Nedelsky model: A first exploration
T. Bechger and G. Maris
(2003)
- Bechger:2003aa
-
The Nedelsky model for multiple choice items
T. Bechger and G. Maria and H. Verstralen and N. Verhelst
(2003)
- Maris:2003ad
-
Two methods for the practical analysis of rating data
G. Maris and T. Bechger
(2003)
- Verguts:2001
-
Some Mantel-Haenszel tests of Rasch model assumptions
T. Verguts and P. D. Boeck
Journal of Mathematical and Statistical Psychology
54
21-37
(2001)
- Verguts:2000
-
A note on the Martin-Löf test for unidimensionality
T. Verguts and P. D. Boeck
Methods of Psychological Research Online
5
(2000)
- Prieto:2003
-
Classical test theory versus Rasch analysis for quality of life questionnaire reduction
L. Prieto and J. Alonso and R. Lamarca
Health and Quality of Life Outcomes
1
(2003)
- Jiao:2004
-
Evaluating the dimensionality of the Michigan English Language Assessment Battery
H. Jiao
2
27-52
(2004)
- Rizopoulos:2005
-
Nonlinear effects in generalized latent variable models
D. Rizopoulos
(2005)
- Smith:2007
-
A Rasch and factor analysis of the Functional Assessment of Cancer Therapy-General (FACT-G)
A. B. Smith and P. Wright and P. J. Selby and G. Velikova
Health and Quality of Life Outcomes
5
(2007)
- Raiche:2005
-
Critical eigenvalue sizes in standardized residual principal components analysis
G. Raîche
Rasch Measurement Transactions
19
1012
(2005)
- Smith:2005
-
Rasch analysis of the dimensional structure of the hospital anxiety and depression scale
A. B. Smith and E. P. Wright and R. Rush and D. P. Stark and G. Velikova and P. J. Selby
Psycho-Oncology
(2005)
- Flieller:1994
-
Méthodes d'étude de l'adéquation au modèle logistique à un paramètre (modèle de Rasch)
A. Flieller
Mathématiques et Sciences Humaines
127
19-47
(1994)
- Orlando:2000
-
Critical issues to address when applying Item Response Theory (IRT) models
M. Orlando
(2000)
- Bock:1970
-
Fitting a response model for n dichotomously scored items
R. D. Bock and M. Lieberman
Psychometrika
35
179-197
(1970)
- Socan:2000
-
Assessment of reliability when test items are not essentially tau-equivalent
G. Socan
(2000)
- Junker:1996
-
Exploring monotonicity in polytomous item response data
B. W. Junker
(1996)
- Junker:2000a
-
Nonparametric IRT in Action: An overview of the special issue
B. W. Junker and K. Sijtsma
(2000)
- Linardakis:1996
-
An approach to multidimensional item response modeling
M. Linardakis and P. Dellaportas
(1996)
- Junker:2000
-
Monotonicity and conditional independence in models for student assessment and attitude measurement
B. W. Junker
(2000)
- Rudner:2001
-
Measurement Decision Theory
L. M. Rudner
(2001)
- Bentler:2004
-
Maximal reliability for unit-weighted composites
P. M. Bentler
(2004)
- Sijtsma:1994
-
A survey of theory and methods of invariant item ordering
K. Sijtsma and B. W. Junker
(1994)
- Mazor:1995
-
Using logistic regression and the Mantel-Haenszel with multiple ability estimates to detect Differential Item Functioning
K. M. Mazor and A. Kanjee and B. E. Clauser
Journal of Educational Measurement
32
131-144
(1995)
- Zwick:1990
-
When do item reponse function and Mantel-Haenszel definitions of Differential Item Functioning coincide?
R. Zwick
Journal of Educational Statistics
15
185-197
(1990)
- Shapiro:2000
-
The asymptotic bias of minimum trace factor analysis, with applications to the greatest lower bound to reliability
A. Shapiro and J. M. F. T. Berge
Psychometrika
65
413-425
(2000)
- Callender:1979
-
An empirical comparison of coefficient alpha, guttman's lambda-2, and MSPLIT maximized split-half reliability estimates
J. C. Callender and H. G. Osburn
Journal of Educational Measurement
16
89
(1979)
- Adema:1989
-
Algorithms for computerized test construction using classical item parameters
J. J. Adema and W. J. van der Linden
Journal of Educational Statistics
14
279-290
(1989)
- Armstrong:1998
-
Optimization of classical reliability in test construction
R. D. Armstrong and D. H. Jones
Journal of Educational and Behavioral Statistics
23
1-17
(1998)
- Raiche:2002a
-
La simulation d'un test adaptatif basé sur le modèle de Rasch
G. Raîche
(2002)
- Kintsch:1999
-
The role of long-term memory in text comprehension
W. Kintsch and V. L. Patel and K. A. Ericsson
Psychologia
42
186-198
(1999)
- Hermann:1999
-
Assessing leadership style: A trait analysis
M. G. Hermann
(1999)
- Papadimitriou:1997
-
Latent semantic indexing: A probabilistic analysis
C. H. Papadimitriou and P. Raghavan and H. Tamaki
(1997)
- Deerwester:1990
-
Indexing by latent semantic analysis
S. Deerwester and S. T. Dumais and R. Harshman
Journal of the American Society for Information Science
41
391-407
(1990)
- Leeuw:2003
-
Principal component analysis with binary data. Applications to roll-call analysis
J. de Leeuw
(2003)
- Lawrence:2005
-
Probabilistic non-linear principal component analysis with gaussian process latent variables models
N. Lawrence
Journal of Machine Learning Research
6
1783-1816
(2005)
- Kemkes:2006
-
Objective scoring for computing competition tasks
G. Kemkes and T. Vasiga and G. Cormack
(2006)
- Eggen:2005
-
Computerized adaptive testing
T. Eggen
(2005)
- Krzanowski:2006
-
Sensitivity in metric scaling and analysis of distance
W. J. Krzanowski
Biometrics
62
239-244
(2006)
- Hofmann:1999
-
Probabilistic latent semantic analysis
T. Hofmann
(1999)
- Farahat:2006
-
Improving probabilistic latent semantic analysis with principal component analysis
A. Farahat and F. Chen
(2006)
- Allegre:2003
-
Un système d'observation et d'analyse en direct de séances d'enseignement
E. Allègre and P. Dessus
(2003)
- Rehder:1998
-
Using latent semantic analysis to assess knowledge: Some technical considerations
B. Rehder and M. E. Schreiner and M. B. W. Wolfe and D. Laham
(1998)
- Wolfe:1998
-
Learning from text: Matching readers and texts by latent semantic analysis
M. B. W. Wolfe and M. E. Schreiner and B. Rehder and D. Laham
Discourse Processes
25
309-336
(1998)
- Foltz:1998
-
The measurement of textual coherence with latent semantic analysis
P. W. Foltz and W. Kintsch and T. K. Landauer
Discourse Processes
25
285-307
(1998)
- Landauer:1998
-
An introduction to latent semantic analyses
T. K. Landauer and P. W. Foltz and D. Laham
Discourse Processes
25
259-284
(1998)
- Huang:2003
-
Psychometric analyses based on evidence-centered design and cognitive science of learning to explore students' problem-solving in physics
C. Huang
(2003)
- Lazarevska:2005
-
The distinctive language of terrorists
E. Lazarevska and J. M. Sholl and M. Young
(2005)
- Zumbo:1999
-
A handbook on the theory and methods of Differential Item Functioning (DIF)
B. D. Zumbo
(1999)
- Agresti:2005
-
Bayesian inference for categorical data analysis
A. Agresti and D. Hitchcock
Statistical Methods and Application (Journal of the Italian Statistical Society)
(2005)
- Davis:2002
-
Strategies for controlling item exposure in computerized adaptive testing with polytomously scored items
L. L. Davis
(2002)
- Landauer:1997
-
How Well Can Passage Meaning be Derived without Using Word Order? A Comparison of Latent Semantic Analysis and Humans
T. K. Landauer and D. Laham and B. Rehder and M. E. Schreiner
(1997)
- Landauer:2004
-
From paragraph to graph: latent semantic analysis for information visualization
T. K. Landauer and D. Laham and M. Derr
Proceedings of the National Academy of Sciences USA
101
5214-5219
(2004)
Most techniques for relating textual information rely on intellectually created links such as author-chosen keywords and titles, authority indexing terms, or bibliographic citations. Similarity of the semantic content of whole documents, rather than just titles, abstracts, or overlap of keywords, offers an attractive alternative. Latent semantic analysis provides an effective dimension reduction method for the purpose that reflects synonymy and the sense of arbitrary word combinations. However, latent semantic analysis correlations with human text-to-text similarity judgments are often empirically highest at approximately 300 dimensions. Thus, two- or three-dimensional visualizations are severely limited in what they can show, and the first and/or second automatically discovered principal component, or any three such for that matter, rarely capture all of the relations that might be of interest. It is our conjecture that linguistic meaning is intrinsically and irreducibly very high dimensional. Thus, some method to explore a high dimensional similarity space is needed. But the 2.7 x 10(7) projections and infinite rotations of, for example, a 300-dimensional pattern are impossible to examine. We suggest, however, that the use of a high dimensional dynamic viewer with an effective projection pursuit routine and user control, coupled with the exquisite abilities of the human visual system to extract information about objects and from moving patterns, can often succeed in discovering multiple revealing views that are missed by current computational algorithms. We show some examples of the use of latent semantic analysis to support such visualizations and offer views on future needs.
- Bianco:2005
-
Modélisation des processus de hiérarchisation et d'application de macrorègles et conception d'un prototype d'aide au résumé
M. Bianco and P. Dessus and B. Lemaire and S. Mandin and P. Mendelsohn
(2005)
- Laham:1997
-
Latent semantic analysis approaches to categorization
D. Laham
979
(1997)
- Bestgen:2002
-
L'analyse sémantique latente et l'identification des métaphores
Y. Bestgen and A. Cabiaux
(2002)
- Hernandez:2006
-
A Procedure for Estimating Intrasubject Behavior Consistency
J. M. Hern\'{a}ndez and V. J. Rubio and J. Revuelta and J. Santacreu
Educational and Psychological Measurement
66
417-434
(2006)
Trait psychology implicitly assumes consistency of the personal traits. Mischel, however, argued against the idea of a general consistency of human beings. The present article aims to design a statistical procedure based on an adaptation of the $\pi^*$ statistic to measure the degree of intraindividual consistency independently of the measure used. Three studies were carried out for testing the suitability of the $\pi^*$ statistic and the proportion of subjects who act consistently. Results have shown the appropriateness of the statistic proposed and that the percentage of consistent individuals depends on whether test items can be assumed as equivalents and the number of response alternatives they contained. The results suggest that the percentage of consistent subjects is far from 100%, and this percentage decreases when items are equivalent. Moreover, the greater the number of response options, the lesser the percentage of consistent individuals.
- Revuelta:2004
-
Analysis of distractor difficulty in Multiple-Choice items
J. Revuelta
Psychometrika
69
217-234
(2004)
Two psychometric models are presented for evaluating the difficulty of the distractors in multiple-choice items. They are based on the criterion of rising distractor selection ratios, which facilitates interpretation of the subject and item parameters. Statistical inferential tools are developed in a Bayesian framework: modal a posteriori estimation by application of an EM algorithm and model evaluation by monitoring posterior predictive replications of the data matrix. An educational example with real data is included to exemplify the application of the models and compare them with the nominal categories model.
- Wang:1998
-
An ANOVA-like Rasch analysis of differential item functioning
W. Wang
(1998)
- Blais:2003
-
Une étude de l'accord et de la fidélité inter juges comparant un modèle de la théorie de la généralisabilité et un modèle de la famille de Rasch
J. Blais and N. Loye
(2003)
- Raiche:2002
-
Objective measurement, Theory into practice
G. Raîche and J. Blais
6
(2002)
- Bailey:2001
-
Ideal point estimation with a small number of votes: A random-effects approach
M. Bailey
Political Analysis
9
192-210
(2001)
- Youness:2004
-
Contributions à une méthodologie de comparaison de partitions
G. Youness
(2004)
- Way:2006
-
Practical questions in introducing computerized adaptive testing for K-12 assessments
W. D. Way and L. L. Davis and S. Fitzpatrick
(2006)
- Schein:2003
-
A generalized linear model for principal component analysis of binary data
A. I. Schein and L. K. Saul and L. H. Ungar
(2003)
- Hardouin:2005
-
Construction d'échelles d'items unidimensionnelles en qualité de vie
J. Hardouin
(2005)
- Klein:2005
-
Graphical models for panel studies, illustrated on data from the framingham heart study
J. P. Klein and N. Keiding and S. Kreiner
(2005)
- Partchev:2004
-
A visual guide to item response theory
I. Partchev
(2004)
- Gruijter:2005
-
Statistical test theory for education and psychology
D. N. M. de Gruijter and L. J. T. van der Kamp
(2005)
- Boeck:2004
-
Explanatory Item Response Models: a Generalized Linear and Nonlinear Approach
P. D. Boeck and M. Wilson
(2004)
http://www.springer.com/west/home?SGWID=4-102-22-26922428-0&changeHeader=true&SHORTCUT=www.springer.com/978-0-387-40275-8
- Rijmen:2003
-
A nonlinear mixed model framework for item response theory
F. Rijmen and F. Tuerlinckx and P. D. Boeck and P. Kuppens
Psychological Methods
8
185-205
(2003)
Mixed models take the dependency between observations based on the same cluster
into account by introducing 1 or more random effects. Common item response
theory (IRT) models introduce latent person variables to model the dependence
between responses of the same participant. Assuming a distribution for the latent
variables, these IRT models are formally equivalent with nonlinear mixed models.
It is shown how a variety of IRT models can be formulated as particular instances
of nonlinear mixed models. The unifying framework offers the advantage that
relations between different IRT models become explicit and that it is rather straight-
forward to see how existing IRT models can be adapted and extended. The ap-
proach is illustrated with a self-report study on anger.
- May:2006
-
A multilevel bayesian item response theory method for scaling socioeconomic status in international studies of education
H. May
Journal of Educational and Behavioral Statistics
31
63-79
(2006)
A new method is presented and implemented for deriving a scale of socioeconomic status (SES) from international survey data using a multilevel Bayesian item response theory (IRT) model. The proposed model incorporates both international anchor items and nation-specific items and is able to (a) produce student family SES scores that are internationally comparable, (b) reduce the influence of irrelevant national differences in culture on the SES scores, and (c) effectively and efficiently deal with the problem of missing data in a manner similar to Rubin's (1987) multiple imputation approach. The results suggest that this model is superior to conventional models in terms of its fit to the data and its ability to use information collected via international surveys.
- Borsboom:2004fj
-
The concept of validity
D. Borsboom and G. J. Mellenbergh and J. van Heerden
Psychological Review
111
1061-1071
(2004)
http://users.fmg.uva.nl/dborsboom/papers.htm
This article advances a simple conception of test validity: A test is valid for measuring an attribute if (a) the attribute exists and (b) variations in the attribute causally produce variation in the measurement outcomes. This conception is shown to diverge from current validity theory in several respects. In particular, the emphasis in the proposed conception is on ontology, reference, and causality, whereas current validity theory focuses on epistemology, meaning, and correlation. It is argued that the proposed
conception is not only simpler but also theoretically superior to the position taken in the existing literature. Further, it has clear theoretical and practical implications for validation research. Most important, validation research must not be directed at the relation between the measured attribute and other attributes but at the processes that convey the effect of the measured attribute on the test scores.
- Bond:2003
-
Validity and assessment: a rasch measurement perspective
T. G. Bond
Metodologia de las Ciencias del Comportamiento
5
179-194
(2003)
This paper argues that the Rasch model, unlike the other models generally referred to as IRT models, and those that fall into the tradition of True Score models, encompasses a set of rigorous prescriptions for what scientific measurement would be like if it were to be achieved in the social sciences. As a direct consequence, the Rasch measurement approach to the construction and monitoring of variables is sensitive to the issues raised in Messick's (1995) broader conception of construct validity. The theory / practice dialectic (Bond & Fox, 2001) ensures that validity is foremost in the mind of those developing measures and that genuine scientific measurement is foremost in the minds of those who seek valid outcomes from assessment. Failures of invariance, such as those referred to as DIF, should alert researchers to the need to modify assessment procedures or the substantive theory under investigation, or both.