❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayAmerican Educational Research Association: Journal of Educational and Behavioral Statistics: Table of Contents

Using the Information Metric to Analyze Clinical Rating Scales

Journal of Educational and Behavioral Statistics, Ahead of Print.
A rating scale is a set of categories designed to obtain information about a quantitative or a qualitative attribute. Item response theory (IRT) proposes that a probability function over a single latent variable represents the overall attribute evolution that the scale is designed to assess. Here we utilize an information theory approach to IRT to analyze rating scale data. The proposed IRT analyses, based on surprisal, offer new tools for assessing raters, rated items, and the whole rating scale. The information transformation from probability to surprisal is a new lens from which to view choice data and is an important augmentation of probability-based IRT. It also offers new graphical tools to measure the amount of information captured by an item in an additive metric, and to measure covariation among items using mutual information. The proposed methodology is illustrated using two scales from real clinical data and the proposed approach is compared with analyses made with the commonly used parametric IRT graded response model. Practical implications of the proposed methodology are provided.

Bayesian Variable Selection in Dynamic Item Response Theory Models

Journal of Educational and Behavioral Statistics, Ahead of Print.
The recent surge in computerized testing brings challenges in the analysis of testing data with classic item response theory (IRT) models. To handle individually varying and irregularly spaced longitudinal dichotomous responses, we adopt a dynamic IRT model framework and then extend the model to link with individual characteristics at a hierarchical level. Further, we have developed an algorithm to select important characteristics of individuals that can capture the growth changes of one’s ability under this multi-level dynamic IRT model, where we can compute the Bayes factor of the proposed model including different covariates using a single Markov chain Monte Carlo output from the full model. In addition, we have shown the model selection consistency under the modified Zellner–Siow prior, and we have conducted simulations to illustrate the properties of the model selection consistency in finite samples. Finally, we have applied our proposed model and computational algorithms to a real data application, called EdSphere dataset, in educational testing.

Estimating Causal Mediation Effects in Multiple-Mediator Analyses With Clustered Data

Journal of Educational and Behavioral Statistics, Ahead of Print.
Multiple-mediator analyses with clustered data are common in educational and behavioral sciences, but limited methods exist to assess the causal mediation effects via each of multiple mediators. In this study, we extend the multiply robust method to make inferences on the causal mediation effects for two mediators with clustered data. The developed method takes into account unmeasured cluster-level confounders and can incorporate machine learning methods to nonparametrically estimate nuisance models while allowing uncertainty quantification via asymptotic standard errors and confidence intervals. We conduct simulations to evaluate the developed method for inference of both the individual-average and cluster-average causal mediation effects with clustered data. We illustrate our method using data from the Education Longitudinal Study.

A Polytomous Extension of the Higher-Order, Hidden Markov Model With Covariates and Hierarchical Learning Trajectories

Journal of Educational and Behavioral Statistics, Ahead of Print.
In practice, constructed-response items are commonly used to diagnose students’ performance using polytomous scores. Existing longitudinal cognitive diagnosis models (CDMs) primarily focus on dichotomizing the data, unsuitable for polytomous scores. This article introduces a longitudinal CDM for handling polytomous responses over time. The proposed model expands the capabilities of learning models to handle polytomous data and account for hierarchies among attributes within the CDMs. For estimation, a Gibbs formulation was proposed to estimate parameters in the measurement part, while a Metropolis-Hastings sampler was employed for the transition part. An empirical study was conducted to showcase the practical application and advantages of the proposed model. Additionally, two simulation studies demonstrated that parameters can be well recovered under various conditions.

New Iterative Algorithms for Estimation of Item Functioning

Journal of Educational and Behavioral Statistics, Ahead of Print.
This article explores innovations for parameter estimation in generalized linear and nonlinear models, which may be used in item response modeling to account for guessing/pretending or slipping/dissimulation and for the effect of covariates. We introduce a new implementation of the EM algorithm and propose a new algorithm based on the parametrized link function. The two novel iterative algorithms are compared to existing methods in a simulation study. Additionally, the study examines software implementation, including the specification of initial values for numerical algorithms and asymptotic properties with an estimation of standard errors. Overall, the newly proposed algorithm based on the parametrized link function outperforms other procedures, especially for small sample sizes. Moreover, the newly implemented EM algorithm provides additional information regarding respondents’ inclination to guess or pretend and slip or dissimulate when answering the item. The study also discusses applications of the methods in the context of the detection of differential item functioning and addresses the measurement error. Methods are offered in the difNLR package and in the interactive application of the ShinyItemAnalysis package; demonstration is provided using real data from psychological and educational assessments.

Online Calibration for Multidimensional CAT With Polytomously Scored Items: A Neural Network–Based Approach

Journal of Educational and Behavioral Statistics, Ahead of Print.
Online calibration is a key technology for calibrating new items in computerized adaptive testing (CAT). As multidimensional polytomous data become popular, online calibration methods applicable to multidimensional CAT with polytomously scored items (P-MCAT) have been proposed. However, the existing methods are mainly based on marginal MLE with an expectation-maximization algorithm (MMLE/EM), making it difficult to accurately estimate parameters in high-dimensional scenarios without sufficient calibration sample size or suitable initial values. To conquer these challenges, a neural network (NN)-based online calibration framework was put forward. The new NN-based methods differ profoundly from the traditional ones in that the parameter estimates of new items are obtained by learning the patterns between input and output data instead of finding solutions to the log-marginal likelihood. Moreover, an alternative solution was proposed for traditional methods to obtain appropriate initial values. Simulation studies were conducted to compare the NN- and MMLE/EM-based methods under various conditions, and further explore the properties of the NN-based methods. Results showed that both the NN-based methods and the alternative solution found their strengths in recovering the item parameters of new items, while the MMLE/EM-based methods struggled to converge when more than three dimensions were involved in the test.

An Improved Satterthwaite (1941, 1946) Effective df Approximation

Journal of Educational and Behavioral Statistics, Ahead of Print.
This study introduces a correction to the approximation of effective df as proposed by Satterthwaite, specifically addressing scenarios where component df are small. The correction is grounded in analytical results concerning the moments of standard normal random variables. This modification is applicable to complex variance estimates that involve both small and large df, offering an enhanced approximation of the higher moments required by Satterthwaite’s framework. Additionally, this correction extends and partially validates the empirically derived adjustment by Johnson and Rust, as it is based on theoretical foundations rather than simulations used to derive empirical transformation constants. Finally, the proposed adjustment also provides a correction to the estimate of the total variance in cases missing data have been replaced by multiple imputations such as in the case of plausible values in national and international large scale assessments.

Using MLP-F in Three Different Aberrant Behaviors in Education

Journal of Educational and Behavioral Statistics, Ahead of Print.
Zhu et al. proposed a person-fit method based on the neural network called machine learning person-fit method (MLP-F) and found promising improvements over some traditional methods. MLP-F relies on constructing an appropriate neural network and uses mean square error as a loss function of the neural network. The primary focus of this study is to explore the potential improvement in classifications by replacing mean squared error with cross-entropy. Additionally, the application of MLP-F requires the establishment of a large number of output nodes when dealing with numerous attributes in an exam. However, an excess of nodes in the output layer may diminish classification accuracy and escalate the demand for training data. This article introduces a novel neural network architecture designed to be more versatile and robust. The findings from the research indicate that utilizing a cross-entropy loss function and the new neural network architecture enhances the performance of MLP-F. Simulation studies, considering various aberrant behaviors, demonstrate that MLP-F is effective in identifying aberrant behaviors and particularly excels in shorter tests, showcasing its potential significance in classroom testing.

A Quasi-Poisson Item Response Theory Model for Heterogeneous Dispersion in Count Data

Journal of Educational and Behavioral Statistics, Ahead of Print.
Item-level count data frequently arise in cognitive, educational, and psychological assessments. Correctly handling different dispersion levels in count data is crucial for accurate statistical inference. This research proposes a Quasi-Poisson item response theory model that accommodates overdispersion, underdispersion, and equidispersion in count data, aiming to explicitly model the connection between the mean and variance parameters, providing a method that is both computationally efficient and statistically robust. This semiparametric model specifies the first two conditional moments for the count variables and derives marginal moments to estimate model parameters. Simulation studies demonstrate the Quasi-Poisson model’s efficacy in parameter recovery across different dispersion scenarios and its negligible computation time. Empirical data analysis further underscores the model’s superior fit and computational efficiency in a real-world setting.

A Hybrid EM Algorithm for Linear Two-Way Interactions With Missing Data

Journal of Educational and Behavioral Statistics, Ahead of Print.
We study an Expectation-Maximization (EM) algorithm for estimating product-term regression models with missing data. The study of such problems in the frequentist tradition has thus far been restricted to an EM algorithm method using full numerical integration. However, under most missing data patterns, we show that this problem can be solved analytically, and numerical approximations are only needed under specific conditions. Thus we propose a hybrid EM algorithm, which uses analytic solutions when available and approximate solutions only when needed. The theoretical framework of our algorithm is described herein, along with three empirical experiments using both simulated and real data. We demonstrate that our algorithm provides greater estimation accuracy, exhibits robustness to distributional violations, and confers higher power to detect interaction effects. We conclude with a discussion of extensions and topics of further research.

Bayesian Diagnostic Classification Models for a Partially Known Q-Matrix

Journal of Educational and Behavioral Statistics, Ahead of Print.
This study proposes a Bayesian method for diagnostic classification models (DCMs) for a partially known Q-matrix setting between exploratory and confirmatory DCMs. This Q-matrix setting is practical and useful because test experts have pre-knowledge of the Q-matrix but cannot readily specify it completely. The proposed method employs priors for the Bayesian variable selection to simultaneously estimate the effects of active and nonactive attributes, and the simulations lead to appropriate attribute recovery rates. Furthermore, the proposed method recovers the attribute mastery of individuals at the same as for a fully known Q-matrix. In addition, the proposed methods can be used to estimate the unknown Q-matrix part. A real data example indicates that the proposed Bayesian estimation method for the partially known Q-matrix fits better than a fully specified Q-matrix. Finally, extensions and future research directions are discussed.

Smoothing of Bivariate Test Score Distributions: Model Selection Targeting Test Score Equating

Journal of Educational and Behavioral Statistics, Ahead of Print.
Observed-score test equating is a vital part of every testing program, aiming to make test scores across test administrations comparable. Central to this process is the equating function, typically estimated by composing distribution functions of the scores to be equated. An integral part of this estimation is presmoothing, where statistical models are fit to observed score frequencies to mitigate sampling variability. This study evaluates the impact of commonly used model fit indices on bivariate presmoothing model-selection accuracy in both item response theory (IRT) and non-IRT settings. It also introduces a new model-selection criterion that directly targets the equating function in contrast to existing methods. The study focuses on the framework of non-equivalent groups with anchor test design, estimating bivariate score distributions based on real and simulated data. Results show that the choice of presmoothing model and model fit criterion influences the equated scores. In non-IRT contexts, a combination of the proposed model-selection criterion and the Bayesian information criterion exhibited superior performance, balancing bias, and variance of the equated scores. For IRT models, high selection accuracy and minimal equating error were achieved across all scenarios.

Using Ordering Theory to Learn Attribute Hierarchies From Examinees’ Attribute Profiles

Journal of Educational and Behavioral Statistics, Ahead of Print.
In cognitive diagnosis, attribute hierarchies are considered important structural features of cognitive diagnostic models, as they provide auxiliary information about the nature of attributes. In this article, the idea of ordering theory is applied to cognitive diagnosis, and a new approach to identify attribute hierarchy based on the attribute correlation intensity matrix is proposed. This approach attempts to identify attribute hierarchy in data with a small sample size while ensuring a high accuracy rate. The results of simulation studies and empirical data analysis show that the proposed approach can be used to identify attribute hierarchy in diagnostic tests, especially in small samples, making it worth popularizing.
❌
❌