❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdaySAGE Publications Inc: Applied Psychological Measurement: Table of Contents

Modeling Within- and Between-Person Differences in the Use of the Middle Category in Likert Scales

Applied Psychological Measurement, Ahead of Print.
When using Likert scales, the inclusion of a middle-category response option poses a challenge for the valid measurement of the psychological attribute of interest. While this middle category is often included to provide respondents with a neutral response option, respondents may in practice also select this category when they do not want to or cannot give an informative response. If one analyzes the response data without considering these two possible uses of the middle response category, measurement may be confounded. In this paper, we propose a response-mixture IRTree model for the analysis of Likert-scale data. This model acknowledges that the middle response category can either be selected as a non-response option (and hence be uninformative for the attribute of interest) or to communicate a neutral position (and hence be informative), and that this choice depends on both person- and item-characteristics. For each observed middle-category response, the probability that it was intended to be informative is modeled, and both the attribute of substantive interest and a non-response tendency are estimated. The performance of the model is evaluated in a simulation study, and the procedure is applied to empirical data from personality psychology.

Weighted Answer Similarity Analysis

Applied Psychological Measurement, Ahead of Print.
Romero et al. (2015; see also Wollack, 1997) developed the Ο‰ statistic as a method for detecting unusually similar answers between pairs of examinees. For each pair, the Ο‰ statistic considers whether the observed number of similar answers is significantly larger than the expected number of similar answers. However, one limitation of Ο‰ is that it does not account for the particular items on which similar answers are observed. Therefore, in this study, we propose a weighted version of the Ο‰ statistic that takes this information into account. We compare the performance of the new and existing statistics using detailed simulations in which several factors are manipulated. Results show that while both the new and existing statistics are able to control the Type I error rate, the new statistic is more powerful, on average.

Impact of Parameter Predictability and Joint Modeling of Response Accuracy and Response Time on Ability Estimates

Applied Psychological Measurement, Ahead of Print.
To maintain test quality, a large supply of items is typically desired. Automatic item generation can result in a reduction in cost and labor, especially if the generated items have predictable item parameters and thus possibly reducing or eliminating the need for empirical tryout. However, the effect of different levels of item parameter predictability on the accuracy of trait estimation using item response theory models is unclear. If predictability is lower, adding response time as a collateral source of information may mitigate the effect on trait estimation accuracy. The present study investigates the impact of varying item parameter predictability on trait estimation accuracy, along with the impact of adding response time as a collateral source of information. Results indicated that trait estimation accuracy using item family model-based item parameters differed only slightly from using known item parameters. Somewhat larger trait estimation errors resulted from using cognitive complexity features to predict item parameters. Further, adding response times to the model resulted in more accurate trait estimation for tests with lower item difficulty levels (e.g., achievement tests). Implications for item generation and response processes aspect of validity are discussed.

Few and Different: Detecting Examinees With Preknowledge Using Extended Isolation Forests

Applied Psychological Measurement, Ahead of Print.
Item preknowledge refers to the case where examinees have advanced knowledge of test material prior to taking the examination. When examinees have item preknowledge, the scores that result from those item responses are not true reflections of the examinee’s proficiency. Further, this contamination in the data also has an impact on the item parameter estimates and therefore has an impact on scores for all examinees, regardless of whether they had prior knowledge. To ensure the validity of test scores, it is essential to identify both issues: compromised items (CIs) and examinees with preknowledge (EWPs). In some cases, the CIs are known, and the task is reduced to determining the EWPs. However, due to the potential threat to validity, it is critical for high-stakes testing programs to have a process for routinely monitoring for evidence of EWPs, often when CIs are unknown. Further, even knowing that specific items may have been compromised does not guarantee that any examinees had prior access to those items, or that those examinees that did have prior access know how to effectively use the preknowledge. Therefore, this paper attempts to use response behavior to identify item preknowledge without knowledge of which items may or may not have been compromised. While most research in this area has relied on traditional psychometric models, we investigate the utility of an unsupervised machine learning algorithm, extended isolation forest (EIF), to detect EWPs. Similar to previous research, the response behavior being analyzed is response time (RT) and response accuracy (RA).

Semi-Parametric Item Response Theory With O’Sullivan Splines for Item Responses and Response Time

Applied Psychological Measurement, Ahead of Print.
Response time (RT) has been an essential resource for supplementing the estimation accuracy of latent traits and item parameters in educational testing. Most item response theory (IRT) approaches are based on parametric RT models. However, since test takers may alter their behaviors during a test due to motivation or strategy shifts, fatigue, or other causes, parametric IRT models are unlikely to capture such subtle and nonlinear information. In this work, we propose a novel semi-parametric IRT model with O’Sullivan splines to accommodate the flexible mean RT shapes and explore the underlying nonlinear relationships between latent traits and RT. A simulation study was conducted to demonstrate the substantial improvement in parameter estimation achieved by the new model, as well as the detriment of using parametric models in terms of biases and measurement errors. Using this model, a dataset of mathematics test scores and RT from the Programme for International Student Assessment was analyzed to demonstrate the evident nonlinearity and to compare the proposed model with existing models in terms of model fitting. The findings presented in this study indicate the promising nature of the new approach, suggesting its potential as an additional psychometric tool to enhance test reliability and reduce measurement errors.

Compound Optimal Design for Online Item Calibration Under the Two-Parameter Logistic Model

Applied Psychological Measurement, Ahead of Print.
Under the theory of sequential design, compound optimal design with two optimality criteria can be used to solve the problem of efficient calibration of item parameters of item response theory model. In order to efficiently calibrate item parameters in computerized testing, a compound optimal design is proposed for the simultaneous estimation of item difficulty and discrimination parameters under the two-parameter logistic model, which adaptively focuses on optimizing the parameter which is difficult to estimate. The compound optimal design using the acceptance probability can provide ability design points to optimize the item difficulty and discrimination parameters, respectively. Simulation and real data analysis studies showed that the compound optimal design outperformed than the D-optimal and random design in terms of the recovery of both discrimination and difficulty parameters.

Comparing Approaches to Estimating Person Parameters for the MUPP Model

Applied Psychological Measurement, Ahead of Print.
This study compared maximum a posteriori (MAP), expected a posteriori (EAP), and Markov Chain Monte Carlo (MCMC) approaches to computing person scores from the Multi-Unidimensional Pairwise Preference Model. The MCMC approach used the No-U-Turn sampling (NUTS). Results suggested the EAP with fully crossed quadrature and the NUTS outperformed the others when there were fewer dimensions. In addition, the NUTS produced the most accurate estimates in larger dimension conditions. The number of items per dimension had the largest effect on person parameter recovery.

Application of Bayesian Decision Theory in Detecting Test Fraud

Applied Psychological Measurement, Ahead of Print.
This article suggests a new approach based on Bayesian decision theory (e.g., Cronbach & Gleser, 1965; Ferguson, 1967) for detection of test fraud. The approach leads to a simple decision rule that involves the computation of the posterior probability that an examinee committed test fraud given the data. The suggested approach was applied to a real data set that involved actual test fraud.

An Experimental Design to Investigate Item Parameter Drift

Applied Psychological Measurement, Ahead of Print.
Methods for detecting item parameter drift may be inadequate when every exposed item is at risk for drift. To address this scenario, a strategy for detecting item parameter drift is proposed that uses only unexposed items deployed in a stratified random method within an experimental design. The proposed method is illustrated by investigating unexpected score increases on a high-stakes licensure exam. Results for this example were suggestive of item parameter drift but not significant at the .05 level.

Adaptive Measurement of Change in the Context of Item Parameter Drift

Applied Psychological Measurement, Ahead of Print.
Adaptive measurement of change (AMC) uses computerized adaptive testing (CAT) to measure and test the significance of intraindividual change on one or more latent traits. The extant AMC research has so far assumed that item parameter values are constant across testing occasions. Yet item parameters might change over time, a phenomenon termed item parameter drift (IPD). The current study examined AMC’s performance in the context of IPD with unidimensional, dichotomous CATs across two testing occasions. A Monte Carlo simulation revealed that AMC false and true positive rates were primarily affected by changes in the difficulty parameter. False positive rates were related to the location of the drift items relative to the latent trait continuum, as the administration of more drift items spuriously increased the magnitude of estimated trait change. Moreover, true positive rates depended upon an interaction between the direction of difficulty parameter drift and the latent trait change trajectory. A follow-up simulation further showed that the number of items in the CAT with parameter drift impacted AMC false and true positive rates, with these relationships moderated by IPD characteristics and the latent trait change trajectory. It is recommended that test administrators confirm the absence of IPD prior to using AMC for measuring intraindividual change with educational and psychological tests.
❌
❌