Reading view

There are new articles available, click to refresh the page.

Unsupervised [randomly responding] survey bot detection: In search of high classification accuracy.

Psychological Methods, Mar 10, 2025, No Pagination Specified; doi:10.1037/met0000746

While online survey data collection has become popular in the social sciences, there is a risk of data contamination by computer-generated random responses (i.e., bots). Bot prevalence poses a significant threat to data quality. If deterrence efforts fail or were not set up in advance, researchers can still attempt to detect bots already present in the data. In this research, we study a recently developed algorithm to detect survey bots. The algorithm requires neither a measurement model nor a sample of known humans and bots; thus, it is model agnostic and unsupervised. It involves a permutation test under the assumption that Likert-type items are exchangeable for bots, but not humans. While the algorithm maintains a desired sensitivity for detecting bots (e.g., 95%), its classification accuracy may depend on other inventory-specific or demographic factors. Generating hypothetical human responses from a well-known item response theory model, we use simulations to understand how classification accuracy is affected by item properties, the number of items, the number of latent factors, and factor correlations. In an additional study, we simulate bots to contaminate real human data from 35 publicly available data sets to understand the algorithm’s classification accuracy under a variety of real measurement instruments. Through this work, we identify conditions under which classification accuracy is around 95% or above, but also conditions under which accuracy is quite low. In brief, performance is better with more items, more categories per item, and a variety in the difficulty or means of the survey items. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

A tutorial on estimating dynamic treatment regimes from observational longitudinal data using lavaan.

Psychological Methods, Mar 06, 2025, No Pagination Specified; doi:10.1037/met0000748

Psychological and behavioral scientists develop interventions toward addressing pressing societal challenges. But such endeavors are complicated by treatments that change over time as individuals’ needs and responses evolve. For instance, students initially in a multiyear mentoring program to improve future academic outcomes may not continue with the program after interim school engagement improves. Conventional interventions bound by rigid treatment assignments cannot adapt to such time-dependent heterogeneity, thus undermining the interventions’ practical relevance and leading to inefficient implementations. Dynamic treatment regimes (DTRs) are a class of interventions that are more tailored, relevant, and efficient than conventional interventions. DTRs, an established approach in the causal inference and personalized medicine literature, are designed to address the causal query: how can individual treatment assignments in successive time points be adapted, based on time-evolving responses, to optimize the intervention’s effectiveness? This tutorial offers an accessible introduction to DTRs using a simple example from the psychology literature. We describe how, using observational data from a single naturally occurring longitudinal study, to estimate the outcomes had different DTRs been counterfactually implemented. To improve accessibility, we implement the estimation procedure in lavaan, a freely available statistical software popular in psychology and social science research. We hope this tutorial guides researchers on framing, interpreting, and testing DTRs in their investigations. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Impact of temporal order selection on clustering intensive longitudinal data based on vector autoregressive models.

Psychological Methods, Mar 03, 2025, No Pagination Specified; doi:10.1037/met0000747

When multivariate intensive longitudinal data are collected from a sample of individuals, the model-based clustering (e.g., vector autoregressive [VAR] based) approach can be used to cluster the individuals based on the (dis)similarity of their person-specific dynamics of the studied processes. To implement such clustering procedures, one needs to set the temporal order to be identical for all individuals; however, between-individual differences on temporal order have been evident for psychological and behavioral processes. One existing method is to apply the most complex structure or the highest order (HO) for all processes, while the other is to use the most parsimonious structure or the lowest order (LO). Up to date, the impact of these methods has not been well studied. In our simulation study, we examined the performance of HO and LO in conjunction with Gaussian mixture model (GMM) and k-means algorithms when a two-step VAR-based clustering procedure is implemented across various data conditions. We found that (a) the LO outperformed the HO in cluster identification, (b) the HO was more favorable than the LO in estimation of cluster-specific dynamics, (c) the GMM generally outperformed the k-means, and (d) the LO in conjunction with the GMM produced the best cluster identification outcome. We demonstrated the uses of the VAR-based clustering technique using the data collected from the “How Nuts are the Dutch” project. We then discussed the results from all our analyses, limitations of our study, and direction for future research, and meanwhile offered our recommendations on the empirical uses of the model-based clustering techniques. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Experiments in daily life: When causal within-person effects do (not) translate into between-person differences.

Psychological Methods, Mar 03, 2025, No Pagination Specified; doi:10.1037/met0000741

Intensive longitudinal designs allow researchers to study the dynamics of psychological processes in daily life. Yet, because these methods are usually observational, they do not allow strong causal inferences. A promising solution is to incorporate (micro-)randomized interventions within intensive longitudinal designs to uncover within-person (Wp) causal effects. However, it remains unclear whether (or how) the resulting Wp causal effects translate into between-person (Bp) differences in outcomes. In this work, we show analytically and using simulated data that Wp causal effects translate into Bp differences if there are no counteracting forces that modulate this cross-level translation. Three possible counteracting forces that we consider here are (a) contextual effects, (b) correlated random effects, and (c) cross-level interactions. We illustrate these principles using empirical data from a 10-day microrandomized mindfulness intervention study (n = 91), in which participants were randomized to complete a treatment or control task at each occasion. We conclude by providing recommendations regarding the design of microrandomized experiments in intensive longitudinal designs, as well as the statistical analyses of data resulting from these designs. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Unidim: An index of scale homogeneity and unidimensionality.

Psychological Methods, Mar 03, 2025, No Pagination Specified; doi:10.1037/met0000729

How to evaluate how well a psychological scale measures just one construct is a recurring problem in assessment. We introduce an index, u, of the unidimensionality and homogeneity of a scale. u is just the product of two other indices: τ (a measure of τ equivalence) and ρc (a measure of congeneric fit). By combining these two indices into one, we provide a simple index of the unidimensionality and homogeneity of a scale. We evaluate u through simulations and with real data sets. Simulations of u across one-factor scales ranging from three to 24 items with various levels of factor homogeneity show that τ and, therefore, u are sensitive to the degree of factor homogeneity. Additional tests with multifactorial scales representing 9, 18, 27, and 36 items with a hierarchical factor structure varying in a general factor loading show that ρc and, therefore, u are sensitive to the general factor saturation of a test. We also demonstrate the performance of u on 45 different publicly available personality and ability measures. Comparisons with traditional measures (i.e., ωh, α, ωt, comparative fit index, and explained common variance) show that u has greater sensitivity to unidimensional structure and less sensitivity to the number of items in a scale. u is easily calculated with open source statistical packages and is relatively robust to sample sizes ranging from 100 to 5,000. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Yes stormtrooper, these are the droids you are looking for: Identifying and preliminarily evaluating bot and fraud detection strategies in online psychological research.

Psychological Methods, Mar 03, 2025, No Pagination Specified; doi:10.1037/met0000724

Bots (i.e., automated software programs that perform various tasks) and fraudulent responders pose a growing and costly threat to psychological research as well as affect data integrity. However, few studies have been published on this topic. (a) Describe our experience with bots and fraudulent responders using a case study, (b) present various bot and fraud detection tactics (BFDTs) and identify the number of suspected bot and fraudulent respondents removed, (c) propose a consensus confidence system for eliminating bots and fraudulent responders to determine the number of BFDTs researchers should use, and (d) examine the initial effectiveness of dynamic versus static BFDT protocols. This study is part of a larger 14-day experience sampling method study with trauma-exposed sexual minority cisgender women and transgender and/or nonbinary people. Faced with several bot and fraudulent responder infiltrations during data collection, we developed an evolving BFDT protocol to eliminate bots and fraudulent responders. Throughout this study, we received 24,053 responses on our baseline survey. After applying our BFDT protocols, we eliminated 99.75% of respondents that were likely bots or fraudulent responders. Some BFDTs seemed to be more effective and afford higher confidence than others, dynamic protocols seemed to be more effective than static protocols, and bots and fraudulent responders introduced significant bias in the results. This study advances online psychological research by curating one of the largest samples of bot and fraudulent respondents and pilot testing the largest number of BFDTs to date. Recommendations for future research are provided. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Network science in psychology.

Psychological Methods, Mar 03, 2025, No Pagination Specified; doi:10.1037/met0000745

Social network analysis can answer research questions such as why or how individuals interact or form relationships and how those relationships impact other outcomes. Despite the breadth of methods available to address psychological research questions, social network analysis is not yet a standard practice. To promote the use of social network analysis in psychological research, we present an overview of network methods, situating each method within the context of research studies and questions in psychology. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Iterated community detection in psychological networks.

Psychological Methods, Mar 03, 2025, No Pagination Specified; doi:10.1037/met0000744

Psychological network models often feature communities: subsets of nodes that are more densely connected to themselves than to other nodes. The Spinglass algorithm is a popular method of detecting communities within a network, but it is a nondeterministic algorithm, meaning that the results can vary from one iteration to the next. There is no established method for determining the optimal solution or for evaluating instability across iterations in the emerging discipline of network psychometrics. We addressed this need by introducing and evaluating iterated community detection: Spinglass (IComDetSpin), a method for aggregating across multiple Spinglass iterations to identify the most frequent solution and quantify and visualize the instability of the solution across iterations. In two simulation studies, we evaluated (a) the performance of IComDetSpin in identifying the true community structure and (b) information about the fuzziness of community boundaries; information that is not available with a single iteration of Spinglass. In Study 1, IComDetSpin outperformed single-iteration Spinglass in identifying the true number of communities and performed comparably to Walktrap. In Study 2, we extended our evaluation to networks estimated from simulated data and found that both IComDetSpin and Exploratory Graph Analysis (a well-established community detection method in network psychometrics) performed well and that IComDetSpin outperformed Exploratory Graph Analysis when correlations between communities were high and number of nodes per community was lower (5 vs. 10). Overall, IComDetSpin improved the performance of Spinglass and provided unique information about the stability of community detection results and fuzziness in community structure. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Erroneous generalization—Exploring random error variance in reliability generalizations of psychological measurements.

Psychological Methods, Feb 27, 2025, No Pagination Specified; doi:10.1037/met0000740

Reliability generalization (RG) studies frequently interpret meta-analytic heterogeneity in score reliability as evidence of differences in an instrument’s measurement quality across administrations. However, such interpretations ignore the fact that, under classical test theory, score reliability depends on two parameters: true score variance and error score variance. True score variance refers to the actual variation in the trait we aim to measure, while error score variance refers to nonsystematic variation arising in the observed, manifest variable. If the error score variance remains constant, variations in true score variance can result in heterogeneity in reliability coefficients. While this argument is not new, we argue that current approaches to addressing this issue in the RG literature are insufficient. Instead, we propose enriching an RG study with Boot-Err: Explicitly modeling the error score variance using bootstrapping and meta-analytic techniques. Through a comprehensive simulation scheme, we demonstrate that score reliability can vary while the measuring quality remains unaffected. The simulation also illustrates how explicitly modeling error score variances may improve inferences concerning random measurement error and under which conditions such enhancements occur. Furthermore, using openly available direct replication data, we show how explicitly modeling error score variance allows for an assessment to what extent measurement quality can be described as identical across administration sites. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Improving the probability of reaching correct conclusions about congruence hypotheses: Integrating statistical equivalence testing into response surface analysis.

Psychological Methods, Feb 24, 2025, No Pagination Specified; doi:10.1037/met0000743

Many psychological theories imply that the degree of congruence between two variables (e.g., self-rated and objectively measured intelligence) is related to some psychological outcome (e.g., life satisfaction). Such congruence hypotheses can be tested with response surface analysis (RSA), in which a second-order polynomial regression model is estimated and suitably interpreted. Whereas several strategies exist for this interpretation, they all contain rationales that diminish the probability of drawing correct conclusions. For example, a frequently applied strategy involves calculating six auxiliary parameters from the estimated regression weights and accepting the congruence hypothesis if they satisfy certain conditions. In testing the conditions, a nonsignificant null-hypothesis test of some parameters is taken as evidence that the parameter is zero. This interpretation is formally inadmissible and adversely affects the probability of making correct decisions about the congruence hypothesis. We address this limitation of the six-parameter strategy and other RSA strategies by proposing that statistical equivalence testing (SET) be integrated into RSA. We compare the existing and new RSA strategies with a simulation study and find that the SET strategies are sensible alternatives to the existing strategies. We provide code templates for implementing the SET strategies. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Is a less wrong model always more useful? Methodological considerations for using ant colony optimization in measure development.

Psychological Methods, Feb 20, 2025, No Pagination Specified; doi:10.1037/met0000734

With the advancement of artificial intelligence (AI), many AI-derived techniques have been adapted into psychological and behavioral science research, including measure development, which is a key task for psychometricians and methodologists. Ant colony optimization (ACO) is an AI-derived metaheuristic algorithm that has been integrated into the structural equation modeling framework to search for optimal (or near optimal) solutions. ACO-driven measurement modeling is an emerging method for constructing scales, but psychological researchers generally lack the necessary understanding of ACO-optimized models and outcome solutions. This article aims to investigate whether ACO solutions are indeed optimal and whether the optimized measurement models of ACO are always more psychologically useful compared to conventional ones built by human psychometricians. To work toward these goals, we highlight five essential methodological considerations for using ACO in measure development: (a) pursuing a local or global optimum, (b) avoiding a subjective optimum, (c) optimizing content validity, (d) bridging the gap between theory and model, and (e) recognizing limitations of unidirectionality. A joint data set containing item-level data from German (n = 297) and the United States (n = 334) samples was employed, and seven illustrative ACO analyses with various configurations were conducted to illustrate or facilitate the discussions of these considerations. We conclude that measurement solutions from the current ACO have not yet become optimal or close to optimal, and the optimized measurement models of ACO may be becoming more useful. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Evaluating statistical fit of confirmatory bifactor models: Updated recommendations and a review of current practice.

Psychological Methods, Feb 20, 2025, No Pagination Specified; doi:10.1037/met0000730

Confirmatory bifactor models have become very popular in psychological applications, but they are increasingly criticized for statistical pitfalls such as tendency to overfit, tendency to produce anomalous results, instability of solutions, and underidentification problems. In part to combat this state of affairs, many different reliability and dimensionality measures have been proposed to help researchers evaluate the quality of the obtained bifactor solution. However, in empirical practice, the evaluation of bifactor models is largely based on structural equation model fit indices. Other critical indicators of solution quality, such as patterns of general and group factor loadings, whether all estimates are interpretable, and values of reliability coefficients, are often not taken into account. In addition, in the methodological literature, some confusion exists about the appropriate interpretation and application of some bifactor reliability coefficients. In this article, we accomplish several goals. First, we review reliability coefficients for bifactor models and their correct interpretations, and we provide expectations for their values. Second, to help steer researchers away from structural equation model fit indices and to improve current practice, we provide a checklist for evaluating the statistical fit of bifactor models. Third, we evaluate the state of current practice by examining 96 empirical articles employing confirmatory bifactor models across different areas of psychology. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Robust Bayesian meta-regression: Model-averaged moderation analysis in the presence of publication bias.

Psychological Methods, Feb 17, 2025, No Pagination Specified; doi:10.1037/met0000737

Meta-regression is an essential meta-analytic tool for investigating sources of heterogeneity and assessing the impact of moderators. However, existing methods for meta-regression have limitations, such as inadequate consideration of model uncertainty and poor performance under publication bias. To overcome these limitations, we extend robust Bayesian meta-analysis (RoBMA) to meta-regression (RoBMA-regression). RoBMA-regression allows for moderator analyses while simultaneously taking into account the uncertainty about the presence and impact of other factors (i.e., the main effect, heterogeneity, publication bias, and other potential moderators). The methodology presents a coherent way of assessing the evidence for and against the presence of both continuous and categorical moderators. We further employ a Savage–Dickey density ratio test to quantify the evidence for and against the presence of the effect at different levels of categorical moderators. We illustrate RoBMA-regression in an empirical example and demonstrate its performance in a simulation study. We implemented the methodology in the RoBMA R package. Overall, RoBMA-regression presents researchers with a powerful and flexible tool for conducting robust and informative meta-regression analyses. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Information theory, machine learning, and Bayesian networks in the analysis of dichotomous and Likert responses for questionnaire psychometric validation.

Psychological Methods, Feb 17, 2025, No Pagination Specified; doi:10.1037/met0000713

Questionnaire validation is indispensable in psychology and medicine and is essential for understanding differences across diverse populations in the measured construct. While traditional latent factor models have long dominated psychometric validation, recent advancements have introduced alternative methodologies, such as the “network framework.” This study presents a pioneering approach integrating information theory, machine learning (ML), and Bayesian networks (BNs) into questionnaire validation. Our proposed framework considers psychological constructs as complex, causally interacting systems, bridging theories, and empirical hypotheses. We emphasize the crucial link between questionnaire items and theoretical frameworks, validated through the known-groups method for effective differentiation of clinical and nonclinical groups. Information theory measures such as Jensen–Shannon divergence distance and ML for item selection enhance discriminative power while contextually reducing respondent burden. BNs are employed to uncover conditional dependences between items, illuminating the intricate systems underlying psychological constructs. Through this integrated framework encompassing item selection, theory formulation, and construct validation stages, we empirically validate our method on two simulated data sets—one with dichotomous and the other with Likert-scale data—and a real data set. Our approach demonstrates effectiveness in standard questionnaire research and validation practices, providing insights into criterion validity, content validity, and construct validity of the instrument. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Meta-analyzing nonpreregistered and preregistered studies.

Psychological Methods, Feb 17, 2025, No Pagination Specified; doi:10.1037/met0000719

Preregistration is gaining ground in psychology, and a consequence of this is that preregistered studies are more often included in meta-analyses. Preregistered studies likely mitigate the effect of publication bias in a meta-analysis, because preregistered studies can be located in the registries they were registered in even if they do not get published. However, current meta-analysis methods do not take into account that preregistered studies are less susceptible to publication bias. Traditional methods treat all studies as equivalent while meta-analytic conclusions can be improved by taking advantage of preregistered studies. The goal of this article is to introduce the hybrid extended meta-analysis (HYEMA) method that takes into account whether a study is preregistered or not and corrects for publication bias in only the nonpreregistered studies. The proposed method is applied to two meta-analyses on prominent effects in the psychological literature: the red-romance hypothesis and money priming. Applying HYEMA to these meta-analyses shows that the average effect size estimate is substantially closer to zero than the estimate of the random-effects meta-analysis model. Two simulation studies tailored to the two applications are also presented to illustrate the method’s superior performance compared to the random-effects meta-analysis model and precision-effect test and precision-effect estimate with standard error when publication bias is present. Hence, I recommend to apply HYEMA as a sensitivity analysis if a mix of both preregistered and nonpreregistered studies are present in a meta-analysis. R code as well as a web application (https://rcmvanaert.shinyapps.io/HYEMA) have been developed and are described in the article to facilitate application of the method. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Efficient design of cluster randomized trials and individually randomized group treatment trials.

Psychological Methods, Feb 13, 2025, No Pagination Specified; doi:10.1037/met0000727

For cluster randomized trials and individually randomized group treatment trials that compare two treatments on a continuous outcome, designs are presented that minimize the number of subjects or the amount of research budget, when aiming for a desired power level. These designs optimize the treatment-to-control allocation ratio of study participants but also optimize the choice between the number of clusters/groups versus the number of persons per cluster/group. Given that optimal designs require prior knowledge of parameters from the analysis model, which are often unknown during the design stage—especially outcome variances—maximin designs are introduced. These designs ensure a prespecified power level for plausible ranges of the unknown parameters and maximize power for the worst-case values of these parameters. The present study not only reviews but also extends the existing literature by deriving optimal and maximin designs when the number of clusters/groups are fixed because of practical constraints. How to calculate sample sizes in such practical designs and how much budget may be saved are illustrated for an empirical example. To facilitate sample size calculation for each of the variants of the maximin designs considered, an easy-to-use interactive R Shiny app has been developed and made available. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Bayesian inference for evidence accumulation models with regressors.

Psychological Methods, Feb 13, 2025, No Pagination Specified; doi:10.1037/met0000669

Evidence accumulation models (EAMs) are an important class of cognitive models used to analyze both response time and response choice data recorded from decision-making tasks. Developments in estimation procedures have helped EAMs become important both in basic scientific applications and solution-focused applied work. Hierarchical Bayesian estimation frameworks for the linear ballistic accumulator (LBA) model and the diffusion decision model (DDM) have been widely used, but still suffer from some key limitations, particularly for large sample sizes, for models with many parameters, and when linking decision-relevant covariates to model parameters. We extend upon previous work with methods for estimating the LBA and DDM in hierarchical Bayesian frameworks that include random effects that are correlated between people and include regression-model links between decision-relevant covariates and model parameters. Our methods work equally well in cases where the covariates are measured once per person (e.g., personality traits or psychological tests) or once per decision (e.g., neural or physiological data). We provide methods for exact Bayesian inference, using particle-based Markov chain Monte-Carlo, and also approximate methods based on variational Bayesian (VB) inference. The VB methods are sufficiently fast and efficient that they can address large-scale estimation problems, such as with very large data sets. We evaluate the performance of these methods in applications to data from three existing experiments. Detailed algorithmic implementations and code are freely available for all methods. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

How many factors to retain in exploratory factor analysis? A critical overview of factor retention methods.

Psychological Methods, Feb 13, 2025, No Pagination Specified; doi:10.1037/met0000733

Determining the number of factors is a decisive, yet very difficult decision a researcher faces when conducting an exploratory factor analysis (EFA). Over the last decades, numerous so-called factor retention criteria have been developed to infer the latent dimensionality from empirical data. While some tutorials and review articles on EFA exist which give recommendations on how to determine the number of latent factors, there is no comprehensive overview that categorizes the existing approaches and integrates the results of existing simulation studies evaluating the various methods in different data conditions. With this article, we want to provide such an overview enabling (applied) researchers to make an informed decision when choosing a factor retention criterion. Summarizing the most important results from recent simulation studies, we provide guidance when to rely on which method and call for a more thoughtful handling of overly simple heuristics. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Fit indices are insensitive to multiple minor violations of perfect simple structure in confirmatory factor analysis.

Psychological Methods, Feb 13, 2025, No Pagination Specified; doi:10.1037/met0000718

Classic confirmatory factor analysis (CFA) models are theoretically superior to exploratory factor analysis (EFA) models because they specify that each indicator only measures one factor. In contrast, in EFA, all loadings are permitted to be nonzero. In this article, we show that when fit to EFA structures and other models with many cross-loadings, classic CFA models often produce excellent fit. A key requirement for breaking this pattern is to have highly variable ratios of main loadings to corresponding cross-loadings in the true data-generating structure—and strongest misfit results when cross-loadings are of mixed sign. We show mathematically that EFA structures that are rotatable to a CFA representation are those where the main loadings and the cross-loadings are proportional for each group of indicators. With the help of a ShinyApp, we show that unless these proportionality constraints are violated severely in the true data structure, CFA models will fit well to most true models containing many cross-loadings by commonly accepted fit index cutoffs. We also show that fit indices are nonmonotone functions of the number of positive cross-loadings, and the relationship becomes monotone only when cross-loadings are of mixed sign. Overall, our findings indicate that good fit of a CFA model rules out that the true model is an EFA model with highly variable ratios of main and cross-loadings, but does not rule out most other plausible EFA structures. We discuss the implications of these findings. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Relative importance analysis in multiple mediator models.

Psychological Methods, Feb 13, 2025, No Pagination Specified; doi:10.1037/met0000725

Mediation analysis is widely used in psychological research to identify the relationship between independent and dependent variables through mediators. Assessing the relative importance of mediators in parallel mediator models can help researchers better understand mediation effects and guide interventions. The traditional coefficient-based measures of indirect effect merely focus on the partial effect of each mediator, which may reach undesirable results of importance assessment. This study develops a new method of measuring the importance of multiple mediators. Three R² measures of indirect effect proposed by MacKinnon (2008) are extended to parallel mediator models. Dominance analysis, a popular method of evaluating relative importance, is applied to decompose the R² indirect effect and attribute it to each mediator. This offers new measures of indirect effect in terms of relative importance. Both frequentist and Bayesian methods are used to make statistical inference for the dominance measures. Simulation studies investigate the performance of the dominance measures and their inference. A real data example illustrates how the relative importance can be assessed in multiple mediator models. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Reassessing the fitting propensity of factor models.

Psychological Methods, Feb 10, 2025, No Pagination Specified; doi:10.1037/met0000735

Model complexity is a critical consideration when evaluating a statistical model. To quantify complexity, one can examine fitting propensity (FP), or the ability of the model to fit well to diverse patterns of data. The scant foundational research on FP has focused primarily on proof of concept rather than practical application. To address this oversight, the present work joins a recently published study in examining the FP of models that are commonly applied in factor analysis. We begin with a historical account of statistical model evaluation, which refutes the notion that complexity can be fully understood by counting the number of free parameters in the model. We then present three sets of analytic examples to better understand the FP of exploratory and confirmatory factor analysis models that are widely used in applied research. We characterize our findings relative to previously disseminated claims about factor model FP. Finally, we provide some recommendations for future research on FP in latent variable modeling. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Reliability in unidimensional ordinal data: A comparison of continuous and ordinal estimators.

Psychological Methods, Feb 10, 2025, No Pagination Specified; doi:10.1037/met0000739

This study challenges three common methodological beliefs and practices. The first question examines whether ordinal reliability estimators are more accurate than continuous estimators for unidimensional data with uncorrelated errors. Continuous estimators (e.g., coefficient alpha) can be applied to both continuous and ordinal data, while ordinal estimators (e.g., ordinal alpha and categorical omega) are specific to ordinal data. Although ordinal estimators are often argued to have conceptual advantages, comprehensive investigations into their accuracy are limited. The second question explores the relationship between skewness and kurtosis in ordinal data. Previous simulation studies have primarily examined cases where skewness and kurtosis change in the same direction, leaving gaps in understanding their independent effects. The third question addresses item response theory (IRT) models: Should the scaling constant always be fixed at the same value (e.g., 1.7)? To answer these questions, this study conducted a Monte Carlo simulation comparing four continuous estimators and eight ordinal estimators. The results indicated that most estimators achieved acceptable levels of accuracy. On average, ordinal estimators were slightly less accurate than continuous estimators, though the difference was smaller than what most users would consider practically significant (e.g., less than 0.01). However, ordinal alpha stood out as a notable exception, severely overestimating reliability across various conditions. Regarding the scaling constant in IRT models, the results indicated that its optimal value varied depending on the data type (e.g., dichotomous vs. polytomous). In some cases, values below 1.7 were optimal, while in others, values above 1.8 were optimal. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Missing not at random intensive longitudinal data with dynamic structural equation models.

Psychological Methods, Feb 10, 2025, No Pagination Specified; doi:10.1037/met0000742

Intensive longitudinal designs are increasingly popular for assessing moment-to-moment changes in mood, affect, and interpersonal or health behavior. Compliance in these studies is never perfect given the high frequency of data collection, so missing data are unavoidable. Nonetheless, there is relatively little existing research on missing data within dynamic structural equation models, a recently proposed framework for modeling intensive longitudinal data. The few studies that exist tend to focus on methods appropriate for data that are missing at random (MAR). However, missing not at random (MNAR) data are prevalent, particularly when the interest is a sensitive outcome related to mental health, substance use, or sexual behavior. As a motivating example, a study on people with binge eating disorder that has large amounts of missingness in a self-report item related to overeating is considered. Missingness may be high because participants felt shame reporting this behavior, which is a clear case of MNAR and for which methods like multiple imputation and full-information maximum likelihood are less effective. To improve handling of MNAR intensive longitudinal data, embedding a Diggle–Kenward-type MNAR model within a dynamic structural equation model is proposed. This approach is straightforward to apply in popular software like Mplus and only requires a few extra lines of code relative to models that assume MAR. Results from the proposed approach are contrasted with results from models that assume MAR, and a simulation study is conducted to study performance of the proposed model with continuous or binary outcomes. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

The relationship between the phi coefficient and the unidimensionality index H: Improving psychological scaling from the ground up.

Psychological Methods, Feb 10, 2025, No Pagination Specified; doi:10.1037/met0000736

To study the dimensional structure of psychological phenomena, a precise definition of unidimensionality is essential. Most definitions of unidimensionality rely on factor analysis. However, the reliability of factor analysis depends on the input data, which primarily consists of Pearson correlations. A significant issue with Pearson correlations is that they are almost guaranteed to underestimate unidimensionality, rendering them unsuitable for evaluating the unidimensionality of a scale. This article formally demonstrates that the simple unidimensionality index H is always at least as high as, or higher than, the Pearson correlation for dichotomous and polytomous items (φ). Leveraging this inequality, a case is presented where five dichotomous items are perfectly unidimensional, yet factor analysis based on φ incorrectly suggests a two-dimensional solution. To illustrate that this issue extends beyond theoretical scenarios, an analysis of real data from a statistics exam (N = 133) is conducted, revealing the same problem. An in-depth analysis of the exam data shows that violations of unidimensionality are systematic and should not be dismissed as mere noise. Inconsistent answering patterns can indicate whether a participant blundered, cheated, or has conceptual misunderstandings, information typically overlooked by traditional scaling procedures based on correlations. The conclusion is that psychologists should consider unidimensionality not as a peripheral concern but as the foundation for any serious scaling attempt. The index H could play a crucial role in establishing this foundation. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

A peculiarity in psychological measurement practices.

Psychological Methods, Feb 10, 2025, No Pagination Specified; doi:10.1037/met0000731

This essay discusses a peculiarity in institutionalized psychological measurement practices. Namely, an inherent contradiction between guidelines for how scales/tests are developed and how those scales/tests are typically analyzed. Best practices for developing scales/tests emphasize developing individual items or subsets of items to capture unique aspects of constructs, such that the full construct is captured across the test. Analysis approaches, typically factor analysis or related reflective models, assume that no individual item (nor a subset of items) captures unique, construct-relevant variance. This contradiction has important implications for the use of factor analysis to support measurement claims. The implications and other critiques of factor analysis are discussed. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Comparison of two independent populations of compositional data with positive correlations among components using a nested dirichlet distribution.

Psychological Methods, Jan 16, 2025, No Pagination Specified; doi:10.1037/met0000702

Compositional data are multivariate data made up of components that sum to a fixed value. Often the data are presented as proportions of a whole, where the value of each component is constrained to be between 0 and 1 and the sum of the components is 1. There are many applications in psychology and other disciplines that yield compositional data sets including Morris water maze experiments, psychological well-being scores, analysis of daily physical activity times, and components of household expenditures. Statistical methods exist for compositional data and typically consist of two approaches. The first is to use transformation strategies, such as log ratios, which can lead to results that are challenging to interpret. The second involves using an appropriate distribution, such as the Dirichlet distribution, that captures the key characteristics of compositional data, and allows for ready interpretation of downstream analysis. Unfortunately, the Dirichlet distribution has constraints on variance and correlation that render it inappropriate for some applications. As a result, practicing researchers will often resort to standard two-sample t test or analysis of variance models for each variable in the composition to detect differences in means. We show that a recently published method using the Dirichlet distribution can drastically inflate Type I error rates, and we introduce a global two-sample test to detect differences in mean proportion of components for two independent groups where both groups are from either a Dirichlet or a more flexible nested Dirichlet distribution. We also derive confidence interval formulas for individual components for post hoc testing and further interpretation of results. We illustrate the utility of our methods using a recent Morris water maze experiment and human activity data. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Dynamic factor analysis with multivariate time series of multiple individuals: An error-corrected estimation method.

Psychological Methods, Jan 09, 2025, No Pagination Specified; doi:10.1037/met0000722

Intensive longitudinal data, increasingly common in social and behavioral sciences, often consist of multivariate time series from multiple individuals. Dynamic factor analysis, combining factor analysis and time series analysis, has been used to uncover individual-specific processes from single-individual time series. However, integrating these processes across individuals is challenging due to estimation errors in individual-specific parameter estimates. We propose a method that integrates individual-specific processes while accommodating the corresponding estimation error. This method is computationally efficient and robust against model specification errors and nonnormal data. We compare our method with a Naive approach that ignores estimation error using both empirical and simulated data. The two methods produced similar estimates for fixed effect parameters, but the proposed method produced more satisfactory estimates for random effects than the Naive method. The relative advantage of the proposed method was more substantial for short to moderately long time series (T = 56–200). (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Dynamic structural equation modeling with floor effects.

Psychological Methods, Jan 06, 2025, No Pagination Specified; doi:10.1037/met0000720

Intensive longitudinal data analysis, commonly used in psychological studies, often concerns outcomes that have strong floor effects, that is, a large percentage at its lowest value. Ignoring a strong floor effect, using regular analysis with modeling assumptions suitable for a continuous-normal outcome, is likely to give misleading results. This article suggests that two-part modeling may provide a solution. It can avoid potential biasing effects due to ignoring the floor effect. It can also provide a more detailed description of the relationships between the outcome and covariates allowing different covariate effects for being at the floor or not and the value above the floor. A smoking cessation example is analyzed to demonstrate available analysis techniques. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

A causal research pipeline and tutorial for psychologists and social scientists.

Psychological Methods, Jan 06, 2025, No Pagination Specified; doi:10.1037/met0000673

Causality is a fundamental part of the scientific endeavor to understand the world. Unfortunately, causality is still taboo in much of psychology and social science. Motivated by a growing number of recommendations for the importance of adopting causal approaches to research, we reformulate the typical approach to research in psychology to harmonize inevitably causal theories with the rest of the research pipeline. We present a new process which begins with the incorporation of techniques from the confluence of causal discovery and machine learning for the development, validation, and transparent formal specification of theories. We then present methods for reducing the complexity of the fully specified theoretical model into the fundamental submodel relevant to a given target hypothesis. From here, we establish whether or not the quantity of interest is estimable from the data, and if so, propose the use of semi-parametric machine learning methods for the estimation of causal effects. The overall goal is the presentation of a new research pipeline which can (a) facilitate scientific inquiry compatible with the desire to test causal theories (b) encourage transparent representation of our theories as unambiguous mathematical objects, (c) tie our statistical models to specific attributes of the theory, thus reducing under-specification problems frequently resulting from the theory-to-model gap, and (d) yield results and estimates which are causally meaningful and reproducible. The process is demonstrated through didactic examples with real-world data, and we conclude with a summary and discussion of limitations. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Assessing heterogeneous causal effects across clusters in partially nested designs.

Psychological Methods, Dec 30, 2024, No Pagination Specified; doi:10.1037/met0000723

Intervention studies in psychology often have a partially nested design (PND): after individuals are assigned to study arms, individuals in a treatment arm are subsequently assigned to clusters (e.g., therapists/therapy groups) to receive treatment, whereas individuals in a control arm are unclustered. Given the presence of clustering in the treatment arm, it can be of interest to examine the heterogeneity of treatment effects across the clusters; but this is challenging in PNDs. First, in defining a causal effect of treatment for a specific cluster, it is unclear how the treatment and control outcomes should be compared, as the clustering is absent in the control arm. Although it may be tempting to compare outcomes between a specific cluster and the entire control arm, this crude comparison may not represent a causal effect even in PNDs with randomized treatment assignments, as the cluster assignment may be nonrandomized (elaborated in this study). In this study, we develop methods to define, identify, and estimate the causal effects of treatment across specific clusters in a PND where the treatment and/or cluster assignment may be nonrandomized. Using the principal stratification approach and potential outcomes framework, we define causal estimands for the cluster-specific treatment effects in two scenarios: (a) no-interference and (b) within-cluster interference. We identify the effects under the principal ignorability assumption. For estimation, we provide a multiply-robust method that can protect against misspecification in a nuisance model and can incorporate machine learning methods in the nuisance model estimation. We evaluate the estimators’ performance through simulations and illustrate the application using an empirical PND example. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

Dynamic fit index cutoffs for treating likert items as continuous.

Psychological Methods, Dec 30, 2024, No Pagination Specified; doi:10.1037/met0000683

Recent reviews report that about 80% of empirical factor analyses are applied to Likert-type responses and that it is exceedingly common to treat Likert-type item responses as continuous. However, traditional model fit index cutoffs like the root-mean-square error of approximation ≤ .06 or comparative fit index ≥ .95 were derived to have 90+% sensitivity to misspecification with continuous responses. A disconnect therefore emerges whereby traditional methodological guidelines assume continuous responses whereas empirical data often contain Likert-type responses. We provide an illustrative simulation study to show that this disconnect is not innocuous—the sensitivity of traditional cutoffs to misspecification is close to 100% with continuous responses but can fall considerably if 5-point Likert responses are treated as continuous in some conditions. In other conditions, the reverse may occur, and traditional cutoffs may be too strict. Generally, applying traditional cutoffs to Likert-type responses can adversely impact conclusions about fit adequacy. This article aims to address this prevalent issue by extending the dynamic fit index (DFI) framework to accommodate Likert-type responses. DFI is a simulation-based method that was initially intended to address changes in cutoff sensitivity to misspecification because of model characteristics (e.g., number of items, strength of loadings). Here, we propose extending DFI so that it also accounts for data characteristics (e.g., number of Likert scale points, response distribution). Two simulations are included to demonstrate that—with 5-point Likert-type responses—the proposed method (a) improves upon traditional cutoffs, (b) improves upon DFI cutoffs based on multivariate normality, and (c) consistently maintains 90+% sensitivity to misspecification. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

Conditional process analysis for two-instance repeated-measures designs.

Psychological Methods, Dec 30, 2024, No Pagination Specified; doi:10.1037/met0000715

Models where some part of a mediation is moderated (conditional process models) are commonly used in psychology research, allowing for better understanding of when the process by which a focal predictor affects an outcome through a mediator depends on moderating variables. Methodological developments in conditional process analysis have focused on between-subject designs. However, two-instance repeated-measures designs, where each subject is measured twice: once in each of two instances, are also very common. Research on how to statistically test mediation, moderation, and conditional process models in these designs has lagged behind. Judd et al. (2001) introduced a piecewise method for testing for mediation, that Montoya and Hayes (2017) then translated to a path-analytic approach, quantifying the indirect effect. Moderation analysis in these designs has been described by Judd et al. (2001, 1996), and Montoya (2018). The generalization to conditional process analysis remains incomplete. I propose a general conditional process model for two-instance repeated-measures designs with one moderator and one mediator. Simplifications of this general model correspond to more commonly used moderated mediation models, such as first-stage and second-stage conditional process analysis. An applied example shows both how to conduct the analysis using MEMORE, a free and easy-to-use macro for SPSS and SAS, and how to interpret the results of such an analysis. Alternative methods for evaluating moderated mediation in two-instance repeated-measures designs using multilevel approaches are also discussed. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

Regularizing threshold priors with sparse response patterns in Bayesian factor analysis with categorical indicators.

Psychological Methods, Dec 30, 2024, No Pagination Specified; doi:10.1037/met0000682

Using instruments comprising ordered responses to items is ubiquitous for studying many constructs of interest. However, using such an item response format may lead to items with response categories infrequently endorsed or unendorsed completely. In maximum likelihood estimation, this results in nonexisting estimates for thresholds. This work focuses on a Bayesian estimation approach to counter this issue. The issue changes from the existence of an estimate to how to effectively construct threshold priors. The proposed prior specification reconceptualizes the threshold prior as prior to the probability of each response category, which is an easier metric to manipulate while maintaining the necessary ordering constraints on the thresholds. The resulting induced-prior is more communicable, and we demonstrate comparable statistical efficiency with existing threshold priors. Evidence is provided using a simulated data set, a Monte Carlo simulation study, and an example multigroup item-factor model analysis. All analyses demonstrate how at least a relatively informative threshold prior is necessary to avoid inefficient posterior sampling and increase confidence in the coverage rates of posterior credible intervals. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

Bayesian (non)linear random effects mediation models: Evaluating the impact of omitting confounders.

Psychological Methods, Dec 30, 2024, No Pagination Specified; doi:10.1037/met0000721

Often in educational and psychological studies, researchers are interested in understanding the mediation mechanism of longitudinal (repeated measures) variables. Almost all longitudinal mediation models in the literature stem from structural equation modeling framework and hence, cannot directly estimate intrinsically nonlinear functions (e.g., exponential, linear–linear piecewise function with an unknown changepoint) without using reparameterizations. The current study aims to develop a framework of Bayesian (non)linear random effects mediation models, B(N)REMM, to directly model intrinsically linear and nonlinear functions. Specifically, we developed two distinct longitudinal mediation models where all variables under consideration were longitudinal and followed either a linear trend (L-BREMM) or a segmented trend captured by linear–linear piecewise functions with unknown random changepoints (P-BREMM). Additionally, no research has assessed the impact of omitting confounder(s) when modeling mediation effects for intrinsically nonlinear functions. We used an empirical data example from the Early Childhood Longitudinal Study—Kindergarten Cohort to contrast the fit of two models where one included the confounder and the other omitted it. The empirical example illustrated the need to study the impacts of model misspecification with respect to omitting confounder(s). We further explored this issue and its effect on model estimation for both L-BREMM and P-BREMM via Monte Carlo simulation studies under a variety of data conditions. The simulation study results showed that omitting confounder(s) negatively impact parameter recovery for both L-BREMM and P-BREMM but only had an impact on model convergence of P-BREMM. We provide R scripts to estimate both L-BREMM and P-BREMM to aid the dissemination of these models. (PsycInfo Database Record (c) 2024 APA, all rights reserved)
❌