❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayPsychological Methods - Vol 29, Iss 5

Subgroup discovery in structural equation models.

Psychological Methods, Vol 29(6), Dec 2024, 1025-1045; doi:10.1037/met0000524

Structural equation modeling is one of the most popular statistical frameworks in the social and behavioral sciences. Often, detection of groups with distinct sets of parameters in structural equation models (SEM) are of key importance for applied researchers, for example, when investigating differential item functioning for a mental ability test or examining children with exceptional educational trajectories. In the present article, we present a new approach combining subgroup discoveryβ€”a well-established toolkit of supervised learning algorithms and techniques from the field of computer scienceβ€”with structural equation models termed SubgroupSEM. We provide an overview and comparison of three approaches to modeling and detecting heterogeneous groups in structural equation models, namely, finite mixture models, SEM trees, and SubgroupSEM. We provide a step-by-step guide to applying subgroup discovery techniques for structural equation models, followed by a detailed and illustrated presentation of pruning strategies and four subgroup discovery algorithms. Finally, the SubgroupSEM approach will be illustrated on two real data examples, examining measurement invariance of a mental ability test and investigating interesting subgroups for the mediated relationship between predictors of educational outcomes and the trajectories of math competencies in 5th grade children. The illustrative examples are accompanied by examples of the R package subgroupsem, which is a viable implementation of our approach for applied researchers. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

Distributional causal effects: Beyond an β€œaveragarian” view of intervention effects.

Psychological Methods, Vol 29(6), Dec 2024, 1046-1061; doi:10.1037/met0000533

The usefulness of mean aggregates in the analysis of intervention effectiveness is a matter of considerable debate in the psychological, educational, and social sciences. In addition to studying β€œaverage treatment effects,” the evaluation of β€œdistributional treatment effects,” (i.e., effects that go beyond means), has been suggested to obtain a broader picture of how an intervention affects the study outcome. We continue this discussion by considering distributional causal effects. We present formal definitions of causal effects that go beyond means and utilize a distributional regression framework known as generalized additive models for location, scale, and shape (GAMLSS). GAMLSS allows one to characterize an intervention effect in its totality through simultaneously modeling means, variances, skewnesses, kurtoses, as well as ceiling and floor effects of outcome distributions. Based on data from a large-scale randomized controlled trial, we use GAMLSS to evaluate the impact of a teacher classroom management program on student academic performance. Results suggest the teacher classroom management training increased mean academic competence as well as the chance to obtain the maximum score on the academic competence scale. These effects would have been completely overlooked in a traditional evaluation of mean aggregates. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

Improving hierarchical models of individual differences: An extension of Goldberg’s bass-ackward method.

13 February 2023 at 00:00

Psychological Methods, Vol 29(6), Dec 2024, 1062-1073; doi:10.1037/met0000546

Goldberg’s (2006) bass-ackward approach to elucidating the hierarchical structure of individual differences data has been used widely to improve our understanding of the relationships among constructs of varying levels of granularity. The traditional approach has been to extract a single component or factor on the first level of the hierarchy, two on the second level, and so on, treating the correlations between adjoining levels akin to path coefficients in a hierarchical structure. This article proposes three modifications to the traditional approach with a particular focus on examining associations among all levels of the hierarchy: (a) identify and remove redundant elements that perpetuate through multiple levels of the hierarchy; (b) (optionally) identify and remove artefactual elements; and (c) plot the strongest correlations among the remaining elements to identify their hierarchical associations. Together these steps can offer a simpler and more complete picture of the underlying hierarchical structure among a set of observed variables. The rationale for each step is described, illustrated in a hypothetical example and three basic simulations, and then applied in real data. The results are compared with the traditional bass-ackward approach together with agglomerative hierarchical cluster analysis, and a basic tutorial with code is provided to apply the extended bass-ackward approach in other data. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

Spurious inference in consensus emergence modeling due to the distinguishability problem.

Psychological Methods, Vol 29(6), Dec 2024, 1074-1083; doi:10.1037/met0000511

Researchers use consensus emergence models (CEMs) to detect when the scores of group members become similar over time. The purpose of this article is to review how CEMs often lead to spurious conclusions of consensus emergence due to the problem of distinguishability, or the notion that different data-generating mechanisms sometimes give rise to similar observed data. As a result, CEMs often cannot distinguish between observations generated from true consensus processes versus those generated by stochastic fluctuations. It will be shown that a distinct set of mechanisms, none of which exhibit true consensus, nonetheless yield spurious inferences of consensus emergence when CEMs are fitted to the observed data. This problem is demonstrated via examples and Monte Carlo simulations. Recommendations for future work are provided. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

Comparing random effects models, ordinary least squares, or fixed effects with cluster robust standard errors for cross-classified data.

Psychological Methods, Vol 29(6), Dec 2024, 1084-1099; doi:10.1037/met0000538

Cross-classified random effects modeling (CCREM) is a common approach for analyzing cross-classified data in psychology, education research, and other fields. However, when the focus of a study is on the regression coefficients at Level 1 rather than on the random effects, ordinary least squares regression with cluster robust variance estimators (OLS-CRVE) or fixed effects regression with CRVE (FE-CRVE) could be appropriate approaches. These alternative methods are potentially advantageous because they rely on weaker assumptions than those required by CCREM. We conducted a Monte Carlo Simulation study to compare the performance of CCREM, OLS-CRVE, and FE-CRVE in models, including conditions where homoscedasticity assumptions and exogeneity assumptions held and conditions where they were violated, as well as conditions with unmodeled random slopes. We found that CCREM out-performed the alternative approaches when its assumptions are all met. However, when homoscedasticity assumptions are violated, OLS-CRVE and FE-CRVE provided similar or better performance than CCREM. When the exogeneity assumption is violated, only FE-CRVE provided adequate performance. Further, OLS-CRVE and FE-CRVE provided more accurate inferences than CCREM in the presence of unmodeled random slopes. Thus, we recommend two-way FE-CRVE as a good alternative to CCREM, particularly if the homoscedasticity or exogeneity assumptions of the CCREM might be in doubt. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

Reliable network inference from unreliable data: A tutorial on latent network modeling using STRAND.

Psychological Methods, Vol 29(6), Dec 2024, 1100-1122; doi:10.1037/met0000519

Social network analysis provides an important framework for studying the causes, consequences, and structure of social ties. However, standard self-report measuresβ€”for example, as collected through the popular β€œname-generator” methodβ€”do not provide an impartial representation of such ties, be they transfers, interactions, or social relationships. At best, they represent perceptions filtered through the cognitive biases of respondents. Individuals may, for example, report transfers that did not really occur, or forget to mention transfers that really did. The propensity to make such reporting inaccuracies is both an individual-level and item-level characteristicβ€”variable across members of any given group. Past research has highlighted that many network-level properties are highly sensitive to such reporting inaccuracies. However, there remains a dearth of easily deployed statistical tools that account for such biases. To address this issue, we provide a latent network model that allows researchers to jointly estimate parameters measuring both reporting biases and a latent, underlying social network. Building upon past research, we conduct several simulation experiments in which network data are subject to various reporting biases, and find that these reporting biases strongly impact fundamental network properties. These impacts are not adequately remedied using the most frequently deployed approaches for network reconstruction in the social sciences (i.e., treating either the union or the intersection of double-sampled data as the true network), but are appropriately resolved through the use of our latent network models. To make implementation of our models easier for end-users, we provide a fully documented R package, STRAND, and include a tutorial illustrating its functionality when applied to empirical food/money sharing data from a rural Colombian population. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

Correcting bias in extreme groups design using a missing data approach.

Psychological Methods, Vol 29(6), Dec 2024, 1123-1131; doi:10.1037/met0000508

Extreme groups design (EGD) refers to the use of a screening variable to inform further data collection, such that only participants with the lowest and highest scores are recruited in subsequent stages of the study. It is an effective way to improve the power of a study under a limited budget, but produces biased standardized estimates. We demonstrate that the bias in EGD results from its inherent missing at random mechanism, which can be corrected using modern missing data techniques such as full information maximum likelihood (FIML). Further, we provide a tutorial on computing correlations in EGD data with FIML using R. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

Causal relationships in longitudinal observational data: An integrative modeling approach.

Psychological Methods, Vol 29(6), Dec 2024, 1132-1147; doi:10.1037/met0000648

Much research in psychology relies on data from observational studies that traditionally do not allow for causal interpretation. However, a range of approaches in statistics and computational sciences have been developed to infer causality from correlational data. Based on conceptual and theoretical considerations on the integration of interventional and time-restrainment notions of causality, we set out to design and empirically test a new approach to identify potential causal factors in longitudinal correlational data. A principled and representative set of simulations and an illustrative application to identify early-life determinants of cognitive development in a large cohort study are presented. The simulation results illustrate the potential but also the limitations for discovering causal factors in observational data. In the illustrative application, plausible candidates for early-life determinants of cognitive abilities in 5-year-old children were identified. Based on these results, we discuss the possibilities of using exploratory causal discovery in psychological research but also highlight its limits and potential misuses and misinterpretations. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

Using natural language processing and machine learning to replace human content coders.

Psychological Methods, Vol 29(6), Dec 2024, 1148-1163; doi:10.1037/met0000518

Content analysis is a common and flexible technique to quantify and make sense of qualitative data in psychological research. However, the practical implementation of content analysis is extremely labor-intensive and subject to human coder errors. Applying natural language processing (NLP) techniques can help address these limitations. We explain and illustrate these techniques to psychological researchers. For this purpose, we first present a study exploring the creation of psychometrically meaningful predictions of human content codes. Using an existing database of human content codes, we build an NLP algorithm to validly predict those codes, at generally acceptable standards. We then conduct a Monte-Carlo simulation to model how four dataset characteristics (i.e., sample size, unlabeled proportion of cases, classification base rate, and human coder reliability) influence content classification performance. The simulation indicated that the influence of sample size and unlabeled proportion on model classification performance tended to be curvilinear. In addition, base rate and human coder reliability had a strong effect on classification performance. Finally, using these results, we offer practical recommendations to psychologists on the necessary dataset characteristics to achieve valid prediction of content codes to guide researchers on the use of NLP models to replace human coders in content analysis research. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

Ubiquitous bias and false discovery due to model misspecification in analysis of statistical interactions: The role of the outcome’s distribution and metric properties.

Psychological Methods, Vol 29(6), Dec 2024, 1164-1179; doi:10.1037/met0000532

Studies of interaction effects are of great interest because they identify crucial interplay between predictors in explaining outcomes. Previous work has considered several potential sources of statistical bias and substantive misinterpretation in the study of interactions, but less attention has been devoted to the role of the outcome variable in such research. Here, we consider bias and false discovery associated with estimates of interaction parameters as a function of the distributional and metric properties of the outcome variable. We begin by illustrating that, for a variety of noncontinuously distributed outcomes (i.e., binary and count outcomes), attempts to use the linear model for recovery leads to catastrophic levels of bias and false discovery. Next, focusing on transformations of normally distributed variables (i.e., censoring and noninterval scaling), we show that linear models again produce spurious interaction effects. We provide explanations offering geometric and algebraic intuition as to why interactions are a challenge for these incorrectly specified models. In light of these findings, we make two specific recommendations. First, a careful consideration of the outcome’s distributional properties should be a standard component of interaction studies. Second, researchers should approach research focusing on interactions with heightened levels of scrutiny. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

Regression with reduced rank predictor matrices: A model of trade-offs.

Psychological Methods, Vol 29(6), Dec 2024, 1180-1187; doi:10.1037/met0000512

A regression model of predictor trade-offs is described. Each regression parameter equals the expected change in Y obtained by trading 1 point from one predictor to a second predictor. The model applies to predictor variables that sum to a constant T for all observations; for example, proportions summing to T = 1.0 or percentages summing to T = 100 for each observation. If predictor variables sum to a constant T for all observations and if a least squares solution exists, the predicted values for the criterion variable Y will be uniquely determined, but there will be an infinite set of linear regression weights and the familiar interpretation of regression weights does not apply. However, the regression weights are determined up to an additive constant and thus differences in regression weights Ξ²vβˆ’Ξ²vβˆ— are uniquely determined, readily estimable, and interpretable. Ξ²vβˆ’Ξ²vβˆ— is the expected increase in Y given a transfer of 1 point from variable vβˆ— to variable v. The model is applied to multiple-choice test items that have four response categories, one correct and three incorrect. Results indicate that the expected outcome depends, not just on the student’s number of correct answers, but also on how the student’s incorrect responses are distributed over the three incorrect response types. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

Comparison of noncentral t and distribution-free methods when using sequential procedures to control the width of a confidence interval for a standardized mean difference.

23 December 2024 at 00:00

Psychological Methods, Vol 29(6), Dec 2024, 1188-1208; doi:10.1037/met0000671

sequential stopping rule (SSR) can generate a confidence interval (CI) for a standardized mean difference d that has an exact standardized width, Ο‰. Two methods were tested using a broad range of Ο‰ and standardized effect sizes Ξ΄. A noncentral t (NCt) CI used with normally distributed data had coverages that were nominal at narrow widths but were slightly inflated at wider widths. A distribution-free (Dist-Free) method used with normally distributed data exhibited superior coverage and stopped on average at the expected sample sizes. When used with moderate to severely skewed lognormal distributions, the coverage was too low at large effect sizes even with a very narrow width where Dist-Free was expected to perform well, and the mean stopping sample sizes were absurdly elevated (thousands per group). SSR procedures negatively biased both the raw difference and the β€œunbiased” Hedges’ g in the stopping sample with all methods and distributions. The d was the less biased estimator of Ξ΄ when the distribution was normal. The poor coverage with a lognormal distribution resulted from a large positive bias in d that increased as a function of both Ο‰ and Ξ΄. Coverage and point estimation were little improved by using g instead of d. Increased stopping time resulted from the way an estimate of the variance is calculated when it encounters occasional extreme scores generated from the skewed distribution. The Dist-Free SSR method was superior when the distribution was normal or only slightly skewed but is not recommended with moderately skewed distributions. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

One-tailed tests: Let’s do this (responsibly).

2 November 2023 at 00:00

Psychological Methods, Vol 29(6), Dec 2024, 1209-1218; doi:10.1037/met0000610

When preregistered, one-tailed tests control false-positive results at the same rate as two-tailed tests. They are also more powerful, provided the researcher correctly identified the direction of the effect. So it is surprising that they are not more common in psychology. Here I make an argument in favor of one-tailed tests and address common mistaken objections that researchers may have to using them. The arguments presented here only apply in situations where the test is clearly preregistered. If power is truly as urgent an issue as statistics reformers suggest, then the deliberate and thoughtful use of preregistered one-tailed tests ought to be not only permitted, but encouraged in cases where researchers desire greater power. One-tailed tests are especially well suited for applied questions, replications of previously documented effects, or situations where directionally unexpected effects would be meaningless. Preregistered one-tailed tests can sensibly align the researcher’s stated theory with their tested hypothesis, bring a coherence to the practice of null hypothesis statistical testing, and produce generally more persuasive results. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

Evaluating classification performance: Receiver operating characteristic and expected utility.

Psychological Methods, Vol 29(5), Oct 2024, 827-843; doi:10.1037/met0000515

One primary advantage of receiver operating characteristic (ROC) analysis is considered to be its ability to quantify classification performance independently of factors such as prior probabilities and utilities of classification outcomes. This article argues the opposite. When evaluating classification performance, ROC analysis should consider prior probabilities and utilities. By developing expected utility lines (EU lines), this article shows the connection between a classifier’s ROC curve and expected utility of classification. In particular, EU lines can be used to estimate expected utilities when classifiers operate at any ROC point for any given prior probabilities and utilities. EU lines are useful across all situationsβ€”no matter if one examines a single classifier or compares multiple classifiers, if one compares classifiers’ potential to maximize expected utilities or classifiers’ actual expected utilities, and if the ROC curves are full or partial, continuous or discrete. The connection between ROC and expected utility analyses reveals the common objective underlying these two methods: to maximize expected utility of classification. Particularly, ROC analysis is useful in choosing an optimal classifier and its optimal operating point to maximize expected utility. Yet, choosing a classifier and its operating point (i.e., changing conditional probabilities) is not the only way to increase expected utility. Inspired by parameters involved in estimating expected utility, this article also discusses other approaches to increase expected utility beyond ROC analysis. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

Sample size planning for replication studies: The devil is in the design.

Psychological Methods, Vol 29(5), Oct 2024, 844-867; doi:10.1037/met0000520

Replication is central to scientific progress. Because of widely reported replication failures, replication has received increased attention in psychology, sociology, education, management, and related fields in recent years. Replication studies have generally been assessed dichotomously, designated either a β€œsuccess” or β€œfailure” based entirely on the outcome of a null hypothesis significance test (i.e., p< .05 or p > .05, respectively). However, alternative definitions of success depend on researchers’ goals for the replication. Previous work on alternative definitions for success has focused on the analysis phase of replication. However, the design of the replication is also important, as emphasized with the adage, β€œan ounce of prevention is better than a pound of cure.” One critical component of design often ignored or oversimplified in replication studies is sample size planning, indeed, the details here are crucial. Sample size planning for replication studies should correspond to the method by which success will be evaluated. Researchers have received little guidance, some of which is misguided, on sample size planning for replication goals other than the aforementioned dichotomous null hypothesis significance testing approach. In this article, we describe four different replication goals. Then, we formalize sample size planning methods for each of the four goals. This article aims to provide clarity on the procedures for sample size planning for each goal, with examples and syntax provided to show how each procedure can be used in practice. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

Selecting scaling indicators in structural equation models (sems).

Psychological Methods, Vol 29(5), Oct 2024, 868-889; doi:10.1037/met0000530

It is common practice for psychologists to specify models with latent variables to represent concepts that are difficult to directly measure. Each latent variable needs a scale, and the most popular method of scaling as well as the default in most structural equation modeling (SEM) software uses a scaling or reference indicator. Much of the time, the choice of which indicator to use for this purpose receives little attention, and many analysts use the first indicator without considering whether there are better choices. When all indicators of the latent variable have essentially the same properties, then the choice matters less. But when this is not true, we could benefit from scaling indicator guidelines. Our article first demonstrates why latent variables need a scale. We then propose a set of criteria and accompanying diagnostic tools that can assist researchers in making informed decisions about scaling indicators. The criteria for a good scaling indicator include high face validity, high correlation with the latent variable, factor complexity of one, no correlated errors, no direct effects with other indicators, a minimal number of significant overidentification equation tests and modification indices, and invariance across groups and time. We demonstrate these criteria and diagnostics using two empirical examples and provide guidance on navigating conflicting results among criteria. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

Extending the actor-partner interdependence model to accommodate multivariate dyadic data using latent variables.

10 October 2022 at 00:00

Psychological Methods, Vol 29(5), Oct 2024, 890-918; doi:10.1037/met0000531

This study extends the traditional Actor-Partner Interdependence model (APIM; Kenny, 1996) to incorporate dyadic data with multiple indicators reflecting latent constructs. Although the APIM has been widely used to model interdependence in dyads, the method and its applications have largely been limited to single sets of manifest variables. This article presents three extensions of the APIM that can be applied to multivariate dyadic data; a manifest APIM linking multiple indicators as manifest variables, a composite-score APIM relating univariate sums of multiple variables, and a latent APIM connecting underlying constructs of multiple indicators. The properties of the three methods in analyzing data with various dyadic patterns are investigated through a simulation study. It is found that the latent APIM adequately estimates dyadic relationships and holds reasonable power when measurement reliability is not too low, whereas the manifest APIM yields poor power and high type I error rates in general. The composite-score APIM, even though it is found to be a better alternative to the manifest APIM, fails to correctly reflect latent dyadic interdependence, raising inferential concerns. We illustrate the APIM extensions for multivariate dyadic data analysis by an example study on relationship commitment and happiness among married couples in Wisconsin. In cases where the measures are reliable reflections of psychological constructs, we suggest using the latent APIM for examining research hypotheses that discuss implications beyond observed variables. We conclude with stressing the importance of carefully examining measurement models when designing and conducting dyadic data analyses. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

Estimating and investigating multiple constructs multiple indicators social relations models with and without roles within the traditional structural equation modeling framework: A tutorial.

13 October 2022 at 00:00

Psychological Methods, Vol 29(5), Oct 2024, 919-946; doi:10.1037/met0000534

The present contribution provides a tutorial for the estimation of the social relations model (SRM) by means of structural equation modeling (SEM). In the overarching SEM-framework, the SRM without roles (with interchangeable dyads) is derived as a more restrictive form of the SRM with roles (with noninterchangeable dyads). Starting with the simplest type of the SRM for one latent construct assessed by one manifest round-robin indicator, we show how the model can be extended to multiple constructs each measured by multiple indicators. We illustrate a multiple constructs multiple indicators SEM SRM both with and without roles with simulated data and explain the parameter interpretations. We present how testing the substantial model assumptions can be disentangled from testing the interchangeability of dyads. Additionally, we point out modeling strategies that adhere to cases in which only some members of a group can be differentiated with regards to their roles (i.e., only some group members are noninterchangeable). In the online supplemental materials, we provide concrete examples of specific modeling problems and their implementation into statistical software (Mplus, lavaan, and OpenMx). Advantages, caveats, possible extensions, and limitations in comparison with alternative modeling options are discussed. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

Data-driven covariate selection for confounding adjustment by focusing on the stability of the effect estimator.

Psychological Methods, Vol 29(5), Oct 2024, 947-966; doi:10.1037/met0000564

Valid inference of cause-and-effect relations in observational studies necessitates adjusting for common causes of the focal predictor (i.e., treatment) and the outcome. When such common causes, henceforth termed confounders, remain unadjusted for, they generate spurious correlations that lead to biased causal effect estimates. But routine adjustment for all available covariates, when only a subset are truly confounders, is known to yield potentially inefficient and unstable estimators. In this article, we introduce a data-driven confounder selection strategy that focuses on stable estimation of the treatment effect. The approach exploits the causal knowledge that after adjusting for confounders to eliminate all confounding biases, adding any remaining non-confounding covariates associated with only treatment or outcome, but not both, should not systematically change the effect estimator. The strategy proceeds in two steps. First, we prioritize covariates for adjustment by probing how strongly each covariate is associated with treatment and outcome. Next, we gauge the stability of the effect estimator by evaluating its trajectory adjusting for different covariate subsets. The smallest subset that yields a stable effect estimate is then selected. Thus, the strategy offers direct insight into the (in)sensitivity of the effect estimator to the chosen covariates for adjustment. The ability to correctly select confounders and yield valid causal inferences following data-driven covariate selection is evaluated empirically using extensive simulation studies. Furthermore, we compare the introduced method empirically with routine variable selection methods. Finally, we demonstrate the procedure using two publicly available real-world datasets. A step-by-step practical guide with user-friendly R functions is included. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

Updated guidelines on selecting an intraclass correlation coefficient for interrater reliability, with applications to incomplete observational designs.

1 September 2022 at 00:00

Psychological Methods, Vol 29(5), Oct 2024, 967-979; doi:10.1037/met0000516

Several intraclass correlation coefficients (ICCs) are available to assess the interrater reliability (IRR) of observational measurements. Selecting an ICC is complicated, and existing guidelines have three major limitations. First, they do not discuss incomplete designs, in which raters partially vary across subjects. Second, they provide no coherent perspective on the error variance in an ICC, clouding the choice between the available coefficients. Third, the distinction between fixed or random raters is often misunderstood. Based on generalizability theory (GT), we provide updated guidelines on selecting an ICC for IRR, which are applicable to both complete and incomplete observational designs. We challenge conventional wisdom about ICCs for IRR by claiming that raters should seldom (if ever) be considered fixed. Also, we clarify how to interpret ICCs in the case of unbalanced and incomplete designs. We explain four choices a researcher needs to make when selecting an ICC for IRR, and guide researchers through these choices by means of a flowchart, which we apply to three empirical examples from clinical and developmental domains. In the Discussion, we provide guidance in reporting, interpreting, and estimating ICCs, and propose future directions for research into the ICCs for IRR. (PsycInfo Database Record (c) 2024 APA, all rights reserved)
❌
❌