Educational and Psychological Measurement

  0013-1644

  1552-3888

  Mỹ

Cơ quản chủ quản:  SAGE Publications Inc.

Lĩnh vực:
Applied PsychologyDevelopmental and Educational PsychologyPsychology (miscellaneous)Applied MathematicsEducation

Phân tích ảnh hưởng

Thông tin về tạp chí

 

Educational and Psychological Measurement publishes referred scholarly work from all academic disciplines interested in the study of measurement theory, problems, and issues. Theoretical articles will address new developments and techniques, and applied articles will deal strictly with innovation applications.

Các bài báo tiêu biểu

A Revised Definition for Suppressor Variables: a Guide To Their Identification and Interpretation
Tập 34 Số 1 - Trang 35-46 - 1974
Anthony J. Conger
In the two-predictor situation it is shown that traditional and negative suppressors increase the predictive value of a standard predictor beyond that suggested by the predictor's zero order validity. This effect of suppression is used to provide a revised definition of suppression and completely accounts for traditional and negative suppression. The revised definition, in conjunction with a two factor model, is shown to lead to a previously undetected type of suppression (reciprocal suppression) which occurs when predictors with positive zero order validities are negatively correlated with one another. In terms of the definition and parameters of the model, limits are determined in which the types of suppression can occur. Furthermore, it is shown how suppressors can be identified in multiple regression equations and a procedure is given for interpreting whether the variables are contributing directly (by predicting relevant variance in the criterion) or indirectly (by removing irrelevant variance in another predictor) or both.
Assessing the Reliability of Beck Depression Inventory Scores: Reliability Generalization across Studies
Tập 60 Số 2 - Trang 201-223 - 2000
Ping Yin, Xitao Fan
The reliability estimates for the Beck Depression Inventory (BDI) scores across studies were accumulated and summarized in a meta-analysis. Only 7.5% of the articles reviewed reported meaningful reliability estimates, indicating that the logic of “test score reliability” generally has not prevailed in clinical psychology regarding application of BDI. Analyses revealed that for BDI, the measurement error due to time sampling as captured by test-retest reliability estimate is considerably larger than the measurement error due to item heterogeneity and content sampling as captured by internal consistency reliability estimate. Also, reliability estimates involving substance addicts were consistently lower than reliability estimates involving normal subjects, possibly due to restriction of range problems. Correlation analyses revealed that standard errors of measurement (SEMs) were not correlated with reliability estimates but were substantially related to standard deviations of BDI scores, suggesting that SEMs should be considered in addition to reliability estimates when interpreting individual BDI scores.
The Balanced Inventory of Desirable Responding (BIDR)
Tập 67 Số 3 - Trang 525-544 - 2007
Andrew Li, Jessica Bagger
The Balanced Inventory of Desirable Responding (BIDR) is one of the most widely used social desirability scales. The authors conducted a reliability generalization study to examine the typical reliability coefficients of BIDR scores and explored factors that explained the variability of reliability estimates across studies. The results indicated that the overall BIDR scale produced scores that were adequately reliable. The mean score reliability estimates for the two subscales, Self-Deception Enhancement and Impression Management, were not satisfactory. In addition, although a number of study characteristics were statistically significantly related to reliability estimates, they accounted for only a small portion of the overall variability in reliability estimates. The results of these findings and their implications are also discussed.
My Current Thoughts on Coefficient Alpha and Successor Procedures
Tập 64 Số 3 - Trang 391-418 - 2004
Lee J. Cronbach, Richard J. Shavelson
In 1997, noting that the 50th anniversary of the publication of “Coefficient Alpha and the Internal Structure of Tests” was fast approaching, Lee Cronbach planned what have become the notes published here. His aimwas to point out theways in which his views on coefficient alpha had evolved, doubting nowthat the coefficientwas the bestway of judging the reliability of an instrument to which it was applied. Tracing in these notes, in vintage Cronbach style, his thinking before, during, and after the publication of the alpha paper, his “current thoughts” on coefficient alpha are that alpha covers only a small perspective of the range of measurement uses for which reliability information is needed and that it should be viewed within a much larger system of reliability analysis, generalizability theory.
The Anger Expression (AX) Scale: Correlations with the State-Trait Personality Inventory and Subscale Intercorrelations
Tập 49 Số 2 - Trang 447-455 - 1989
Steven Collins, B. Jo Hailey
Undergraduate students (N = 502) completed the State-Trait Personality Inventory (STPI; Spielberger et al., 1979), and the Anger Expression Scale (AX; Spielberger et al., 1986), composed of three subscales, Anger Out (AX/Out), Anger In (AX/In), and Anger Control (AX/Con), and a total score of Anger Experienced (AX/EX). Correlations between each of the scales and subscales of the personality inventories were calculated and were compared with one another. As predicted, correlations between AX/EX and STPI subscales related to anger and anxiety were significantly higher than that between AX/EX and the STPI Curiosity subscale. Furthermore, the STPI Trait Anger Temperament subscale was more highly correlated with AX/Out than were any of the other personality subscales except STPI Trait Anger. Correlations between subscales of the AX were compared with those obtained by Spielberger et al. (1985) Whereas Spielberger et al. (1985) made a case for the independence of AX/Out and AX/In by presenting correlations of essentially zero for both male and female subjects, the present study showed a significant positive correlation between the two subscales for men, casting some doubt on the independence of the subscales.
Testing the Difference Between Two Alpha Coefficients With Small Samples of Subjects and Raters
Tập 66 Số 4 - Trang 589-600 - 2006
Leonard S. Feldt, Seonghoon Kim
Researchers sometimes need a statistical test of the hypothesis that two values of Cronbach's alpha reliability coefficient are equal. The situation may involve scores from two different measures administered to independent random samples or from the same measure administered to random samples from two different populations. Feldt derived a test that functions well with large or moderate numbers of subjects. However, he validated this test only when the number of parts ( k) of the measurement was fairly large, as it would be if the parts were individual test items. He did not consider instances in which the parts were raters, and hence k would be as small as 2 or 3. In this article, the Feldt test is investigated for such situations. It is found to function quite well in its control of Type I error.
Little Jiffy, Mark Iv
Tập 34 Số 1 - Trang 111-117 - 1974
Henry F. Kaiser, John R. Rice
Reliability and Model Fit
Tập 76 Số 6 - Trang 976-985 - 2016
Leanne Stanley, Michael C. Edwards
The purpose of this article is to highlight the distinction between the reliability of test scores and the fit of psychometric measurement models, reminding readers why it is important to consider both when evaluating whether test scores are valid for a proposed interpretation and/or use. It is often the case that an investigator judges both the reliability of scores and the fit of a corresponding measurement model to be either acceptable or unacceptable for a given situation, but these are not the only possible outcomes. This article focuses on situations in which model fit is deemed acceptable, but reliability is not. Data were simulated based on the item characteristics of the PROMIS (Patient Reported Outcomes Measurement Information System) anxiety item bank and analyzed using methods from classical test theory, factor analysis, and item response theory. Analytic techniques from different psychometric traditions were used to illustrate that reliability and model fit are distinct, and that disagreement among indices of reliability and model fit may provide important information bearing on a particular validity argument, independent of the data analytic techniques chosen for a particular research application. We conclude by discussing the important information gleaned from the assessment of reliability and model fit.
The Effect of Estimation Methods on SEM Fit Indices
Tập 80 Số 3 - Trang 421-445 - 2020
Dexin Shi, Alberto Maydeu‐Olivares
We examined the effect of estimation methods, maximum likelihood (ML), unweighted least squares (ULS), and diagonally weighted least squares (DWLS), on three population SEM (structural equation modeling) fit indices: the root mean square error of approximation (RMSEA), the comparative fit index (CFI), and the standardized root mean square residual (SRMR). We considered different types and levels of misspecification in factor analysis models: misspecified dimensionality, omitting cross-loadings, and ignoring residual correlations. Estimation methods had substantial impacts on the RMSEA and CFI so that different cutoff values need to be employed for different estimators. In contrast, SRMR is robust to the method used to estimate the model parameters. The same criterion can be applied at the population level when using the SRMR to evaluate model fit, regardless of the choice of estimation method.
A Monte Carlo Comparison Study of the Power of the Analysis of Covariance, Simple Difference, and Residual Change Scores in Testing Two-Wave Data
Tập 73 Số 1 - Trang 47-62 - 2013
Yasemin Kisbu‐Sakarya, David P. MacKinnon, Leona S. Aiken
This study compares the analysis of covariance (ANCOVA), difference score, and residual change score methods in testing the group effect for pretest–posttest data in terms of statistical power and Type I error rates using a Monte Carlo simulation. Previous research has mathematically shown the effect of stability of individual scores from pretest to posttest, reliability, and nonrandomization (i.e., pretest imbalance) on the performance of the ANCOVA, difference score, and residual change score methods. However, related power issues have not been adequately addressed. The authors examined the impact of stability of measurement over time, reliability of covariate and criterion, nonrandomization, sample size, and treatment effect size on statistical power of the three methods. Across conditions, ANCOVA and residual change score methods had similar power rates. When reliability was less than perfect, ANCOVA had more power than the difference score method when there was an increase from pretest to posttest and a positive baseline imbalance (i.e., treatment group had higher pretest scores than the control group), or when there was a decrease from pretest to posttest and a negative baseline imbalance, and vice versa. In case of perfect reliability, the statistical power of ANCOVA did not differ from the difference score method. For the difference score method, when reliability was low, there was no effect of stability on power, whereas when reliability was high or perfect, power increased as stability increased for medium and large effect sizes. Difference scores may be preferred over ANCOVA under certain circumstances.