
Applied Psychological Measurement
SSCI-ISI SCOPUS (1977-2023)
0146-6216
1552-3497
Mỹ
Cơ quản chủ quản: SAGE Publications Inc.
Các bài báo tiêu biểu
The CES-D scale is a short self-report scale designed to measure depressive symptomatology in the general population. The items of the scale are symptoms associated with depression which have been used in previously validated longer scales. The new scale was tested in household interview surveys and in psychiatric settings. It was found to have very high internal consistency and adequate test- retest repeatability. Validity was established by pat terns of correlations with other self-report measures, by correlations with clinical ratings of depression, and by relationships with other variables which support its construct validity. Reliability, validity, and factor structure were similar across a wide variety of demographic characteristics in the general population samples tested. The scale should be a useful tool for epidemiologic studies of de pression.
A structural equation model is described that permits estimation of the reliability index and coefficient of a composite test for congeneric measures. The method is also helpful in exploring the factorial structure of an item set, and its use in scale reliability estimation and development is illustrated. The modeling. estimator of composite reliability it yields does not possess the general underestimation property of Cronbach's coefficient a.
This paper describes an attempt to construct a measuring instrument for loneliness that meets the cri teria of a Rasch scale. Rasch (1960, 1966) proposed a latent trait model for the unidimensional scaling of di chotomous items that does not suffer from the inade quacies of classical approaches. The resulting Rasch scale of this study, which is based on data from 1,201 employed, disabled, and jobless adults, consists of five positive and six negative items. The positive items assess feelings of belongingness, whereas the negative items apply to three separate aspects of miss ing relationships. The techniques for testing the as sumptions underlying the Rasch model are compared with their counterparts from classical test theory, and the implications for the methodology of scale con struction are discussed.
Unidimensional item response theory (IRT) has be come widely used in the analysis and equating of edu cational achievement tests. If an IRT model is true, item responses must be locally independent when the trait is held constant. This paper presents several mea sures of local dependence that are used in conjunction with the three-parameter logistic model in the analysis of unidimensional and two-dimensional simulated data and in the analysis of three mathematics achievement tests at Grades 3 and 6. The measures of local depen dence (called Q2 and Q3) were useful for identifying subsets of items that were influenced by the same fac tors (simulated data) or that had similar content (real data). Item pairs with high Q2 or Q3 values tended to have similar item parameters, but most items with similar item parameters did not have high Q2 or Q3 values. Sets of locally dependent items tended to be difficult and discriminating if the items involved an accumulation of the skills involved in the easier items in the rest of the test. Locally dependent items that were independent of the other items in the test did not have unusually high or low difficulties or discrimina tions. Substantial unsystematic errors of equating were found from the equating of tests involving collections of different dimensions, but substantial systematic er rors of equating were only found when the two tests measured quite different dimensions that were presum ably taught sequentially.
New goodness-of-fit indices are introduced for dichotomous item response theory (IRT) models. These indices are based on the likelihoods of number-correct scores derived from the IRT model, and they provide a direct comparison of the modeled and observed frequencies for correct and incorrect responses for each number-correct score. The behavior of Pearson’s X2 ( S- X2) and the likelihood ratio G2 ( S- G2) was assessed in a simulation study and compared with two fit indices similar to those currently in use ( Q1- X2 and Q1- G2). The simulations included three conditions in which the simulating and fitting models were identical and three conditions involving model misspecification. S- X2 performed well, with Type I error rates close to the expected .05 and .01 levels. Performance of this index improved with increased test length. S- G2 tended to reject the null hypothesis too often, as did Q1- X2 and Q1- G2. The power of S- X2 appeared to be similar for all test lengths, but varied depending on the type of model misspecification.
A model is proposed that combines the theoret ical strength of the Rasch model with the heuristic power of latent class analysis. It assumes that the Rasch model holds for all persons within a latent class, but it allows for different sets of item parameters between the latent classes. An estima tion algorithm is outlined that gives conditional maximum likelihood estimates of item parameters for each class. No a priori assumption about the item order in the latent classes or the class sizes is required. Application of the model is illustrated, both for simulated data and for real data.
The assumption of local independence is central to all item response theory (IRT) models. Violations can lead to inflated estimates of reliability and problems with construct validity. For the most widely used fit statistic Q3, there are currently no well-documented suggestions of the critical values which should be used to indicate local dependence (LD), and for this reason, a variety of arbitrary rules of thumb are used. In this study, an empirical data example and Monte Carlo simulation were used to investigate the different factors that can influence the null distribution of residual correlations, with the objective of proposing guidelines that researchers and practitioners can follow when making decisions about LD during scale development and validation. A parametric bootstrapping procedure should be implemented in each separate situation to obtain the critical value of LD applicable to the data set, and provide example critical values for a number of data structure situations. The results show that for the Q3 fit statistic, no single critical value is appropriate for all situations, as the percentiles in the empirical null distribution are influenced by the number of items, the sample size, and the number of response categories. Furthermore, the results show that LD should be considered relative to the average observed residual correlation, rather than to a uniform value, as this results in more stable percentiles for the null distribution of an adjusted fit statistic.
The commonly used form of r wg. (J) can display irregular behavior, so four variants of this index were examined. An alternative index, r* wg. J, is recommended. This index is an inverse linear function of the ratio of the average obtained variance to the variance of uniformly distributed random error. r* wg.Jis superficially similar to Cronbach’s α, but careful examination confirms that r* wg.Jis an index of agreement, not reliability. Based on an examination of the small-sample behavior of r wgand r* wg.J, sample sizes of 10 or more raters are recommended.