Assessment of fit of item response theory models used in large-scale educational survey assessments

Large-Scale Assessments in Education - Tập 4 Số 1 - 2016
Peter W. van Rijn1, Sandip Sinharay2, Shelby J. Haberman3, M. Johnson4
1ETS Global, Amsterdam, Netherlands
2Pacific Metrics Corporation, Monterey, CA, USA
3ETS, Princeton, NJ, USA
4Columbia University, New York, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

Adams, R. J., Wilson, M. R., & Wang, W. C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1–23.

Allen, N. A., Donoghue, J. R., & Schoeps, T. L. (2001). The NAEP 1998 technical report (NCES 2001-452). Washington, DC: United States Department of Education, Institute of Education Sciences, Department of Education, Office for Educational Research and Improvement.

American Association of Educational Research, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

Beaton, A. E. (1987). Implementing the new design: The NAEP 1983–84 technical report (Tech. Rep. No 15-TR-20). Princeton, NJ: ETS.

Beaton, A. E. (2003). A procedure for testing the fit of IRT models for special populations: Draft. Unpublished manuscript.

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397–479). Reading: Addison-Wesley.

Bock, R. D., & Haberman, S. J. (2009) Confidence bands for examining goodness-of-fit of estimated item response functions. Paper presented at the annual meeting of the Psychometric Society, Cambridge, UK.

Box, G. E. P., & Draper, N. R. (1987). Empirical model-building and response surfaces. New York, NY: Wiley.

Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153–168.

Camilli, G. (1992). A conceptual analysis of differential item functioning in terms of a multidimensional item response model. Applied Psychological Measurement, 16, 129–147.

Cochran, W. G. (1977). Sampling techniques (3rd ed.). New York: Wiley.

Debeer, D., & Janssen, R. (2013). Modeling item-position effects within an IRT framework. Journal of Educational Measurement, 50, 164–185.

Dresher, A. R., & Thind, S. K. (2007). Examination of item fit for individual jurisdictions in NAEP. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.

du Toit, M. (2003). IRT from SSI. Lincolnwood, IL: Scientific Software International.

Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman and Hall.

Gilula, Z., & Haberman, S. J. (1995). Prediction functions for categorical panel data. The Annals of Statistics, 23, 1130–1142.

Gilula, Z., & Haberman, S. J. (1994). Models for analyzing categorical panel data. Journal of the American Statistical Association, 89, 645–656.

Haberman, S. J. (2009). Use of generalized residuals to examine goodness of fit of item response models (ETS Research Report RR-09-15). Princeton: ETS.

Haberman, S. J. (2013). A general program for item-response analysis that employs the stabilized Newton-Raphson algorithm (ETS Research Report RR-13-32). Princeton: ETS.

Haberman, S. J., & Sinharay, S. (2013). Generalized residuals for general models for contingency tables with application to item response theory. Journal of American Statistical Association, 108, 1435–1444.

Haberman, S. J., Sinharay, S., & Chon, K. H. (2013). Assessing item fit for unidimensional item response theory models using residuals from estimated item response functions. Psychometrika, 78, 417–440.

Kirsch, I. S. (2001). The International Adult Literacy Survey (IALS): Understanding what was measured (ETS Research Report RR-01-25). Princeton: ETS.

Li, J. (2005) The effect of accommodations for students with disabilities: An item fit analysis. Paper presented at the Annual meeting of the National Council of Measurement in Education, Montreal, CA.

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading: Addison Wesley.

Martin, M. O., & Kelly, D. L. (1996). Third international mathematics and science study technical report volume 1: Design and development. Chestnut Hill: Boston College.

Mislevy, R. J., Johnson, E. G., & Muraki, E. (1992). Scaling procedures in NAEP. Journal of Educational Statistics, 17, 131–154.

Mullis, I., Martin, M., & Gonzalez, E. (2003). 2003 PIRLS 2001 international report: IEA’s study of reading literacy achievement in primary schools,. Chestnut Hill: Boston College.

Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176.

National Center for Education Statistics. (2009). The nations report card: Mathematics 2009 (Tech. Rep. No. NCES 2010451). Washington, DC: Institute of Education Sciences, U.S. Department of Education.

National Center for Education Statistics. (2011). The nations report card: Science 2009 (Tech. Rep. No. NCES 2011451). Washington, DC: Institute of Education Sciences, U.S. Department of Education.

Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24, 50–64.

Pellegrino, J. W., Jones, L. R., & Mitchell, K. J. (1999). Grading the nation’s report card: Evaluating NAEP and transforming the assessment of educational progress. Washington, DC: National Academy Press.

Perie, M., Grigg, W., & Donahue, P. (2005). The nation’s report card: Reading 2005 (Tech. Rep. No. NCES 2006451). Washington, DC: U.S. Government Printing Office: U.S. Department of Education, National Center for Education Statistics.

Rampey, B. D., Dion, G. S., & Donahue, P. L. (2009). NAEP 2008 trends in academic progress (Tech. Rep. No. NCES 2009479). Washington, DC: National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education.

Rogers, A., Gregory, K., Davis, S., Kulick, E. (2006). Users guide to NAEP model-based p-value programs. Unpublished manuscript. Princeton: ETS.

Sinharay, S. (2006). Bayesian item fit analysis for unidimensional item response theory models. British Journal of Mathematical and Statistical Psychology, 59, 429–449.

Sinharay, S., Guo, Z., von Davier, M., & Veldkamp, B. P. (2010). Assessing fit of latent regression models. IERI Monograph Series, 3, 35–55.

Sinharay, S., & Haberman, S. J. (2014). How often is the misfit of item response theory models practically significant? Educational Measurement: Issues and practice, 33(1), 23–35.

Sinharay, S., Haberman, S. J., & Jia, H. (2011). Fit of item response theory models: A survey of data from several operational tests (ETS Research Report No. RR-11-29). Princeton: ETS.

Von Davier, M., & Sinharay, S. (2014). Analytics in international large-scale assessments: item response theory and population models. In L. Rutkowski, M. Von Davier, & D. Rutkowski (Eds.), Handbook of international large-scale assessment: background, technical issues, and methods of data analysis (pp. 155–174). Boca Raton: CRC.