Using country-specific Q-matrices for cognitive diagnostic assessments with international large-scale data

Jolien Delafontaine1, Changsheng Chen1, Jung Yeon Park1, Wim Van Den Noortgate2
1Faculty of Psychology and Educational Science, KU Leuven, Leuven, Belgium
2Imec Research Group ITEC, KU Leuven, Kortrijk, Belgium

Tóm tắt

AbstractIn cognitive diagnosis assessment (CDA), the impact of misspecified item-attribute relations (or “Q-matrix”) designed by subject-matter experts has been a great challenge to real-world applications. This study examined parameter estimation of the CDA with the expert-designed Q-matrix and two refined Q-matrices for international large-scale data. Specifically, the G-DINA model was used to analyze TIMSS data for Grade 8 for five selected countries separately; and the need of a refined Q-matrix specific to the country was investigated. The results suggested that the two refined Q-matrices fitted the data better than the expert-designed Q-matrix, and the stepwise validation method performed better than the nonparametric classification method, resulting in a substantively different classification of students in attribute mastery patterns and different item parameter estimates. The results confirmed that the use of country-specific Q-matrices based on the G-DINA model led to a better fit compared to a universal expert-designed Q-matrix.

Từ khóa


Tài liệu tham khảo

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723. https://doi.org/10.1109/TAC.1974.1100705

Baker, F. B. (2001). The basics of item response theory. Retrieved from http:///ericae.net.irt/baker.

Birenbaum, M., Tatsuoka, C., & Xin, T. (2005). Large-scale diagnostic assessment: Comparison of eighth graders’ mathematics performance in the United States, Singapore and Israel. Assessment in Education: Principles, Policy & Practice, 12(2), 167–181. https://doi.org/10.1080/09695940500143852

Bradshaw, L., Izsák, A., Templin, J., & Jacobson, E. (2014). Diagnosing teachers’ understandings of rational numbers: Building a multidimensional test within the diagnostic classification framework. Educational Measurement: Issues and Practice, 33(1), 2–14. https://doi.org/10.1111/emip.12020

Chen, J. (2017). A residual-based approach to validate Q-matrix specifications. Applied Psychological Measurement, 41(4), 277–293. https://doi.org/10.1177/0146621616686021

Chiu, C. Y. (2013). Statistical refinement of the Q-matrix in cognitive diagnosis. Applied Psychological Measurement, 37(8), 598–618. https://doi.org/10.1177/0146621613488436

Choi, K. M., Lee, Y. S., & Park, Y. S. (2015). What CDM can tell about what students have learned: An analysis of TIMSS eighth grade mathematics. Eurasia Journal Mathematics, Science & Technology Education. https://doi.org/10.12973/eurasia.2015.1421a

de la Torre, J. (2008). An empirically based method of Q-matrix validation for the DINA model: Development and applications. Journal of Educational Measurement, 45(4), 343–362. https://doi.org/10.1111/j.1745-3984.2008.00069.x

de la Torre, J. (2009). DINA model and parameter estimation: A didactic. Journal of Educational and Behavioral Statistics, 34(1), 115–130. https://doi.org/10.3102/1076998607309474

de la Torre, J. (2011). The Generalized DINA model framework. Psychometrika, 76(2), 179–199. https://doi.org/10.1007/s11336-011-9207-7

de la Torre, J., & Chiu, C. Y. (2016). General method of empirical Q-matrix validation. Psychometrika, 81(2), 253–273. https://doi.org/10.1007/s11336-015-9467-8

Desmarais, M. C., & Naceur, R. (2013). A matrix factorization method for mapping items to skills and for enhancing expert-based q-matrices. In: International Conference on Artificial Intelligence in Education (pp. 441–450). Berlin: Springer.

Groß, J., Robitzsch, A., & George, A. C. (2016). Cognitive diagnosis models for baseline testing of educational standards in math. Journal of Applied Statistics, 43(1), 229–243. https://doi.org/10.1080/02664763.2014.1000841

Hagenaars, J. A., & McCutcheon, A. L. (2002). Applied latent class analysis. Cambridge University Press.

Im, S., & Park, H. J. (2010). A comparison of US and Korean students’ mathematics skills using a cognitive diagnostic testing method: Linkage to instruction. Educational Research and Evaluation, 16(3), 287–301. https://doi.org/10.1080/13803611.2010.523294

Jia, B., Zhu, Z., & Gao, H. (2021). International Ccomparative study of statistics learning trajectories based on PISA data on Cognitive Diagnostic Models. Frontiers in Psychology. https://doi.org/10.3389/fpsyg.2021.657858

Johnson, M.S., Lee, Y.S., Park, J.Y., Zhang, Z., & Sachdeva, R. (2013). Comparing attribute distribution across countries: Application to TIMSS 2007 mathematics. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco, CA.

Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric Item Response Theory. Applied Psychological Measurement, 25(3), 258–272. https://doi.org/10.1177/01466210122032064

Jurich, D. P., & Bradshaw, L. P. (2014). An illustration of diagnostic classification modeling in student learning outcomes assessment. International Journal of Testing, 14(1), 49–72. https://doi.org/10.1080/15305058.2013.835728

Köhn, H. F., & Chiu, C. Y. (2016). A procedure for assessing the completeness of the Q-matrices of cognitively diagnostic tests. Psychometrika, 82(1), 112–132. https://doi.org/10.1007/s11336-016-9536-7

Köhn, H. F., & Chiu, C. Y. (2018). How to build a complete Q-matrix for a cognitively diagnostic test. Journal of Classification, 35(2), 273–299. https://doi.org/10.1007/s00357-018-9255-0

Little, R. J. (1988). Missing-data adjustments in large surveys. Journal of Business & Economic Statistics, 6(3), 287–296. https://doi.org/10.2307/1391878

Little, R. J. A., & Rubin, D. B. (2002). Statistical Analysis with Missing Data (2nd ed.). Wiley.

Liu, J. (2015). On the consistency of Q-matrix estimation: A commentary. Psychometrika, 82(2), 523–527. https://doi.org/10.1007/s11336-015-9487-4

Liu, R., Huggins-Manley, A. C., & Bulut, O. (2017). Retrofitting diagnostic classification models to responses from IRT-based assessment forms. Educational and Psychological Measurement, 78(3), 357–383. https://doi.org/10.1177/0013164416685599

Liu, Y., Andersson, B., Xin, T., Zhang, H., & Wang, L. (2019). Improved Wald statistics for item-level model comparison in diagnostic classification models. Applied Psychological Measurement, 43(5), 402–414. https://doi.org/10.1177/0146621618798664

Liu, Y., Tian, W., & Xin, T. (2016). An application of M2 statistic to evaluate the fit of cognitive diagnostic models. Journal of Educational and Behavioral Statistics, 41(1), 3–26. https://doi.org/10.3102/1076998615621293

Ma, W., & de la Torre, J. (2020a). GDINA: An R package for cognitive diagnosis modeling. Journal of Statistical Software, 93(14), 1–26. https://doi.org/10.18637/jss.v093.i14

Ma, W., & de la Torre, J. (2020b). An empirical Q-matrix validation method for the sequential generalized DINA model. British Journal of Mathematical and Statistical Psychology, 73(1), 142–163. https://doi.org/10.1111/bmsp.12156

Maydeu-Olivares, A. (2013). Goodness-of-fit assessment of item response theory Models. Measurement: Interdisciplinary Research & Perspective, 11(3), 71–101. Doi: https://doi.org/10.1080/15366367.2013.831680

Maydeu-Olivares, A., Cai, L., & Hernández, A. (2011). Comparing the fit of item response theory and factor analysis models. Structural Equation Modeling: A Multidisciplinary Journal, 18(3), 333–356. https://doi.org/10.1080/10705511.2011.581993

Maydeu-Olivares, A., & Joe, H. (2014). Assessing approximate fit in categorical data analysis. Multivariate Behavioral Research, 49(4), 305–328. https://doi.org/10.1080/00273171.2014.911075

Mullis, I. V., Martin, M. O., Foy, P., & Arora, A. (2012). TIMSS 2011 international results in mathematics. International Association for the Evaluation of Educational Achievement (IEA). Amsterdam: IEA Secretariat.

Nájera, P., Sorrel, M. A., & Abad, F. J. (2019). Reconsidering cutoff points in the general method of empirical Q-matrix validation. Educational and Psychological Measurement, 79(4), 727–753. https://doi.org/10.1177/0013164418822700

Park, J. Y., Lee, Y. S., & Johnson, M. S. (2017). An efficient standard error estimator of the DINA model parameters when analyzing clustered data. International Journal of Quantitative Research in Education, 4(1/2), 244–264. https://doi.org/10.1504/ijqre.2017.10007548

R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

Ravand, H., & Robitzsch, A. (2015). Cognitive diagnostic modeling using R. Practical Assessment, Research & Evaluation, 20(1), 11. https://doi.org/10.7275/5g6f-ak15

Rubin, D. B. (1986). Statistical matching using file concatenation with adjusted weights and multiple imputations. Journal of Business & Economic Statistics, 4(1), 87–94. https://doi.org/10.2307/1391390

Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. Guilford Press.

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. https://doi.org/10.1214/aos/1176344136

Sedat, ŞE. N., & Arican, M. (2015). A diagnostic comparison of Turkish and Korean students’ mathematics performances on the TIMSS 2011 assessment. Eğitimde Ve Psikolojide Ölçme Ve Değerlendirme Dergisi, 6(2), 238–253. https://doi.org/10.21031/epod.65266

Sessoms, J., & Henson, R. A. (2018). Applications of diagnostic classification models: A literature review and critical commentary. Measurement: Interdisciplinary Research and Perspectives, 16(1), 1–17. https://doi.org/10.1080/15366367.2018.1435104

Tatsuoka, K. K. (1984). Analysis of errors in fraction addition and subtraction problems. Final Report. Retrieved from University of Illinois, Computer-Based Education Research Lab website: https://files.eric.ed.gov/fulltext/ED257665.pdf.

Templin, J. L., & Henson, R. A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11(3), 287–305. https://doi.org/10.1037/1082-989x.11.3.287

Terzi, R., & de la Torre, J. (2018). An iterative method for empirically-based Q-matrix validation. International Journal of Assessment Tools in Education, 5(2), 248–262. https://doi.org/10.21449/ijate.407193

von Davier, M., & Lee, Y. S. (2019). Handbook of diagnostic classification models: Models and model extensions, applications, software packages. Springer Publishing.

Wang, W., Song, L., Ding, S., Meng, Y., Cao, C., & Jie, Y. (2018). An EM-based method for Q-matrix validation. Applied Psychological Measurement, 42(6), 446–459. https://doi.org/10.1177/0146621617752991

Wu, X., Wu, R., Chang, H. H., Kong, Q., & Zhang, Y. (2020). International comparative study on PISA mathematics achievement test based on cognitive diagnostic models. Frontiers in Psychology. https://doi.org/10.3389/fpsyg.2020.02230

Zheng, Y., Chiu, C.-Y., & Douglas, J. (2019). NPCD: Nonparametric methods for cognitive diagnosis; R Package Version 1.0–11. https://CRAN.R-project.org/package=NPCD