Novel item selection strategies for cognitive diagnostic computerized adaptive testing: A heuristic search framework

Xi Cao1, Ying Lin2, Dong Liu3,4, Fudan Zheng5, Henry Been-Lirn Duh1,6
1Department of Computer Science and Information Technology, La Trobe University Melbourne, Australia
2Department of Psychology, Sun Yat-Sen University, Guangzhou, China
3College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
4Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province, Xinxiang, China
5School of Computer Engineering, Guangzhou City University of Technology, Guangzhou, China
6PolyU-NVIDIA Joint Research Centre, Hong Kong Polytechnic University, Hong Kong, China

Tóm tắt

The computerized adaptive form of cognitive diagnostic testing, CD-CAT, has gained increasing attention in the domain of personalized measurements for its ability to categorize individual mastery status of fine-grained attributes more accurately and efficiently through administering items tailored to one’s ability progressively. How to select the next item based on previous response(s) is crucial for the success of CD-CAT. Previous item selection strategies for CD-CAT have often followed a greedy or semi-greedy approach, which makes it difficult to strike a balance between diagnostic performance and item bank utilization. To address this issue, this study takes a graph perspective and transforms the item selection problem in CD-CAT into a path-searching problem, in which paths refer to possible test construction and nodes refer to individual items. A heuristic function is defined to predict the prospect of a path, indicating how well the corresponding test can diagnose the current examinee. Two search mechanisms with different biases towards item exposure control are proposed to approximate the optimal path with the best prospect. The first unused item on the resulting path is selected as the next item. The above components compose a novel CD-CAT item selection framework based on heuristic search. Simulation studies are conducted under a variety of conditions regarding bank designs, bank-quality conditions, and testing scenarios. The results are compared with different types of classic item selection strategies in CD-CAT, showing that the proposed framework can enhance bank utilization at a smaller cost of diagnostic performance.

Tài liệu tham khảo

Boody, B. S., Bhatt, S., Mazmudar, A. S., Hsu, W. K., Rothrock, N. E., & Patel, A. A. (2018). Validation of patient-reported outcomes measurement information system (PROMIS) computerized adaptive tests in cervical spine surgery. Journal of Neurosurgery: Spine, 28(3), 268–279. https://doi.org/10.3171/2017.7.SPINE1766 Chang, H.-H. (2015). Psychometrics behind computerized adaptive testing. Psychometrika, 80, 1–20. https://doi.org/10.1007/s11336-014-9401-5 Chang, H.-H., & Ying, Z. (1999). A-stratified multistage computerized adaptive testing. Applied Psychological Measurement, 23(3), 211–222. https://doi.org/10.1177/01466219922031338 Chen, S.-K., & Cook, K. F. (2009). SIMPOLYCAT: An SAS program for conducting CAT simulation based on polytomous IRT models. Behavior Research Methods, 41, 499–506. https://doi.org/10.3758/brm.41.2.499 Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74, 619. https://doi.org/10.1007/s11336-009-9123-2 Finkelman, M., Kim, W., & Roussos, L. A. (2009). Automated test assembly for cognitive diagnosis models using a genetic algorithm. Journal of Educational Measurement, 46(3), 273–292. https://doi.org/10.1111/j.1745-3984.2009.00081.x Gausden, E. B., Levack, A., Nwachukwu, B. U., Sin, D., Wellman, D. S., & Lorich, D. G. (2018). Computerized adaptive testing for patient reported outcomes in ankle fracture surgery. Foot & Ankle International, 39(10), 1192–1198. https://doi.org/10.1177/1071100718782487 Gibbons, R. D., & de Gruy, F. V. (2019). Without wasting a word: Extreme improvements in efficiency and accuracy using computerized adaptive testing for mental health disorders (CAT-MH). Current Psychiatry Reports, 21, 67. https://doi.org/10.1007/s11920-019-1053-9 Gibbons, R. D., Weiss, D. J., Pilkonis, P. A., Frank, E., Moore, T., Kim, J. B., & Kupfer, D. J. (2012). Development of a computerized adaptive test for depression. Archives of General Psychiatry, 69(11), 1104–1112. https://doi.org/10.1001/archgenpsychiatry.2012.14 Gibbons, R. D., Kupfer, D. J., Frank, E., Lahey, B. B., George-Milford, B. A., Biernesser, C. L., … Brent, D. A. (2020). Computerized adaptive tests for rapid and accurate assessment of psychopathology dimensions in youth. Journal of the American Academy of Child & Adolescent Psychiatry, 59(11), 1264–1273. https://doi.org/10.1016/j.jaac.2019.08.009 Hartz, S. M. (2002). A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality. (Unpublished doctoral dissertation). University of Illinois, Urbana-Champaign. Hemati, S. J., & Baghaei, P. (2020). A cognitive diagnostic modeling analysis of the English reading comprehension section of the Iranian National University Entrance Examination. International Journal of Language Testing, 10(1), 11–32. Henson, R. (2009). Diagnostic classification models: Thoughts and future directions. Measurement: Interdisciplinary Research and Perspectives, 7(1), 34–36. https://doi.org/10.1080/15366360802715395 Henson, R., & Douglas, J. (2005). Test construction for cognitive diagnosis. Applied Psychological Measurement, 29(4), 262–277. https://doi.org/10.1177/0146621604272623 Henson, R., Roussos, L., Douglas, J., & He, X. (2008). Cognitive diagnostic attribute-level discrimination indices. Applied Psychological Measurement, 32(4), 275–288. https://doi.org/10.1177/0146621607302478 Huang, H.-Y. (2018). Effects of item calibration errors on computerized adaptive testing under cognitive diagnosis models. Journal of Classification, 35(3), 437–465. https://doi.org/10.1007/s00357-018-9265-y Huebner, A. (2012). Item overexposure in computerized classification tests using sequential item selection. Practical Assessment, Research, and Evaluation, 17(1), 12. https://doi.org/10.7275/nr1c-yv82 Jiao, H., Lissitz, R. W., & Wie, A. V. (2019). Data analytics and psychometrics: Informing assessment practices. Information Age Publishing Inc. ISBN: 978-1-64113-328-9. Jiang, Z., & Carter, R. (2018). Using Hamiltonian Monte Carlo to estimate the log-linear cognitive diagnosis model via stan. Behavior Research Methods, 51, 651–662. https://doi.org/10.3758/s13428-018-1069-9 Kaplan, M., de la Torre, J., & Barrada, J. R. (2015). New item selection methods for cognitive diagnosis computerized adaptive testing. Applied Psychological Measurement, 39(3), 167–188. https://doi.org/10.1177/0146621614554650 Kingsbury, G. G., & Zara, A. R. (1989). Procedures for selecting items for computerized adaptive tests. Applied Measurement in Education, 2(4), 359–375. https://doi.org/10.1207/s15324818ame0204_6 Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22(1), 79–86. Kuo, B.-C., Pai, H.-S., & de la Torre, J. (2016). Modified cognitive diagnostic index and modified attribute-level discrimination index for test construction. Applied Psychological Measurement, 40(5), 315–330. https://doi.org/10.1177/0146621616638643 Leighton, J., & Gierl, M. (2007). In J. Leighton (Ed.), Cognitive diagnostic assessment for education: Theory and applications. Cambridge University Press. https://doi.org/10.1017/CBO9780511611186 Li, H., & Suen, H. K. (2013). Constructing and validating a Q-matrix for cognitive diagnostic analyses of a reading test. Educational Assessment., 18(1), 1–25. https://doi.org/10.1080/10627197.2013.761522 Li, X., Zhang, J., & Chang, H. (2019). Look-ahead content balancing method in variable-length computerized classification testing. British Journal of Mathematical and Statistical Psychology, 73(1), 88–108. https://doi.org/10.1111/bmsp.12165 Lim, Y. S., & Drasgow, F. (2017). Nonparametric calibration of item-by-attribute matrix in cognitive diagnosis. Multivariate Behavioral Research, 52(5), 562–575. https://doi.org/10.1080/00273171.2017.1341829 Lin, Y., Jiang, Y.-S., Gong, Y.-J., Zhan, Z.-H., & Zhang, J. (2019). A discrete multiobjective particle swarm optimizer for automated assembly of parallel cognitive diagnosis tests. IEEE Transactions on Cybernetics, 49(7), 2792–2805. https://doi.org/10.1109/TCYB.2018.2836388 Ma, C., Ouyang, J., & Xu, G. (2023). Learning latent and hierarchical structures in cognitive diagnosis models. Psychometrika, 88(1), 175–207. https://doi.org/10.1007/s11336-022-09867-5 McGlohen, M., & Chang, H.-H. (2008). Combining computer adaptive testing technology with cognitively diagnostic assessment. Behavior Research Methods, 40, 808–821. https://doi.org/10.3758/BRM.40.3.808 Meijer, R. R., & Nering, M. L. (1999). Computerized adaptive testing: Overview and introduction. Applied Psychological Measurement, 23(3), 187–194. https://doi.org/10.1177/01466219922031310 Mizumoto, A., Sasao, Y., & Webb, S. A. (2017). Developing and evaluating a computerized adaptive testing version of the word part levels test. Language Testing, 36(1), 101–123. https://doi.org/10.1177/0265532217725776 Moore, T. M., Scott, J. C., Reise, S. P., Port, A. M., Jackson, C. T., Ruparel, K., … Gur, R. C. (2015). Development of an abbreviated form of the Penn Line Orientation Test using large samples and computerized adaptive test simulation. Psychological Assessment, 27(3), 955–964. https://doi.org/10.1037/pas0000102 Nájera, P., Abad, F. J., & Sorrel, M. A. (2021). Determining the number of attributes in cognitive diagnosis modeling. Frontiers in Psychology, 12, 614470. https://doi.org/10.3389/fpsyg.2021.614470 Ravand, H. (2016). Application of a cognitive diagnostic model to a high-stakes reading comprehension test. Journal of Psychoeducational Assessment, 34(8), 782–799. https://doi.org/10.1177/0734282915623053 Şahin, A., & Weiss, D. J. (2015). Effects of calibration sample size and item bank size on ability estimation in computerized adaptive testing. Educational Sciences: Theory & Practice, 15(6), 1585–1595. https://doi.org/10.12738/estp.2015.6.0102 Sessoms, J., & Henson, R. A. (2018). Applications of diagnostic classification models: A literature review and critical commentary. Measurement: Interdisciplinary Research and Perspectives, 16(1), 1–17. https://doi.org/10.1080/15366367.2018.1435104 Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x Sorrel, M. A., Barrada, J. R., de la Torre, J., & Abad, F. J. (2020). Adapting cognitive diagnosis computerized adaptive testing item selection rules to traditional item response theory. PLoS ONE, 15(1), e0227196. https://doi.org/10.1371/journal.pone.0227196 Sun, X., Gao, Y., Xin, T., & Song, N. (2021). Binary restrictive threshold method for item exposure control in cognitive diagnostic computerized adaptive testing. Frontiers in Psychology, 12, 517155. https://doi.org/10.3389/fpsyg.2021.517155 Tatsuoka, C., & Ferguson, T. (2003). Sequential classification on partially ordered sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(1), 143–157. https://doi.org/10.1111/1467-9868.00377 Tseng, W.-T. (2016). Measuring English vocabulary size via computerized adaptive testing. Computers & Education, 97, 69–85. https://doi.org/10.1016/j.compedu.2016.02.018 van der Linden, W. J. (1998). Bayesian item selection criteria for adaptive testing. Psychometrika, 63(2), 201–216. https://doi.org/10.1007/BF02294775 van der Linden, W. J. (2008). Some new developments in adaptive testing technology. Journal of Psychology, 216(1), 3–11. https://doi.org/10.1027/0044-3409.216.1.3 van der Linden, W. J., & Glas, G. A. W. (Eds.). (2000a). Computerized adaptive testing: Theory and practice. Springer Netherlands. https://doi.org/10.1007/0-306-47531-6 van der Linden, W. J., & Glas, C. A. W. (2000b). Capitalization on item calibration error in adaptive testing. Applied Measurement in Education, 13(1), 35–53. https://doi.org/10.1207/s15324818ame1301_2 van der Linden, W. J., & Veldkamp, B. P. (2004). Constraining item exposure in computerized adaptive testing with shadow tests. Journal of Educational and Behavioral Statistics, 29(3), 273–291. https://doi.org/10.3102/10769986029003273 Wang, C. (2013). Mutual information item selection method in cognitive diagnostic computerized adaptive testing with short test length. Educational and Psychological Measurement, 73(6), 1017–1035. https://doi.org/10.1177/0013164413498256 Wang, C., Chang, H.-H., & Douglas, J. (2011a). Combining CAT with cognitive diagnosis: A weighted item selection approach. Behavior Research Methods, 44, 95–109. https://doi.org/10.3758/s13428-011-0143-3 Wang, C., Chang, H.-H., & Huebner, A. (2011b). Restrictive stochastic item selection methods in cognitive diagnostic computerized adaptive testing. Journal of Educational Measurement, 48(3), 255–273. https://doi.org/10.1111/j.1745-3984.2011.00145 Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21(4), 361–375. https://doi.org/10.1111/j.1745-3984.1984.tb01040.x Xu, X., Chang, H., & Douglas, J. (2003). A simulation study to compare CAT strategies for cognitive diagnosis. In Annual meeting of the American Educational Research Association, Chicago. Yasuda, J., Hull, M. M., & Mae, N. (2022). Improving test security and efficiency of computerized adaptive testing for the Force Concept Inventory. Physical Review Physics Education Research, 18(1), 010112. https://doi.org/10.1103/PhysRevPhysEducRes.18.010112 Yu, X., Cheng, Y., & Chang, H.-H. (2019). Recent developments in cognitive diagnostic computerized adaptive testing (CD-CAT): A comprehensive review. In M. von Davier & Y.-S. Lee (Eds.), Handbook of diagnostic classification models: Models and model extensions, applications, software packages (pp. 307–331). Springer International Publishing. https://doi.org/10.1007/978-3-030-05584-4_15 Zhang, S., & Chang, H.-H. (2019). A multilevel logistic hidden Markov model for learning under cognitive diagnosis. Behavior Research Methods, 52, 408–421. https://doi.org/10.3758/s13428-019-01238-w Zheng, C., & Wang, C. (2017). Application of binary searching for item exposure control in cognitive diagnostic computerized adaptive testing. Applied Psychological Measurement, 41(7), 561–576. https://doi.org/10.1177/0146621617707509