Empirical evaluation of fully Bayesian information criteria for mixture IRT models using NUTS

Behaviormetrika - Tập 50 - Trang 93-120 - 2022
Rehab AlHakmani1, Yanyan Sheng2
1Emirates College for Advanced Education, Abu Dhabi, UAE
2The University of Chicago, Chicago, USA

Tóm tắt

This study is to evaluate the performance of fully Bayesian information criteria, namely, LOO, WAIC and WBIC in terms of the accuracy in determining the number of latent classes of a mixture IRT model while comparing it to the conventional model via non-random walk MCMC algorithms and to further compare their performance with conventional information criteria including AIC, BIC, CAIC, SABIC, and DIC. Monte Carlo simulations were carried out to evaluate these criteria under different situations. The results indicate that AIC, BIC, and their related CAIC and SABIC tend to select the simpler model and are not recommended when the actual data involve multiple latent classes. For the three fully Bayesian measures, WBIC can be used for detecting the number of latent classes for tests with at least 30 items, while WAIC and LOO are suggested to be used together with their effective number of parameters in choosing the correct number of latent classes.

Tài liệu tham khảo

Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Contr 19:716–723 Al Hakmani R, Sheng Y (2019) NUTS for mixture IRT models. In: Wiberg M, Culpepper S, Janssen R, González J, Molenaar D (eds) Quantitative psychology. Springer, New York, pp 25–37 Andrich D (1978) A rating formulation for ordered response categories. Psychometrika 43(4):561–573 Bilir MK (2009) Mixture item response theory-MIMIC model: simultaneous estimation of differential item functioning for manifest groups and latent classes. Doctoral dissertation. ProQuest Dissertations & Theses A&I. (Order No. 3399179) Birnbaum A (1968) Some latent trait models and their use in inferring an examinee’s ability. In: Lord FM, Novick MR (eds) Statistical theories of mental test scores. Addison-Wesley, Reading, pp 397–479 Bock RD (1972) Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika 37:29–51 Bolt DM, Cohen AS, Wollack JA (2001) A mixture item response model for multiple-choice data. J Educ Behav Stat 26(4):381–409 Bolt DM, Cohen AS, Wollack JA (2002) Item parameter estimation under conditions of test speededness: application of a mixture Rasch model with ordinal constraints. J Educ Meas 39(4):331–348 Bozdogan H (1987) Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika 52(3):345–370 Brooks S, Smith J, Vehtari A, Plummer M, Stone M, Robert CP et al (2002) Discussion on the paper by Spiegelhalter, Best, Carlin and van der Linde. J R Stat Soc Ser B Stat Methodol 64:616–639 Carlin BP, Louis TA (2001) Bayes and empirical Bayes methods for data analysis, 2nd edn. Chapman & Hall/CRC, Boca Raton Cho S-J, Cohen AS, Kim S-H (2013) Markov chain Monte Carlo estimation of a mixture item response theory model. J Stat Comput Simul 83:278–306. https://doi.org/10.1080/00949655.2011.603090 Choi IH, Paek I, Cho SJ (2017) The impact of various class-distinction features on model selection in the mixture Rasch model. J Exp Educ 85(3):411–424. https://doi.org/10.1080/00220973.2016.1250208 Congdon P (2003) Applied Bayesian modelling. Wiley, New York Da Silva MA, Bazán JL, Huggins-Manley AC (2018) Sensitivity analysis and choosing between alternative polytomous IRT models using Bayesian model comparison criteria. Commun Stat Simul Comput 48:601–620. https://doi.org/10.1080/03610918.2017.1390126 De Ayala RJ, Kim SH, Stapleton LM, Dayton CM (2002) Differential item functioning: a mixture distribution conceptualization. Int J Test 2(3&4):243–276 de la Torre J, Stark S, Chernyshenko OS (2006) Markov chain Monte Carlo estimation of item parameters for the generalized graded unfolding model. Appl Psychol Meas 30(3):216–232. https://doi.org/10.1177/0146621605282772 Duane S, Kennedy A, Pendleton BJ, Roweth D (1987) Hybrid Monte Carlo. Phys Lett B 195:216–222. https://doi.org/10.1016/0370-2693(87)91197-X Finch WH, French BF (2012) Parameter estimation with mixture item response theory models: a Monte Carlo comparison of maximum likelihood and Bayesian methods. J Mod Appl Stat Methods 11(1):167–178 Fisher (1922) On the mathematical foundation of theoretical Statistics. Philos Trans R Soc 222:309–368 Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7(4):457–472 Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2014) Bayesian data analysis, 3rd edn. Chapman & Hall/CRC, Boca Raton, FL Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6(6):721–741. https://doi.org/10.1109/TPAMI.1984.4767596 Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97–109. https://doi.org/10.1093/biomet/57.1.97 Hoffman MD, Gelman A (2014) The no-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J Mach Learn Res 15(2):1593–1624 Jang Y, Cohen AS (2020) The impact of Markov chain convergence on estimation of mixture IRT model parameters. Educ Psychol Meas 80(5):975–994. https://doi.org/10.1177/0013164419898228 Lee H, Beretvas SN (2014) Evaluation of two types of differential item functioning in factor mixture models with binary outcomes. Educ Psychol Meas 74(5):831–858. https://doi.org/10.1177/0013164414526881 Li F, Cohen A, Kim S, Cho S (2009) Model selection methods for mixture dichotomous IRT models. Appl Psychol Meas 33(5):353–373. https://doi.org/10.1177/0146621608326422 Luo Y (2019) LOO and WAIC as model selection methods for polytomous items. Psychol Test Assess Model 61:161–185 Luo Y, Al-Harbi K (2017) Performances of LOO and WAIC as IRT model selection methods. Psychol Test Assess Model 59(2):183–205 Luo Y, Jiao H (2017) Using the Stan program for Bayesian item response theory. Educ Psychol Meas 78(3):384–408 Masters GN (1982) A Rasch model for partial credit scoring. Psychometrika 47(2):149–174 Metropolis N, Ulam S (1949) The Monte Carlo method. J Am Stat Assoc 44(247):335–341 Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092 Meyer JP (2010) A mixture Rasch model with Item response time components. Appl Psychol Meas 34(7):521–538. https://doi.org/10.1177/0146621609355451 Muraki E (1992) A generalized partial credit model: application of an EM algorithm. Appl Psychol Meas 16(2):159–176 Neal RM (2011) MCMC using Hamiltonian dynamics. In: Brooks S, Gelman A, Jones G, Meng X (eds) Handbook of Markov chain Monte Carlo. CRC Press, Boca Raton, pp 113–162 Neyman J, Pearson ES (1933) On the problem of the most efficient tests of statistical hypotheses. Philos Trans A Math Phys Eng Sci 231:289–337. https://doi.org/10.1098/rsta.1933.0009 Nylund KL, Asparouhov T, Muthén BO (2007) Deciding on the number of classes in latent class analysis and growth mixture modeling: a Monte Carlo simulation study. Struct Equ Model 14:535–569. https://doi.org/10.1080/10705510701575396 Plummer M (2008) Penalized loss functions for Bayesian model comparison. Biostatistics 9:523–539. https://doi.org/10.1093/biostatistics/kxm049 Preinerstorfer D, Formann AK (2012) Parameter recovery and model selection in mixed Rasch models. Br J Math Stat Psychol 65(2):251–262. https://doi.org/10.1111/j.2044-8317.2011.02020.x Rost J (1990) Rasch models in latent classes: an integration of two approaches to item analysis. Appl Psychol Meas 14(3):271–282. https://doi.org/10.1177/014662169001400305 Samejima F (1969) Estimation of latent ability using a response pattern of graded scores. Psychometrika 17:1–37 Samuelsen K (2005) Examining differential item functioning from a latent class perspective. Doctoral dissertation. ProQuest Dissertations & Theses A&I. (Order No. 3175148) Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464 Sclove SL (1987) Application of model-selection criteria to some problems in multivariate analysis. Psychometrika 52(3):333–343. https://doi.org/10.1007/BF02294360 Sen S, Cohen AS, Kim SH (2016) The impact of non-normality on extraction of spurious latent classes in mixture IRT models. Appl Psychol Meas 40(2):98–113. https://doi.org/10.1177/0146621615605080 Sen S, Cohen AS, Kim S (2019) Model selection for multilevel mixture Rasch models. Appl Psychol Meas 43(4):272–289. https://doi.org/10.1177/0146621618779990 Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A (2002) Bayesian measures of model complexity and fit. J R Stat Soc Ser B Stat Methodol 64(4):583–639 Stan Development Team (2020) RStan: the R interface to Stan. R package version 2.21.2. http://mc-stan.org/. Sugiura N (1978) Further analysts of the data by Akaike’s information criterion and the finite corrections: further analysts of the data by Akaike’s. Commun Stat Theory Methods 7(1):13–26 Uto M, Ueno M (2020) A generalized many-facet Rasch model and its Bayesian estimation using Hamiltonian Monte Carlo. Behaviormetrika 47:469–496 Vehtari A, Gelman A, Gabry J (2017) Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput 27(5):1413–1432. https://doi.org/10.1007/s11222-016-9696-4 Watanabe S (2010) Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J Mach Learn Res 11:3571–3594 Watanabe S (2013) A widely applicable Bayesian information criterion. J Mach Learn Res 14:867–897 Watanabe S (2021) WAIC and WBIC for mixture models. Behaviormetrika 48:5–21 Wollack JA, Bolt DM, Cohen AS, Lee YS (2002) Recovery of item parameters in the nominal response model: a comparison of marginal maximum likelihood estimation and Markov chain Monte Carlo estimation. Appl Psychol Meas 26(3):339–352. https://doi.org/10.1177/0146621602026003007