Empirical evaluation of fully Bayesian information criteria for mixture IRT models using NUTS
Tóm tắt
This study is to evaluate the performance of fully Bayesian information criteria, namely, LOO, WAIC and WBIC in terms of the accuracy in determining the number of latent classes of a mixture IRT model while comparing it to the conventional model via non-random walk MCMC algorithms and to further compare their performance with conventional information criteria including AIC, BIC, CAIC, SABIC, and DIC. Monte Carlo simulations were carried out to evaluate these criteria under different situations. The results indicate that AIC, BIC, and their related CAIC and SABIC tend to select the simpler model and are not recommended when the actual data involve multiple latent classes. For the three fully Bayesian measures, WBIC can be used for detecting the number of latent classes for tests with at least 30 items, while WAIC and LOO are suggested to be used together with their effective number of parameters in choosing the correct number of latent classes.
Tài liệu tham khảo
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Contr 19:716–723
Al Hakmani R, Sheng Y (2019) NUTS for mixture IRT models. In: Wiberg M, Culpepper S, Janssen R, González J, Molenaar D (eds) Quantitative psychology. Springer, New York, pp 25–37
Andrich D (1978) A rating formulation for ordered response categories. Psychometrika 43(4):561–573
Bilir MK (2009) Mixture item response theory-MIMIC model: simultaneous estimation of differential item functioning for manifest groups and latent classes. Doctoral dissertation. ProQuest Dissertations & Theses A&I. (Order No. 3399179)
Birnbaum A (1968) Some latent trait models and their use in inferring an examinee’s ability. In: Lord FM, Novick MR (eds) Statistical theories of mental test scores. Addison-Wesley, Reading, pp 397–479
Bock RD (1972) Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika 37:29–51
Bolt DM, Cohen AS, Wollack JA (2001) A mixture item response model for multiple-choice data. J Educ Behav Stat 26(4):381–409
Bolt DM, Cohen AS, Wollack JA (2002) Item parameter estimation under conditions of test speededness: application of a mixture Rasch model with ordinal constraints. J Educ Meas 39(4):331–348
Bozdogan H (1987) Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika 52(3):345–370
Brooks S, Smith J, Vehtari A, Plummer M, Stone M, Robert CP et al (2002) Discussion on the paper by Spiegelhalter, Best, Carlin and van der Linde. J R Stat Soc Ser B Stat Methodol 64:616–639
Carlin BP, Louis TA (2001) Bayes and empirical Bayes methods for data analysis, 2nd edn. Chapman & Hall/CRC, Boca Raton
Cho S-J, Cohen AS, Kim S-H (2013) Markov chain Monte Carlo estimation of a mixture item response theory model. J Stat Comput Simul 83:278–306. https://doi.org/10.1080/00949655.2011.603090
Choi IH, Paek I, Cho SJ (2017) The impact of various class-distinction features on model selection in the mixture Rasch model. J Exp Educ 85(3):411–424. https://doi.org/10.1080/00220973.2016.1250208
Congdon P (2003) Applied Bayesian modelling. Wiley, New York
Da Silva MA, Bazán JL, Huggins-Manley AC (2018) Sensitivity analysis and choosing between alternative polytomous IRT models using Bayesian model comparison criteria. Commun Stat Simul Comput 48:601–620. https://doi.org/10.1080/03610918.2017.1390126
De Ayala RJ, Kim SH, Stapleton LM, Dayton CM (2002) Differential item functioning: a mixture distribution conceptualization. Int J Test 2(3&4):243–276
de la Torre J, Stark S, Chernyshenko OS (2006) Markov chain Monte Carlo estimation of item parameters for the generalized graded unfolding model. Appl Psychol Meas 30(3):216–232. https://doi.org/10.1177/0146621605282772
Duane S, Kennedy A, Pendleton BJ, Roweth D (1987) Hybrid Monte Carlo. Phys Lett B 195:216–222. https://doi.org/10.1016/0370-2693(87)91197-X
Finch WH, French BF (2012) Parameter estimation with mixture item response theory models: a Monte Carlo comparison of maximum likelihood and Bayesian methods. J Mod Appl Stat Methods 11(1):167–178
Fisher (1922) On the mathematical foundation of theoretical Statistics. Philos Trans R Soc 222:309–368
Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7(4):457–472
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2014) Bayesian data analysis, 3rd edn. Chapman & Hall/CRC, Boca Raton, FL
Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6(6):721–741. https://doi.org/10.1109/TPAMI.1984.4767596
Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97–109. https://doi.org/10.1093/biomet/57.1.97
Hoffman MD, Gelman A (2014) The no-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J Mach Learn Res 15(2):1593–1624
Jang Y, Cohen AS (2020) The impact of Markov chain convergence on estimation of mixture IRT model parameters. Educ Psychol Meas 80(5):975–994. https://doi.org/10.1177/0013164419898228
Lee H, Beretvas SN (2014) Evaluation of two types of differential item functioning in factor mixture models with binary outcomes. Educ Psychol Meas 74(5):831–858. https://doi.org/10.1177/0013164414526881
Li F, Cohen A, Kim S, Cho S (2009) Model selection methods for mixture dichotomous IRT models. Appl Psychol Meas 33(5):353–373. https://doi.org/10.1177/0146621608326422
Luo Y (2019) LOO and WAIC as model selection methods for polytomous items. Psychol Test Assess Model 61:161–185
Luo Y, Al-Harbi K (2017) Performances of LOO and WAIC as IRT model selection methods. Psychol Test Assess Model 59(2):183–205
Luo Y, Jiao H (2017) Using the Stan program for Bayesian item response theory. Educ Psychol Meas 78(3):384–408
Masters GN (1982) A Rasch model for partial credit scoring. Psychometrika 47(2):149–174
Metropolis N, Ulam S (1949) The Monte Carlo method. J Am Stat Assoc 44(247):335–341
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092
Meyer JP (2010) A mixture Rasch model with Item response time components. Appl Psychol Meas 34(7):521–538. https://doi.org/10.1177/0146621609355451
Muraki E (1992) A generalized partial credit model: application of an EM algorithm. Appl Psychol Meas 16(2):159–176
Neal RM (2011) MCMC using Hamiltonian dynamics. In: Brooks S, Gelman A, Jones G, Meng X (eds) Handbook of Markov chain Monte Carlo. CRC Press, Boca Raton, pp 113–162
Neyman J, Pearson ES (1933) On the problem of the most efficient tests of statistical hypotheses. Philos Trans A Math Phys Eng Sci 231:289–337. https://doi.org/10.1098/rsta.1933.0009
Nylund KL, Asparouhov T, Muthén BO (2007) Deciding on the number of classes in latent class analysis and growth mixture modeling: a Monte Carlo simulation study. Struct Equ Model 14:535–569. https://doi.org/10.1080/10705510701575396
Plummer M (2008) Penalized loss functions for Bayesian model comparison. Biostatistics 9:523–539. https://doi.org/10.1093/biostatistics/kxm049
Preinerstorfer D, Formann AK (2012) Parameter recovery and model selection in mixed Rasch models. Br J Math Stat Psychol 65(2):251–262. https://doi.org/10.1111/j.2044-8317.2011.02020.x
Rost J (1990) Rasch models in latent classes: an integration of two approaches to item analysis. Appl Psychol Meas 14(3):271–282. https://doi.org/10.1177/014662169001400305
Samejima F (1969) Estimation of latent ability using a response pattern of graded scores. Psychometrika 17:1–37
Samuelsen K (2005) Examining differential item functioning from a latent class perspective. Doctoral dissertation. ProQuest Dissertations & Theses A&I. (Order No. 3175148)
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Sclove SL (1987) Application of model-selection criteria to some problems in multivariate analysis. Psychometrika 52(3):333–343. https://doi.org/10.1007/BF02294360
Sen S, Cohen AS, Kim SH (2016) The impact of non-normality on extraction of spurious latent classes in mixture IRT models. Appl Psychol Meas 40(2):98–113. https://doi.org/10.1177/0146621615605080
Sen S, Cohen AS, Kim S (2019) Model selection for multilevel mixture Rasch models. Appl Psychol Meas 43(4):272–289. https://doi.org/10.1177/0146621618779990
Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A (2002) Bayesian measures of model complexity and fit. J R Stat Soc Ser B Stat Methodol 64(4):583–639
Stan Development Team (2020) RStan: the R interface to Stan. R package version 2.21.2. http://mc-stan.org/.
Sugiura N (1978) Further analysts of the data by Akaike’s information criterion and the finite corrections: further analysts of the data by Akaike’s. Commun Stat Theory Methods 7(1):13–26
Uto M, Ueno M (2020) A generalized many-facet Rasch model and its Bayesian estimation using Hamiltonian Monte Carlo. Behaviormetrika 47:469–496
Vehtari A, Gelman A, Gabry J (2017) Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput 27(5):1413–1432. https://doi.org/10.1007/s11222-016-9696-4
Watanabe S (2010) Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J Mach Learn Res 11:3571–3594
Watanabe S (2013) A widely applicable Bayesian information criterion. J Mach Learn Res 14:867–897
Watanabe S (2021) WAIC and WBIC for mixture models. Behaviormetrika 48:5–21
Wollack JA, Bolt DM, Cohen AS, Lee YS (2002) Recovery of item parameters in the nominal response model: a comparison of marginal maximum likelihood estimation and Markov chain Monte Carlo estimation. Appl Psychol Meas 26(3):339–352. https://doi.org/10.1177/0146621602026003007