Conditioning factors of test-taking engagement in PIAAC: an exploratory IRT modelling approach considering person and item characteristics

Large-Scale Assessments in Education - 2017

Frank Goldhammer¹, Thomas Martens², Oliver Lüdtke³

¹German Institute for International Educational Research (DIPF)/Centre for International Student Assessment (ZIB), Schloßstr. 29, 60486, Frankfurt/Main, Germany

²Medical School Hamburg, Am Kaiserkai 1, 20457, Hamburg, Germany

³IPN-Leibniz Institute for Science and Mathematics Education/Centre for International Student Assessment (ZIB), Olshausenstraße 62, 24118, Kiel, Germany

Tóm tắt

Từ khóa

Tài liệu tham khảo

Asseburg, R., & Frey, A. (2013). Too hard, too easy, or just right? The relationship between effort or boredom and ability-difficulty fit. Psychological Test and Assessment Modeling, 55(1), 92–104.

Bartolucci, F. (2007). A class of multidimensional IRT models for testing unidimensionality and clustering items. Psychometrika, 72(2), 141. https://doi.org/10.1007/s11336-005-1376-9 .

Bates, D., Maechler, M., Bolker, B. M., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01 .

Braun, H., Kirsch, I., & Yamamoto, K. (2011). An experimental study of the effects of monetary incentives on performance on the 12th-grade NAEP Reading assessment. Teachers College Record, 113(11), 2309–2344.

Brown, A. R., & Finney, S. J. (2011). Low-stakes testing and psychological reactance: Using the hong psychological reactance scale to better understand compliant and non-compliant examinees. International Journal of Testing, 11(3), 248–270. https://doi.org/10.1080/15305058.2011.570884 .

Cole, J. S., Bergin, D. A., & Whittaker, T. A. (2008). Predicting student achievement for low stakes tests with effort and task value. Contemporary Educational Psychology, 33(4), 609–624. https://doi.org/10.1016/j.cedpsych.2007.10.002 .

Cronbach, L. J. (1970). Essentials of psychological testing. New York: Harper & Row.

de Ayala, R. J. (2009). The theory and practice of item response theory. New York: Guilford Press.

De Boeck, P., Bakker, M., Zwitser, R., Nivard, M., Hofman, A., Tuerlinckx, F., et al. (2011). The estimation of item response models with the lmer function from the lme4 package in R. Journal of Statistical Software, 39(12), 1–28. https://doi.org/10.18637/jss.v039.i12 .

Debeer, D., Buchholz, J., Hartig, J., & Janssen, R. (2014). Student, school, and country differences in sustained test-taking effort in the 2009 PISA reading assessment. Journal of Educational and Behavioral Statistics, 39(6), 502–523. https://doi.org/10.3102/1076998614558485 .

DeMars, C. E., Bashkov, B. M., & Socha, A. B. (2013). The role of gender in test-taking motivation under low-stakes conditions. Research and Practice in Assessment, 8, 69–82.

Doran, H., Bates, D., Bliese, P., & Dowling, M. (2007). Estimating the multilevel rasch model: With the lme4 package. Journal of Statistical Software, 20, 1–18. https://doi.org/10.18637/jss.v020.i02 .

Douglas, J., & Cohen, A. (2001). Nonparametric item response function estimation for assessing parametric model fit. Applied Psychological Measurement, 25(3), 234–243. https://doi.org/10.1177/01466210122032046 .

Eccles (Parsons), J. S., Adler, T. F., Futterman, R., Goff, S. B., Kaczala, C. M., Meece, J. L., et al. (1983). Expectancies, values, and academic behaviors. In J. T. Spence (Ed.), Achievement and achievement motives: Psychological and sociological approaches (pp. 75–146). San Francisco: W. H. Freeman.

Eccles, J. S., & Wigfield, A. (2002). Motivational beliefs, values, and goals. Annual Review of Psychology, 53(1), 109–132. https://doi.org/10.1146/annurev.psych.53.100901.135153 .

Finn, B. (2015). Measuring motivation in low-stakes assessments. ETS Research Report Series, 2015(2), 1–17. http://doi.org/10.1002/ets2.12067 .

Fox, J.-P., & Marianti, S. (2017). Person-Fit statistics for joint models for accuracy and speed. Journal of Educational Measurement, 54(2), 243–262. https://doi.org/10.1111/jedm.12143 .

Goldhammer, F., Martens, T., Christoph, G., & Lüdtke, O. (2016). Test-taking engagement in PIAAC. Vol. 133. In: OECD Education Working Papers. Paris: OECD Publishing.

Goldhammer, F., Naumann, J., Stelter, A., Tóth, K., Rölke, H., & Klieme, E. (2014). The time on task effect in reading and problem solving is moderated by task difficulty and skill: Insights from a computer-based large-scale assessment. Journal of Educational Psychology, 106, 608–626. https://doi.org/10.1037/a0034716 .

Gollwitzer, P. M. (1996). The Volitional Benefits of Planning. In P. M. Gollwitzer & J. A. Bargh (Eds.), The psychology of action. Linking cognition and motivation to behavior (pp. 287-312). New York, London: The Guilford Press.

Haladyna, T. M., & Downing, S. M. (2004). Construct-irrelevant variance in high-stakes testing. Educational Measurement: Issues and Practice, 23(1), 17–27. https://doi.org/10.1111/j.1745-3992.2004.tb00149.x .

Holman, R., & Glas, C. A. W. (2005). Modelling non-ignorable missing-data mechanisms with item response theory models. British Journal of Mathematical and Statistical Psychology, 58(1), 1–17. https://doi.org/10.1348/000711005x47168 .

Jakewerth, P. M., Stancavage, B. S., & Reed, E. D. (1999). An investigation of why students do not respond to questions. CA: Palo Alto.

Kiefer, T., Robitzsch, A., & Wu, M. (2016). TAM: Test analysis modules. R package version 1.99–6. Retrieved from http://CRAN.R-project.org/package=TAM .

Köhler, C., Pohl, S., & Carstensen, C. (2015). Investigating mechanisms for missing responses in competence tests. Psychological Test and Assessment Modeling, 57(4), 499–522.

Kong, X. J., Wise, S. L., & Bhola, D. S. (2007). Setting the response time threshold parameter to differentiate solution behavior from rapid-guessing behavior. Educational and Psychological Measurement, 67(4), 606–619. https://doi.org/10.1177/0013164406294779 .

Kuhl, J. (2000). Chapter 5—A functional-design approach to motivation and self-regulation: The dynamics of personality systems interactions A2—Boekaerts, Monique. In P. R. Pintrich & M. Zeidner (Eds.), Handbook of self-regulation (pp. 111–169). San Diego: Academic Press.

Lau, A. R., Swerdzewski, P. J., Jones, A. T., Anderson, R. D., & Markle, R. E. (2009). Proctors matter: strategies for increasing examinee effort on general education program assessments. The Journal of General Education, 58, 196–217. https://doi.org/10.1353/jge.0.0045 .

Lee, Y.-H., & Jia, Y. (2014). Using response time to investigate students’ test-taking behaviors in a NAEP computer-based study. Large-scale Assessments in Education, 2(1), 1–24. https://doi.org/10.1186/s40536-014-0008-1 .

Ma, L., Wise, S. L., Thum, Y. M., & Kingsbury, G. (2011). Detecting response time threshold under the computer adaptive testing environment. Paper presented at the annual meeting of the National Council of Measurement in Education, New Orleans.

Marsh, H. W., & Craven, R. G. (2006). Reciprocal effects of self-concept and performance from a multidimensional perspective: Beyond seductive pleasure and unidimensional perspectives. Perspectives on Psychological Science, 1(2), 133–163. https://doi.org/10.1111/j.1745-6916.2006.00010.x .

Meyer, J. P. (2010). A mixture rasch model with item response time components. Applied Psychological Measurement, 34(7), 521–538. https://doi.org/10.1177/0146621609355451 .

Mislevy, R. J., & Wu, P.-K. (1996). Missing responses and IRT ability estimation: Omits, choice, time limits, and adaptive testing (Vol. RR96-30). Princeton: Educational Testing Service.

OECD. (2013a). OECD skills outlook 2013: First results from the survey of adult skills. Paris: OECD Publishing.

OECD. (2013b). Technical report of the survey of adult skills (PIAAC). Paris: OECD Publishing.

Penk, C., Pöhlmann, C., & Roppelt, A. (2014). The role of test-taking motivation for students’ performance in low-stakes assessments: an investigation of school-track-specific differences. Large-scale Assessments in Education, 2(1), 5. https://doi.org/10.1186/s40536-014-0005-4 .

Pohl, S., Gräfe, L., & Rose, N. (2013). Dealing with omitted and not-reached items in competence tests: Evaluating approaches accounting for missing responses in item response theory models. Educational and Psychological Measurement. https://doi.org/10.1177/0013164413504926 .

Rios, J. A., Guo, H., Mao, L., & Liu, O. L. (2017). Evaluating the impact of careless responding on aggregated-scores: to filter unmotivated examinees or not? International Journal of Testing, 17, 74–104. http://doi.org/10.1080/15305058.2016.1231193 .

Rost, J. (2004). Lehrbuch Testtheorie—Testkonstruktion [Textbook Test theory—Test construction] (2nd ed.). Bern: Huber.

Schnipke, D. L., & Scrams, D. J. (1997). Modeling item response times with a two-state mixture model: A new method of measuring speededness. Journal of Educational Measurement, 34(3), 213–232. https://doi.org/10.1111/j.1745-3984.1997.tb00516.x .

Setzer, J. C., Wise, S. L., van den Heuvel, J. R., & Ling, G. (2013). An investigation of examinee test-taking effort on a large-scale assessment. Applied Measurement in Education, 26(1), 34–49. https://doi.org/10.1080/08957347.2013.739453 .

Stocking, M. L., Eignor, D. R., & Cook, L. L. (1988). Factors affecting the sample invariant properties of linear and curvilinear observed- and true-score equating procedures. ETS Research Report Series, 1988(2), i–71. http://doi.org/10.1002/j.2330-8516.1988.tb00297.x .

Sundre, D. L., & Kitsantas, A. (2004). An exploration of the psychology of the examinee: Can examinee self-regulation and test-taking motivation predict consequential and non-consequential test performance? Contemporary Educational Psychology, 29(1), 6–26. https://doi.org/10.1016/S0361-476X(02)00063-2 .

Team, R. C. (2016). R: A language and environment for statistical computing (Version 3.1.3). Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.R-project.org/ .

Trautwein, U., Marsh, H. W., Nagengast, B., Lüdtke, O., Nagy, G., & Jonkmann, K. (2012). Probing for the multiplicative term in modern expectancy—value theory: A latent interaction modeling study. Journal of Educational Psychology, 104(3), 763. https://doi.org/10.1037/a0027470 .

van der Linden, W. J., & Guo, F. (2008). Bayesian procedures for identifying aberrant response-time patterns in adaptive testing. Psychometrika, 73(3), 365–384. https://doi.org/10.1007/s11336-007-9046-8 .

Wise, S. L. (2006). An investigation of the differential effort received by items on a low-stakes computer-based test. Applied Measurement in Education, 19(2), 95–114. https://doi.org/10.1207/s15324818ame1902_2 .

Wise, S. L. (2009). Strategies for managing the problem of unmotivated examinees in low-stakes testing programs. The Journal of General Education, 58(3), 152–166.

Wise, S. L. (2015). Effort analysis: Individual score validation of achievement test data. Applied Measurement in Education, 28(3), 237–252. doi: 10.1080/08957347.2015.1042155 .

Wise, S. L. (2017). Rapid-guessing behavior: Its identification, interpretation, and implications. Educational Measurement: Issues and Practice. https://doi.org/10.1111/emip.12165 .

Wise, S. L., & DeMars, C. E. (2005). Low examinee effort in low-stakes assessment: Problems and potential solutions. Educational Assessment, 10, 1–17. https://doi.org/10.1207/s15326977ea1001_1 .

Wise, S. L., & DeMars, C. E. (2006). An application of item response time: The effort-moderated IRT model. Journal of Educational Measurement, 43(1), 19–38. https://doi.org/10.1111/j.1745-3984.2006.00002.x .

Wise, S. L., & Gao, L. (2017). A General Approach to Measuring Test-Taking Effort on Computer-Based Tests. Applied Measurement in Education, 30, 343–354. http://doi.org/10.1080/08957347.2017.1353992 .

Wise, S. L., & Kong, X. J. (2005). Response time effort: A new measure of examinee motivation in computer-based tests. Applied Measurement in Education, 18(2), 163–183. https://doi.org/10.1207/s15324818ame1802_2 .

Wise, S. L., & Ma, L. (2012). Setting response time thresholds for a CAT item pool: The normative threshold method. Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, Canada.

Wolf, L. F., Smith, J. K., & Birnbaum, M. E. (1995). Consequence of performance, test, motivation, and mentally taxing items. Applied Measurement in Education, 8(4), 341–351. https://doi.org/10.1207/s15324818ame0804_4 .

Wright, B. D., & Linacre, J. M. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8(3), 370.

Yamamoto, K., & Everson, H. (1997). Modeling the effects of test length and test time on parameter estimation using the HYBRID model. In J. Rost & R. Langeheine (Eds.), Applications of latent trait and latent class models in the social sciences (pp. 89–98). Münster: Waxman.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA