Comparing different response time threshold setting methods to detect low effort on a large-scale assessment
Tóm tắt
Từ khóa
Tài liệu tham khảo
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (Eds.). (2014). Standards for educational and psychological testing. American Educational Research Association.
Brozo, W. G., Shiel, G., & Topping, K. (2007). Engagement in reading: Lessons learned from three PISA countries. Journal of Adolescent and Adult Literacy, 51(4), 304–315.
Debeer, D., Buchholz, J., Hartig, J., & Janssen, R. (2014). Student, school, and country differences in sustained test-taking effort in the 2009 PISA reading assessment. Journal of Educational and Behavioral Statistics, 39(6), 502–523.
Demars, C. E. (2007). Changes in rapid-guessing behavior over a series of assessments. Educational Assessment, 12(1), 23–45.
Goldhammer, F., Naumann, J., Stelter, A., Tóth, K., Rölke, H., & Klieme, E. (2014). The time on task effect in reading and problem solving is moderated by task difficulty and skill: Insights from a computer-based large-scale assessment. Journal of Educational Psychology, 106(3), 608.
Guo, H., Rios, J. A., Haberman, S., Liu, O. L., Wang, J., & Paek, I. (2016). A new procedure for detection of students’ rapid guessing responses using response time. Applied Measurement in Education, 29(3), 173–183.
Jensen, N., Rice, A., & Soland, J. (2018). The influence of rapidly guessed item responses on teacher value-added estimates: Implications for policy and practice. Educational Evaluation and Policy Analysis, 20(1), 90–98.
Kim, S.-Y. (2012). Sample size requirements in single- and multiphase growth mixture models: A Monte Carlo simulation study. Structural Equation Modeling: A Multidisciplinary Journal, 19(3), 457–476.
Kong, X. J., Wise, S. L., & Bhola, D. S. (2007). Setting the response time threshold parameter to differentiate solution behavior from rapid-guessing behavior. Educational and Psychological Measurement, 67(1), 606–619.
Kroehne, U., Deribo, T., & Goldhammer, F. (2020). Rapid guessing rates across administration mode and test setting. Psychological Test and Assessment Modeling, 62(2), 147–177.
Kuhfeld, M., & Soland, J. (2020). Using assessment metadata to quantify the impact of test disengagement on estimates of educational effectiveness. Journal of Research on Educational Effectiveness, 13(1), 147–175.
Kyllonen, P. C. (2012). Measurement of 21st century skills within the common core state standards. . New Jersey: Educational Testing Service (ETS).
Lee, Y. H., & Jia, Y. (2014). Using response time to investigate Students’ test-taking behaviors in a NAEP computer-based study. Large-scale Assessments in Education, 2(1), 8.
Lu, J., Wang, C., Zhang, J., & Tao, J. (2020). A mixture model for responses and response times with a higher-order ability structure to detect rapid guessing behaviour. British Journal of Mathematical and Statistical Psychology, 73(2), 261–288.
Meyer, J. P. (2010). A mixture Rasch model with item response time components. Applied Psychological Measurement, 34(7), 521–538.
Molenaar, D., & de Boeck, P. (2018). Response mixture modeling: Accounting for heterogeneity in item characteristics across response times. Psychometrika, 83(2), 1–19.
Peugh, J., & Fan, X. (2012). How well does growth mixture modeling identify heterogeneous growth trajectories? A simulation study examining GMM’s performance characteristics. Structural Equation Modeling: A Multidisciplinary Journal, 19(2), 204–226.
Rios, J. A., & Guo, H. (2020). Can culture be a salient predictor of test-taking engagement? An analysis of differential noneffortful responding on an international college-level assessment of critical thinking. Applied Measurement in Education, 33(4), 263–279.
Rios, J. A., Soland, J. (2020). Parameter Estimation Accuracy of the Effort-Moderated Item Response Theory Model Under Multiple Assumption Violations. Educational and Psychological Measurement 0013164420949896.
Rios, J. A., Liu, O. L., & Bridgeman, B. (2014). Identifying low-effort examinees on student learning outcomes assessment: A comparison of two approaches. New Directions for Institutional Research, 2014(161), 69–82.
Rios, J. A., Guo, H., Mao, L., & Liu, O. L. (2016). Evaluating the impact of careless responding on aggregated-scores: To filter unmotivated examinees or not? International Journal of Testing, 17(1), 1–31.
Sahin, F., & Colvin, K. F. (2020). Enhancing response time thresholds with response behaviors for detecting disengaged examinees. Large-scale Assessments in Education, 8, 1–24.
Schnipke, D. L., & Scrams, D. J. (1997). Modeling item response times with a two-state mixture model: A new method of measuring speededness. Journal of Educational Measurement, 34(3), 213–232.
Schnipke, D. L., & Scrams, D. J. (2002). Exploring issues of examinee behavior: Insights gained from response-time analyses. Computer-Based Testing: Building the Foundation for Future Assessments, 25(1), 237–266.
Soland, J. (2018b). The achievement gap or the engagement gap? Investigating the sensitivity of gaps estimates to test motivation. Applied Measurement in Education, 31(4), 312–323.
Soland, J. (2018a). Are achievement gap estimates biased by differential student test effort? Putting an important policy metric to the test. Teachers College Record, 120(12).
Soland, J., & Kuhfeld, M. (2019). Do students rapidly guess repeatedly over time? A longitudinal analysis of student test disengagement, background, and attitudes. Educational Assessment, 24(4), 327–342.
Soland, J., Jensen, N., Keys, T. D., Bi, S. Z., & Wolk, E. (2019). Are test and academic disengagement related? Implications for measurement and practice. Educational Assessment, 24(2), 1–16.
Soland, J., Zamarro, G., Cheng, A., & Hitt, C. (2019). Identifying naturally occurring direct assessments of social-emotional competencies: The promise and limitations of survey and assessment disengagement metadata. Educational Researcher, 48(7), 466–478.
Ulitzsch, E., von Davier, M., & Pohl, S. (2020). A hierarchical latent response model for inferences about examinee engagement in terms of guessing and item-level non-response. British Journal of Mathematical and Statistical Psychology, 73(1), 83–112.
van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72(3), 287–308.
Wang, C., & Xu, G. (2015). A mixture hierarchical model for response times and response accuracy. British Journal of Mathematical and Statistical Psychology, 68(3), 456–477.
Wise, S. L. (2015). Effort analysis: Individual score validation of achievement test data. Applied Measurement in Education, 28(3), 237–252.
Wise, S. L., & DeMars, C. E. (2006). An application of item response time: The effort-moderated IRT model. Journal of Educational Measurement, 43(1), 19–38.
Wise, S. L., & Kong, X. (2005). Response time effort: A new measure of examinee motivation in computer-based tests. Applied Measurement in Education, 18(2), 163–183.
Wise, S. L., & Kuhfeld, M. R. (2020). Using retest data to evaluate and improve effort-moderated scoring. Journal of Educational Measurement.
Wise, S. L., & Ma, L. (2012). Setting response time thresholds for a CAT item pool: The normative threshold method. Annual Meeting of the National Council on Measurement in Education, Vancouver, Canada. https://nwea.org/content/uploads/2012/04/Setting-Response-Time-Thresholds-for-a-CAT-Item-Pool.pdf.
Wise, S. L., Kuhfeld, M. R., & Soland, J. (2019). The effects of effort monitoring with proctor notification on test-taking engagement, test performance, and validity. Applied Measurement in Education, 32(2), 183–192.
Wise, S. L., Soland, J., & Bo, Y. (2019). The (non) impact of differential test taker engagement on aggregated scores. International Journal of Testing, 20(1), 57–77.