Assessment by Comparative Judgement: An Application to Secondary Statistics and English in New Zealand

New Zealand Journal of Educational Studies - Tập 55 Số 1 - Trang 49-71 - 2020
Neil Marshall1, Kirsten Shaw1, Jodie Hunter2, Ian Jones3
1New Zealand Qualifications Authority, Wellington, New Zealand
2Institute of Education, Massey University, Wellington, New Zealand
3Mathematics Education Centre, Loughborough University, Loughborough, UK

Tóm tắt

AbstractThere is growing interest in using comparative judgement to assess student work as an alternative to traditional marking. Comparative judgement requires no rubrics and is instead grounded in experts making pairwise judgements about the relative ‘quality’ of students’ work according to a high level criterion. The resulting decision data are fitted to a statistical model to produce a score for each student. Cited benefits of comparative judgement over traditional methods include increased reliability, validity and efficiency of assessment processes. We investigated whether such claims apply to summative statistics and English assessments in New Zealand. Experts comparatively judged students’ responses to two national assessment tasks, and the reliability and validity of the outcomes were explored using standard techniques. We present evidence that the comparative judgement process efficiently produced reliable and valid assessment outcomes. We consider the limitations of the study, and make suggestions for further research and potential applications.

Từ khóa


Tài liệu tham khảo

Alomran, M., & Chia, D. (2018). Automated scoring system for multiple choice test with quick feedback. International Journal of Information and Education Technology, 8, 538–545. https://doi.org/10.18178/ijiet.2018.8.8.1096.

Assessment Research Group. (2009). Assessment in schools: Fit for purpose? A commentary by the teaching and learning research programme. London: Economic and Social Research Council.

Baird, J.-A., Andrich, D., Hopfenbeck, T. N., & Stobart, G. (2017). Assessment and learning: Fields apart? Assessment in Education: Principles, Policy & Practice, 24, 317–350. https://doi.org/10.1080/0969594X.2017.1319337.

Berkowitz, B. W., Fitch, R., & Kopriva, R. (2000). The Use of Tests as part of high-stakes decision-making for students: A resource guide for educators and policy-makers. Washington, DC: Office for Civil Rights (ED).

Bisson, M., -J., Gilmore, C., Inglis, M., & Jones, I. (2016). Measuring conceptual understanding using comparative judgement. International Journal of Research in Undergraduate Mathematics Education, 2, 141–164. https://doi.org/10.1007/s40753-016-0024-3.

Black, P., Burkhardt, H., Daro, P., Jones, I., Lappan, G., Pead, D., & Stephens, M. (2012). High-stakes examinations to support policy. Educational Designer, 2(5), 1–31. https://www.educationaldesigner.org/ed/volume2/issue5/article16/

Bramley, T. (2007). Paired comparison methods. In P. Newton, J.-A. Baird, H. Goldstein, H. Patrick, & P. Tymms (Eds.), Techniques for Monitoring the comparability of examination standards (pp. 264–294). London: QCA.

Heldsinger, S., & Humphry, S. (2010). Using the method of pairwise comparison to obtain reliable teacher assessments. The Australian Educational Researcher, 37, 1–19. https://doi.org/10.1007/BF03216919.

Hipkins, R., Johnston, M., & Sheehan, M. (2016). NCEA in context. Wellington, New Zealand: NZCER Press. https://www.nzcer.org.nz/nzcerpress/ncea-context.

Hunter, J., & Jones, I. (2018). Free-response tasks in primary mathematics: a window on students’ thinking. In Proceedings of the 41st annual conference of the Mathematics Education Research Group of Australasia (Vol. 41, pp. 400–407). Auckland, New Zealand: MERGA.

Jones, I., & Alcock, L. (2014). Peer assessment without assessment criteria. Studies in Higher Education, 39(10), 1774–1787. https://doi.org/10.1080/03075079.2013.821974.

Jones, I., Bisson, M., Gilmore, C., & Inglis, M. (2019). Measuring conceptual understanding in randomised controlled trials: Can comparative judgement help? British Educational Research Journal, 45, 662–680. https://doi.org/10.1002/berj.3519.

Jones, I., & Inglis, M. (2015). The problem of assessing problem solving: Can comparative judgement help? Educational Studies in Mathematics, 89, 337–355. https://doi.org/10.1007/s10649-015-9607-.

Jones, I., Inglis, M., Gilmore, C., & Hodgen, J. (2013). Measuring conceptual understanding: The case of fractions. In A. M. Lindmeier & A. Heinze (Eds.), Proceedings of the 37th Conference of the international group for the psychology of mathematics education (Vol. 3, pp. 113–120). Kiel, Germany: IGPME.

Jones, I., & Karadeniz, I. (2016). An alternative approach to assessing achievement. In C. Csikos, A. Rausch, & J. Szitanyi (Eds.), The 40th Conference of the International Group for the Psychology of Mathematics Education (Vol. 3, pp. 51–58). Szeged, Hungary: IGPME.

Jones, I., & Sirl, D. (2017). Peer assessment of mathematical understanding using comparative judgement. Nordic Studies in Mathematics Education, 22, 147–164.

Jones, I., Swan, M., & Pollitt, A. (2014). Assessing mathematical problem solving using comparative judgement. International Journal of Science and Mathematics Education, 13(1), 151–177. https://doi.org/10.1007/s10763-013-9497-6.

Jones, I., & Wheadon, C. (2015). Peer assessment using comparative and absolute judgement. Studies in Educational Evaluation, 47, 93–101. https://doi.org/10.1016/j.stueduc.2015.09.004.

Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1–73. https://doi.org/10.1111/jedm.12000.

Meadows, M., & Billington, L. (2005). A review of the literature on marking reliability. Manchester: AQA & the National Assessment Agency.

Murphy, R. (1982). A further report of investigations into the reliability of marking of GCE examinations. British Journal of Educational Psychology, 52, 58–63. https://doi.org/10.1111/j.2044-8279.1982.tb02503.x.

Newton, P. (1996). The reliability of marking of General Certificate of Secondary Education scripts: Mathematics and English. British Educational Research Journal, 22, 405–420. https://doi.org/10.1080/0141192960220403.

Newton, P., & Shaw, S. (2014). Validity in educational and psychological assessment. Cambridge: Sage Publications.

Pollitt, A. (2012). The method of adaptive comparative judgement. Assessment in Education: Principles, Policy & Practice, 19, 281–300. https://doi.org/10.1080/0969594X.2012.665354.

Sangwin, C. (2013). Computer Aided Assessment of Mathematics. Oxford: Oxford University Press.

Steedle, J. T., & Ferrara, S. (2016). Evaluating comparative judgment as an approach to essay scoring. Applied Measurement in Education, 29, 211–223. https://doi.org/10.1080/08957347.2016.1171769.

Steiger, J. H. (1980). Tests for comparing elements of a correlation matrix. Psychological Bulletin, 87, 245.

Suto, W. M. I., & Nadas, R. (2009). Why are some GCSE examination questions harder to mark accurately than others? Using Kelly’s Repertory Grid technique to identify relevant question features. Research Papers in Education, 24, 335–377. https://doi.org/10.1080/02671520801945925.

Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34, 273–286. https://doi.org/10.1037/h0070288.

Thurstone, L. L. (1954). The measurement of values. Psychological Review, 61, 47–58. https://doi.org/10.1037/h0060035.

van Daal, T., Lesterhuis, M., Coertjens, L., Donche, V., & De Maeyer, S. (2019). Validity of comparative judgement to assess academic writing: Examining implications of its holistic character and building on a shared consensus. Assessment in Education: Principles, Policy & Practice, 26, 59–74. https://doi.org/10.1080/0969594X.2016.1253542.

Verhavert, S., Bouwer, R., Donche, V., & Maeyer, S. D. (2019). A meta-analysis on the reliability of comparative judgement. Assessment in Education: Principles, Policy & Practice, 26, 1–22. https://doi.org/10.1080/0969594X.2019.1602027.

Wiliam, D. (2001). Reliability, validity, and all that jazz. Education 3–13: International Journal of Primary. Elementary and Early Years Education, 29, 17–21. https://doi.org/10.1080/03004270185200311.

Wiliam, D. (2010). What counts as evidence of educational achievement? The role of constructs in the pursuit of equity in assessment. Review of Research in Education, 34, 254–284. https://doi.org/10.3102/0091732X09351544.