Investigating the characteristics of language test specifications and item writer guidelines, and their effect on item development: a mixed-method case study
Tóm tắt
This study discusses the characteristics of test specifications (specs) and item writer guidelines (IWGs), their role in item development of English as a Second Language (ESL) reading tests, and the use of the CEFR for specs development. This mixed-method study analyzed specs, IWGs, tests, and the Pearson Test of English General test statistics. Moreover, interviews and focus groups were conducted with the specs’ developers, IWGs, and item writers. The findings show no unique way of conceptualizing specs and IWGs. Moreover, translating the CEFR reading descriptors into specs is a challenging task. However, results from the judgmental study and item statistics suggest that the investigated specs and IWGs facilitated the development of good-quality items at a certain difficulty level. This study reveals the potential role of specs and IWGs in establishing test validity. This research contributes to understanding the under-researched area of specs and IWGs and shows the type of information required for effective item writing and ways of enhancing the validity and reliability of tests. Practical and theoretical suggestions and future research have also been identified.
Tài liệu tham khảo
Alderson JC, Cseresznyés M. (2005). Reading and use of English. In J. C. Alderson (Eds.), Into Europe: Prepare for modern English exams (Vol. 1, pp. 1–297). Available from http://www.lancs.ac.uk/fass/projects/examreform/into_europe/Reading_and_Use_of_English.pdf
Alderson, J. C. (2000). Assessing reading. Cambridge University Press. https://doi.org/10.1017/CBO9780511732935
Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation. Cambridge University Press.
Alderson, J. C., Figueras, N., Kuijper, H., Nold, G., Takala, S., & Tardieu, C. (2006). Analyzing reading and listening tests about the Common European Framework of Reference. The experience of the Dutch CEFR construct project. Language Assessment Quarterly, 3(1), 3–30. https://doi.org/10.1207/s15434311laq0301_2
Arhin, A. K., Essuman, J., & Arhin, E. (2021). Analysis of item writing flaws in a communications skills test in a Ghanaian University. Afr J Teach Educ, 10(2), 121–143. https://doi.org/10.21083/ajote.v10i2.6762
Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice. Oxford University Press.
Bachman, L. F., & Palmer, A. S. (2010). Language assessment in practice. Oxford University Press.
Belyazid S. (1996). Task-based language test specifications designed for an adult TEFL context in Morocco. Unpublished MA thesis, University of Illinois at Urbana-Champaign, USA.
Cho D. (1995). The effect of specificity of language test specifications on item construction. Unpublished PhD thesis, University of Illinois at Urbana-Champaign, USA.
Davidson, F. (2012b). Test specifications and criterion-referenced assessment. In G. Fulcher & F. Davidson (Eds.), The Routledge handbook of language testing (pp. 197–207). New York: Routledge. https://doi.org/10.4324/9780203181287.ch13
Davidson, F. (2012). Releasability of language test specifications. Japan Language Testing Association (JLTA) Journal, 15, 1–23.
Davidson, F., & Fulcher, G. (2007). The Common European Framework of Reference (CEFR) and the design of language tests: a matter of effect. Language Teaching, 40, 231–241. https://doi.org/10.1017/S0261444807004351
Davidson, F., & Lynch, B. K. (2002). Test craft. Yale University Press.
Fulcher, G. (2021a). Language Assessment Literacy in a Learning-Oriented Assessment Framework. In A. Gebril (Ed.), Learning-oriented assessment: Putting theory into practice (pp. 254–270). New York: Routledge. https://doi.org/10.4324/9781003014102
Fulcher, G., & Davidson, F. (2007). Language testing and assessment: An advanced resource book. Routledge.
Fulcher, G., Panahi, A., & Mohebbi, H. (2022). Glenn Fulcher’s thirty-five years of contribution to language testing and assessment: a systematic review. Language Teaching Research Quarterly, 29, 20–56. https://doi.org/10.32038/ltrq.2022.29.03
Green, A., & Hawkey, R. (2011). Re-fitting for a different purpose: a case study of item writer practices in adapting source texts for a test of academic reading. Language Testing, 29(1), 109. https://doi.org/10.1177/0265532211413445
Gutiérrez Baffil, T. G., & Collada Peña, I. D. L. C. (2022). Assessing writing in English in Cuban higher education. Transformación, 18(1), 238–252.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15(3), 309–334. https://doi.org/10.1207/S15324818AME1503_5
Haladyna, T. M., & Rodriguez, M. C. (2021). Using full-information item analysis to improve item quality. Educational Assessment, 26(3), 198–211. https://doi.org/10.1287/ited.2022.0274
Hambleton, R. K., & Eignor, D. (1979). A practitioner’s guide to criterion-referenced test development, validation, and test score usage (Report No. 70) (2nd ed.). University of Massachusetts.
Harsch, C., & Seyferth, S. (2019). Marrying achievement with proficiency in developing and validating a local CEFR-based writing checklist. Assessing Writing, 43, 10–43. https://doi.org/10.32038/ltrq.2021.26.02
Hughes, A. (1989). Testing for language teachers. Cambridge University Press.
Huhta, A., Luoma, S., Oscarson, M., Sajavaara, K., Takala, S., & Teasdale, A. (2002). A diagnostic language assessment system for adult learners. J. C. Alderson (Ed.), Common European Framework of Reference for Languages: Learning, teaching, assessment. Case studies (pp. 130–145). Council of Europe.
Jin Y. (2021). Test specifications. In Fulcher, G &. Hardling, L (Eds.), The Routledge handbook of language testing (pp.271–288). Taylor & Frances.
Jones, N. (2002). Relating the ALTE framework to the Common European Framework of Reference. J. C. Alderson (Ed.), Common European Framework of Reference for Languages: Learning, teaching, assessment. Case studies (pp. 167–183). Council of Europe.
Kennedy, L. C. (2007). Expanding test specifications with rhetorical genre studies and activity theory analyses. Unpublished MA thesis, Carleton University, Ottawa, Ontario.
Kim, J., Chi, Y., Huensch, A., Jun, H., Li, H., & Roullion, V. (2010). A case study on an item-writing process: use of test specifications, nature of group dynamics, and individual item writers’ characteristics. Language Assessment Quarterly, 7, 160–174.
Li, J. (2006). Introducing Audit Trails to the World of Language Testing. Unpublished MA thesis, University of Illinois at Urbana-Champaign, USA.
Norris, J. M., Brown, J. D., Hudson, T., & Yoshioka, J. (1998). Designing second language performance assessments. Honolulu: Second language teaching & curriculum Centre, University of Hawaii/University of Hawaii Press.
Osterlind, S. J. (1998). Constructing test items: Multiple-choice, constructed-response, performance, and other formats (2nd ed.). Kluwer Academic Publishers.
Pearson. (2012). Test Centre handbook Available from http://pearsonpte.com/TestCenters/Pages/Resources.aspx
Popham, W. J. (1978). Criterion-referenced measurement. Prentice-Hall.
Roid, G. H., & Haladyna, T. M. (1982). A technology for test-item-writing. Academic Press.
Rossi, O., & Brunfaut, T. (2021). Text authenticity in listening assessment: can item writers be trained to produce authentic-sounding texts? Language Assessment Quarterly, 18(4), 398–418. https://doi.org/10.1080/15434303.2021.1895162
Salisbury, K. (2005). The edge of expertise: towards an understanding of listening test-item-writing as professional practice: unpublished PhD thesis, King's College, University of London. https://doi.org/10.1002/9781118784235.eelt0981
Shi, D. (2021). Item writing and item writers. In Fulcher, G &. Hardling, L (Eds.), The Routledge handbook of language testing (pp.341–356). Taylor & Frances.
Stake, R. E. (2008). Qualitative case studies. In N. K. Denzin & Y. S. Lincoln (Eds.), Strategies of qualitative inquiry (pp. 119–149). Sage.
Tashakkori, A., & Teddlie, C. (2003). The past and future of mixed-methods research: From data triangulation to mixed model designs. In A. Tashakkori & C. Teddlie (Eds.), Handbook of mixed methods in social and behavioral research (pp. 671–702). Sage Publications. https://doi.org/10.4135/9781506335193
Tinkelman, S. N. (1971). Planning the objective test. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 46–80). American Council on Education.
Weir, C. J. (2005). Limitations of the Common European Framework for developing comparable examinations and tests. Language Testing, pp. 22, 281–300. https://doi.org/10.1191/0265532205lt309oa
Bachman, L. F. (2002). Some reflections on task-based language performance assessment. Language Testing, 19(4), 453–476. https://doi.org/10.1191/0265532202lt240oa.
Little, T. D., Simpson, R. B., & O'Connor, P. (2002). Statistical methods for research in education and psychology (Third edition). Pearson Education.
Morgan, D.L. (1997). Focus Groups as Qualitative Research. 2nd Edition. Thousand Oaks: Sage.
Morrow, K. (Ed.). (2004). Insights from the Common European Framework. Oxford University Press.