Automated essay scoring: Psychometric guidelines and practices

Assessing Writing - Tập 18 - Trang 25-39 - 2013
Chaitanya Ramineni1, David M. Williamson1
1Educational Testing Service, Rosedale Rd, Princeton 08541, NJ, USA

Tài liệu tham khảo

Anson, 2003, Responding to and assessing student writing: The uses and limits of technology, 234 Attali, 2011 Attali, 2010, Performance of a generic approach in automated essay scoring, Journal of Technology Learning, and Assessment, 10 Attali, 2006, Automated essay scoring with e-rater v.2, Journal of Technology Learning, and Assessment, 4 Bennett, 2006, Moving the field forward: Some thoughts on validity and automated scoring, 403 Bennett, 2011 Bennett, 1998, Validity and automated scoring: It's not only the scoring, Educational Measurement: Issues and Practice, 17, 9, 10.1111/j.1745-3992.1998.tb00631.x Bernstein, 2000, Two experiments on automatic scoring of spoken language proficiency, 57 Braun, 1988, Understanding scoring reliability: Experiments in calibrating essay readers, Journal of Educational Statistics, 13, 1, 10.2307/1164948 Braun, 2006, Rule-based methods for automatic scoring: Application in a licensing context, 83 Breland, 2004 Bridgeman, 2012, Comparison of human and machine scoring of essays: Differences by gender, ethnicity, and country, Applied Measurement in Education, 25, 27, 10.1080/08957347.2012.635502 Burstein, 2003, The e-rater® scoring engine: Automated essay scoring with natural language processing, 113 Callear, 2001, CAA of short non-MCQ answers, 55 Chevalier, 2007, Speech interaction with Saybot player, a CALL software to help Chinese learners of English Cheville, 2004, Automated scoring technologies and the rising influence of error, English Journal, 93, 47, 10.2307/4128980 Clauser, 2002, Validity issues for performance-based tests scored with computer-automated scoring systems, Applied Measurement in Education, 15, 413, 10.1207/S15324818AME1504_05 Cohen, 1988 Conference on College Composition and Communication (2004). CCCC position statement on teaching, learning, and assessing writing in digital environments. Retrieved from http://www.ncte.org/cccc/resources/positions/123773.htm. Cumming, 2002, Decision-making while rating ESL/EFL writing tasks: A descriptive framework, Modern Language Journal, 86, 67, 10.1111/1540-4781.00137 Deane, 2013, On the relation between automated essay scoring and modern views of the writing construct, Assessing Writing, 18, 7, 10.1016/j.asw.2012.10.002 DeVore, 2002, Considerations in the development of accounting simulations Dikli, 2006, An overview of automated scoring of essays, Journal of Technology Learning and Assessment, 5 Elliot, 2005 Fleiss, 1973, The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability, Educational and Psychological Measurement, 33, 613, 10.1177/001316447303300309 Franco, 2000, The SRI EduSpeak™ system: Recognition and pronunciation scoring for language learning, 123 Hales, 1975, The effect of the quality of preceding responses on the grades assigned to subsequent responses to an essay question, Journal of Educational Measurement, 12, 115, 10.1111/j.1745-3984.1975.tb01014.x Hamp-Lyons, 2007, Worrying about rating, Assessing Writing, 12, 1, 10.1016/j.asw.2007.05.002 Herrington, 2001, What happens when machines read our students’ writing?, College English, 63, 480, 10.2307/378891 Hughes, 1984, The use of model essays to reduce context effects in essay scoring, Journal of Educational Measurement, 21, 277, 10.1111/j.1745-3984.1984.tb01034.x Huot, 2002 Huot, 2006, Writing assessment: A techno-history, 417 Landauer, 2003, Automated scoring and annotation of essays with the Intelligent Essay Assessor, 87 Leacock, 2003, C-rater: Scoring of short-answer questions, Computers and the Humanities, 37, 389, 10.1023/A:1025779619903 Leckie, 2011, Rater effects on essay scoring: A multilevel analysis of severity drift, central tendency, and rater experience, Journal of Educational Measurement, 48, 399, 10.1111/j.1745-3984.2011.00152.x Lunz, 1990, Measuring the impact of judge severity on examination scores, Applied Measurement in Education, 3, 331, 10.1207/s15324818ame0304_3 Margolis, 2006, A regression-based procedure for automated scoring of a complex medical performance assessment, 123 Mitchell, 2002, Towards robust computerized marking of free-text responses, 233 National Governors Association Center for Best Practices & Council of Chief State School Officers, 2010 Page, 1966, The imminence of grading essays by computer, Phi Delta Kappan, 48, 238 Page, 1968, The use of the computer in analyzing student essays, International Review of Education, 14, 210, 10.1007/BF01419938 Page, 2003, Project essay grade: PEG, 43 Pearson (2009, March). PTE academic automated scoring. Retrieved from http://www.pearsonpte.com/SiteCollectionDocuments/AutomatedScoringUS.pdf Powers, 2002, Comparing the validity of automated and human scoring of essays, Educational Computing Research, 26, 407, 10.2190/CX92-7WKV-N7WC-JL0A Powers, 2002, Stumping e-rater: Challenging the validity of automated essay scoring, Computers in Human Behavior, 18, 103, 10.1016/S0747-5632(01)00052-8 Quinlan, 2009 Ramineni, 2013, Validating automated essay scoring for online writing placement, Assessing Writing, 18, 40, 10.1016/j.asw.2012.10.005 Ramineni, 2012 Ramineni, 2012 Risse, 2007, Testing and assessing mathematical skills by a script based system Rudner, 1992, Reducing errors due to the use of judges, Practical assessment, Research and Evaluation, 3 Rudner, 2006, An evaluation of IntelliMetric™ essay scoring system, The Journal of Technology, Learning and Assessment, 4 Saal, 1980, Rating the ratings: Assessing the psychometric quality of rating data, Psychological Bulletin, 88, 413, 10.1037/0033-2909.88.2.413 Sargeant, 2004, A human–computer collaborative approach to the marking of free text answers, 361 Shermis, 2003 Shermis, 2006, Applications of computers in assessment and analysis of writing, 403 Shermis, 2008, How important is content in the ratings of essay assessments?, Assessment in Education: Principles, Policy and Practice, 15, 91, 10.1080/09695940701876219 Singley, 1998 Spandel, 2005 Stalnaker, 1936, The problem of the English examination, Educational Record, 17, 35 Sukkarieh, 2005, Information extraction and machine learning: Auto-marking short free text responses to science questions, 629 Valenti, 2003, An overview of current research on automated essay grading, Journal of Information Technology Education, 2, 319, 10.28945/331 Vaughan, 1991, Holistic assessment: What goes on in the rater's mind?, 11 Warschauer, 2006, Automated writing evaluation: Defining the classroom research agenda, Language Teaching Research, 10, 157, 10.1191/1362168806lr190oa Weigle, 1999, Investigating rater/prompt interactions in writing assessment: Quantitative and qualitative approaches, Assessing Writing, 6, 145, 10.1016/S1075-2935(00)00010-6 White, 1994 Williamson, 1999, ’Mental model’ comparison of automated and human scoring, Journal of Educational Measurement, 36, 158, 10.1111/j.1745-3984.1999.tb00552.x Williamson, 2010, Automated Scoring for the Assessment of Common Core Standards Williamson, 2012, A framework for evaluation and use of automated scoring, Educational Measurement: Issues and Practices, 31, 2, 10.1111/j.1745-3992.2011.00223.x Wiseman, 2012, Rater effects: Ego engagement in rater decision-making, Assessing Writing, 17, 150, 10.1016/j.asw.2011.12.001 Wolfe, 1998, Cognitive differences in proficient and non- proficient essay scorers, Written Communication, 15, 469, 10.1177/0741088398015004002 Xi, 2008 Yang, 2002, A review of strategies for validating computer-automated scoring, Applied Measurement in Education, 15, 391, 10.1207/S15324818AME1504_04 Zechner, 2006, Towards automatic scoring of non-native spontaneous speech, 216