Validating the Interpretations and Uses of Test Scores

Journal of Educational Measurement - Tập 50 Số 1 - Trang 1-73 - 2013
Michael T. Kane1
1Educational Testing Service

Tóm tắt

To validate an interpretation or use of test scores is to evaluate the plausibility of the claims based on the scores. An argument‐based approach to validation suggests that the claims based on the test scores be outlined as an argument that specifies the inferences and supporting assumptions needed to get from test responses to score‐based interpretations and uses. Validation then can be thought of as an evaluation of the coherence and completeness of this interpretation/use argument and of the plausibility of its inferences and assumptions. In outlining the argument‐based approach to validation, this paper makes eight general points. First, it is the proposed score interpretations and uses that are validated and not the test or the test scores. Second, the validity of a proposed interpretation or use depends on how well the evidence supports the claims being made. Third, more‐ambitious claims require more support than less‐ambitious claims. Fourth, more‐ambitious claims (e.g., construct interpretations) tend to be more useful than less‐ambitious claims, but they are also harder to validate. Fifth, interpretations and uses can change over time in response to new needs and new understandings leading to changes in the evidence needed for validation. Sixth, the evaluation of score uses requires an evaluation of the consequences of the proposed uses; negative consequences can render a score use unacceptable. Seventh, the rejection of a score use does not necessarily invalidate a prior, underlying score interpretation. Eighth, the validation of the score interpretation on which a score use is based does not validate the score use.

Từ khóa


Tài liệu tham khảo

10.1093/applin/14.2.115

American Psychological Association, American Educational Research Association, & National Council on Measurement in Education, 1966, Standards for educational and psychological tests and manuals

American Psychological Association, American Educational Research Association, & National Council on Measurement in Education, 1974, Standards for educational and psychological tests and manuals

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1985, Standards for educational and psychological testing

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1999, Standards for educational and psychological testing

10.1146/annurev.ps.37.020186.000245

Angoff W. H., 1988, Test validity, 9

10.1111/j.1745-3992.2002.tb00095.x

10.1207/s15434311laq0201_1

Bachman L., 2010, Language assessment in practice: Developing language assessments and justifying their use in the real world

Blair J., 1995, Reasoning in evaluation: Inferential links and leaps, 71

Bonner S., 2005, Paper presented at the meeting of the National Council on Measurement in Education

Boorstin D., 1983, The discoverers

Borsboom D., 2009, The concept of validity, 135

10.1037/0033-295X.111.4.1061

10.1111/j.1745-3984.2001.tb01129.x

10.1007/978-1-4757-3456-0

Bridgeman P., 1927, The logic of modern physics

10.1037/h0061548

10.1111/j.1744-6570.1949.tb01397.x

Camilli G., 2006, Educational measurement, 221

10.1037/h0046016

10.1111/j.1744-6570.1983.tb02234.x

Carroll L., 2010, Through the looking glass. (Original work published 1871.)

Cascio W., 1993, Personnel selection in organizations, 310

Cascio W., 2010, Adverse impact: Implications for organizational staffing and high stakes selection, 271

10.1017/S0267190599190135

Chapelle C. A., 2008, Building a validity argument for the test of English as a foreign language

10.1111/j.1745-3992.2009.00165.x

10.1037/a0026975

10.1177/01466210022031778

Clauser B., 2006, Educational measurement, 701

Cole N. S., 1989, Educational measurement, 201

Cook T., 1979, Quasi‐experimentation: Design and analysis issues for field settings

Cronbach L. J., 1971, Educational measurement, 443

Cronbach L. J., 1980, Validity on parole: How can we go straight, New Directions for Testing and Measurement: Measuring Achievement Over a Decade, 5, 99

Cronbach L. J., 1982, Designing evaluations of educational and social programs

Cronbach L. J., 1988, Test validity, 3

Cronbach L. J., 1989, Intelligence: Measurement, theory, and public policy, 147

Cronbach L. J., 1965, Psychological tests and personnel decisions

10.1126/science.176.4036.785

10.1037/h0040957

Cronbach L. J., 1977, Aptitudes and instructional methods: A handbook for research on interactions

10.3102/00346543058004438

10.1080/0969594960030302

Cureton E. E., 1951, Educational measurement, 621

Debra P. v. Turlington(1981).644 F. 2d 397 (5th Cir. ) 564 F. Supp. 177 (M. D. Fla. 1983).

10.1037/0021-9010.92.5.1380

10.1037/h0045478

Ebel R., 1966, Testing problems in perspective: Twenty‐fifth anniversary volume of topical readings from the invitational conference in testing problems, 18

10.1037/0033-2909.93.1.179

10.1007/BF02294171

10.1037/1082-989X.3.3.380

10.1017/CBO9780511807947.020

Ennis R. H., 1973, Philosophy of educational research, 650

Equal Employment Opportunity Commission (EEOC), Civil Service Commission, Department of Labor, & Department of Justice, 1979, Adoption by four agencies of Uniform Guidelines on employee selection procedures, Federal Register, 43, 38290

10.1002/jhbs.20079

Feldt L. S., 1989, Educational measurement, 105

Flockton L., 2002, Social studies assessment results 2001

Frederiksen N., 1984, The real test bias, American Psychologist, 39, 193, 10.1037/0003-066X.39.3.193

10.3102/0013189X018009027

10.4324/9780203449066

Galison P., 1987, How do experiments end?

10.1111/j.1745-3992.1998.tb00828.x

Griggs v. Duke Power Company(1971).401 U.S.424.

10.1037/h0037624

10.1177/014662167700100103

10.1037/0735-7028.11.3.385

Guion R., 1998, Assessment, measurement, and prediction for personnel decisions

10.1037/13240-000

Gutman A., 2005, Employment discrimination litigation, 20

10.1111/j.1745-3992.1999.tb00276.x

Haertel E. H., 2006, Educational measurement, 65

10.3102/0091732X027001025

Hansen H., 1995, Fallacies, classical and contemporary readings

10.1111/j.1745-3992.1993.tb00550.x

Hershey J., 2001, Wharton on making decisions, 225

Heubert J. P., 1999, High stakes: Testing for tracking, promotion, and graduation

House E. R., 1980, Evaluating with validity

10.1249/00003677-199401000-00005

Jaeger R., 1989, Educational measurement, 485

10.1177/014662168200600201

Kane M., 1986, The future of testing, 145

10.1037/0033-2909.112.3.527

10.1207/s15324818ame0904_4

10.1111/j.1745-3984.2002.tb01141.x

10.1111/j.1745-3992.2002.tb00083.x

10.3917/rhu.016.0017

Kane M., 2010, Errors of measurement, theory, and public policy

10.1111/j.1745-3984.2010.00128.x

10.1111/j.1745-3992.1999.tb00010.x

Kelley T., 1927, Interpretation of educational measurements

Kolen M., 2006, Educational measurement, 155

Koretz D. M., 2006, Educational measurement, 531

10.1017/CBO9781139171434.009

10.1111/j.1745-3992.1998.tb00830.x

10.1111/j.1745-3992.2002.tb00082.x

Lane S., 2006, Educational measurement, 387

10.1111/j.1745-3992.1997.tb00587.x

10.1111/j.1745-3992.1998.tb00831.x

Linn R. L., 2005, Conflicting demands of No Child Left Behind and state systems: Mixed messages about school performance, Education Policy Analysis Archives, 13

Linn R. L., 2009, The concept of validity, 195

10.3102/0013189X07311286

Loevinger J., 1957, Objective tests as instruments of psychological theory, Psychological Reports, Monograph Supplement, 3, 635

Lord F. M., 1968, Statistical theories of mental test scores

Madaus G. F., 1988, Critical issues in curriculum, 83

Maguire T., 1994, Construct validity and achievement assessment, The Alberta Journal of Educational Research, 40, 109

Marion S., 2009, Alternate assessment: Proceedings from the 8th Annual MARCES Conference, 113

McNeil L., 2005, Leaving children behind: How “Texas‐style” accountability fails Latino youth, 57

Meehl P., 1950, On the circularity of the law of effect, Psychological Bulletin, 47, 52, 10.1037/h0058557

10.1037/11281-000

10.1111/j.1745-3992.1997.tb00588.x

10.1037/0003-066X.30.10.955

10.1037/0003-066X.35.11.1012

10.3102/0013189X010009009

10.1111/j.1745-3992.1982.tb00660.x

Messick S., 1988, Test validity, 33

Messick S., 1989, Educational measurement, 13

10.3102/0013189X023002013

Mill J. S., 2002, A system of logic

10.1111/j.1745-3984.1996.tb00498.x

10.3102/10769986029002241

Mislevy R., 2006, Educational measurement, 257

Mislevy R., 2009, The concept of validity, 83

10.1207/S15366359MEA0101_02

10.3102/00346543062003229

10.3102/0013189X023002005

Moss P., 1995, Themes and variations in validity theory, Educational Measurement: Issues and Practice, 4, 5, 10.1111/j.1745-3992.1995.tb00854.x

10.1111/j.1745-3992.1998.tb00826.x

10.3102/0013189X07311608

National Research Council, 2001, Knowing what students know: The science and design of educational assessment

National Research Council, 2007, Lessons learned about testing: Ten years of work at the National Research Council

No Child Left Behind (NCLB) Act(2002).Pub. L. No. 107–110 115 Stat. 1435.

Pellegrino J. W., 1999, Review of Research on Education, 307

10.1007/978-0-387-49771-6_4

10.1017/CBO9780511800917

Phillips D., 2007, Evidence and decision making, 376

10.1007/978-94-017-0783-1

10.1111/j.1745-3992.1997.tb00586.x

Popper K. R., 1962, Conjecture and refutation: The growth of scientific knowledge

10.1111/j.1744-6570.2008.00108.x

10.1111/j.1745-3992.1998.tb00827.x

10.1111/j.1745-3992.2002.tb00080.x

Sackett P. R., 1998, Beyond multiple choice: Evaluating alternatives for traditional testing for selection, 113

Sackett P., 2010, Adverse impact: Implications for organizational staffing and high stakes selection, 453

10.1007/BF00143275

Scriven M., 2002, The role of constructs in psychological and educational measurement, 255

Shadish W. R., 2002, Experimental and quasi‐experimental designs for generalized causal inference

Shavelson R. J., 1991, Generalizability theory: A primer

10.2307/1167347

10.1111/j.1745-3992.1997.tb00585.x

10.1023/A:1006985528729

Sireci S., 2009, The concept of validity, 19

10.1111/j.1745-3992.2000.tb00019.x

10.1111/j.1745-3992.2006.00065.x

10.3102/00346543075004457

10.2307/1412107

10.1111/j.1745-3992.1998.tb00829.x

10.1037/h0057079

10.1111/j.1744-6570.1977.tb02320.x

10.1037/h0070314

Toulmin S., 1958, The uses of argument

Toulmin S., 2001, Return to reason

10.1037/0003-066X.47.2.244

Walton D., 1989, Informal logic: A handbook for critical argumentation

Wigdor A., 1982, Ability testing: Uses, consequences, and controversies

Wiley D., 1991, Improving inquiry in social science, 75

Willingham W., 1997, Gender and fair assessment

10.1177/0265532209349465

Yen W., 2006, Educational measurement, 111

Zumbo B., 2007, Handbook of statistics, Vol. 26: Psychometrics, 45

Zumbo B., 2009, The concept of validity, 65