False positives vs. false negatives: public opinion on the cost ratio in criminal justice risk assessment

Byunggu Kang1, Sishi Wu1
1School of Criminal Justice, University at Albany, SUNY, Albany, USA

Tóm tắt

We examine public attitudes towards false positives and false negatives in criminal justice risk assessment and how people’s choices differ in varying offenses and stages. We use data from a factorial survey experiment conducted with a sample of 575 Americans. Respondents were randomly assigned to different conditions in the vignette for the criminal justice process and the offense severity and were asked to choose the cost ratio. While people prefer the cost ratio with higher false positives, the degree to which they accept false positives is lower than the cost ratios of existing risk assessments. The offense severity impacts people’s acceptance of false positives. Meanwhile, numeracy influences people’s decisions on the cost ratio. To our knowledge, this is the first study to investigate public opinion on the cost ratio in risk assessments. We suggest that public opinion on the cost ratio can be an alternative way to find the ideal cost ratio.

Tài liệu tham khảo

Athey, S. (2017). Beyond prediction: Using big data for policy problems. Science, 355(6324), 483–485. https://doi.org/10.1126/science.aal4321 Awad, E., Anderson, M., Anderson, S. L., & Liao, B. (2020). An approach for combining ethical principles with public opinion to guide public policy. Artificial Intelligence, 287, Article 103349. https://doi.org/10.1016/j.artint.2020.103349 Barnes, G. C. (Principal Investigator), & Hyatt, J. M. (2012). Classifying adult probationers by forecasting future offending (Grant No. 2008-IJ-CX–0024). National Institute of Justice. https://www.ojp.gov/pdffiles1/nij/grants/238082.pdf Berinsky AJ, Huber GA, Lenz GS (2012) Evaluating online labor markets for experimental research: Amazon.com’s Mechanical Turk. Political Analysis 20(3):351–368. https://doi.org/10.1093/pan/mpr057 Berk, R. A. (2019). Machine learning risk assessments in criminal justice settings. Springer Nature Switzerland. https://doi.org/10.1007/978-3-030-02272-3 Berk, R. A., & Rossi, P. H. (1997). Just punishments: Federal guidelines and public views compared. Aldine Transaction. Berk, R. A., Sorenson, S. B., & Barnes, G. (2016). Forecasting domestic violence: A machine learning approach to help inform arraignment decisions. Journal of Empirical Legal Studies, 13(1), 94–115. https://doi.org/10.1111/jels.12098 Binns, R., Van Kleek, M., Veale, M., Lyngs, U., Zhao, J., & Shadbolt, N. (2018). “It’s reducing a human being to a percentage.” In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (pp. 1–14). New York, NY, USA: ACM. https://doi.org/10.1145/3173574.3173951 Blackstone, W. (2016). The Oxford edition of Blackstone: Commentaries on the laws of England, Vol. 4: Of Public Wrongs. (R. Paley, Ed.). Oxford University Press. https://doi.org/10.1093/actrade/9780199601028.book.1 Brauneis, R., & Goodman, E. (2017). Algorithmic transparency for the smart city. Yale Journal of Law & Technology, 20, 103–176. https://doi.org/10.2139/ssrn.3012499 Coppock, A. (2019). Generalizing from survey experiments conducted on Mechanical Turk: A replication approach. Political Science Research and Methods, 7(3), 613–628. Cambridge Core. https://doi.org/10.1017/psrm.2018.10 Coppock, A., Leeper, T. J., & Mullinix, K. J. (2018). Generalizability of heterogeneous treatment effect estimates across samples. Proceedings of the National Academy of Sciences, 115(49), 12441–12446. https://doi.org/10.1073/pnas.1808083115 DeKay, M. L. (1996). The difference between Blackstone-like error ratios and probabilistic standards of proof. Law & Social Inquiry, 21(01), 95–132. https://doi.org/10.1111/j.1747-4469.1996.tb00013.x de Keijser, J. W., de Lange, E. G., & van Wilsem, J. A. (2014). Wrongful convictions and the Blackstone ratio: An empirical analysis of public attitudes. Punishment & Society, 16(1), 32–49. https://doi.org/10.1177/1462474513504800 DeMichele, M., Baumgartner, P., Wenger, M., Barrick, K., & Comfort, M. (2020). Public safety assessment: Predictive utility and differential prediction by race in Kentucky. Criminology & Public Policy, 19(2), 409–431. https://doi.org/10.1111/1745-9133.12481 Desmarais, S. L., & Singh, J. P. (2013). Risk assessment instruments validated and implemented in correctional settings in the United States. The Council of State Governments (CSG) Justice Center. https://csgjusticecenter.org/wp-content/uploads/2020/02/Risk-Instruments-Guide.pdf Dieterich, W., Mendoza, C., & Brennan, T. (2016). COMPAS risk scales: Demonstrating accuracy equity and predictive parity. Retrieved from https://go.volarisgroup.com/rs/430-MBX-989/images/ProPublica_Commentary_Final_070616.pdf Dressel, J., & Farid, H. (2018). The accuracy, fairness, and limits of predicting recidivism. Science Advances, 4(1), Article eaao5580. https://doi.org/10.1126/sciadv.aao5580 Green, B., & Chen, Y. (2019). The principles and limits of algorithm-in-the-loop decision making. Proc. ACM Human-Computer Interaction., 3(CSCW). https://doi.org/10.1145/3359152 EPIC. (2020). Liberty at risk: Pre-trial risk assessment tools in the U.S. https://archive.epic.org/LibertyAtRiskReport.pdf Fagerlin, A., Zikmund-Fisher, B. J., Ubel, P. A., Jankovic, A., Derry, H. A., & Smith, D. M. (2007). Measuring numeracy without a math test: Development of the subjective numeracy scale. Medical Decision Making, 27(5), 672–680. https://doi.org/10.1177/0272989X07304449 Freedman, D. A. (2006). Statistical models for causation. Evaluation Review, 30(6), 691–713. https://doi.org/10.1177/0193841X06293771 Furman v. Georgia, 408 U.S. 238, 92 S. Ct. 2726, 33 L. Ed. 2d 346 (1972). Hamilton, M. (2020). Judicial gatekeeping on scientific validity with risk assessment tools. Behavioral Sciences & the Law, 38(3), 226–245. https://doi.org/10.1002/bsl.2456 Hannah-Moffat, K. (2013). Actuarial sentencing: An “unsettled” proposition. Justice Quarterly, 30(2), 270–296. https://doi.org/10.1080/07418825.2012.682603 Harris, H., Goss, J., & Gumbs, A. (2019). Pretrial risk assessment in California. Public Policy Institute of California. https://www.ppic.org/publication/pretrial-risk-assessment-in-california Harrison, G., Hanson, J., Jacinto, C., Ramirez, J., & Ur, B. (2020, January 27 - 30). An empirical study on the perceived fairness of realistic, imperfect machine learning models. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 392–402. https://doi.org/10.1145/3351095.3372831 Hartmann, K., & Wenzelburger, G. (2021). Uncertainty, risk and the use of algorithms in policy decisions: A case study on criminal justice in the USA. Policy Sciences. https://doi.org/10.1007/s11077-020-09414-y Kleinberg, J., Mullainathan, S., & Raghavan, M. (2016). Inherent trade-offs in the fair determination of risk scores. arXiv, Article 1609.05807. http://arxiv.org/abs/1609.05807 Kraemer, F., van Overveld, K., & Peterson, M. (2011). Is there an ethics of algorithms? Ethics and Information Technology, 13(3), 251–260. https://doi.org/10.1007/s10676-010-9233-7 Larson, J., Mattu, S., Kirchner, L., & Angwin, J. (2016). How we analyzed the COMPAS recidivism algorithm. Retrieved May 12, 2021, from https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm Lempert, R. O. (1976). Modeling relevance essay. Michigan Law Review, 75(Issues 5 & 6), 1021–1057. https://repository.law.umich.edu/cgi/viewcontent.cgi?article=4052&context=mlr Lillquist, E. (2002). Recasting reasonable doubt: Decision theory and the virtues of variability. U.C. Davis Law Review, 36(1), 85–198. https://https://doi.org/10.2139/ssrn.349820 Lin, W. (2013). Agnostic notes on regression adjustments to experimental data: Reexamining Freedman’s critique. The Annals of Applied Statistics, 7(1), 295–318. https://doi.org/10.1214/12-AOAS583 Lipkus, I. M., Samsa, G., & Rimer, B. K. (2001). General performance on a numeracy scale among highly educated samples. Medical Decision Making, 21(1), 37–44. https://doi.org/10.1177/0272989X0102100105 Long, J. S. (1997). Ordinal outcomes. SAGE Publications Inc. Monahan, J., & Silver, E. (2003). Judicial decision thresholds for violence risk management. International Journal of Forensic Mental Health, 2(1), 1–6. https://doi.org/10.1080/14999013.2003.10471174 Monahan, J., Metz, A. L., & Garrett, B. L. (2018). Judicial appraisals of risk assessment in sentencing. Behavioral Sciences & the Law, 36(5), 565–575. https://doi.org/10.1002/bsl.2380 Mossman, D., & Hart, K. J. (1993). How bad is civil commitment? A study of attitudes toward violence and involuntary hospitalization. Bulletin of the American Academy of Psychiary and the Law, 21(2), 181–194. Mullinix, K. J., Leeper, T. J., Druckman, J. N., & Freese, J. (2015). The generalizability of survey experiments. Journal of Experimental Political Science, 2(2), 109–138. https://doi.org/10.1017/XPS.2015.19 Cambridge Core. Nagel S., Lamm D., Neef M. (1981) Decision theory and juror decision-making. In: Sales B.D. (eds) The trial process. Perspectives in Law & Psychology, vol 2. Springer, Boston, MA. https://doi.org/10.1007/978-1-4684-3767-6_10 Netter, B. (2007). Using groups statistics to sentence individual criminals: An ethical and statistical critique of the Virginia Risk Assessment Program. Journal of Criminal Law and Criminology, 97(3), 699–729. https://www.proquest.com/scholarly-journals/using-group-statistics-sentence-individual/docview/218408627/se-2?accountid=14166 Oswald, M., Grace, J., Urwin, S., & Barnes, G. C. (2018). Algorithmic risk assessment policing models: Lessons from the Durham HART model and ‘Experimental’ proportionality. Information & Communications Technology Law, 27(2), 223–250. https://doi.org/10.1080/13600834.2018.1458455 Robinson, P. H. (2008). Distributive principles of criminal law. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780195365757.001.0001 Robinson, P. H., Goodwin, G. P., & Reisig, M. D. (2010). The disutility of injustice. New York University Law Review, 85, 1940–2033. https://asu.pure.elsevier.com/en/publications/the-disutility-of-injustice Rudin, C., Wang, C., & Coker, B. (2020). The age of secrecy and unfairness in recidivism prediction. Harvard Data Science Review, 2(1). https://doi.org/10.1162/99608f92.6ed64b30 Saxena, N. A., Huang, K., DeFilippis, E., Radanovic, G., Parkes, D. C., & Liu, Y. (2019). How do fairness definitions fare? Examining public attitudes towards algorithmic definitions of fairness. Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 99–106. https://doi.org/10.1145/3306618.3314248 Scurich, N. (2015). Criminal justice policy preferences: Blackstone ratios and the veil of ignorance. Stanford Law & Policy Review., 26, 23–35. Scurich, N. (2016). Structured risk assessment and legal decision-making. In M. Miller & B. Bornstein (Eds.), Advances in psychology and law (Vol. 1, pp. 159–183). Springer, Cham. https://doi.org/10.1007/978-3-319-29406-3_5 Scurich, N. (2018). The case against categorical risk estimates. Behavioral Sciences & the Law, 36(5), 554–564. https://doi.org/10.1002/bsl.2382 Scurich, N., & Krauss, D. A. (2020). Public’s views of risk assessment algorithms and pretrial decision making. Psychology, Public Policy, and Law, 26(1), 1–9. https://doi.org/10.1037/law0000219 Simons, D. J., & Chabris, C. F. (2012). Common (mis)beliefs about memory: A replication and comparison of telephone and Mechanical Turk survey methods. PLOS ONE, 7(12), Article e51876. https://doi.org/10.1371/journal.pone.0051876 Sommer, R., Sommer, B. A., & Heidmets, M. (1991). Release of the guilty to protect the innocent. Criminal Justice and Behavior, 18(4), 480–490. https://doi.org/10.1177/0093854891018004008 Snowberg, E., & Yariv, L. (2021). Testing the waters: Behavior across participant pools. American Economic Review, 111(2), 687–719. https://doi.org/10.1257/aer.20181065 Srivastava, M., Heidari, H., & Krause, A. (2019). Mathematical notions vs. human perception of fairness: A descriptive approach to fairness for machine learning. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2459–2468. https://doi.org/10.1145/3292500.3330664 Weinberg, J., Freese, J., & McElhattan, D. (2014). Comparing data characteristics and results of an online factorial survey between a population-based and a crowdsource-recruited sample. Sociological Science, 1(19), 292–310. https://doi.org/10.15195/v1.a19 U.S. v. Booker, 543 U.S. 220, 125 S.Ct. 738, 160 L.Ed.2d 621 (2005). Wenk, E., Halatyn, T., & Springer, S. (1976). The diagnostic parole prediction index (No. 75-NI-99–0039; pp. 1–73). Research Center National Council on Crime and Delinquency. https://www.ojp.gov/pdffiles1/Digitization/39861NCJRS.pdf Winship, C., & Mare, R. D. (1984). Regression models with ordinal variables. American Sociological Review, 49(4), 512. https://doi.org/10.2307/2095465 Xiong, M., Greenleaf, R. G., & Goldschmidt, J. (2017). Citizen attitudes toward errors in criminal justice: Implications of the declining acceptance of Blackstone’s ratio. International Journal of Law, Crime and Justice, 48, 14–26. https://doi.org/10.1016/j.ijlcj.2016.10.001