Quality control questions on Amazon’s Mechanical Turk (MTurk): A randomized trial of impact on the USAUDIT, PHQ-9, and GAD-7
Tóm tắt
Crowdsourced psychological and other biobehavioral research using platforms like Amazon’s Mechanical Turk (MTurk) is increasingly common – but has proliferated more rapidly than studies to establish data quality best practices. Thus, this study investigated whether outcome scores for three common screening tools would be significantly different among MTurk workers who were subject to different sets of quality control checks. We conducted a single-stage, randomized controlled trial with equal allocation to each of four study arms: Arm 1 (Control Arm), Arm 2 (Bot/VPN Check), Arm 3 (Truthfulness/Attention Check), and Arm 4 (Stringent Arm – All Checks). Data collection was completed in Qualtrics, to which participants were referred from MTurk. Subjects (
Từ khóa
Tài liệu tham khảo
Adesida, P. O. (2020). Suicide capability scale development: PPCS and CSAS (Publication Number 28148178) [Azusa Pacific University]. ProQuest.
Agley, J., & Xiao, Y. (2020). Misinformation about COVID-19: evidence for differential latent profiles and a strong association with trust in science. BMC Public Health, 21, 89. https://doi.org/10.1186/s12889-020-10103-x
Agley, J., Xiao, Y., & Nolan, R. (2020). Amazon MTurk for research: Improving data quality. 10.17605/OSF.IO/SV9EA
Aguinis, H., Villamor, I., & Ramani, R. S. (2020). MTurk research: Review and recommendations. Journal of Management, 46(4), 823–837.
Amazon.com. (2020). Amazon Mechanical Turk: Access a global, on-demand, 24x7 workforce. Amazon. Retrieved November 25, 2020 from https://MTurk.com
Angus, D. J., Pickering, D., Keen, B., & Blaszczynski, A. (2021). Study framing influences crowdsourced rates of problem gambling and alcohol use disorder. Psychology of Addictive Behaviors, https://doi.org/10.1037/adb0000687.
Arditte, K.A., Çek, D., Shaw, A.M., & Timpano, K.R. (2016). The importance of assessing clinical phenomena in Mechanical Turk research. Psychological Assessment, 28(6), 684–691.
Barends, A. J., & Vries, R. E. D. (2019). Noncompliant responding: Comparing exclusion criteria in MTurk personality research to improve data quality. Personality and Individual Differences, 143, 84–89.
Bauer, B. W., Larsen, K. L., Caulfield, N., Elder, D. D., Jordan, S. S., & Capron, D. W. (2020). Review of best practice recommendations for ensuring high data quality with Amazon's Mechanical Turk. PsyArxiv, https://psyarxiv.com/m78sf/download?format=pdf.
Berinsky, A. J., Margolis, M. F., & Sances, M. W. (2014). Separating the shirkers from the workers? Making sure respondents pay attention on self-administered surveys. American Journal of Political Science, 58(3), 739–753. https://doi.org/10.1111/ajps.12081
Blanca, M. J., Alarcón, R., Arnau, J., Bono, R., & Bendayan, R. (2017). Non-normal data: Is ANOVA still a valid option? Psicothema, 29(4), 552–557.
Brenner, J. (2020). Examining the stage progression of employee burnout (Publication Number 28148693) [Hofstra University]. ProQuest.
Buchanan, E. M., & Scofield, J. E. (2018). Methods to detect low quality data and its implication for psychological research. Behavior Research Methods, 50, 2586–2596. https://doi.org/10.3758/s13428-018-1035-6
Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon's Mechanical Turk: A New Source of Inexpensive, Yet High-Quality Data? Perspectives on Psychological Science, 6(1), 3–5.
Chandler, J., Mueller, P., & Paolacci, G. (2014). Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers. Behavior Research Methods, 46, 112–130. https://doi.org/10.3758/s13428-013-0365-7
Chandler, J., & Shapiro, D. (2016). Conducting clinical research using crowdsourced convenience samples. Annual Review of Clinical Psychology, 12, 53–81.
CloudResearch.com. (2021). Online participant recruitment – made easy. CloudResearch. Retrieved April 12, 2021 from https://www.cloudresearch.com
Cunningham, J. A., Godinho, A., & Bertholet, N. (2019). Outcomes of two randomized controlled trials, employing participants recruited through Mechanical Turk, of Internet interventions targeting unhealthy alcohol use. BMC Medical Research Methodology, 19, 124. https://doi.org/10.1186/s12874-019-0770-4
Dennis, S. A., Goodson, B. M., & Pearson, C. (2019). Virtual private servers and the limitations of IP-based screening procedures: Lessons from the MTurk quality crisis of 2018. Available at SSRN 3233954.
Dickinson, D. L., Wolkow, A. P., Rajaratnam, S. M. W., & Drummond, S. P. A. (2018). Personal sleep debt and daytime sleepiness mediate the relationship between sleep and mental health outcomes in young adults. Depression and Anxiety, 35(8), 775–783. https://doi.org/10.1002/da.22769
Dupuis, M., Meier, E., & Cuneo, F. (2019). Detecting computer-generated random responding in questionnaire-based data: A comparison of seven indices. Behavior Research Methods, 51, 2228–2237.
Engle, K., Talbot, M., & Samuelson, K.W. (2020). Is Amazon’s Mechanical Turk (MTurk) a comparable recruitment source for trauma studies? Psychological Trauma: Theory, Research, Practice, and Policy, 12(4), 381–388.
Flaherty, C. (2020). Faculty home work. Retrieved December 14 from https://www.insidehighered.com/news/2020/03/24/working-home-during-covid-19-proves-challenging-faculty-members
Ganduillia, L., Lezzi, E., & Parciasepe, P. (2020). Replication with MTurk of the experimental design by Gangadharan, Grossman, Jones & Leister (2018): Charitable giving across donor types. Journal of Economic Psychology, 78, 102268.
Grant, S., Mayo-Wilson, E., Montgomery, P., Macdonald, G., Michie, S., Hopewell, S., & Moher, D. (2018). CONSORT-SPI 2018 explanation and elaboration: guidance for reporting social and psychological intervention trials. Trials, 19, 406.
Higgins-Biddle, J. C., & Babor, T. F. (2018). A review of the Alcohol Use Disorders Identification Test (AUDIT), AUDIT-C, and USAUDIT for screening in the United States: Past issues and future directions. The American Journal of Drug and Alcohol Abuse, 44(6), 578–586.
Hydock, C. (2018). Assessing and overcoming participant dishonesty in online data collection. Behavior Research Methods, 50, 1563–1567. https://doi.org/10.3758/s13428-017-0984-5
Jain, J.P., Offer, C., Rowe, C., Turner, C., Dawson-Rose, C., Hoffman, T., & Santos, G-M. (2021). The psychosocial predictors and day-level correlates of substance use among participants recruited via an online crowdsourcing platform in the United States: Daily diary study. JMIR Public Health and Surveillance, 7(4), e23872.
Johnson, D. R., & Borden, L. A. (2012). Participants at Your Fingertips: Using Amazon’s Mechanical Turk to Increase Student–Faculty Collaborative Research. Teaching of Psychology, 39(4), 245–251.
Keith, M. G., Tay, L., & Harms, P. D. (2017). Systems perspective of Amazon Mechanical Turk for Organizational Research: Review and Recommendations. Frontiers in Psychology, 8, 1359.
Kennedy, R., Clifford, S., Burleigh, T., Waggoner, P. D., Jewell, R., & Winter, N. J. G. (2020). The shape and solutions to the MTurk quality crisis. Political Science Research and Methods, 8, 614–629.
Killgore, W. D. S., Cloonan, S. A., Taylor, E. C., & Dailey, N. S. (2020). Loneliness: A signature mental health concern in the era of COVID-19. Psychiatry Research, 113117. https://doi.org/10.1016/j.psychres.2020.113117
Kim, H. S., & Hodgins, D. C. (2017). Reliability and validity of data obtained from alcohol, cannabis, and gambling populations on Amazon’s Mechanical Turk. Psychology of Addictive Behaviors, 31(1), 86–94.
Kim, H. S., & Hodgins, D. C. (2020). Are you for real? Maximizing participant eligibility on Amazon's Mechanical Turk Addiction. https://doi.org/10.1111/add.15065
Kraiger, K., McGonagle, A. K., & Sanchez, D. R. (2020). What's in a sample? Comparison of effect size replication and response quality across student, MTurk, and Qualtrics samples 11th Conference on Organizational Psychology: People and Risks, Saratov State University.
Kroenke, K., Spitzer, R. L., & Williams, J. B. W. (2001). The PHQ-9: Validity of a brief depression severity measure. Journal of General Internal Medicine, 16(9), 606–613.
Loepp, E., & Kelly, J. T. (2020). Distinction without a difference? An assessment of MTurk Worker types. Research & Politics, January–March, 1–8.
MacInnis, C. C., Boss, H. C. D., & Bourdage, J. S. (2020). More evidence of participant misrepresentation on MTurk and investigating who misrepresents. Personality and Individual Differences, 152, 109603.
Mellis, A. M., & Bickel, W. K. (2020). Mechanical Turk data collection in addiction research: utility, concerns and best practices. Addiction, 115(10), 1960–1968. https://doi.org/10.1111/add.15032.
Merz, Z. C., Lace, J. W., & Einstein, A. M. (2020). Examining broad intellectual abilities obtained within an MTurk internet sample. Current Psychology. https://doi.org/10.1007/s12144-020-00741-0
Nordstokke, D. W., & Zumbo, B. D. (2007). A cautionary tale about Levene's Tests for Equal Variances. Educational Research & Policy Studies, 7(1), 1–14.
Ogletree, A. M., & Katz, B. (2020). How do older adults recruited using MTurk differ from those in a national probability sample? The International Journal of Aging and Human Development, Online First. https://doi.org/10.1177/0091415020940197
Ophir, Y., Sisso, I., Asterhan, C.S.C., Tikochinski, R., & Reichart, R. (2019). The Turker blues: Hidden factors behind increased depression rates among Amazon’s Mechanical Turkers. Clinical Psychological Science, 8(1), 65–83.
Peterson, D. (2015). All that is solid: Bench-building at the frontiers of two experimental sciences. American Sociological Review, 80(6), 1201–1225.
Prolific. (2018). Using attention checks as a measure of data quality. Prolific. Retrieved May 11 from https://researcher-help.prolific.co/hc/en-gb/articles/360009223553-Using-attention-checks-as-a-measure-of-data-quality
Qualtrics. (2020). Randomizer. QualtricsXM. Retrieved September 4 from https://www.qualtrics.com/support/survey-platform/survey-module/survey-flow/standard-elements/randomizer/
Redmiles, E. M., Kross, S., & Mazurek, M. L. (2019). How well do my results generalize? Comparing security and privacy survey results from MTurk, web, and telephone samples 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA.
Spitzer, R. L., Kroenke, K., Williams, J. B. W., & Löwe, B. (2006). A brief measure for assessing generalized anxiety disorder: The GAD-7. JAMA Internal Medicine, 166(10), 1092–1097.
Tomitaka, S., Kawasaki, Y., Ide, K., Akutagawa, M., Ono, Y., & Furukawa, T.A. (2018). Stability of the distribution of Patient Health Questionnaire-9 scores against age in the general population: Data from the National Health and Nutrition Examination Survey. Frontiers in Psychiatry, 9, 390.