Conducting behavioral research on Amazon’s Mechanical Turk

Springer Science and Business Media LLC - Tập 44 Số 1 - Trang 1-23 - 2012
Winter Mason1, Siddharth Suri1
1Yahoo! Research, New York, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

Akerlof, G. A. (1970). The market for “lemons”: Qualitative uncertainty and the market mechanism. Quarterly Journal of Economics, 84, 488–500.

Alonso, O., & Mizzaro, S. (2009). Can we get rid of TREC assessors? Using Mechanical Turk for relevance assessment. In S. Geva, J. Kamps, C. Peters, T. Saka, A. Trotman, & E. Voorhees (Eds.), Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation (pp. 15–16). Amsterdam: IR Publications.

Andrews, D., Nonnecke, B., & Preece, J. (2003). Electronic survey methodology: A case study in reaching hard-to-involve Internet users. International Journal of Human–Computer Interaction, 16, 185–210.

Barchard, K. A., & Williams, J. (2008). Practical advice for conducting ethical online experiments and questionnaires for United States psychologists. Behavior Research Methods, 40, 1111–1128.

Baron, J., & Hershey, J. (1988). Outcome bias in decision evaluation. Journal of Personality and Social Psychology, 54, 569–579.

Berk, R. A. (1983). An introduction to sample selection bias in sociological data. American Sociological Review, 48, 386–398.

Birnbaum, M. H. (Ed.). (2000). Psychological experiments on the Internet. San Diego, CA: Academic Press.

Birnbaum, M. H. (2004). Human research and data collection via the Internet. Annual Review of Psychology, 55, 803–832.

Buhrmester, M. D., Kwang, T., & Gosling, S. D. (in press). Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science.

Camerer, C. F., & Hogarth, R. M. (1999). The effects of financial incentives in experiments: A review and capital-labor-production framework. Journal of Risk and Uncertainty, 19, 7–42.

Centola, D. (2010). The spread of behavior in an online social network experiment. Science, 329, 1194.

Chilton, L. B., Horton, J. J., Miller, R. C., & Azenkot, S. (2010). Task search in a human computation market. In Proceedings of the ACM SIGKDD Workshop on Human Computation (pp. 1–9). New York: ACM.

Cooper, R., DeJong, D. V., Forsythe, R., & Ross, T. W. (1996). Cooperation without reputation: Experimental evidence from prisoner’s dilemma games. Games and Economic Behavior, 12, 187–218.

Couper, M. P. (2000). Web surveys: A review of issues and approaches. Public Opinion Quarterly, 64, 464–494.

Couper, M. P., & Miller, P. V. (2008). Web survey methods. Public Opinion Quarterly, 72, 831.

Dixon, W. J. (1953). Processing data for outliers. Biometrics, 9, 74–89.

Eriksson, K., & Simpson, B. (2010). Emotional reactions to losing explain gender differences in entering a risky lottery. Judgment and Decision Making, 5, 159–163.

Fehr, E., & Gachter, S. (2000). Cooperation and punishment in public goods experiments. American Economic Review, 90, 980–994.

Felstiner, A. L. (2010). Working the crowd: Employment and labor law in the crowdsourcing industry. Retrieved from http://works.bepress.com/alek_felstiner/1/

Frick, A., Bächtiger, M.-T., & Reips, U.-D. (2001). Financial incentives, personal information, and drop out in online studies. In U.-D. Reips & M. Bosnjak (Eds.), Dimensions of Internet science (pp. 209–219). Lengerich: Pabst Science.

Göritz, A. S. (2006). Incentives in Web studies: Methodological issues and a review. International Journal of Internet Science, 1, 58–70.

Göritz, A. S. (2008). The long-term effect of material incentives on participation in online panels. Field Methods, 20, 211–225.

Göritz, A. S., & Stieger, S. (2008). The high-hurdle technique put to the test: Failure to find evidence that increasing loading times enhances data quality in Web-based studies. Behavior Research Methods, 40, 322–327.

Göritz, A. S., Wolff, H. G., & Goldstein, D. G. (2008). Individual payments as a longer-term incentive in online panels. Behavior Research Methods, 40, 1144–1149.

Gosling, S. D., & Johnson, J. A. (Eds.). (2010). Advanced methods for conducting online behavioral research. Washington, DC: American Psychological Association.

Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica, 47, 153–161.

Horton, J. J., Rand, D. G., & Zeckhauser, R. J. (in press). The online laboratory. Experimental Economics.

Hossain, T., & Morgan, J. (2006). Plus shipping and handling: Revenue (non)equivalence in field experiments on eBay. Advances in Economic Analysis & Policy, 6, 3.

Howe, J. (2006). The rise of crowdsourcing. Wired Magazine, 14, 1–4.

Huang, E., Zhang, H., Parkes, D. C., Gajos, K. Z., & Chen, Y. (2010). Toward automatic task design: A progress report. In Proceedings of the ACM SIGKDD Workshop on Human Computation (pp. 77–85). New York: ACM.

Ipeirotis, P. G. (2010a). Analyzing the Amazon Mechanical Turk marketplace. ACM XRDS, 17, 16–21.

Ipeirotis, P. G. (2010b). Demographics of Mechanical Turk (Tech. Rep. No. CeDER-10-01). New York: New York University. Retrieved from http://hdl.handle.net/2451/29585 . March.

Ipeirotis, P. G., Provost, F., & Wang, J. (2010). Quality management on Amazon Mechanical Turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation (pp. 64–67). New York: ACM.

Kittur, A., Chi, E. H., & Suh, B. (2008). Crowdsourcing user studies with Mechanical Turk. In M. Czerwinski & A. Lund (Eds.), Proceeding of the Twenty-Sixth Annual SIGCHI Conference on Human Factors in Computing Systems (pp. 453–456). New York: ACM.

Kraut, R., Olson, J., Banaji, M., Bruckman, A., Cohen, J., & Couper, M. (2004). Psychological research online: Opportunities and challenges. American Psychologist, 59, 105–117.

Little, G., Chilton, L. B., Goldman, M., & Miller, R. C. (2010). Exploring iterative and parallel human computation processes. In Proceedings of the ACM SIGKDD Workshop on Human Computation (pp. 68–76). New York: ACM.

Mao, A., Parkes, D. C., Procaccia, A. D., & Zhang, H. (2011). Human computation and multiagent systems: An algorithmic perspective. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence. San Francisco.

Marge, M. , Banerjee, S., & Rudnicky, A. I. (2010). Using the Amazon Mechanical Turk for transcription of spoken language. In J. Hansen (Ed.), Proceedings of the 2010 IEEE Conference on Acoustics, Speech and Signal Processing (pp. 5270–5273). IEEE.

Mason, W. A., & Watts, D. J. (2009). Financial incentives and the performance of crowds. In Proceedings of the ACM SIGKDD Workshop on Human Computation (pp. 77–85). New York: ACM.

McCreadie, R. M. C., Macdonald, C., & Ounis, I. (2010). Crowdsourcing a news query classification dataset. In M. Lease, V. Carvalho, & E. Yilmaz (Eds.), Proceedings of the ACM SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation (CSE 2010) (pp. 31–38). Geneva, Switzerland. July 23.

Musch, J., & Klauer, K. C. (2002). Psychological experimenting on the World Wide Web: Investigating content effects in syllogistic reasoning. In M. B. B. Batinic & U.-D. Reips (Eds.), Online social sciences (pp. 181–212). Göttingen: Hogrefe.

Nosek, B. A. (2007). Implicit–explicit relations. Current Directions in Psychological Science, 16, 65–69.

Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running experiments on Amazon Mechanical Turk. Judgment and Decision Making, 5, 411–419.

Pontin, J. (2007). Artificial intelligence, with help from the humans. New York Times. March.

Rand, D. G. (in press). The promise of Mechanical Turk: How online labor markets can help theorists run behavioral experiments. Journal of Theoretical Biology.

Reiley, D. (1999). Using field experiments to test equivalence between auction formats: Magic on the Internet. American Economic Review, 89, 1063–1080.

Reips, U. D. (2000). The Web experiment method: Advantages, disadvantages and solutions. In M. H. Birnbaum (Ed.), Psychological experiments on the Internet (pp. 89–114). San Diego: Academic Press.

Reips, U. D. (2001). The Web experimental psychology lab: Five years of data collection on the Internet. Behavior Research Methods, Instruments, & Computers, 33, 201–211.

Reips, U. D. (2002). Standards for Internet-based experimenting. Experimental Psychology, 49, 243–256.

Reips, U. D., & Birnbaum, M. H. (2011). Behavioral research and data collection via the internet. In R. W. Proctor & K.-P. L. Vu (Eds.), The handbook of human factors in web design (pp. 563–585). Mahwah: Erlbaum.

Ross, J., Irani, L., Silberman, M. S., Zaldivar, A., & Tomlinson, B. (2010). Who are the crowdworkers? Shifting demographics in Amazon Mechanical Turk. In K. Edwards & T. Rodden (Eds.), Proceedings of the ACM Conference on Human Factors in Computing Systems (pp. 2863–2872). New York: ACM.

Ryan, K. J., Brady, J., Cooke, R., Height, D., Jonsen, A., King, P., et al. (1979). The Belmont report: Ethical principles and guidelines for the protection of human subjects of research. Washington, DC: National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research.

Salganik, M. J., Dodds, P. S., & Watts, D. J. (2006). Experimental study of inequality and unpredictability in an artificial cultural market. Science, 311, 854–856.

Schmidt, W. C. (2007). Technical considerations when implementing online research. In A. Joinson, K. McKenna, T. Postmes, & U.-D. Reips (Eds.), The Oxford handbook of internet psychology (pp. 461–472). Oxford: Oxford University Press.

Shariff, A. F., & Norenzayan, A. (2007). God is watching you. Psychological Science, 18, 803.

Sheng, V. S., Provost, F., & Ipeirotis, P. G. (2008). Get another label? Improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 614–622). New York: ACM.

Smith, M., & Leigh, B. (1997). Virtual subjects: Using the Internet as an alternative source of subjects and research environment. Behavior Research Methods, 29, 496–505.

Snow, R., O’Connor, B., Jurafsky, D., & Ng, A. Y. (2008). Cheap and fast—but is it good? Evaluating non-expert annotations for natural language tasks. In M. Lapata & H. T. Ng (Eds.), Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 254–263). New York: ACM.

Suri, S., & Watts, D. J. (2011). Cooperation and contagion in Web-based, networked public goods experiments. PLoS One, 6(3), e16836.

Tversky, A., & Kahneman, D. (1981). The framing of decisions and the psychology of choice. Science, 211, 453–458.

Tversky, A., & Kahneman, D. (1983). Extensional versus intuitive reasoning: The conjuction fallacy in probability judgement. Psychological Review, 90, 293–315.

Urbano, J., Morato, J., Marrero, M., & Martín, D. (2010). Crowdsourcing preference judgments for evaluation of music similarity tasks. In M. Lease, V. Carvalho, & E. Yilmaz (Eds.), Proceedings of the ACM SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation (CSE 2010) (pp. 9–16). Geneva, Switzerland.

Voracek, M., Stieger, S., & Gindl, A. (2001). Online replication of evolutionary psychology evidence: Sex differences in sexual jealousy in imagined scenarios of mate’s sexual versus emotional infidelity. In U.-D. Reips & M. Bosnjak (Eds.), Dimensions of Internet science (pp. 91–112). Lengerich: Pabst Science.

Zhu, D., & Carterette, B. (2010). An analysis of assessor behavior in crowdsourced preference judgments. In M. Lease, V. Carvalho, & E. Yilmaz (Eds.), Proceedings of the ACM SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation (CSE 2010) (pp. 21–26). Geneva, Switzerland.