Sequence labeling with multiple annotators

Machine Learning - Tập 95 Số 2 - Trang 165-181 - 2014
Filipe Rodrigues1, Francisco C. Pereira2, Bernardete Ribeiro1
1Centre for Informatics and Systems of the University of Coimbra (CISUC), Department of Informatics Engineering, University of Coimbra, 3030-290, Coimbra, Portugal
2Singapore-MIT Alliance for Research and Technology (SMART), 1 CREATE Way, Singapore, 138602, Singapore

Tóm tắt

Từ khóa


Tài liệu tham khảo

Allen, J., & Salzberg, S. (2005). JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics, 21(18), 3596–3603.

Allen, J., Pertea, M., & Salzberg, S. (2004). Computational gene prediction using multiple sources of evidence. Genome Research, 14(1), 142–148.

Bellare, K., & Mccallum, A. (2007). Learning extractors from unlabeled text using relevant databases. In Sixth international workshop on information integration on the web.

Callison-Burch, C., & Dredze, M. (2010). Creating speech and language data with amazon’s mechanical turk. In Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with amazon’s mechanical turk (pp. 1–12).

Dawid, A. P., & Skene, A. M. (1979). Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society. Series C. Applied Statistics, 28(1), 20–28.

Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.

Donmez, P., & Carbonell, J. (2008). Proactive learning: cost-sensitive active learning with multiple imperfect oracles. In Proceedings of the 17th ACM conference on information and knowledge management (pp. 619–628).

Donmez, P., Schneider, J., & Carbonell, J. (2010). A probabilistic framework to learn from multiple annotators with time-varying accuracy. In Proceedings of the SIAM international conference on data mining (pp. 826–837).

Dredze, M., Talukdar, P., & Crammer, K. (2009). Sequence learning from data with multiple labels. In ECML-PKDD 2009 workshop on learning from multi-label data.

Fernandes, E., & Brefeld, U. (2011). Learning from partially annotated sequences. In Proceedings of the 2011 European conference on machine learning and knowledge discovery in databases (pp. 407–422).

Groot, P., Birlutiu, A., & Heskes, T. (2011). Learning from multiple annotators with Gaussian processes. In Proceedings of the 21st international conference on artificial neural networks (Vol. 6792, pp. 159–164).

Howe, J. (2008). Crowdsourcing: why the power of the crowd is driving the future of business (1st edn.). New York: Crown Publishing Group.

Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th international conference on machine learning (pp. 282–289).

Laws, F., Scheible, C., & Schütze, M. (2011). Active learning with amazon mechanical turk. In Proceedings of the conference on empirical methods in natural language processing. Stroudsburg: Association for Computational Linguistics (pp. 1546–1556).

Lawson, N., Eustice, K., Perkowitz, M., & Yetisgen-Yildiz, M. (2010). Annotating large email datasets for named entity recognition with mechanical turk. In Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with amazon’s mechanical turk. Stroudsburg: Association for Computational Linguistics (pp. 71–79).

Liu, D., & Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45, 503–528.

Novotney, S., & Callison-Burch, C. (2010). Cheap, fast and good enough: automatic speech recognition with non-expert transcription. In Human language technologies, HLT ’10. Stroudsburg: Association for Computational Linguistics (pp. 207–215).

Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. In Proceedings of the IEEE (pp. 257–286).

Ramshaw, L., & Marcus, M. (1995). Text chunking using transformation-based learning. In Proceedings of the third workshop on very large corpora. Stroudsburg: Association for Computational Linguistics (pp. 82–94).

Raykar, V., Yu, S., Zhao, L., Jerebko, A., Florin, C., Valadez, G., Bogoni, L., & Moy, L. (2009). Supervised learning from multiple experts: whom to trust when everyone lies a bit. In Proceedings of the 26th international conference on machine learning (pp. 889–896).

Raykar, V., Yu, S., Zhao, L., Valadez, G., Florin, C., Bogoni, L., & Moy, L. (2010). Learning from crowds. Journal of Machine Learning Research, 1297–1322.

Sang, E., & Meulder, F. D. (2003). Introduction to the conll-2003 shared task: language-independent named entity recognition. In Proceedings of the 7th conference on natural language learning at HLT-NAACL (Vol. 4, pp. 142–147).

Sheng, V., Provost, F., & Ipeirotis, P. (2008). Get another label? Improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 614–622).

Smyth, P., Fayyad, U., Burl, M., Perona, P., & Baldi, P. (1995). Inferring ground truth from subjective labelling of venus images. Advances in Neural Information Processing Systems, 1085–1092.

Snow, R., O’Connor, B., Jurafsky, D., & Ng, A. (2008). Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. In Proceedings of the conference on empirical methods in natural language processing (pp. 254–263).

Surowiecki, J. (2004). The wisdom of crowds: why the many are smarter than the few and how collective wisdom shapes business, economies, societies, and nations. New York: Doubleday.

Sutton, C., & McCallum, A. (2006). Introduction to conditional random fields for relational learning. Cambridge: MIT Press.

Voyer, R., Nygaard, V., Fitzgerald, W., & Copperman, H. (2010). A hybrid model for annotating named entity training corpora. In Proceedings of the fourth linguistic annotation workshop. Stroudsburg: Association for Computational Linguistics (pp. 243–246).

Wu, O., Hu, W., & Gao, J. (2011). Learning to rank under multiple annotators. In Proceedings of the 22nd international joint conference on artificial intelligence (pp. 1571–1576).

Yan, Y., Rosales, R., Fung, G., Schmidt, M., Valadez, G., Bogoni, L., Moy, L., & Dy, J. (2010). Modeling annotator expertise: learning when everybody knows a bit of something. Journal of Machine Learning Research, 9, 932–939.

Yan, Y., Rosales, R., Fung, G., & Dy, J. (2011). Active learning from crowds. In Proceedings of the 28th international conference on machine learning (pp. 1161–1168).