Supervised clustering for automated document classification and prioritization: a case study using toxicological abstracts
Tóm tắt
Từ khóa
Tài liệu tham khảo
Albalate A, Suchindranath A, Suendermann D, Minker W (2010) A semi-supervised cluster-and-label approach for utterance classification. In: Workshop proceedings of the 6th international conference on intelligent environments, pp 61–70
Aphinyanaphongs Y, Tsamardinos I, Statnikov A, Hardin D, Aliferis CF (2005) Text categorization models for high-quality article retrieval in internal medicine. J Am Med Inform Assoc 12:207–216
Bekhuis Tanja, Demner-Fushman Dina (2012) Screening nonrandomized studies for medical systematic reviews: a comparative study of classifiers. Artif Intell Med 55(3):197–207
Bishop CM (2006) Pattern Recognition and Machine Learning., vol 1. New York, Springer
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:1–39
Chapelle O, Scholkopf B, Zien A (2006) Semi-supervised learning. MIT Press, Cambridge
Cohen AM, Hersh WR, Peterson K, Yen P-Y (2006) Reducing workload in systematic review preparation using automated citation classification. J Am Med Inform Assoc 13:206–219
Cohen AM, Ambert K, McDonagh M (2012) Studying the potential impact of automated document classification on scheduling a systematic review update. BMC Med Inform Decis Mak 12(1):33
Dasarathy BV (1991) Nearest neighbour (NN) norms: NN pattern classification techniques. IEEE Computer Society Press, Los Alamitos
Devarajan K (2008) Nonnegative matrix factorization: an analytical and interpretive tool in computational biology. PLoS Comput Biol 4:e1000029
Dietterich TG (2000) Ensemble methods in machine learning. International workshop on multiple classifier systems. Springer, Berlin
Frunza O, Inkpen D, Matwin S, Klement W, O’blenis P (2011) Exploiting the systematic review protocol for classification of medical abstracts. Artif Intell Med 51:17–25
Goutte C, Gaussier E (2005) A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: Losada DE, Fernández-Luna JM (eds) Proceedings of advances in information retrieval: 27th European conference on IR research. Springer, Santiago de Compostela, pp 345–359
Harris ZS (1954) Distributional structure. WORD 10:146–162
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and Prediction. Springer, New York
Haynes RB, Wilczynski N, McKibbon KA, Walker CJ, Sinclair JC (1994) Developing optimal search strategies for detecting clinically sound studies in MEDLINE. J Am Med Inform Assoc 1:447–458
Ingersoll GS, Morton TS, Farris AL (2013) Taming text: how to find, organize, and manipulate it. Manning Publications Co., Greenwich
Jonnalagadda S, Petitti D (2013) A new iterative method to reduce workload in systematic review process. Int J Comput Biol Drug Des 6:5–17
Larsen RJ, Marx ML (2001) An introduction to mathematical statistics and its applications. Prentice Hall, Upper Saddle River, NJ
Le QV, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of the 31st international conference on machine learning, Bejing, pp 1188–1196
Li B, Yu S, Lu Q (2003) An improved k-nearest neighbor algorithm for text categorization. In: Proceedings of the 20th international conference on computer processing of oriental languages, Shenyang
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S (2015) Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev 4:5
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Python Software Foundation. Python language reference (version 2.7)
Shemilt I et al (2014) Pinpointing needles in giant haystacks: use of text mining to reduce impractical screening workload in extremely large scoping reviews. Res Synth Methods 5(1):31–49
US EPA (2015) IRIS toxicological review of Dibutyl phthalate (Dbp) (preliminary assessment materials). US Environmental Protection Agency, Washington, DC, EPA/635/R-13/302
Wallace BC, Trikalinos TA, Lau J, Brodley C, Schmid CH (2010) Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinform 11:55
Webb AR (2002) Statistical pattern recognition. Wiley, New York
Zhu X, Goldberg AB (2009) Introduction to semi-supervised learning. Synthesis lectures on artificial intelligence and machine learning. Morgan and Claypool Publishers, Los Altos