Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec
Tóm tắt
Từ khóa
Tài liệu tham khảo
Amin, 2014, Customer churn prediction in telecommunication industry: With and without counter-example, 134
Aung, 2009, Random forest classifier for multi-category classification of web pages, 372
Bíró, 2008, Latent dirichlet allocation in web spam filtering, 29
Blei, 2003, Latent dirichlet allocation, J. Mach. Learn. Res., 3, 993
Blum, 1998, Combining labeled and unlabeled data with co-training, 92
Bouguelia, 2013, A stream-based semi-supervised active learning approach for document classification, 611
Chapelle, 2010
Druck, 2007, Semi-supervised classification with hybrid generative/discriminative methods, 280
Glorot, 2011, Domain adaptation for large-scale sentiment classification: A deep learning approach, 513
Go, 2009, Twitter sentiment classification using distant supervision, CS224N Project Report, Stanford, 1, 12
Harish, 2010, Representation and classification of text documents: a brief review, IJCA, Special Issue on RTIPPR, 110
Khan, 2010, A review of machine learning algorithms for text-documents classification, J. Adv. Inf. Technol., 1, 4
Kim, 2006, Some effective techniques for naive bayes text classification, IEEE Trans. Knowl. Data Eng., 18, 1457, 10.1109/TKDE.2006.180
J.H. Lau, T. Baldwin, An empirical evaluation of doc2vec with practical insights into document embedding generation, arXiv:1607.05368 (2016).
Le, 2014, Distributed representations of sentences and documents, 14, 1188
T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, arXiv:1301.3781 (2013).
Nigam, 2000, Analyzing the effectiveness and applicability of co-training, 86
Nigam, 2000, Text classification from labeled and unlabeled documents using em, Mach. Learn., 39, 103, 10.1023/A:1007692713085
Pang, 2002, Thumbs up?: sentiment classification using machine learning techniques, 79
Qiu, 2014, Collapsed gibbs sampling for latent dirichlet allocation on spark, J. Mach. Learn. Res., 36, 17
Ranjan, 2017, Document classification using lstm neural network, J. Data Mining Manage., 2
Robertson, 2004, Understanding inverse document frequency: on theoretical arguments for idf, J. Document., 60, 503, 10.1108/00220410410560582
Rosenberg, 2005, Semi-supervised self-training of object detection models, 29
Sabbah, 2017, Modified frequency-based term weighting schemes for text classification, Appl. Soft Comput., 58, 193, 10.1016/j.asoc.2017.04.069
Tong, 2001, Support vector machine active learning with applications to text classification, J. Mach. Learn. Res., 2, 45
Wang, 2012, Semi-supervised latent dirichlet allocation and its application for document classification, 306
Xing, 2014, Document classification with distributions of word vectors, 1
Xu, 2016, Bayesian naïve bayes classifiers to text classification, J. Inf. Sci.
Yun-tao, 2005, An improved tf-idf approach for text classification, J. Zhejiang Univ. Sci. A, 6, 49, 10.1631/jzus.2005.A0049
Zhang, 2011, A comparative study of tf* idf, lsi and multi-words for text classification, Expert. Syst. Appl., 38, 2758, 10.1016/j.eswa.2010.08.066
Zhu, 2005