Classification of heterogeneous text data for robust domain-specific language modeling
Tóm tắt
Từ khóa
Tài liệu tham khảo
Juhár J, Staš J, Hládek D: Recent progress in development language model for Slovak large vocabulary continuous speech recognition. In New Technologies - Trends, Innovations and Research. Edited by: C Volosencu, C Volosencu . InTech Open Access, Rijeka; 2012:261-276.
Juhár J, Trnka M, Darjaa S, Hládek D, Sabo R, Pleva M, Rusko M: Recent advances in the Slovak dictation system for the judicial domain. In Proceedings of the 6th Language and Technology Conference on HLT. Poznań, LTC; 2013:555-560.
Huang A: Similarity measures for text document clustering. In Proceedings of the 6th New Zealand Computer Science Research Student Conference. Christchurch, NZCSRSC; 2008:49-56.
Yue L, Xiao S, Lv X, Wang T: Topic detection based on keyword. In Proceedings of 2011 International Conference on Mechatronic Science, Electric Engineering and Computer. Jilin, MEC; 2011:464-467.
Manning CD, Raghavan P, Schütze H: Introduction to Information Retrieval. Cambridge: Cambridge University Press; 2009.
Peng F, Schuurmans D, Wang S: Augmenting naïve Bayes classifiers with statistical language models. Inf. Retr. 2004, 7(3–4):317-345.
Tan S: An effective refinement strategy for KNN text classifier. Expert Syst. Appl 2006, 30(2):290-298. 10.1016/j.eswa.2005.07.019
Remeikis N, Skučas I, Melninkaité V: Text categorization using neural networks initialized with decision trees. Informatica 2004, 15(4):551-564.
Joachims T: Text categorization with support vector machines: learning with many relevant features. In Proceedings of the 10th European Conference on ML. Chemnitz, ECML; 1998:137-142.
Zhang W, Yoshida T, Tang X: Text classification using semi-supervised clustering. In Proceedings of the 2nd International Conference on Business Intelligence and Financial Engineering. Beijing, BIFE; 2009:197-200.
Darjaa S, Cerňak M, Trnka M, Rusko M: Effective triphone mapping for acoustic modeling in speech recognition. In Proceeding of INTERSPEECH 2011. Florence, INTERSPEECH; 2011:1717-1720.
Pleva M, Juhár J: Building of broadcast news database for evaluation of the automated subtitling service. Communications 2013, 15(2A):124-128.
Hládek D, Juhár J, Staš J: the Slovak morphological classifier. In Proceedings of the 54th International Symposium ELMAR 2012. Zadar, ELMAR; 2012:195-198.
Garabík R: Slovak morphology analyzer based on Levenshtein edit operations. In Proceedings of the 1st Workshop on Intelligent and Knowledge Oriented Technologies. Bratislava, WIKT; 2006:2-5.
Hládek D, Juhár J, Ološtiak M, Staš J: Automatic extraction of multiword units from Slovak text corpora. In Proceedings of the 7th International Conference on Natural Language Processing, Corpus Linguistics and E-learning. Bratislava, SLOVKO; 2013:228-237.
Reed JW, Jiao Y, Potok TE, Klump BA, Elmore MT, Hurson AR, TF-ICF: a new term weighting scheme for clustering dynamic data sets. In Proceedings of the 5th International Conference on Machine Learning and Applications. Orlando: ICMLA; 2006:258-263.
Zlacký D, Staš J, J Juhár, A Čižmár, Term weighting schemes for Slovak text document clustering. (J. Electr. Electron. Eng, ed.), vol. 6, (2013), pp. 163–166
Jin R, Falusos C, Hauptmann AG: Meta-scoring: automatically evaluating term weighting schemes in IR without precision-recall. In Proceedings of the 24th Annual International ACM Conference on Research and Development in Information Retrieval. New Orleans, USA, SIGIR ACM, New York; 2001:83-89.
Robertson SE, Walker S, Jones S, Hancock-Beaulieu MM, Gatford M: Okapi at TREC-3. In Proceedings of the 3rd Text Retrieval Conference. Gaithersburg, TREC-3; 1996:109-126.
Whissell JS, Clarke ChLA: Improving document clustering using Okapi BM25 feature weighting. Inf. Retr 2011, 14(5):466-487. 10.1007/s10791-011-9163-y
Singhal A: AT&T at TREC-6. In Proceedings of the 6th Text Retrieval Conference. Gaithersburg, TREC-6; 1998:215-226.
Lee S, Song J, Kim Y: An empirical comparison of four text mining methods. J. Comp. Inf. Sys 2010, 51(1):1-10.
Cha SH: Comprehensive survey on distance/similarity measures between probability density functions. Intl. J. Math. Model. Methods Appl. Sci 2007, 1(4):300-307.
Rosin PL: Edges: saliency measures and automatic thresholding. Technical Note No. I.95.58: Institute for Remote Sensing Applications 1995.
Lee A, Kawahara T: Recent development of open-source speech recognition engine Julius. In em Proceedings of the 2009 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. Sapporo, APSIPA ASC; 2009:131-137.
Stolcke A, Zheng J, Wang W, Abrash V: SRILM at sixteen: update and outlook. In Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop. Waikoloa, ASRU; 2011:5 pages-5 pages.