Classification of heterogeneous text data for robust domain-specific language modeling

EURASIP Journal on Audio, Speech, and Music Processing - Tập 2014 Số 1 - 2014

Ján Staš¹, Jozef Juhár¹, Daniel Hládek¹

¹Department of Electronics and Multimedia Communications, Technical University of Košice, Park Komenského 13, 041 20, Košice, Slovakia

Tóm tắt

Từ khóa

Tài liệu tham khảo

Juhár J, Staš J, Hládek D: Recent progress in development language model for Slovak large vocabulary continuous speech recognition. In New Technologies - Trends, Innovations and Research. Edited by: C Volosencu, C Volosencu . InTech Open Access, Rijeka; 2012:261-276.

Juhár J, Trnka M, Darjaa S, Hládek D, Sabo R, Pleva M, Rusko M: Recent advances in the Slovak dictation system for the judicial domain. In Proceedings of the 6th Language and Technology Conference on HLT. Poznań, LTC; 2013:555-560.

Huang A: Similarity measures for text document clustering. In Proceedings of the 6th New Zealand Computer Science Research Student Conference. Christchurch, NZCSRSC; 2008:49-56.

Yue L, Xiao S, Lv X, Wang T: Topic detection based on keyword. In Proceedings of 2011 International Conference on Mechatronic Science, Electric Engineering and Computer. Jilin, MEC; 2011:464-467.

Manning CD, Raghavan P, Schütze H: Introduction to Information Retrieval. Cambridge: Cambridge University Press; 2009.

Peng F, Schuurmans D, Wang S: Augmenting naïve Bayes classifiers with statistical language models. Inf. Retr. 2004, 7(3–4):317-345.

Tan S: An effective refinement strategy for KNN text classifier. Expert Syst. Appl 2006, 30(2):290-298. 10.1016/j.eswa.2005.07.019

Remeikis N, Skučas I, Melninkaité V: Text categorization using neural networks initialized with decision trees. Informatica 2004, 15(4):551-564.

Joachims T: Text categorization with support vector machines: learning with many relevant features. In Proceedings of the 10th European Conference on ML. Chemnitz, ECML; 1998:137-142.

Zhang W, Yoshida T, Tang X: Text classification using semi-supervised clustering. In Proceedings of the 2nd International Conference on Business Intelligence and Financial Engineering. Beijing, BIFE; 2009:197-200.

Darjaa S, Cerňak M, Trnka M, Rusko M: Effective triphone mapping for acoustic modeling in speech recognition. In Proceeding of INTERSPEECH 2011. Florence, INTERSPEECH; 2011:1717-1720.

Pleva M, Juhár J: Building of broadcast news database for evaluation of the automated subtitling service. Communications 2013, 15(2A):124-128.

Hládek D, Juhár J, Staš J: the Slovak morphological classifier. In Proceedings of the 54th International Symposium ELMAR 2012. Zadar, ELMAR; 2012:195-198.

Garabík R: Slovak morphology analyzer based on Levenshtein edit operations. In Proceedings of the 1st Workshop on Intelligent and Knowledge Oriented Technologies. Bratislava, WIKT; 2006:2-5.

Hládek D, Juhár J, Ološtiak M, Staš J: Automatic extraction of multiword units from Slovak text corpora. In Proceedings of the 7th International Conference on Natural Language Processing, Corpus Linguistics and E-learning. Bratislava, SLOVKO; 2013:228-237.

Reed JW, Jiao Y, Potok TE, Klump BA, Elmore MT, Hurson AR, TF-ICF: a new term weighting scheme for clustering dynamic data sets. In Proceedings of the 5th International Conference on Machine Learning and Applications. Orlando: ICMLA; 2006:258-263.

Zlacký D, Staš J, J Juhár, A Čižmár, Term weighting schemes for Slovak text document clustering. (J. Electr. Electron. Eng, ed.), vol. 6, (2013), pp. 163–166

Jin R, Falusos C, Hauptmann AG: Meta-scoring: automatically evaluating term weighting schemes in IR without precision-recall. In Proceedings of the 24th Annual International ACM Conference on Research and Development in Information Retrieval. New Orleans, USA, SIGIR ACM, New York; 2001:83-89.

Robertson SE, Walker S, Jones S, Hancock-Beaulieu MM, Gatford M: Okapi at TREC-3. In Proceedings of the 3rd Text Retrieval Conference. Gaithersburg, TREC-3; 1996:109-126.

Whissell JS, Clarke ChLA: Improving document clustering using Okapi BM25 feature weighting. Inf. Retr 2011, 14(5):466-487. 10.1007/s10791-011-9163-y

Singhal A: AT&T at TREC-6. In Proceedings of the 6th Text Retrieval Conference. Gaithersburg, TREC-6; 1998:215-226.

Lee S, Song J, Kim Y: An empirical comparison of four text mining methods. J. Comp. Inf. Sys 2010, 51(1):1-10.

Cha SH: Comprehensive survey on distance/similarity measures between probability density functions. Intl. J. Math. Model. Methods Appl. Sci 2007, 1(4):300-307.

Rosin PL: Edges: saliency measures and automatic thresholding. Technical Note No. I.95.58: Institute for Remote Sensing Applications 1995.

Lee A, Kawahara T: Recent development of open-source speech recognition engine Julius. In em Proceedings of the 2009 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. Sapporo, APSIPA ASC; 2009:131-137.

Stolcke A, Zheng J, Wang W, Abrash V: SRILM at sixteen: update and outlook. In Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop. Waikoloa, ASRU; 2011:5 pages-5 pages.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA