Text Classification Algorithms: A Survey

Information (Switzerland) - Tập 10 Số 4 - Trang 150
Kamran Kowsari1,2, Kiana Jafari Meimandi1, Mojtaba Heidarysafa1, Sanjana Mendu1, Laura L. Barnes1,3,2, Donald E. Brown1,3
1Department of Systems and Information Engineering, University of Virginia, Charlottesville, VA 22904, USA
2Sensing Systems for Health Lab, University of Virginia, Charlottesville, VA 22911, USA
3School of Data Science, University of Virginia, Charlottesville, VA, 22904, USA

Tóm tắt

In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine learning approaches have achieved surpassing results in natural language processing. The success of these learning algorithms relies on their capacity to understand complex models and non-linear relationships within data. However, finding suitable structures, architectures, and techniques for text classification is a challenge for researchers. In this paper, a brief overview of text classification algorithms is discussed. This overview covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods. Finally, the limitations of each technique and their application in real-world problems are discussed.

Từ khóa


Tài liệu tham khảo

Jiang, 2018, Text classification based on deep belief network and softmax regression, Neural Comput. Appl., 29, 61, 10.1007/s00521-016-2401-x

Kowsari, K., Brown, D.E., Heidarysafa, M., Jafari Meimandi, K., Gerber, M.S., and Barnes, L.E. (2017, January 18–21). HDLTex: Hierarchical Deep Learning for Text Classification. Machine Learning and Applications (ICMLA). Proceedings of the 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), Cancun, Mexico.

McCallum, A., and Nigam, K. (1998, January 26–27). A comparison of event models for naive bayes text classification. Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, Madison, WI, USA.

Kowsari, K., Heidarysafa, M., Brown, D.E., Jafari Meimandi, K., and Barnes, L.E. (2018, January 9–11). RMDL: Random Multimodel Deep Learning for Classification. Proceedings of the 2018 International Conference on Information System and Data Mining, Lakeland, FL, USA.

Heidarysafa, 2018, An Improvement of Data Classification Using Random Multimodel Deep Learning (RMDL), IJMLC, 8, 298

Lai, S., Xu, L., Liu, K., and Zhao, J. (2015, January 25–30). Recurrent Convolutional Neural Networks for Text Classification. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA.

Aggarwal, C.C., and Zhai, C. (2012). A survey of text classification algorithms. Mining Text Data, Springer.

Aggarwal, C.C., and Zhai, C.X. (2012). Mining Text Data, Springer.

Salton, 1988, Term-weighting approaches in automatic text retrieval, Inf. Process. Manag., 24, 513, 10.1016/0306-4573(88)90021-0

Goldberg, Y., and Levy, O. (2014). Word2vec explained: Deriving mikolov et al.’s negative-sampling word-embedding method. arXiv.

Pennington, J., Socher, R., and Manning, C.D. (2014, January 25–29). Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar.

Mamitsuka, N.A.H. (1998). Query learning strategies using boosting and bagging. Machine Learning: Proceedings of the Fifteenth International Conference (ICML’98), Morgan Kaufmann Pub.

Kim, Y.H., Hahn, S.Y., and Zhang, B.T. (2000, January 24–28). Text filtering by boosting naive Bayes classifiers. Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece.

Schapire, 2000, BoosTexter: A boosting-based system for text categorization, Mach. Learn., 39, 135, 10.1023/A:1007649029923

Harrell, F.E. (2001). Ordinal logistic regression. Regression Modeling Strategies, Springer.

Hosmer, D.W., Lemeshow, S., and Sturdivant, R.X. (2013). Applied Logistic Regression, John Wiley & Sons.

Dou, J., Yamagishi, H., Zhu, Z., Yunus, A.P., and Chen, C.W. (2018). TXT-tool 1.081-6.1 A Comparative Study of the Binary Logistic Regression (BLR) and Artificial Neural Network (ANN) Models for GIS-Based Spatial Predicting Landslides at a Regional Scale. Landslide Dynamics: ISDR-ICL Landslide Interactive Teaching Tools, Springer.

Chen, 2017, A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility, Catena, 151, 147, 10.1016/j.catena.2016.11.032

Larson, 2010, Introduction to information retrieval, J. Am. Soc. Inf. Sci. Technol., 61, 852, 10.1002/asi.21234

Li, 2001, Gene selection for sample classification based on gene expression data: Study of sensitivity to choice of parameters of the GA/KNN method, Bioinformatics, 17, 1131, 10.1093/bioinformatics/17.12.1131

Manevitz, 2001, One-class SVMs for document classification, J. Mach. Learn. Res., 2, 139

Han, E.H.S., and Karypis, G. (2000). Centroid-based document classification: Analysis and experimental results. European Conference on Principles of Data Mining and Knowledge Discovery, Springer.

Xu, 2012, An Improved Random Forest Classifier for Text Categorization, JCP, 7, 2913

Lafferty, J., McCallum, A., and Pereira, F.C. (July, January 28). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proceedings of the 18th International Conference on Machine Learning 2001 (ICML 2001), Williamstown, MA, USA.

Shen, 2007, Document Summarization Using Conditional Random Fields, IJCAI, 7, 2862

Zhang, 2008, Automatic keyword extraction from documents using conditional random fields, J. Comput. Inf. Syst., 4, 1169

LeCun, 2015, Deep learning, Nature, 521, 436, 10.1038/nature14539

Huang, 2005, Using AUC and accuracy in evaluating learning algorithms, IEEE Trans. Knowl. Data Eng., 17, 299, 10.1109/TKDE.2005.50

Lock, 2002, Acute mesenteric ischemia: Classification, evaluation and therapy, Acta Gastro-Enterol. Belg., 65, 220

Matthews, 1975, Comparison of the predicted and observed secondary structure of T4 phage lysozyme, Biochim. Biophys. Acta (BBA)-Protein Struct., 405, 442, 10.1016/0005-2795(75)90109-9

Hanley, 1982, The meaning and use of the area under a receiver operating characteristic (ROC) curve, Radiology, 143, 29, 10.1148/radiology.143.1.7063747

Pencina, 2008, Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond, Stat. Med., 27, 157, 10.1002/sim.2929

Jacobs, P.S. (2014). Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval, Psychology Press.

Croft, W.B., Metzler, D., and Strohman, T. (2010). Search Engines: Information Retrieval in Practice, Addison-Wesley Reading.

Yammahi, M., Kowsari, K., Shen, C., and Berkovich, S. (2014, January 4–6). An efficient technique for searching very large files with fuzzy criteria using the pigeonhole principle. Proceedings of the 2014 Fifth International Conference on Computing for Geospatial Research and Application, Washington, DC, USA.

Chu, Z., Gianvecchio, S., Wang, H., and Jajodia, S. (2010, January 6–10). Who is tweeting on Twitter: Human, bot, or cyborg?. Proceedings of the 26th Annual Computer Security Applications Conference, Austin, TX, USA.

Gordon, 1983, An operational classification of disease prevention, Public Health Rep., 98, 107

Nobles, A.L., Glenn, J.J., Kowsari, K., Teachman, B.A., and Barnes, L.E. (2018, January 21–26). Identification of Imminent Suicide Risk Among Young Adults using Text Messages. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, Montreal, QC, Canada.

Gupta, 2015, Text Document Tokenization for Word Frequency Count using Rapid Miner (Taking Resume as an Example), Int. J. Comput. Appl., 975, 8887

Verma, 2014, Tokenization and filtering process in RapidMiner, Int. J. Appl. Inf. Syst., 7, 16

Aggarwal, C.C. (2018). Machine Learning for Text, Springer.

Saif, H., Fernández, M., He, Y., and Alani, H. (2014, January 26–31). On stopwords, filtering and data sparsity for sentiment analysis of twitter. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), Reykjavik, Iceland.

Gupta, 2009, A survey of text mining techniques and applications, J. Emerg. Technol. Web Intell., 1, 60

Dalal, 2011, Automatic text classification: A technical review, Int. J. Comput. Appl., 28, 37

Whitney, 2010, Abbreviations for names of rock-forming minerals, Am. Mineral., 95, 185, 10.2138/am.2010.3371

Helm, A. (2003). Recovery and reclamation: A pilgrimage in understanding who and what we are. Psychiatric and Mental Health Nursing: The Craft of Caring, Routledge.

Dhuliawala, S., Kanojia, D., and Bhattacharyya, P. (2016, January 23–28). SlangNet: A WordNet like resource for English Slang. Proceedings of the LREC, Portorož, Slovenia.

Pahwa, 2018, Sentiment Analysis-Strategy for Text Pre-Processing, Int. J. Comput. Appl., 180, 15

Mawardi, V.C., Susanto, N., and Naga, D.S. (2018). Spelling Correction for Text Documents in Bahasa Indonesia Using Finite State Automata and Levinshtein Distance Method. EDP Sci., 164.

Dziadek, 2017, Improving Terminology Mapping in Clinical Text with Context-Sensitive Spelling Correction, Informatics for Health: Connected Citizen-Led Wellness and Population Health, Volume 235, 241

Mawardi, 2018, Fast and Accurate Spelling Correction Using Trie and Bigram, TELKOMNIKA (Telecommun. Comput. Electron. Control), 16, 827, 10.12928/telkomnika.v16i1.6890

Spirovski, K., Stevanoska, E., Kulakov, A., Popeska, Z., and Velinov, G. (2018, January 25–27). Comparison of different model’s performances in task of document classification. Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics, Novi Sad, Serbia.

Singh, 2016, Text stemming: Approaches, applications, and challenges, ACM Compu. Surv. (CSUR), 49, 45

Sampson, G. (2005). The’Language Instinct’Debate: Revised Edition, A&C Black.

Plisson, J., Lavrac, N., and Mladenić, D. (2004, January 13–14). A rule based approach to word lemmatization. Proceedings of the 7th International MultiConference Information Society IS 2004, Ljubljana, Slovenia.

Korenius, T., Laurikkala, J., Järvelin, K., and Juhola, M. (2004, January 8–13). Stemming and lemmatization in the clustering of finnish text documents. Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, Washington, DC, USA.

Caropreso, M.F., and Matwin, S. (2006). Beyond the bag of words: A text representation for sentence selection. Conference of the Canadian Society for Computational Studies of Intelligence, Springer.

Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., and Chanona-Hernández, L. (2012). Syntactic dependency-based n-grams as classification features. Mexican International Conference on Artificial Intelligence, Springer.

1972, A statistical interpretation of term specificity and its application in retrieval, J. Doc., 28, 11, 10.1108/eb026526

Tokunaga, 1994, Text categorization based on weighted inverse document frequency, Inf. Process. Soc. Jpn. SIGNL, 94, 33

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv.

Mikolov, 2013, Distributed representations of words and phrases and their compositionality, Adv. Neural Inf. Process. Syst., 26, 3111

Rong, X. (2014). word2vec parameter learning explained. arXiv.

Maaten, 2008, Visualizing data using t-SNE, J. Mach. Learn. Res., 9, 2579

Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T. (2016). Enriching word vectors with subword information. arXiv.

Melamud, O., Goldberger, J., and Dagan, I. (2016, January 11–12). context2vec: Learning generic context embedding with bidirectional lstm. Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, Berlin, Germany.

Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv.

Abdi, 2010, Principal component analysis, Wiley Interdiscip. Rev. Comput. Stat., 2, 433, 10.1002/wics.101

Jolliffe, 2016, Principal component analysis: A review and recent developments, Philos. Trans. R. Soc. A, 374, 20150202, 10.1098/rsta.2015.0202

Ng, 2015, Principal components analysis. Generative Algorithms, Regularization and Model Selection, CS, 229, 71

Cao, 2003, A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine, Neurocomputing, 55, 321, 10.1016/S0925-2312(03)00433-8

1984, Réseaux de neurones à synapses modifiables: Décodage de messages sensoriels composites par une apprentissage non supervisé et permanent, CR Acad. Sci. Paris, 299, 525

Jutten, 1991, Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture, Signal Process., 24, 1, 10.1016/0165-1684(91)90079-X

Hoyer, 2001, Topographic independent component analysis, Neural Comput., 13, 1527, 10.1162/089976601750264992

Oja, 2000, Independent component analysis: algorithms and applications, Neural Netw., 13, 411, 10.1016/S0893-6080(00)00026-5

Sugiyama, 2007, Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis, J. Mach. Learn. Res., 8, 1027

Balakrishnama, 1998, Linear discriminant analysis-a brief tutorial, Inst. Signal Inf. Process., 18, 1

Sugiyama, M. (2006, January 25–29). Local fisher discriminant analysis for supervised dimensionality reduction. Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA.

Pauca, V.P., Shahnaz, F., Berry, M.W., and Plemmons, R.J. (2004, January 22–24). Text mining using non-negative matrix factorizations. Proceedings of the 2004 SIAM International Conference on Data Mining, Lake Buena Vista, FL, USA.

Tsuge, S., Shishibori, M., Kuroiwa, S., and Kita, K. (2001, January 7–10). Dimensionality reduction using non-negative matrix factorization for information retrieval. Proceedings of the 2001 IEEE International Conference on Systems, Man, and Cybernetics, Tucson, AZ, USA.

Kullback, 1951, On information and sufficiency, Ann. Math. Stat., 22, 79, 10.1214/aoms/1177729694

Johnson, D., and Sinanovic, S. (2019, April 23). Symmetrizing the Kullback-Leibler DistanceIEEE Trans. Available online: https://scholarship.rice.edu/bitstream/handle/1911/19969/Joh2001Mar1Symmetrizi.PDF?sequence=1.

Bingham, E., and Mannila, H. (2001, January 26–29). Random projection in dimensionality reduction: Applications to image and text data. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA.

Chakrabarti, 2003, Fast and accurate text classification via multiple linear discriminant projections, VLDB J., 12, 170, 10.1007/s00778-003-0098-9

Rahimi, 2009, Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning, Adv. Neural Inf. Process. Syst., 21, 1313

Morokoff, 1995, Quasi-monte carlo integration, J. Comput. Phys., 122, 218, 10.1006/jcph.1995.1209

Johnson, 1986, Extensions of lipschitz maps into Banach spaces, Isr. J. Math., 54, 129, 10.1007/BF02764938

Dasgupta, 2003, An elementary proof of a theorem of Johnson and Lindenstrauss, Random Struct. Algorithms, 22, 60, 10.1002/rsa.10073

Vempala, S.S. (2005). The Random Projection Method, American Mathematical Society.

Mao, X., and Yuan, C. (2016). Stochastic Differential Equations with Markovian Switching, World Scientific.

Goodfellow, I., Bengio, Y., Courville, A., and Bengio, Y. (2016). Deep Learning, MIT Press.

Wang, W., Huang, Y., Wang, Y., and Wang, L. (2014, January 23–28). Generalized autoencoder: A neural network framework for dimensionality reduction. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA.

Rumelhart, D.E., Hinton, G.E., and Williams, R.J. (1985). Learning Internal Representations by Error Propagation, California University San Diego, Institute for Cognitive Science. Technical Report.

Liang, 2017, Text feature extraction based on deep learning: A review, EURASIP J. Wirel. Commun. Netw., 2017, 211, 10.1186/s13638-017-0993-1

Baldi, P. (2011, January 2). Autoencoders, unsupervised learning, and deep architectures. Proceedings of the ICML Workshop on Unsupervised and Transfer Learning, Bellevue, WA, USA.

AP, 2014, An autoencoder approach to learning bilingual word representations, Adv. Neural Inf. Process. Syst., 27, 1853

Masci, J., Meier, U., Cireşan, D., and Schmidhuber, J. (2011). Stacked convolutional auto-encoders for hierarchical feature extraction. International Conference on Artificial Neural Networks, Springer.

Chen, K., Seuret, M., Liwicki, M., Hennebert, J., and Ingold, R. (2015, January 23–26). Page segmentation of historical document images with convolutional autoencoders. Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia.

Geng, 2015, High-resolution SAR image classification via deep convolutional autoencoders, IEEE Geosci. Remote Sens. Lett., 12, 2351, 10.1109/LGRS.2015.2478256

Sutskever, 2014, Sequence to sequence learning with neural networks, Adv. Neural Inf. Process. Syst., 27, 3104

Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv.

Hinton, 2002, Stochastic neighbor embedding, Adv. Neural Inf. Process. Syst., 15, 857

Joyce, J.M. (2011). Kullback-leibler divergence. International Encyclopedia of Statistical Science, Springer.

Rocchio, J.J. (1971). Relevance feedback in information retrieval. The SMART Retrieval System: Experiments in Automatic Document Processing, Englewood Cliffs.

Partalas, I., Kosmopoulos, A., Baskiotis, N., Artieres, T., Paliouras, G., Gaussier, E., Androutsopoulos, I., Amini, M.R., and Galinari, P. (2015). LSHTC: A benchmark for large-scale text classification. arXiv.

Sowmya, B., and Srinivasa, K. (2016, January 6–8). Large scale multi-label text classification of a hierarchical data set using Rocchio algorithm. Proceedings of the 2016 International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), Bangalore, India.

Korde, 2012, Text classification and classifiers: A survey, Int. J. Artif. Intell. Appl., 3, 85

Selvi, S.T., Karthikeyan, P., Vincent, A., Abinaya, V., Neeraja, G., and Deepika, R. (2017, January 19–21). Text categorization using Rocchio algorithm and random forest algorithm. Proceedings of the 2016 Eighth International Conference on Advanced Computing (ICoAC), Chennai, India.

Albitar, S., Espinasse, B., and Fournier, S. (2012, January 10–12). Towards a Supervised Rocchio-based Semantic Classification of Web Pages. Proceedings of the KES, San Sebastian, Spain.

Farzi, 2016, Estimation of organic facies using ensemble methods in comparison with conventional intelligent approaches: A case study of the South Pars Gas Field, Persian Gulf, Iran, Model. Earth Syst. Environ., 2, 105, 10.1007/s40808-016-0165-z

Bauer, 1999, An empirical comparison of voting classification algorithms: Bagging, boosting, and variants, Mach. Learn., 36, 105, 10.1023/A:1007515423169

Schapire, 1990, The strength of weak learnability, Mach. Learn., 5, 197, 10.1007/BF00116037

Freund, Y. (1992, January 27–29). An improved boosting algorithm and its implications on learning complexity. Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, USA.

Bloehdorn, S., and Hotho, A. (2004). Boosting for text classification with semantic features. International Workshop on Knowledge Discovery on the Web, Springer.

Freund, Y., Kearns, M., Mansour, Y., Ron, D., Rubinfeld, R., and Schapire, R.E. (1995, January 23–25). Efficient algorithms for learning to play repeated games against computationally bounded adversaries. Proceedings of the 36th Annual Symposium on Foundations of Computer Science, Milwaukee, WI, USA.

Breiman, 1996, Bagging predictors, Mach. Learn., 24, 123, 10.1007/BF00058655

Geurts, P. (2000). Some enhancements of decision tree bagging. European Conference on Principles of Data Mining and Knowledge Discovery, Springer.

Cox, D.R. (2018). Analysis of Binary Data, Routledge.

Fan, 2008, LIBLINEAR: A library for large linear classification, J. Mach. Learn. Res., 9, 1871

Genkin, 2007, Large-scale Bayesian logistic regression for text categorization, Technometrics, 49, 291, 10.1198/004017007000000245

Juan, 2002, On the use of Bernoulli mixture models for text classification, Pattern Recogn., 35, 2705, 10.1016/S0031-3203(01)00242-4

Cheng, 2009, Combining instance-based learning and logistic regression for multilabel classification, Mach. Learn., 76, 211, 10.1007/s10994-009-5127-5

Krishnapuram, 2005, Sparse multinomial logistic regression: Fast algorithms and generalization bounds, IEEE Trans. Pattern Anal. Mach. Intell., 27, 957, 10.1109/TPAMI.2005.127

Huang, K. (2015). Unconstrained Smartphone Sensing and Empirical Study for Sleep Monitoring and Self-Management. [Ph.D. Thesis, University of Massachusetts Lowell].

Guerin, A. (2016). Using Demographic Variables and In-College Attributes to Predict Course-Level Retention for Community College Spanish Students, Northcentral University.

Kaufmann, S. (1969). CUBA: Artificial Conviviality and User-Behaviour Analysis in Web-Feeds. [PhD Thesis, Universität Hamburg].

Porter, 1980, An algorithm for suffix stripping, Program, 14, 130, 10.1108/eb046814

Pearson, 1925, Bayes’ theorem, examined in the light of experimental sampling, Biometrika, 17, 388, 10.1093/biomet/17.3-4.388

Hill, 1968, Posterior distribution of percentiles: Bayes’ theorem for sampling from a population, J. Am. Stat. Assoc., 63, 677, 10.1080/01621459.1968.11009286

Qu, Z., Song, X., Zheng, S., Wang, X., Song, X., and Li, Z. (2018, January 15–17). Improved Bayes Method Based on TF-IDF Feature and Grade Factor Feature for Chinese Information Classification. Proceedings of the 2018 IEEE International Conference on Big Data and Smart Computing (BigComp), Shanghai, China.

Kim, 2006, Some effective techniques for naive bayes text classification, IEEE Trans. Knowl. Data Eng., 18, 1457, 10.1109/TKDE.2006.180

Frank, E., and Bouckaert, R.R. (2006). Naive bayes for text classification with unbalanced classes. European Conference on Principles of Data Mining and Knowledge Discovery, Springer.

Liu, 2009, Imbalanced text classification: A term weighting approach, Expert Syst. Appl., 36, 690, 10.1016/j.eswa.2007.10.042

Soheily-Khah, S., Marteau, P.F., and Béchet, N. (2017). Intrusion detection in network systems through hybrid supervised and unsupervised mining process-a detailed case study on the ISCX benchmark data set. HAL.

Wang, 2012, Nonparametric bayesian estimation of periodic light curves, Astrophys. J., 756, 67, 10.1088/0004-637X/756/1/67

Ranjan, M.N.M., Ghorpade, Y.R., Kanthale, G.R., Ghorpade, A.R., and Dubey, A.S. (2017). Document Classification Using LSTM Neural Network. J. Data Min. Manag., 2, Available online: http://matjournals.in/index.php/JoDMM/article/view/1534.

Jiang, 2012, An improved K-nearest-neighbor algorithm for text categorization, Expert Syst. Appl., 39, 1503, 10.1016/j.eswa.2011.08.040

Han, E.H.S., Karypis, G., and Kumar, V. (2001). Text categorization using weight adjusted k-nearest neighbor classification. Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer.

Salton, G. (1989). Automatic Text Processing: The Transformation, Analysis, and Retrieval of, Addison-Wesley.

Sahgal, D., and Ramesh, A. (2002, January 1–3). On Road Vehicle Detection Using Gabor Wavelet Features with Various Classification Techniques. Proceedings of the 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628), Santorini, Greece.

Patel, D., and Srivastava, T. Ant Colony Optimization Model for Discrete Tomography Problems. Proceedings of the Third International Conference on Soft Computing for Problem Solving.

Sahgal, D., and Parida, M. (2014). Object Recognition Using Gabor Wavelet Features with Various Classification Techniques. Proceedings of the Third International Conference on Soft Computing for Problem Solving, Springer.

Sanjay, 2018, Comparing Existing Methods for Predicting the Detection of Possibilities of Blood Cancer by Analyzing Health Data, Int. J. Innov. Res. Sci. Technol., 4, 10

Vapnik, 1964, A class of algorithms for pattern recognition learning, Avtomat. Telemekh, 25, 937

Boser, B.E., Guyon, I.M., and Vapnik, V.N. (1992, January 27–29). A training algorithm for optimal margin classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, USA.

Bo, 2006, SVM Multi-Class Classification, J. Data Acquis. Process., 3, 017

Mohri, M., Rostamizadeh, A., and Talwalkar, A. (2012). Foundations of Machine Learning, MIT Press.

Chen, 2016, Turning from TF-IDF to TF-IGM for term weighting in text classification, Expert Syst. Appl., 66, 245, 10.1016/j.eswa.2016.09.009

Weston, J., and Watkins, C. (1998). Multi-Class Support Vector Machines, Department of Computer Science, Royal Holloway, University of London. Technical Report CSD-TR-98-04.

Zhang, 2008, Text classification based on multi-word with support vector machine, Knowl.-Based Syst., 21, 879, 10.1016/j.knosys.2008.03.044

Lodhi, 2002, Text classification using string kernels, J. Mach. Learn. Res., 2, 419

Leslie, 2002, The spectrum kernel: A string kernel for SVM protein classification, Biocomputing, 7, 566

Eskin, 2002, Mismatch string kernels for SVM protein classification, Adv. Neural Inf. Process. Syst., 15, 1417

Singh, R., Kowsari, K., Lanchantin, J., Wang, B., and Qi, Y. (2017). GaKCo: A Fast and Scalable Algorithm for Calculating Gapped k-mer string Kernel using Counting. bioRxiv.

Sun, A., and Lim, E.P. (December, January 29). Hierarchical text classification and evaluation. Proceedings of the IEEE International Conference on Data Mining (ICDM 2001), San Jose, CA, USA.

Sebastiani, 2002, Machine learning in automated text categorization, ACM Comput. Surv. (CSUR), 34, 1, 10.1145/505282.505283

Maron, 1998, A framework for multiple-instance learning, Adv. Neural Inf. Process. Syst., 10, 570

Andrews, 2002, Support vector machines for multiple-instance learning, Adv. Neural Inf. Process. Syst., 15, 577

Karamizadeh, S., Abdullah, S.M., Halimi, M., Shayan, J., and Javad Rajabi, M. (2014, January 2–4). Advantage and drawback of support vector machine functionality. Proceedings of the 2014 International Conference on Computer, Communications and Control Technology (I4CT), Langkawi, Malaysia.

Guo, G. (2014). Soft biometrics from face images using support vector machines. Support Vector Machines Applications, Springer.

Morgan, 1963, Problems in the analysis of survey data, and a proposal, J. Am. Stat. Assoc., 58, 415, 10.1080/01621459.1963.10500855

Safavian, 1991, A survey of decision tree classifier methodology, IEEE Trans. Syst. Man Cybern., 21, 660, 10.1109/21.97458

Magerman, D.M. (1995, January 26–30). Statistical decision-tree models for parsing. Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, Cambridge, MA, USA.

Quinlan, 1986, Induction of decision trees, Mach. Learn., 1, 81, 10.1007/BF00116251

1991, A distance-based attribute selection measure for decision tree induction, Mach. Learn., 6, 81, 10.1023/A:1022694001379

Giovanelli, C., Liu, X., Sierla, S., Vyatkin, V., and Ichise, R. (November, January 29). Towards an aggregator that exploits big data to bid on frequency containment reserve market. Proceedings of the 43rd Annual Conference of the IEEE Industrial Electronics Society (IECON 2017), Beijing, China.

Quinlan, 1987, Simplifying decision trees, Int. J. Man-Mach. Stud., 27, 221, 10.1016/S0020-7373(87)80053-6

Jasim, D.S. (2019, April 23). Data Mining Approach and Its Application to Dresses Sales Recommendation. Available online: https://www.researchgate.net/profile/Dalia_Jasim/publication/293464737_main_steps_for_doing_data_mining_project_using_weka/links/56b8782008ae44bb330d2583/main-steps-for-doing-data-mining-project-using-weka.pdf.

Ho, T.K. (August, January Canada). Random decision forests. Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC.

Breiman, L. (1999). Random Forests, University of California. UC Berkeley TR567.

Wu, 2004, Probability estimates for multi-class classification by pairwise coupling, J. Mach. Learn. Res., 5, 975

Bansal, H., Shrivastava, G., Nhu, N., and Stanciu, L. (2018). Social Network Analytics for Contemporary Business Organizations, IGI Global.

Sutton, 2012, An introduction to conditional random fields, Found. Trends® Mach. Learn., 4, 267, 10.1561/2200000013

Vail, D.L., Veloso, M.M., and Lafferty, J.D. (2007, January 14–18). Conditional random fields for activity recognition. Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, Honolulu, HI, USA.

Chen, 2017, Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN, Expert Syst. Appl., 72, 221, 10.1016/j.eswa.2016.10.065

Sutton, C., and McCallum, A. (2006). An introduction to conditional random fields for relational learning. Introduction to Statistical Relational Learning, MIT Press.

Tseng, H., Chang, P., Andrew, G., Jurafsky, D., and Manning, C. (2005, January 14–15). A conditional random field word segmenter for sighan bakeoff 2005. Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea.

Nair, V., and Hinton, G.E. (2010, January 21–24). Rectified linear units improve restricted boltzmann machines. Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel.

Sutskever, I., Martens, J., and Hinton, G.E. (July, January 28). Generating text with recurrent neural networks. Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, WA, USA.

Mandic, D.P., and Chambers, J.A. (2001). Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and Stability, Wiley Online Library.

Bengio, 1994, Learning long-term dependencies with gradient descent is difficult, IEEE Trans. Neural Netw., 5, 157, 10.1109/72.279181

Hochreiter, 1997, Long short-term memory, Neural Comput., 9, 1735, 10.1162/neco.1997.9.8.1735

Graves, 2005, Framewise phoneme classification with bidirectional LSTM and other neural network architectures, Neural Netw., 18, 602, 10.1016/j.neunet.2005.06.042

Pascanu, 2013, On the difficulty of training recurrent neural networks, ICML, 28, 1310

Chung, J., Gulcehre, C., Cho, K., and Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv.

Jaderberg, 2016, Reading text in the wild with convolutional neural networks, Int. J. Comput. Vis., 116, 1, 10.1007/s11263-015-0823-z

LeCun, 1998, Gradient-based learning applied to document recognition, Proc. IEEE, 86, 2278, 10.1109/5.726791

Scherer, D., Müller, A., and Behnke, S. (2010, January 15–18). Evaluation of pooling operations in convolutional architectures for object recognition. Proceedings of the Artificial Neural Networks–ICANN 2010, Thessaloniki, Greece.

Johnson, R., and Zhang, T. (2014). Effective use of word order for text categorization with convolutional neural networks. arXiv.

Hinton, 2002, Training products of experts by minimizing contrastive divergence, Neural Comput., 14, 1771, 10.1162/089976602760128018

Hinton, 2006, A fast learning algorithm for deep belief nets, Neural Comput., 18, 1527, 10.1162/neco.2006.18.7.1527

Mohamed, 2012, Acoustic modeling using deep belief networks, IEEE Trans. Audio Speech Lang. Process., 20, 14, 10.1109/TASL.2011.2109382

Yang, Z., Yang, D., Dyer, C., He, X., Smola, A.J., and Hovy, E.H. (2016, January 12–17). Hierarchical Attention Networks for Document Classification. Proceedings of the HLT-NAACL, San Diego, CA, USA.

Seo, P.H., Lin, Z., Cohen, S., Shen, X., and Han, B. (2016). Hierarchical attention networks. arXiv.

Bottou, L. Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT’2010.

Tieleman, 2012, Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude, COURSERA Neural Netw. Mach. Learn., 4, 26

Kingma, D., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv.

Duchi, 2011, Adaptive subgradient methods for online learning and stochastic optimization, J. Mach. Learn. Res., 12, 2121

Zeiler, M.D. (2012). ADADELTA: An adaptive learning rate method. arXiv.

Wang, B., Xu, J., Li, J., Hu, C., and Pan, J.S. (2017, January 3–5). Scene text recognition algorithm based on faster RCNN. Proceedings of the 2017 First International Conference on Electronics Instrumentation & Information Systems (EIIS), Harbin, China.

Zhou, C., Sun, C., Liu, Z., and Lau, F. (2015). A C-LSTM neural network for text classification. arXiv.

Shwartz-Ziv, R., and Tishby, N. (2017). Opening the black box of deep neural networks via information. arXiv.

Gray, A., and MacDonell, S. (2019, April 23). Alternatives to Regression Models for Estimating Software Projects. Available online: https://www.researchgate.net/publication/2747623_Alternatives_to_Regression_Models_for_Estimating_Software_Projects.

Shrikumar, A., Greenside, P., and Kundaje, A. (2017). Learning important features through propagating activation differences. arXiv.

Anthes, 2013, Deep learning comes of age, Commun. ACM, 56, 13, 10.1145/2461256.2461262

Lampinen, A.K., and McClelland, J.L. (2017). One-shot and few-shot learning of word embeddings. arXiv.

Severyn, A., and Moschitti, A. (2015, January 9–13). Learning to rank short text pairs with convolutional deep neural networks. Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile.

Gowda, H.S., Suhil, M., Guru, D., and Raju, L.N. (2016). Semi-supervised text categorization using recursive K-means clustering. International Conference on Recent Trends in Image Processing and Pattern Recognition, Springer.

Kowsari, K. (2014). Investigation of Fuzzyfind Searching with Golay Code Transformations. [Ph.D. Thesis, Department of Computer Science, The George Washington University].

Kowsari, K., Yammahi, M., Bari, N., Vichr, R., Alsaby, F., and Berkovich, S.Y. (2015). Construction of fuzzyfind dictionary using golay coding transformation for searching applications. arXiv.

Chapelle, O., and Zien, A. (2005, January 6–8). Semi-Supervised Classification by Low Density Separation. Proceedings of the AISTATS, The Savannah Hotel, Barbados.

Nigam, K., McCallum, A., and Mitchell, T. (2006). Semi-supervised text classification using EM. Semi-Supervised Learning, MIT Press.

Shi, L., Mihalcea, R., and Tian, M. (2010, January 9–11). Cross language text classification by model translation and semi-supervised learning. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Cambridge, MA, USA.

Zhou, 2014, Fuzzy deep belief networks for semi-supervised sentiment classification, Neurocomputing, 131, 312, 10.1016/j.neucom.2013.10.011

Yang, 1999, An evaluation of statistical approaches to text categorization, Inf. Retr., 1, 69, 10.1023/A:1009982220290

Lever, 2016, Points of significance: Classification evaluation, Nat. Methods, 13, 603, 10.1038/nmeth.3945

Manning, C.D., Raghavan, P., and Schütze, H. (2008). Matrix decompositions and latent semantic indexing. Introduction to Information Retrieval, Cambridge University Press.

Tsoumakas, G., Katakis, I., and Vlahavas, I. (2009). Mining multi-label data. Data Mining and Knowledge Discovery Handbook, Springer.

Yonelinas, 2007, Receiver operating characteristics (ROCs) in recognition memory: A review, Psychol. Bull., 133, 800, 10.1037/0033-2909.133.5.800

Japkowicz, 2002, The class imbalance problem: A systematic study, Intell. Data Anal., 6, 429, 10.3233/IDA-2002-6504

Bradley, 1997, The use of the area under the ROC curve in the evaluation of machine learning algorithms, Pattern Recogn., 30, 1145, 10.1016/S0031-3203(96)00142-2

Hand, 2001, A simple generalisation of the area under the ROC curve for multiple class classification problems, Mach. Learn., 45, 171, 10.1023/A:1010920819831

Wu, 2008, Interpreting tf-idf term weights as making relevance decisions, ACM Trans. Inf. Syst. (TOIS), 26, 13, 10.1145/1361684.1361686

Rezaeinia, S.M., Ghodsi, A., and Rahmani, R. (2017). Improving the Accuracy of Pre-trained Word Embeddings for Sentiment Analysis. arXiv.

Sharma, 2007, Fast principal component analysis using fixed-point algorithm, Pattern Recogn. Lett., 28, 1151, 10.1016/j.patrec.2007.01.012

Putthividhya, D.P., and Hu, J. (2011, January 27–31). Bootstrapped named entity recognition for product attribute extraction. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK.

Banerjee, M. (2011). A Utility-Aware Privacy Preserving Framework for Distributed Data Mining with Worst Case Privacy Guarantee, University of Maryland.

Chen, J., Yan, S., and Wong, K.C. (2018). Verbal aggression detection on Twitter comments: Convolutional neural network for short-text sentiment analysis. Neural Comput. Appl., 1–10.

Zhang, 2015, Character-level convolutional networks for text classification, Adv. Neural Inf. Process. Syst., 28, 649

Schütze, H., Manning, C.D., and Raghavan, P. (2008). Introduction to Information Retrieval, Cambridge University Press.

Hoogeveen, 2018, Web forum retrieval and text analytics: A survey, Found. Trends® Inf. Retr., 12, 1, 10.1561/1500000062

Dwivedi, S.K., and Arya, C. (2016, January 4–5). Automatic Text Classification in Information retrieval: A Survey. Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies, Udaipur, India.

Jones, 1971, Automatic keyword classification for information retrieval, Libr. Q., 41, 338, 10.1086/619985

O’Riordan, C., and Sorensen, H. (1997, January 28–31). Information filtering and retrieval: An overview. Proceedings of the 16th Annual International Conference of the IEEE, Atlanta, GA, USA.

Buckley, C. (1985). Implementation of the SMART Information Retrieval System, Cornell University. Technical Report.

Pang, 2008, Opinion mining and sentiment analysis, Found. Trends® Inf. Retr., 2, 1, 10.1561/1500000011

Liu, B., and Zhang, L. (2012). A survey of opinion mining and sentiment analysis. Mining Text Data, Springer.

Pang, 2002, Thumbs up?: Sentiment classification using machine learning techniques, ACL-02 Conference on Empirical Methods in Natural Language Processing, Volume 10, 79, 10.3115/1118693.1118704

Aggarwal, C.C. (2016). Content-based recommender systems. Recommender Systems, Springer.

Pazzani, M.J., and Billsus, D. (2007). Content-based recommendation systems. The Adaptive Web, Springer.

Sumathy, 2013, Text mining: Concepts, applications, tools and issues—An overview, Int. J. Comput. Appl., 80, 29

Heidarysafa, M., Kowsari, K., Barnes, L.E., and Brown, D.E. (2018, January 17–20). Analysis of Railway Accidents’ Narratives Using Deep Learning. Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA.

Mani, I. (1999). Advances in Automatic Text Summarization, MIT Press.

Cao, Z., Li, W., Li, S., and Wei, F. (2017, January 4–9). Improving Multi-Document Summarization via Text Classification. Proceedings of the AAAI, San Francisco, CA, USA.

March, 2011, Combining Bayesian text classification and shrinkage to automate healthcare coding: A data quality analysis, J. Data Inf. Qual. (JDIQ), 2, 13

Zhang, 2018, Patient2Vec: A Personalized Interpretable Deep Representation of the Longitudinal Electronic Health Record, IEEE Access, 6, 65333, 10.1109/ACCESS.2018.2875677

Trieschnigg, 2009, MeSH Up: Effective MeSH text classification for improved document retrieval, Bioinformatics, 25, 1412, 10.1093/bioinformatics/btp249

Ofoghi, B., and Verspoor, K. (2017). Textual Emotion Classification: An Interoperability Study on Cross-Genre data sets. Australasian Joint Conference on Artificial Intelligence, Springer.

Pennebaker, J., Booth, R., Boyd, R., and Francis, M. (2015). Linguistic Inquiry and Word Count: LIWC2015, Pennebaker Conglomerates. Available online: www.LIWC.net.

Paul, 2017, Social Monitoring for Public Health, Synth. Lect. Inf. Concepts Retr. Serv., 9, 1

Yu, B., and Kwok, L. (2011, January 24–28). Classifying business marketing messages on Facebook. Proceedings of the Association for Computing Machinery Special Interest Group on Information Retrieval, Bejing, China.

Kang, 2018, Opinion mining using ensemble text hidden Markov models for text classification, Expert Syst. Appl., 94, 218, 10.1016/j.eswa.2017.07.019

Turtle, 1995, Text retrieval in the legal world, Artif. Intell. Law, 3, 5, 10.1007/BF00877694

Bergman, P., and Berman, S.J. (2016). Represent Yourself in Court: How to Prepare &amp Try a Winning Case, Nolo.