Multi-task learning in under-resourced Dravidian languages

Journal of Data, Information and Management - Tập 4 - Trang 137-165 - 2022
Adeep Hande1, Siddhanth U. Hegde2, Bharathi Raja Chakravarthi3
1Indian Institute of Information Technology Tiruchirappalli, Tiruchirappalli, India
2University Visvesvaraya College of Engineering, Bangalore University, Bangalore, India
3Insight SFI Research Centre for Data Analytics, National University of Ireland Galway, Galway, Ireland

Tóm tắt

It is challenging to obtain extensive annotated data for under-resourced languages, so we investigate whether it is beneficial to train models using multi-task learning. Sentiment analysis and offensive language identification share similar discourse properties. The selection of these tasks is motivated by the lack of large labelled data for user-generated code-mixed datasets. This paper works with code-mixed YouTube comments for Tamil, Malayalam, and Kannada languages. Our framework is applicable to other sequence classification problems irrespective to the size of the datasets. Experiments show that our multi-task learning model can achieve high results compared to single-task learning while reducing the time and space constraints required to train the models on individual tasks. Analysis of fine-tuned models indicates the preference of multi-task learning over single task learning resulting in a higher weighted F1 score on all three languages. We apply two multi-task learning approaches to three Dravidian languages, Kannada, Malayalam, and Tamil. Maximum scores on Kannada and Malayalam were achieved by mBERT subjected to cross entropy loss and with an approach of hard parameter sharing. Best scores on Tamil was achieved by DistilBERT subjected to cross entropy loss with soft parameter sharing as the architecture type. For the tasks of sentiment analysis and offensive language identification, the best performing model scored a weighted F1-Score of (66.8%, 90.5%), (59%, 70%) and (62.1%,75.3%) for Kannada, Malayalam and Tamil on sentiment analysis and offensive language identification respectively.

Tài liệu tham khảo

Anagha M, Kumar RR, Sreetha K, Raj PR (2015) Fuzzy logic based hybrid approach for sentiment analysis of malayalam movie reviews. In: 2015 IEEE International conference on signal processing, informatics, communication and energy systems (SPICES), IEEE, pp 1–4 Appidi AR, Srirangam VK, Suhas D, Shrivastava M (2020) Creation of corpus and analysis in code-mixed Kannada-English Twitter data for emotion prediction. In: Proceedings of the 28th international conference on computational linguistics, International Committee on Computational Linguistics, Barcelona, Spain (Online), pp 6703–6709. https://www.aclweb.org/anthology/2020.coling-main.587 del Arco FMP, Molina-González MD, Ureña-López LA, Martín-Valdivia MT (2021) Comparing pre-trained language models for spanish hate speech detection. Expert Syst Appl 166(114):120. https://doi.org/10.1016/j.eswa.2020.114120 Asperti A, Trentin M (2020) Balancing reconstruction error and kullback-leibler divergence in variational autoencoders. IEEE Access 8:199,440–199,448. https://doi.org/10.1109/ACCESS.2020.3034828 Bali K, Sharma J, Choudhury M, Vyas Y (2014) “I am borrowing ya mixing ?” an analysis of English-Hindi code mixing in Facebook. In: Proceedings of the First workshop on computational approaches to code switching, Association for Computational Linguistics, Doha, Qatar, pp 116–126. https://doi.org/10.3115/v1/W14-3914, https://www.aclweb.org/anthology/W14-3914 Banerjee S, Chakravarthi BR, McCrae JP (2020) Comparison of pretrained embeddings to identify hate speech in indian code-mixed text. In: 2020 2Nd international conference on advances in computing, communication control and networking (ICACCCN), IEEE, pp 21–25 Barman U, Das A, Wagner J, Foster J (2014) Code mixing: A challenge for language identification in the language of social media. In: Proceedings of the First workshop on computational approaches to code switching, Association for Computational Linguistics, Doha, Qatar, pp 13–23. https://doi.org/10.3115/v1/W14-3902, https://www.aclweb.org/anthology/W14-3902 Bhat S (2012) Morpheme segmentation for Kannada standing on the shoulder of giants. In: Proceedings of the 3rd workshop on south and southeast asian natural language processing, The COLING 2012 Organizing Committee, Mumbai, India, pp 79–94. https://www.aclweb.org/anthology/W12-5007 Bisong E (2019) Google Colaboratory. Apress, Berkeley, CA, pp 59–64. https://doi.org/10.1007/978-1-4842-4470-8_7 Brownlee J (2019) How to calculate the kl divergence for machine learning. Available at http://machinelearningmastery.com/divergence-between-probability-distributions/l Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75. https://doi.org/10.1023/A:1007379606734 Chakravarthi BR (2020) HopeEDI: A multilingual hope speech detection dataset for equality, diversity, and inclusion. In: Proceedings of the Third workshop on computational modeling of people’s opinions, personality, and emotion’s in social media, Association for Computational Linguistics, Barcelona, Spain (Online), pp 41–53. https://www.aclweb.org/anthology/2020.peoples-1.5 Chakravarthi BR, Jose N, Suryawanshi S, Sherly E, McCrae JP (2020) A sentiment analysis dataset for code-mixed Malayalam-English. In: Proceedings of the 1st joint workshop on spoken language technologies for under-resourced languages (SLTU) and collaboration and computing for under-resourced languages (CCURL), European Language Resources association, Marseille, France, pp 177–184. https://www.aclweb.org/anthology/2020.sltu-1.25 Chakravarthi BR, Muralidaran V, Priyadharshini R, McCrae JP (2020) Corpus creation for sentiment analysis in code-mixed Tamil-English text. In: Proceedings of the 1st joint workshop on spoken language technologies for under-resourced languages (SLTU) and collaboration and computing for under-resourced languages (CCURL), European Language Resources association, Marseille, France, pp 202–210. https://www.aclweb.org/anthology/2020.sltu-1.28 Chakravarthi BR, Priyadharshini R, Jose NMAK, Mandl T, Kumaresan PK, Ponnusamy RVH, McCrae John Philip Sherly E (2021) Findings of the shared task on Offensive Language Identification in Tamil, Malayalam, and Kannada. In: Proceedings of the First workshop on speech and language technologies for dravidian languages. Association for Computational Linguistics Chakravarthi BR, Priyadharshini R, Muralidaran V, Jose N, Suryawanshi S, Sherly E, McCrae JP (2021) Dravidiancodemix: Sentiment analysis and offensive language identification dataset for dravidian languages in code-mixed text. Language Resources and Evaluation Changpinyo S, Hu H, Sha F (2018) Multi-task learning for sequence tagging: An empirical study. In: Proceedings of the 27th international conference on computational linguistics, Association for Computational Linguistics, Santa Fe, New Mexico, USA, pp 2965–2977. https://www.aclweb.org/anthology/C18-1251 Chhablani G, Bhartia Y, Sharma A, Pandey H, Suthaharan S (2021) Nlrg at semeval-2021 task 5: Toxic spans detection leveraging bert-based token classification and span prediction techniques Clarke I, Grieve J (2017) Dimensions of abusive language on Twitter. In: Proceedings of the First workshop on abusive language online, Association for Computational Linguistics, Vancouver, BC, Canada, pp 1–10. https://doi.org/10.18653/v1/W17-3001, https://www.aclweb.org/anthology/W17-3001 Conneau A, Khandelwal K, Goyal N, Chaudhary V, Wenzek G, Guzmán F, Grave E, Ott M, Zettlemoyer L, Stoyanov V (2020) Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th annual meeting of the association for computational linguistics, Association for Computational Linguistics, Online, pp 8440–8451. https://doi.org/10.18653/v1/2020.acl-main.747, https://www.aclweb.org/anthology/2020.acl-main.747 Crawshaw M (2020) Multi-task learning with deep neural networks: A survey. arXiv:2009.09796 Dadvar M, Trieschnigg D, Ordelman R, de Jong F (2013) Improving cyberbullying detection with user context. In: Serdyukov P, Braslavski P, Kuznetsov SO, Kamps J., Rüger S, Agichtein E, Segalovich I, Yilmaz E (eds) Advances in information retrieval. Springer, Berlin, Heidelberg, pp 693–696 Dai Z, Yang Z, Yang Y, Carbonell J, Le Q, Salakhutdinov R (2019) Transformer-XL: Attentive language models beyond a fixed-length context. In: Proceedings of the 57th annual meeting of the association for computational linguistics, Association for Computational Linguistics, Florence, Italy, pp 2978–2988. https://doi.org/10.18653/v1/P19-1285, https://www.aclweb.org/anthology/P19-1285 Dai Z, Yang Z, Yang Y, Carbonell JG, Le QV, Salakhutdinov R (2019) Transformer-xl: Attentive language models beyond a fixed-length context. arXiv:abs/1901.02860. 1901.02860 Das A, Bandyopadhyay S (2010) SentiWordNet for Indian languages. In: Proceedings of the Eighth workshop on asian language resouces, Coling 2010 Organizing Committee, Beijing, China, pp 56–63. https://www.aclweb.org/anthology/W10-3208 Das A, Gambäck B (2014) Identifying languages at the word level in code-mixed Indian social media text. In: Proceedings of the 11th international conference on natural language processing, NLP Association of India, Goa, India, pp 378–387. https://www.aclweb.org/anthology/W14-5152 De Boer PT, Kroese DP, Mannor S, Rubinstein RY (2005) A tutorial on the cross-entropy method. Annals of Operations Research 134(1):19–67 Deng J, Dong W, Socher R, Li L (2009) Kai Li, Li fei-fei: imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848 Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, pp 4171–4186. https://doi.org/10.18653/v1/N19-1423, https://www.aclweb.org/anthology/N19-1423 Djuric N, Zhou J, Morris R, Grbovic M, Radosavljevic V, Bhamidipati N (2015) Hate speech detection with comment embeddings. In: Proceedings of the 24th international conference on world wide web, pp 29–30 Dobrescu A, Giuffrida MV, Tsaftaris SA (2020) Doing more with less: a multitask deep learning approach in plant phenotyping. Frontiers in plant science 11 Duong L, Cohn T, Bird S, Cook P (2015) Low resource dependency parsing: Cross-lingual parameter sharing in a neural network parser. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 2: Short Papers), Association for Computational Linguistics, Beijing, China, pp 845–850. https://doi.org/10.3115/v1/P15-2139, https://www.aclweb.org/anthology/P15-2139 Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: 2015 IEEE International conference on computer vision (ICCV), pp 2650–2658 El Boukkouri H, Ferret O, Lavergne T, Noji H, Zweigenbaum P, Tsujii J (2020) CharacterBERT: Reconciling ELMo and BERT for word-level open-vocabulary representations from characters. In: Proceedings of the 28th international conference on computational linguistics, International Committee on Computational Linguistics, Barcelona, Spain (Online), pp 6903–6915. https://www.aclweb.org/anthology/2020.coling-main.609 Ghanghor NK, Krishnamurthy P, Thavareesan S, Priyadharshini R (2021) Chakravarthi, B.R.: IIITK@dravidianlangtech-EACL2021: Offensive Language Identification and Meme Classification in Tamil, Malayalam and Kannada. In: Proceedings of the First workshop on speech and language technologies for dravidian languages. Association for Computational Linguistics, Online Grégoire F, Langlais P (2018) Extracting parallel sentences with bidirectional recurrent neural networks to improve machine translation. In: Proceedings of the 27th international conference on computational linguistics, Association for Computational Linguistics, Santa Fe, New Mexico, USA, pp 1442–1453. https://www.aclweb.org/anthology/C18-1122 Hande A, Priyadharshini R, Chakravarthi BR (2020) KanCMD: Kannada CodeMixed dataset for sentiment analysis and offensive language detection. In: Proceedings of the Third workshop on computational modeling of people’s opinions, personality, and emotion’s in social media, Association for Computational Linguistics, Barcelona, Spain (Online), pp 54–63. https://www.aclweb.org/anthology/2020.peoples-1.6 Jain K, Deshpande A, Shridhar K, Laumann F, Dash A (2020) Indic-transformers: An analysis of transformer language models for indian languages Jin N, Wu J, Ma X, Yan K, Mo Y (2020) Multi-task learning model based on multi-scale cnn and lstm for sentiment classification. IEEE Access 8:77,060–77,072. https://doi.org/10.1109/ACCESS.2020.2989428 Kakwani D, Kunchukuttan A, Golla SNCG, Bhattacharyya A, Khapra MM, Kumar P (2020) IndicNLPSuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages. In: Findings of the association for computational linguistics: EMNLP 2020, Association for Computational Linguistics, Online, pp 4948–4961. https://doi.org/10.18653/v1/2020.findings-emnlp.445, https://www.aclweb.org/anthology/2020.findings-emnlp.445 Khanuja S, Bansal D, Mehtani S, Khosla S, Dey A, Gopalan B, Margam DK, Aggarwal P, Nagipogu RT, Dave S et al (2021) Muril:, Multilingual representations for indian languages. arXiv:2103.10730 Kingma DP, Ba J (2014) Adam:, A method for stochastic optimization. arXiv:1412.6980 Kuchaiev O, Ginsburg B (2017) Factorization tricks for LSTM networks. arXiv:abs/1703.10722 Kudo T (2018) Subword regularization: Improving neural network translation models with multiple subword candidates. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Melbourne, Australia, pp 66–75. https://doi.org/10.18653/v1/P18-1007, https://www.aclweb.org/anthology/P18-1007 Kudo T, Richardson J (2018) SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In: Proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations, Association for Computational Linguistics, Brussels, Belgium, pp 66–71. https://doi.org/10.18653/v1/D18-2012, https://www.aclweb.org/anthology/D18-2012 Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Statist 22(1):79–86. https://doi.org/10.1214/aoms/1177729694 Kumar R, Ojha AK, Lahiri B, Zampieri M, Malmasi S, Murdock V, Kadar D (eds) (2020) Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying. European Language Resources Association (ELRA), Marseille, France. https://www.aclweb.org/anthology/2020.trac-1.0 Kumar SS, Kumar MA, Soman K, Poornachandran P (2020) Dynamic mode-based feature with random mapping for sentiment analysis. In: Intelligent systems, technologies and applications, Springer, pp 1–15 Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) ALBERT: A lite BERT for self-supervised learning of language representations. arXiv:abs/1909.11942 Li N, Chow CY, Zhang JD (2020) Seml: a semi-supervised multi-task learning framework for aspect-based sentiment analysis. IEEE Access 8:189,287–189,297. https://doi.org/10.1109/ACCESS.2020.3031665 Lin TY, Goyal P, Girshick R, He K, Dollar P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) Liu V, Curran JR (2006) Web text corpus for natural language processing. In: 11th Conference of the European chapter of the association for computational linguistics. Association for Computational Linguistics, Trento, Italy. https://www.aclweb.org/anthology/E06-1030 Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: A robustly optimized BERT pretraining approach. arXiv:abs/1907.11692 Loshchilov I, Hutter F (2019) Decoupled weight decay regularization. In: International conference on learning representations. https://openreview.net/forum?id=Bkg6RiCqY7 Ma Y, Zhao L, Hao J (2020) XLP at SemEval-2020 task 9: Cross-lingual models with focal loss for sentiment analysis of code-mixing language. In: Proceedings of the Fourteenth workshop on semantic evaluation, International Committee for Computational Linguistics, Barcelona (online), pp 975–980. https://www.aclweb.org/anthology/2020.semeval-1.126 Mandl T, Modha S, Kumar MA, Chakravarthi BR (2020) Overview of the hasoc track at fire 2020: Hate speech and offensive language identification in tamil, malayalam, hindi, english and german. In: Forum for information retrieval evaluation, FIRE 2020. https://doi.org/10.1145/3441501.3441517. Association for Computing Machinery, New York, NY, USA, pp 29–32 Maninis K, Radosavovic I (2019) Kokkinos, I.: Attentive single-tasking of multiple tasks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 1851–1860 Martínez Alonso H, Plank B (2017) When is multitask learning effective? semantic sequence prediction under varying data conditions. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, Association for Computational Linguistics, Valencia, Spain, pp 44–53. https://www.aclweb.org/anthology/E17-1005 Maslej-Krešň’akov’a V, Sarnovsk‘y M, Butka P, Machov’a K (2020) Comparison of deep learning models and various text pre-processing techniques for the toxic comments classification. Appl Sci 10 (23):8631 Mou L, Zhu XX (2018) Vehicle instance segmentation from aerial image and video using a multitask learning residual fully convolutional network. IEEE Trans Geosci Remote Sens 56(11):6699–6711. https://doi.org/10.1109/TGRS.2018.2841808 Murphy KP (2012) Machine learning: a probabilistic perspective. The MIT Press, London, England Nair DS, Jayan JP, Rajeev R, Sherly E (2015) Sentiment analysis of malayalam film review using machine learning techniques. In: 2015 International conference on advances in computing, communications and informatics (ICACCI), IEEE, pp 2381–2384 Nair DS, Jayan JP, Rajeev RR, Sherly E (2014) Sentima - sentiment extraction for malayalam. In: 2014 International conference on advances in computing, communications and informatics (ICACCI), pp 1719–1723. https://doi.org/10.1109/ICACCI.2014.6968548 Nakov P, Ritter A, Rosenthal S, Sebastiani F, Stoyanov V (2016) SemEval-2016 task 4: Sentiment analysis in Twitter. In: Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), Association for Computational Linguistics, San Diego, California, pp 1–18. https://doi.org/10.18653/v1/S16-1001, https://www.aclweb.org/anthology/S16-1001 Nobata C, Tetreault J, Thomas A, Mehdad Y, Chang Y (2016) Abusive language detection in online user content. In: Proceedings of the 25th international conference on world wide web, pp 145–153 Ouyang X, Xu S, Zhang C, Zhou P, Yang Y, Liu G, Li X (2019) A 3d-cnn and lstm based multi-task learning architecture for action recognition. IEEE Access 7:40,757–40,770. https://doi.org/10.1109/ACCESS.2019.2906654 Padmamala R, Prema V (2017) Sentiment analysis of online tamil contents using recursive neural network models approach for tamil language. In: 2017 IEEE International conference on smart technologies and management for computing, communication, controls, energy and materials (ICSTM), pp 28–31. https://doi.org/10.1109/ICSTM.2017.8089122 Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22(10):1345–1359 Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2 (1–2):1–135. https://doi.org/10.1561/1500000011 Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Köpf A., Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) Pytorch: an imperative style, high-performance deep learning library. In: NeurIPS Patra BG, Das D, Das A, Prasath R (2015) Shared task on sentiment analysis in indian languages (sail) tweets - an overview. In: Prasath R, Vuppala AK, Kathirvalavakumar T (eds) Mining intelligence and knowledge exploration. Springer International Publishing, Cham, pp 650–655 Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Louppe G, Prettenhofer P, Weiss R, Weiss RJ, VanderPlas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12:2825–2830 Peng Y, Chen Q, Lu Z (2020) An empirical study of multi-task learning on BERT for biomedical text mining. In: Proceedings of the 19th SIGBioMed workshop on biomedical language processing, Association for Computational Linguistics, Online, pp 205–214. https://doi.org/10.18653/v1/2020.bionlp-1.22, https://www.aclweb.org/anthology/2020.bionlp-1.22 Phani S, Lahiri S, Biswas A (2016) Sentiment analysis of tweets in three Indian languages. In: Proceedings of the 6th workshop on south and southeast asian natural language processing (WSSANLP2016), The COLING 2016 Organizing Committee, Osaka, Japan, pp 93–102. https://www.aclweb.org/anthology/W16-3710 Pires T, Schlinger E, Garrette D (2019) How multilingual is multilingual BERT? In: Proceedings of the 57th annual meeting of the association for computational linguistics, Association for Computational Linguistics, Florence, Italy, pp 4996–5001. https://doi.org/10.18653/v1/P19-1493, https://www.aclweb.org/anthology/P19-1493 Prabhu S, Narayan U, Debnath ASS, Shrivastava M (2020) Detection and annotation of events in Kannada. In: 16th Joint ACL - ISO workshop on interoperable semantic annotation PROCEEDINGS, European Language Resources Association, Marseille, pp 88–93. https://www.aclweb.org/anthology/2020.isa-1.10 Prokhorov V, Shareghi E, Li Y, Pilehvar MT, Collier N (2019) On the importance of the Kullback-Leibler divergence term in variational autoencoders for text generation. In: Proceedings of the 3rd workshop on neural generation and translation, Association for Computational Linguistics, Hong Kong, pp 118–127. https://doi.org/10.18653/v1/D19-5612, https://www.aclweb.org/anthology/D19-5612 Puranik K, Hande A, Priyadharshini R, Thavareesan S, Chakravarthi BR (2021) IIITT@LT-EDI-EACL2021-Hope Speech detection: There is always hope in Transformers. In: Proceedings of the First workshop on language technology for equality, diversity and inclusion. Association for Computational Linguistics Radford A (2018) Improving language understanding by generative pre-training Rakhlin A (2016) MIT Online Methods in Machine Learning 6.883, Lecture Notes: Multiclass and multilabel problems. http://www.mit.edu/rakhlin/6.883/lectures/lecture05.pdf. Last visited on 2021/02/08 Ranasinghe T, Zampieri M (2021) Mudes: Multilingual detection of offensive spans Rani P, Suryawanshi S, Goswami K, Chakravarthi BR, Fransen T, McCrae JP (2020) A comparative study of different state-of-the-art hate speech detection methods in hindi-english code-mixed data. In: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pp 42–48 Razavi AH, Inkpen D, Uritsky S, Matwin S (2010) Offensive language detection using multi-level classification. In: Canadian conference on artificial intelligence, Springer, pp 16–27 Reddy S, Sharoff S (2011) Cross language POS taggers (and other tools) for Indian languages: An experiment with Kannada using Telugu resources. In: Proceedings of the Fifth international workshop on cross lingual information access, Asian Federation of Natural Language Processing, Chiang Mai, Thailand, pp 11–19. https://www.aclweb.org/anthology/W11-3603 Ruder S (2017) An overview of multi-task learning in deep neural networks. arXiv:abs/1706.05098 Sanh V, Debut L, Chaumond J, Wolf T (2019) Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv:abs/1910.01108 Schuster M, Nakajima K (2012) Japanese and korean voice search. In: 2012 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 5149–5152. https://doi.org/10.1109/ICASSP.2012.6289079 Se S, Vinayakumar R, Kumar MA, Soman K (2016) Predicting the sentimental reviews in tamil movie using machine learning algorithms. Indian J Sci Technol 9(45):1–5 Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Berlin, Germany, pp 1715–1725. https://doi.org/10.18653/v1/P16-1162, https://www.aclweb.org/anthology/P16-1162 Severyn A, Moschitti A, Uryupina O, Plank B, Filippova K (2014) Opinion mining on YouTube. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Baltimore, Marylandm, pp 1252–1261. https://doi.org/10.3115/v1/P14-1118, https://www.aclweb.org/anthology/P14-1118 Shalev-Shwartz S, Ben-David S (2014) Understanding machine learning: from theory to algorithms. Cambridge University Press, New York Shazeer N, Mirhoseini A, Maziarz K, Davis A, Le Q, Hinton G, Dean J (2017) Outrageously large neural networks: The sparsely-gated mixture-of-experts layer Soumya S, Pramod K (2020) Sentiment analysis of malayalam tweets using machine learning techniques. ICT Express 6(4):300–305 Sowmya Lakshmi BS, Shambhavi BR (2017) An automatic language identification system for code-mixed english-kannada social media text. In: 2017 2Nd international conference on computational systems and information technology for sustainable solution (CSITSS), pp. 1–5 Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958 Tanwar A, Majumder P (2020) Translating morphologically rich indian languages under zero-resource conditions. ACM Trans. Asian Low-Resour. Lang. Inf. Process 19(6). https://doi.org/10.1145/3407912 Taylor WL (1953) “Cloze procedure”: a new tool for measuring readability. Journalism & Mass Communication Quarterly 30:415–433 Thavareesan S, Mahesan S (2019) Sentiment analysis in tamil texts: a study on machine learning techniques and feature representation. In: 2019 14Th conference on industrial and information systems (ICIIS), pp 320–325. https://doi.org/10.1109/ICIIS47346.2019.9063341 Thilagavathi R, Krishnakumari K (2016) Tamil english language sentiment analysis system. International Journal of Engineering Research & Technology (IJERT) 4:114–118 Tian Y, Galery T, Dulcinati G, Molimpakis E, Sun C (2017) Facebook sentiment: Reactions and emojis. In: Proceedings of the Fifth international workshop on natural language processing for social media, Association for Computational Linguistics, Valencia, Spain, pp 11–16. https://doi.org/10.18653/v1/W17-1102, https://www.aclweb.org/anthology/W17-1102 Tontodimamma A, Nissi E, Sarra A, Fontanella L (2021) Thirty years of research into hate speech: topics of interest and their evolution. Scientometrics 126(1):157–179 Tula D, Potluri P, Ms S, Doddapaneni S, Sahu P, Sukumaran R, Patwa P (2021) Bitions@DravidianLangTech-EACL2021: Ensemble of multilingual language models with pseudo labeling for offence detection in Dravidian languages. In: Proceedings of the First workshop on speech and language technologies for dravidian languages, Association for Computational Linguistics, Kyiv, pp 291–299. https://www.aclweb.org/anthology/2021.dravidianlangtech-1.42 Uysal AK, Gunal S (2014) The impact of preprocessing on text classification. Inform Process Manage 50(1):104–112. https://doi.org/10.1016/j.ipm.2013.08.006 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need Weston J, Watkins C (1999) Support vector machines for multi-class pattern recognition. In: ESANN Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, Davison J, Shleifer S, von Platen P, Ma C, Jernite Y, Plu J, Xu C, Le Scao T, Gugger S, Drame M, Lhoest Q, Rush A (2020) Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations. https://doi.org/10.18653/v1/2020.emnlp-demos.6, https://www.aclweb.org/anthology/2020.emnlp-demos.6. Association for Computational Linguistics, Online, pp 38–45 Yang Y, Hospedales TM (2017) Trace norm regularised deep multi-task learning. arXiv:abs/1606.04038 Yang Z, Dai Z, Yang Y, Carbonell JG, Salakhutdinov R, Le QV (2019) Xlnet: Generalized autoregressive pretraining for language understanding. arXiv:abs/1906.08237 Yasaswini K, Puranik K, Hande A, Priyadharshini R, Thavareesan S, Chakravarthi BR (2021) IIITT@dravidianlangtech-EACL2021: Transfer learning for offensive language detection in dravidian languages. In: Proceedings of the First workshop on speech and language technologies for dravidian languages, Association for Computational Linguistics Zampieri M, Malmasi S, Nakov P, Rosenthal S, Farra N, Kumar R (2019) SemEval-2019 task 6: Identifying and categorizing offensive language in social media (OffensEval). In: Proceedings of the 13th international workshop on semantic evaluation, Association for Computational Linguistics, Minneapolis, Minnesota, USA, pp 75–86. https://doi.org/10.18653/v1/S19-2010, https://www.aclweb.org/anthology/S19-2010 Zampieri M, Nakov P, Rosenthal S, Atanasova P, Karadzhov G, Mubarak H, Derczynski L, Pitenis Z, Çöltekin Ç (2020) SemEval-2020 task 12: Multilingual offensive language identification in social media (OffensEval 2020). In: Proceedings of the Fourteenth workshop on semantic evaluation, International Committee for Computational Linguistics, Barcelona (online), pp 1425–1447. https://www.aclweb.org/anthology/2020.semeval-1.188 Zhai P, Tao Y, Chen H, Cai T, Li J (2020) Multi-task learning for lung nodule classification on chest ct. IEEE Access 8:180,317–180,327. https://doi.org/10.1109/ACCESS.2020.3027812 Zhang H, Sun S, Hu Y, Liu J, Guo Y (2020) Sentiment classification for chinese text based on interactive multitask learning. IEEE Access 8:129,626–129,635. https://doi.org/10.1109/ACCESS.2020.3007889 Zhang K, Wu L, Zhu Z, Deng J (2020) A multitask learning model for traffic flow and speed forecasting. IEEE Access 8:80,707–80,715. https://doi.org/10.1109/ACCESS.2020.2990958 Zhang Y, Yang Q (2018) A survey on multi-task learning Zhang Z, Chen C, Dai G, Li WJ, Yeung DY (2014) Multicategory large margin classification methods: Hinge losses vs. coherence functions. Artif Intell 215:55–78. https://doi.org/10.1016/j.artint.2014.06.002