Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo
Nhận diện người đọc Kinh Quran sử dụng NASNetLarge
Neural Computing and Applications - Trang 1-15 - 2024
Tóm tắt
Nhận diện người nói có những lợi thế đáng kể trong lĩnh vực tương tác giữa người và máy tính. Gần đây, nhiều học giả đã đóng góp trong lĩnh vực này và thành công trong việc tạo ra các mô hình học sâu cho hệ thống nhận diện người nói tự động. Tuy nhiên, hầu hết công việc xử lý tín hiệu giọng nói vẫn bị giới hạn ở các ứng dụng chỉ bằng tiếng Anh, mặc dù có nhiều thách thức với giọng nói tiếng Ả Rập, đặc biệt là với việc đọc Kinh Quran, sách thánh của Hồi giáo. Trong bối cảnh này, nghiên cứu này đề xuất một mô hình để nhận diện người đọc Kinh Quran bằng cách sử dụng một tập dữ liệu gồm 11.000 mẫu âm thanh được trích xuất từ 20 người đọc Quran. Để cho phép đưa các đại diện hình ảnh của mẫu âm thanh vào các mô hình được huấn luyện trước, các mẫu âm thanh được chuyển đổi từ đại diện âm thanh gốc sang đại diện hình ảnh bằng cách sử dụng Hệ số Cepstrum tần số Mel. Sáu mô hình học sâu được huấn luyện trước được đánh giá riêng biệt trong mô hình được đề xuất. Kết quả từ tập dữ liệu thử nghiệm cho thấy mô hình NASNetLarge đạt được tỷ lệ chính xác cao nhất là 98,50% trong số các mô hình đã được huấn luyện trước được sử dụng trong nghiên cứu này.
Từ khóa
#nhận diện người nói #Kinh Quran #mô hình học sâu #âm thanh #Hồi giáoTài liệu tham khảo
Togneri R, Pullella D (2011) An overview of speaker identification: Accuracy and robustness issues. IEEE Circuits Syst Mag 11(2):23–61. https://doi.org/10.1109/MCAS.2011.941079
Dhakal P, Damacharla P, Javaid AY, Devabhaktuni V (2019) A near real-time automatic speaker recognition architecture for voice-based user interface. Mach Learn Knowl Extract 1(1):504–520. https://doi.org/10.3390/make1010031
Khan AU, Bhaiya LP, Banchhor SK (2012) Hindi speaking person identification using zero crossing rate. Int J Soft Comput Eng, 2(3):101–104
Bharti R, Bansal P (2015) Real time speaker recognition system using MFCC and vector quantization technique. Int J Comput Appl 117(1). https://doi.org/10.5120/20520-2361
Le PN, Ambikairajah E, Epps J et al (2011) Investigation of spectral centroid features for cognitive load classification. Speech Commun 53(4):540–551. https://doi.org/10.1016/j.specom.2011.01.005
Ghahremani P, BabaAli B, Povey D, Riedhammer K, Trmal J, Khudanpur S (2014) A pitch extraction algorithm tuned for automatic speech recognition. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp 2494–2498). IEEE. https://doi.org/10.1109/ICASSP.2014.6854049
Hossan MA, Memon S, Gregory MA (2010) A novel approach for MFCC feature extraction. In: 2010 4th International conference on signal processing and communication systems, pp 1–5. IEEE. https://doi.org/10.1109/ICSPCS.2010.5709752
Wang ZZ, Yong JH (2008) Texture analysis and classification with linear regression model based on wavelet transform. IEEE Trans Image Process 17(8):1421–1430. https://doi.org/10.1109/TIP.2008.926150
Noble WS (2006) What is a support vector machine? Nat Biotechnol 24(12):1565–1567. https://doi.org/10.1038/nbt1206-1565
Cunningham P, Delany SJ (2021) k-Nearest neighbour classifiers—a tutorial. ACM Comput Surv (CSUR) 54(6):1–25. https://doi.org/10.1145/3459665
Padi S, Sadjadi SO, Manocha D, Sriram RD (2022) Multimodal emotion recognition using transfer learning from speaker recognition and bert-based models. arXiv preprint arXiv:2202.08974. https://doi.org/10.48550/arXiv.2202.08974
Shivakumar PG, Georgiou P (2020) Transfer learning from adult to children for speech recognition: evaluation, analysis and recommendations. Comput Speech Lang 63:101077. https://doi.org/10.1016/j.csl.2020.101077
Beikmohammadi A, Faez K (2018) December. Leaf classification for plant recognition with deep transfer learning. In 2018 4th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS) (pp. 21–26). IEEE. https://doi.org/10.1109/ICSPIS.2018.8700547
Shahriar S, Tariq U (2021) Classifying maqams of qur’anic recitations using deep learning. IEEE Access 9:117271–117281. https://doi.org/10.1109/ACCESS.2021.3098415
Al-Ayyoub M, Damer NA, Hmeidi I (2018) Using deep learning for automatically determining correct application of basic quranic recitation rules. Int Arab J Inf Technol 15(3A):620–625
Bradbury J (2000) Linear predictive coding. Mc G. Hill
Schuller B, Rigoll G, Lang M (2003) Hidden Markov model-based speech emotion recognition. In: 2003 IEEE international conference on acoustics, speech, and signal processing, 2003. Proceedings. (ICASSP'03). IEEE. (vol 2, pp II-1). https://doi.org/10.1109/ICASSP.2003.1202279
Ting W, Guo-Zheng Y, Bang-Hua Y et al (2008) Eeg feature extraction based on wavelet packet decomposition for brain computer interface. Measurement 41(6):618–625. https://doi.org/10.1016/j.measurement.2007.07.007
Lee H, Grosse R, Ranganath R, Ng AY (2009) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th annual international conference on machine learning (pp 609–616). https://doi.org/10.1145/1553374.1553453
Alagrami AM, Eljazzar MM (2020) Smartajweed automatic recognition of Arabic quranic recitation rules. arXiv preprint arXiv:2101.04200. https://doi.org/10.48550/arXiv.2101.04200
Vaidyanathan PP (1990) Multirate digital filters, filter banks, polyphase networks, and applications: a tutorial. Proc IEEE 78(1):56–93. https://doi.org/10.1109/5.52200
Marlina L, Wardoyo C, Sanjaya WM, Anggraeni D, Dewi SF, Roziqin A, Maryanti S (2018) Makhraj recognition of Hijaiyah letter for children based on mel-frequency cepstrum coefficients (MFCC) and support vector machines (SVM) method. In: 2018 International conference on information and communications technology (ICOIACT) (pp 935–940). IEEE. https://doi.org/10.1109/ICOIACT.2018.8350684
Hamid R, Naim F, Naharuddin NZA (2013) Makhraj recognition for al-quran recitation using mfcc. Int J Intell Inf Process 4(2):45–53. https://doi.org/10.4156/ijiip.vol4.issue2.5
Alkhateeb JH (2020) A machine learning approach for recognizing the holy quran reciter. Int J Adv Comput Sci Appl 11(7). https://doi.org/10.14569/ijacsa.2020.0110735
Anazi M, Shahin OR (2022) A machine learning model for the identification of the holy quran reciter utilizing k-nearest neighbor and artificial neural networks. Inf Sci Lett 11(4):1093–1102.
Nahar KM, Al-Shannaq M, Manasrah A et al (2019) A holy quran reader/reciter identification system using support vector machine. Int J Mach Learn Comput 9(4):458–464.
Shah SM, Ahsan SN (2014) Arabic speaker identification system using combination of DWT and LPC features. In: 2014 International conference on open source systems and technologies. IEEE. (pp 176–181). https://doi.org/10.1109/ICOSST.2014.7029340
Shensa MJ et al (1992) The discrete wavelet transform: wedding the a trous and mallat algorithms. IEEE Trans Signal Process 40(10):2464–2482. https://doi.org/10.1109/78.157290
Chapaneri SV (2012) Spoken digits recognition using weighted MFCC and improved features for dynamic time warping. Int J Comput Appl 40(3):6–12.
Han W, Chan CF, Choy CS, Pun KP (2006). An efficient MFCC extraction method in speech recognition. In: 2006 IEEE international symposium on circuits and systems (ISCAS), IEEE. (pp 4). https://doi.org/10.1109/ISCAS.2006.1692543
Chakraborty S, Mondal R, Singh PK et al (2021) Transfer learning with fine tuning for human action recognition from still images. Multimedia Tools Appl 80:20547–20578. https://doi.org/10.1007/s11042-021-10753-y
Deng J, Dong W, Socher R et al (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, IEEE, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp 8697–8710). https://doi.org/10.1109/CVPR.2018.00907
Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning, PMLR, pp 6105–6114.
Tan M, Le Q (2021) Efficientnetv2: Smaller models and faster training. In: International conference on machine learning, PMLR, pp 10096–10106
Vrbanˇciˇc G, Podgorelec V (2020) Transfer learning with adaptive fine-tuning. IEEE Access 8:196197–196211. https://doi.org/10.1109/ACCESS.2020.3034343
Zoph B, Le QV (2016) Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578. https://doi.org/10.48550/arXiv.1611.01578
Albawi S, Mohammed TA, Al-Zawi S (2017) Understanding of a convolutional neural network. In: 2017 International conference on engineering and technology (ICET) (pp 1–6). IEEE. https://doi.org/10.1109/ICEngTechnol.2017.8308186
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285. https://doi.org/10.1613/jair.301
Torralba A, Fergus R, Freeman WT (2008) 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30(11):1958–1970. https://doi.org/10.1109/TPAMI.2008.128
Henderson P, Ferrari V (2017) End-to-end training of object class detectors for mean average precision. In: Computer vision–ACCV 2016: 13th Asian conference on computer vision, Taipei, Taiwan, November 20–24, 2016, Revised Selected Papers, Part V 13 (pp 198–213). Springer International Publishing. https://doi.org/10.48550/arXiv.1607.03476
Baheti B, Innani S, Gajre S, Talbar S (2020) Eff-unet: A novel architecture for semantic segmentation in unstructured environment. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp 358–359). https://doi.org/10.1109/CVPRW50498.2020.00187
Sunil CK, Jaidhar CD, Patil N (2021) Cardamom plant disease detection approach using EfficientNetV2. IEEE Access 10:789–804. https://doi.org/10.1109/ACCESS.2021.3138920
Gupta S, Jaafar J, Ahmad WW et al (2013) Feature extraction using mfcc. Signal Image Process Int J 4(4):101–108. https://doi.org/10.5121/sipij.2013.4408
Briggs WL, Henson VE (1995) The DFT: an owner’s manual for the discrete Fourier transform. Soc Ind Appl Math
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456. pmlr
Agarap AF (2018) Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375. https://doi.org/10.48550/arXiv.1803.08375
Dietterich T (1995) Overfitting and undercomputing in machine learning. ACM Comput Surv (CSUR) 27(3):326–327
Sharma S, Sharma S, Athaiya A (2017) Activation functions in neural networks. Towards Data Sci 6(12):310–316
Berrar D (2019) Cross-validation. Encyclopedia Bioin Comput Biol, pp 542–545. https://doi.org/10.1016/B978-0-12-809633-8.20349-X