Gender and age-evolution detection based on audio forensic analysis using light deep neural network
Tóm tắt
Forensic audio analysis is a foundation stone of many crime investigations. In forensic evidence; the audio file of the human voice is analyzed to extract much information in addition to the content of the speech, such as the speaker’s identity, emotions, gender, origin, etc. The accurate determination of individuals into groups based on their age development stage and their gender are often used as early investigations to differentiate them and determine the legal rights and responsibilities associated with them. This work introduces a light CNN model with a new architecture to detect the human age-evolution being’s stage (kids or adults) at the same time the gender of the adult one (male or female) based on the individual’s voice characteristics, which offers a balance between computational efficiency and model accuracy. The temporal information in the audio file is prepared by scaled and normalized. Then this information is exploited to extract and track the unique and salient audio features that make up the pattern of the feature map for each target class through some convolutional layers followed by maxpooling layers. Finally, The decision is made based on these feature maps by some fully connected layers. Successful and promising results are accomplished in terms of accuracy and loss functions which realize 0.99 and 0.017 respectively over the riched Voxceleb2 dataset. The proposed model underscores the importance of leveraging Light DNNs for gender and age-evolution detection, offering a robust and ethically sound solution for real-world applications in the field of audio forensics such as span speaker identification, victim profiling, deception detection, and more, contributing to the advancement of audio forensic analysis.
Tài liệu tham khảo
Ahmad, J., Fiaz, M., Kwon, S. I., Sodanil, M., Vo, B., & Baik, S. W. (2016). Gender identification using MFCC for telephone applications-a comparative study. arXiv Prepr. arXiv1601.01577., 2016.
Alnuaim, A. A., Zakariah, M., Shashidhar, C., Hatamleh, W. A., Tarazi, H., Shukla, P. K., & Ratna, R. (2022). Speaker gender recognition based on deep neural networks and ResNet50, Wireless Communications and Mobile Computing. Hindawi.
Becker, S., Ackermann, M., Lapuschkin, S., Müller, K. R., & Samek, W. (2018). Interpreting and explaining deep neural networks for classification of audio signals, arXiv Prepr. ArXiv1807.03418, 2018.
Choi, J., Kim, S., Park, W., Yong, S., & Nam, S. (2020). Children’s song dataset for singing voice research, 21th International Society for Music Information Retrieval Conference (ISMIR).
Chung*, A. Z. J. S., Nagrani*, A. (2018). VoxCeleb2: Deep Speaker Recognition, Interspeech.
Ertam, F. (2019). An effective gender recognition approach using voice data via deeper LSTM networks. Applied Acoustics, 156, 351–358.
Goyal, S., Patage, V. V., & Tiwari, S. (2020). Gender and age group predictions from speech features using multi-layer perceptron model, 2020 IEEE 17th India Council international conference (INDICON) (pp. 1–6). IEEE.
Gupta, P., Goel, S., & Purwar, A. (2018). A stacked technique for gender recognition through voice, 2018 Eleventh international conference on contemporary computing, (IC3) (pp. 1–3). IEEE.
Gupta, Y., Gangwar, K., Singhal, M., & Hemavathi, D. (2022). Gender and age recognition using audio data—artificial neural networks, Soft Computing for Security Applications, 1397, 449–470
Lee, Y. O., Jo, J., & Hwang, J. (2017). Application of deep neural network and generative adversarial network to industrial maintenance: A case study of induction motor fault detection. Proceedings of 2017 IEEE international conference on Big Data (Big Data), Boston, MA, USA, vol. 1–14 December, pp. 3248–3253, 2017.
Livieris, I. E., Pintelas, E., & Pintelas, P. (2019). Gender recognition by voice using an improved self-labeled algorithm. Machine Learning and Knowledge Extraction, 1(1), 492–503.
Markitantov, M., & Verkholyak, O. (2019). Automatic recognition of speaker age and gender based on deep neural networks, International conference on speech and computer, (pp. 327–336). Springer
Mavaddati, S. (2018). Voice-based age and gender recognition using training generative sparse model. International Journal of Engineering, 31(9), 1529–1535.
Nasef, M. M., Sauber, A. M., & Nabil, M. M. (2021). Voice gender recognition under unconstrained environments using self-attention. Applied Acoustics, 175, 107823.
Pahwa, A., & Aggarwal, G. (2016). Speech feature extraction for gender recognition, International Journal of Images, Grapics and Signal Processing, 9(3), 17–25.
Priya, E., Reshma, P. S., Sashaank, S. (2022). Temporal and spectral features based gender recognition from audio signals, 2022 International conference on communication, computing and internet of things (IC3IoT) (pp. 1–5). IEEE.
Qawaqneh, Z., Mallouh, A. A., & Barkana, B. D. (2017). Deep neural network framework and transformed MFCCs for speaker’s age and gender classification. Knowledge-Based Systems, 115, 5–14.
Ramdinmawii, E., & Mittal, V. K. (2016). Gender identification from speech signal by examining the speech production characteristics, International conference on statistical process control and operations management (ICSPCom), vol. 244–249. 1, 2016.
Sharma, G., & Mala, S. (2020). Framework for gender recognition using voice, 2020 10th international conference on cloud computing, data science & engineering (Confluence) (pp. 32–37). IEEE.
Shergill, J. S., Pravin, C., & Ojha, V. (2021). Accent and gender recognition from English language speech and audio using signal processing and deep learning, International conference on Hybrid Intelligent Systems, (HIS 2020) (pp. 62–72). Springer.
Susithra, N., Rajalakshmi, K., Ashwath, P., Ajay, B., Rohit, D., & Stewaugh, S. (2022). Speech based emotion recognition and gender identification using FNN and CNN Models, 2022 3rd international conference for emerging technology, (INCET) (pp. 1–6).
Wang, Z. (2017). Learning utterance-level representations for speech emotion and age/gender recognition using deep neural, 2017 IEEE international conference on acoustics, speech and signal processing, (ICASSP) (pp. 5150–5154).
Yasmin, G., Das, A. K., Nayak, J., Vimal, S., & Dutta, S. (2022). A rough set theory and deep learning-based predictive system for gender recognition using audio speech. In A. Di Nola & R. Cerulli (Eds), Soft Computing (pp. 1–24). Springer.
Yusnita, M. A., Hafiz, A. M., Fadzilah, M. N., Zulhanip, A. Z., & Idris, M. (2017). Automatic gender recognition using linear prediction coefficients and artificial neural network on speech signal. 2017 7th IEEE international conference on control system, computing and Engineering (ICCSCE).
Zjalic, J. (2020). Digital audio forensics fundamentals: From capture to courtroom (1st ed.). Focal Press.
Zvarevashe, K., & Olugbara, O. O. (2018). Gender voice recognition using random forest recursive feature elimination with gradient boosting machines, 2018 international conference on advances in big data, computing and data communication systems, (icABCD 2018) (pp. 1–6). IEEE.