Clean speech/speech with background music classification using HNGD spectrum
Tóm tắt
Từ khóa
Tài liệu tham khảo
Anand Joseph, M., Guruprasad, S., & Yegnanarayana, B. (2006). Extracting formants from short segments of speech using group delay functions.
Bayya, Y., & Gowda, D. N. (2013). Spectro-temporal analysis of speech signals using zero-time windowing and group delay function. Speech Communication, 55(6), 782–795.
Beyerlein, P., Aubert, X., Haeb-Umbach, R., Harris, M., Klakow, D., Wendemuth, A., et al. (2002). Large vocabulary continuous speech recognition of broadcast news-the philips/rwth approach. Speech Communication, 37(1), 109–131.
Bhattacharyya, A. (1943). On a measure of divergence between two statistical populations defined by their probability distribution. Bulletin of the Calcutta Mathematical Society, 35, 99–109.
Castán, D., Ortega, A., Miguel, A., & Lleida, E. (2014). Audio segmentation-by-classification approach based on factor analysis in broadcast news domain. EURASIP Journal on Audio, Speech, and Music Processing, 2014(1), 1–13.
Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1–27:27. http://www.csie.ntu.edu.tw/~cjlin/libsvm .
Gauvain, J., Lamel, L., & Adda, G. (2000). Transcribing broadcast news for audio and video indexing. Communications of the ACM, 43(2), 64–70.
Gauvain, J.-L., Lamel, L., & Adda, G. (2002). The limsi broadcast news transcription system. Speech Communication, 37(1), 89–108.
Jiang, D.-N., Lu, L., Zhang, H.-J., Tao, J.-H., & Cai, L.-H. (2002). Music type classification by spectral contrast feature. In Proceedings 2002 IEEE international conference on multimedia and expo, 2002 (ICME’02) (Vol. 1, pp. 113–116). IEEE.
Khonglah, B. K., & Prasanna, S. M. (2016). Speech/music classification using speech-specific features. Digital Signal Processing, 48, 71–83.
Murthy, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16, 1602–1613.
Nguyen, L., Matsoukas, S., Davenport, J., Kubala, F., Schwartz, R., & Makhoul, J. (2002). Progress in transcription of broadcast news using byblos. Speech Communication, 38(1–2), 213230.
Oppenheim, A . V., & Schafer, R . W. (1975). Digital signal processing. New Delhi: Prentice-Hall.
Prasad, R., & Yegnanarayana, B. (2013). Acoustic segmentation of speech using zero time liftering (ztl) (pp. 2292–2296).
Renals, S., Abberley, D., Kirby, D., & Robinson, T. (2000). Indexing and retrieval of broadcast news. Speech Communication, 32(1), 5–20.
Sarma, B. D., Prasanna, S. M., & Sarmah, P. (2017). Consonant-vowel unit recognition using dominant aperiodic and transition region detection. Speech Communication, 92, 77–89.
Scheirer, E., & Slaney, M. (1997). Construction and evaluation of a robust multifeature speech/music discriminator. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing., 2, 1331–1334.
Sell, G., & Clark, P. (2014). Music tonality features for speech/music discrimination. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2489–2493). IEEE.
Siegler, M. A., Jain, U., Raj, B., & Stern, R. M. (1997). Automatic segmentation, classification and clustering of broadcast news audio. In Proceedings of DARPA Speech Recognition Workshop (pp. 97–99).
Srinivas, K. S., & Prahallad, K. (2012). An fir implementation of zero frequency filtering of speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 20(9), 2613–2617.
Tzanetakis, G., & Cook, P. (2000). Sound analysis using mpeg compressed audio. In Proceedings IEEE international conference on acoustics, speech, and signal processing, 2000 (ICASSP’00) (Vol. 2, pp. II761–II764).
Vavrek, J., Vozáriková, E., Pleva, M., & Juhár, J. (2012). Broadcast news audio classification using svm binary trees. In 2012 35th international conference on telecommunications and signal processing (TSP) (pp. 469–473). IEEE
Wegmann, S., Zhan, P., & Gillick, L. (1999). Progress in broadcast news transcription at dragon systems. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, 33–36.
Woodland, P. (2002). The development of the htk broadcast news transcription system: An overview. Speech Communication, 37(1–2), 47–67.
Yegnanarayana, B. (1978). Formant extraction from linear-prediction phase spectra. The Journal of the Acoustical Society of America, 63(5), 1638–1640.
Yegnanarayana, B., & Murthy, H. A. (1992). Significance of group delay functions in spectrum estimation. IEEE Transactions on Signal Processing, 40(9), 2281–2289.