Multi-coder vector quantizer for transparent coding of wideband speech ISF parameters

Merouane Bouzid1, Nacèra Meziane2, Salah-Eddine Cheraitia1
1Speech Communication and Signal Processing Lab., Electrical Engineering Faculty, University of Sciences and Technology Houari Boumediene (USTHB), Algiers, Algeria
2Instrumentation Lab, Electrical Engineering Faculty, University of Sciences and Technology Houari Boumediene (USTHB), Algiers, Algeria

Tóm tắt

Modern low bit-rate speech coders require efficient coding of the linear predictive coding (LPC) coefficients. Immittance Spectral Frequencies (ISF) and Line Spectral Frequencies (LSF) are currently the most efficient transmission parameters for LPC coefficients in wideband speech coding. In this paper, we propose a new hybrid coding scheme with multi-coder vector quantization for efficient coding of ISF parameters of the wideband speech coder AMR-WB. The coding system was designed based on four structured quantizers under noiseless channel conditions: the split vector quantizer (SVQ), the switched split vector quantizer (SSVQ), the multi-stage vector quantizer (MSVQ), and the multi switched split vector quantizer (MSSVQ). Simulation results show that our proposed AMR-WB ISF coding scheme outperforms conventional wideband ISF quantizers at lower bit rates.

Từ khóa


Tài liệu tham khảo

Bessette, B., Salami, R., Lefebvre, R., Jelínek, M., Rotola-Pukkila, J., Vainio, J., Mikkola, H., & Järvinen, K. (2002). The adaptive multirate wideband speech codec (AMR-WB). IEEE Transactions on Speech and Audio Processing, 10(8), 620–636. https://doi.org/10.1109/TSA.2002.804299 Bistritz, Y., & Peller, S. (1993). Immittance spectral pairs (ISP) for speech encoding. In Proceedings of the IEEE international conference on acoustic speech and signal processing (ICASSP'93) (pp. 9–12). Minneapolis, MN, USA, 2. https://doi.org/10.1109/ICASSP.1993.319215 Bistritz, Y., Lev-Ari, H., & Kailath, T. (1989). Immittance domain levinson algorithms. IEEE Transactions on Information Theory, 35(3), 675–682. https://doi.org/10.1109/18.30994 Biundo, G., Grassi, S., Ansorge, M., Pellandini, F., & Farine, P. A. (2002). Design techniques for spectral quantization in wideband speech coding. In Proceedings of the 3rd COST 276 workshop on information and knowledge management for integrated media communication (pp. 114–119). Budapest. Bouzid, M., & Cheraitia, S. (2012). Channel optimized switched split vector quantization for wideband speech LSF parameters. In Proceedings of 11th edition of the international conference on information science, signal processing and their applications (ISSPA'2012) (pp. 1045–1050, 3–5). Canada Bouzid, M., & Cheraitia, S. (2015). Voicing-based classified split vector quantizer for efficient coding of AMR-WB ISF parameters. In Proceedings of the 17th international conference on speech and computer (SPECOM 2015) Springer, Lecture Notes in Artificial Intelligence (LNAI 9319), Athens, Greece, (pp. 472–479, 20–24). https://doi.org/10.1007/978-3-319-23132-7_58 Bouzid, M., Meziane, N., & Cheraitia, S. (2023). Efficient coding of wideband ISF parameters: Application of variable rate SSVQ scheme. In Proceedings of the international conference on smart applications, communication and networking (SmartNets'2023) (pp. 25–27). Turkey. https://doi.org/10.1109/SmartNets58706.2023.10216230 Bouzid, M., & Djeradi, A. (2005). Optimisation de la quantification vectorielle codée par treillis: Application au codage des paramètres LSF. Annales des Télécommunications, 60(5–6), 744–769. https://doi.org/10.1007/BF03219945 Chen J. H., & Wang, D. (1996). Transform predictive coding of wideband speech signals. In Proceedings of the IEEE international conference on acoustic speech and signal processing (ICASSP'96) (pp. 275–278). Atlanta, USA. Cheraitia, S., & Bouzid, M. (2014). Robust coding of wideband speech immittance spectral frequencies. Speech Communication, Elsevier, 65, 94–108. https://doi.org/10.1016/j.specom.2014.07.001 Garofolo, J. S., et al. (1988). DARPA TIMIT (CD-ROM) Acoustic-phonetic Continuous Speech Database. National Institute of Standards and Technology (NIST), Gaithersburg. Gersho, A., & Gray, R. M. (1992). Vector quantization and signal compression. Kluwer Academic Publishers. Guibé, G., How, H. T., & Hanzo, L. (2001). Speech spectral quantizers for wideband speech coding. European Transactions on Telecommunications, 12(6), 535–545. https://doi.org/10.1002/ett.4460120609 Hallett, L., & Hintz, A. (2010). Digital broadcasting- challenges and opportunities for European community radio broadcasters. Telematics and Informatics, 27(2), 151–161. https://doi.org/10.1016/j.tele.2009.06.005 Itakura, F. (1975). Line spectrum representation of linear predictive coefficients of speech signals. Journal of Acoustical Society of America, 57(S1), S35. https://doi.org/10.1121/1.1995189 Juang, B. H., & Gray Jr. A.H. (1982). Multiple stage vector quantisation for speech coding. In Proceedings of the IEEE international conference on acoustic, speech and signal processing (ICASSP'82) (pp. 1, 597–600). Katsavounidis, I., Kuo, C., & Zhang, Z. A. (1994). New initialization technique for generalized Lloyd iteration. IEEE Signal Processing Letter, 1(10), 144–146. https://doi.org/10.1109/97.329844 Kleijn, W. B., & Paliwal, K. K. (1995). Speech coding and synthesis (pp. 433–466). Elsevier Science B.V. Krishnan, V., Anderson, D. V., & Truong, K. K. (2004). Optimal multistage vector quantization of LPC parameters over noisy channels. IEEE Transactions on Speech Audio Process, 12(1), 1–8. https://doi.org/10.1109/TSA.2003.819945 Li, Y., Kang, Y., Wu, H., Guo, Y., & Meng, J. (2020). Single and multiple frame coding of LSF parameters using deep neural network and pyramid vector quantizer. Speech Communication Journal, 120, 1–10. https://doi.org/10.1016/j.specom.2020.03.004 Liang, Y., Lee, Y.-C., & Teng, A. (2007). Real-time communication: Internet protocol voice and video telephony and teleconferencing. Multimedia over IP and Wireless Networks. https://doi.org/10.1016/B978-012088480-3/50016-3 Linde, Y., Buzo, A., & Gray, R. M. (1980). An algorithm for vector quantization design. IEEE Transactions on Communications, 28(1), 84–95. https://doi.org/10.1109/TCOM.1980.1094577 McLoughlin, I. V. (2008). Review line spectral pairs. Signal Processing Elsevier, 88(3), 448–467. https://doi.org/10.1016/j.sigpro.2007.09.003 Paliwal, K. K., & Atal, B. S. (1993). Efficient vector quantization of LPC parameters at 24 bits/frame. IEEE Transactions on Speech and Audio Processing, 1(1), 3–14. https://doi.org/10.1109/89.221363 Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Prentice-Hall. Satya Sai Ram, M., Siddaiah, P., & Madhavi Latha, M. (2008a). Multi switched split vector quantizer. International Journal of Computer, Information, and Systems Science, and Engineering, IJCISSE, 2(1), 90–95. https://doi.org/10.5281/zenodo.1071444 Satya Sai Ram, M., Siddaiah, P., & Madhavi Latha, M. (2008b). Multi switched split vector quantization of narrow band speech signals. Proceedings of World Academy of Science, Engineering and Technology, 27, 236–239. Semenov, V. (2015). Analysis of time distribution of immittance spectral frequencies and technique for their calculation. Computational and Applied Mathematics Journal, 1(6), 406–409. Sheikhan, M. (2013). Hybrid of PSO and SOM neural network for immittance spectral frequency quantization in AMR-WB speech codecs. In Proceedings of the 5th international conference on information and knowledge technology (IKT'2013), Shiraz. Sheikhan, M., & Garoucy, S. (2010). Hybrid VQ and neural models for ISF quantization in wideband speech coding. World Applied Sciences Journal, 10, 59–66. So, S., & Paliwal, K. K. (2004). Efficient vector quantization of line spectral frequencies using the switched split vector quantiser. In Proceedings of the international conference on spoken language processing, Jeju, Korea. So, S., & Paliwal, K. K. (2007). A comparative study of LPC parameter representations and quantisation schemes for wideband speech coding. Digital Signal Processing Journal, Elsevier, 17, 114–137. https://doi.org/10.1016/j.dsp.2005.10.002 Yeh, C. Y., & Huang, H. (2019). An upgraded version of the binary search space-structured VQ search algorithm for AMR-WB codec. Symmetry Journal, 11, 283. https://doi.org/10.3390/sym11020283 Xiaochen, W., Yong, Z., Ruimin, H., & Xi, D. (2009). An immittance spectral frequency parameters quantization algorithm based on Gaussian mixture model. In Proceedings of the international conference on multimedia information networking and security (MINES'09) (pp. 324–328). https://doi.org/10.1109/MINES.2009.250