Dual estimation based vocal tract shape computation

International Journal of Speech Technology - Tập 22 - Trang 575-584 - 2018
Subhasmita Sahoo1, Aurobinda Routray1
1[Department of Electrical Engineering, Indian Institute of Technology Kharagpur, Kharagpur, India]

Tóm tắt

This paper presents a new method for direct estimation of vocal tract shape from the speech signal. The method computes cross-sectional areas of uniform-length cylindrical tubes comprising the vocal tract. Cross-sectional areas are calculated from reflection coefficients at tube junctions whose values depend on the areas of adjoining tubes. A new state space representation of the speech production system has been formulated in which reflection coefficients are parameters. The state space model has been constructed using state equations of the glottal flow signal and vocal tract formulated from Liljencrants–Fant model and concatenated tube model respectively. Dual extended Kalman filtering algorithm has been used for estimation of unknown parameters of the system. The estimated reflection coefficients are then used to compute cross-sectional areas of the vocal tract. The performance of proposed technique has been compared to an existing shape estimation method proposed by Wakita. For both synthesized and natural speech signals, the performance of proposed method has been found to be comparable to the existing one. Nevertheless, the Kalman filter algorithm used in proposed method has provisions to tune measurement noise covariance which can be adjusted based on the noise level in speech. Therefore, the performance of proposed method has been seen to be comparatively more robust to noise than the existing technique.

Tài liệu tham khảo

Bar-Shalom, Y., Li, X. R., & Kirubarajan, T. (2004). Estimation with applications to tracking and navigation: Theory algorithms and software. New York: Wiley. Fant, G., Liljencrants, J., & Lin, Q. (1985). A four-parameter model of glottal flow. STL-QPSR, 4(1985), 1–13. Haykin, S. S., et al. (2001). Kalman filtering and neural networks. Hoboken: Wiley. Hu, Y. (2007). Subjective evaluation and comparison of speech enhancement algorithms. Speech Communication, 49, 588–601. Hwang, I., Balakrishnan, H., & Tomlin, C. (2006). State estimation for hybrid systems: Applications to aircraft tracking. IEE Proceedings Control Theory and Applications, 153(5), 556. Mathur, S., Story, B. H., & Rodríguez, J. J. (2006). Vocal-tract modeling: Fractional elongation of segment lengths in a waveguide model with half-sample delays. IEEE Transactions on Audio, Speech, and Language Processing, 14(5), 1754–1762. Mullen, J., Howard, D. M., & Murphy, D. T. (2007). Real-time dynamic articulations in the 2-D waveguide mesh vocal tract model. IEEE Transactions on Audio, Speech, and Language Processing, 15(2), 577–585. Plumpe, M. D., Quatieri, T. F., Reynolds, D., et al. (1999). Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Transactions on Speech and Audio Processing, 7(5), 569–586. Quatieri, T. F. (2006). Discrete-time speech signal processing: Principles and practice. Delhi: Pearson Education India. Routray, A., Pradhan, A. K., & Rao, K. P. (2002). A novel Kalman filter for frequency estimation of distorted signals in power systems. IEEE Transactions on Instrumentation and Measurement, 51(3), 469–479. Sahoo, S., & Routray, A. (2016). A novel method of glottal inverse filtering. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(7), 1230–1241. Schroeder, M. R. (1967). Determination of the geometry of the human vocal tract by acoustic measurements. The Journal of the Acoustical Society of America, 41(4B), 1002–1010. Schroeter, J., & Sondhi, M. M. (1994). Techniques for estimating vocal-tract shapes from the speech signal. IEEE Transactions on Speech and Audio Processing, 2(1), 133–150. Skordilis, Z. I., Toutios, A., Töger, J., & Narayanan, S. (2017). Estimation of vocal tract area function from volumetric Magnetic Resonance Imaging. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 924–928). IEEE. Sondhi, M. M., & Gopinath, B. (1971). Determination of vocal-tract shape from impulse response at the lips. The Journal of the Acoustical Society of America, 49(6B), 1867–1873. Sorensen, T., Toutios, A., Goldstein, L., & Narayanan, S. S. (2016). Characterizing vocal tract dynamics with real-time MRI. In 15th Conference on Laboratory Phonology, Ithaca, NY. Story, B. H., Titze, I. R., & Hoffman, E. A. (1996). Vocal tract area functions from magnetic resonance imaging. The Journal of the Acoustical Society of America, 100(1), 537–554. Wakita, H. (1973). Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms. IEEE Transactions on Audio and Electroacoustics, 21(5), 417–427.