Phoneme analysis based on quantitative and qualitative entropy measurement

Computer Speech & Language - Tập 22 - Trang 313-329 - 2008
Jari Turunen1, Tarmo Lipping1
1Tampere University of Technology, Pori Pohjoisranta 11, P.O. Box 300, FIN-28601 Pori, Finland

Tài liệu tham khảo

3GPP TS 26.171, Speech codec speech processing functions; Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; General description, version 6, 2005, available: URL: <http://www.3gpp.org/ftp/Specs/html-info/26171.htm> (accessed 14.08.06.). Abarbanel, 1996 Abdallah, I., Montrésor, S., Baudry, M., 1997. Robust speech/non speech detection in adverse conditions using an entropy based estimators, In: Proceeding of 13th International Conference on Digital Signal Processing, pp. 757–760. Banbrook, M., McLaughlin, S., 1994. IS SPEECH CHAOTIC: invariant measures for speech data. In: Proceeding of IEE Colloquium on Exploiting Chaos in Signal Processing, pp. 801–810. Banbrook, 1999, Speech characterization and synthesis by nonlinear methods, IEEE Transactions on Speech and Audio Processing, 7, 1, 10.1109/89.736326 Bai, 1998, An optimal two-stage identification algorithm of Hammerstein–Wiener nonlinear systems, Automatica, 34, 333, 10.1016/S0005-1098(97)00198-2 Billings, 1980, Identification of nonlinear systems, IEE Proceedings, 127, 272, 10.1049/ip-d.1980.0047 Boshoff, H., Grotepass, M., 1991. The fractal dimension of fricative speech sounds. In: Proceeding of IEEE COMSIG, pp. 12–16. Deller, 1993 Dajer, M., Pereira, J., Maciel, C. 2005. Nonlinear dynamical analysis of normal voices. In: Proceeding of IEEE ISM, p. 6. ETSI EN 301704, Digital cellular telecommunications system (Phase2+) Adaptive Multi-Rate (AMR) speech transcoding (GSM 06.90 version 7.2.1 Release 1998), ETSI, 2000, 52p. Fraser, 1989, Information and entropy in strange attractors, IEEE Transactions on Information Theory, 35, 245, 10.1109/18.32121 Fackrell, J. 1996. Bispectral analysis of speech signals, PhD. Thesis, University of Edinburgh, 198p. Faundez-Zanuy, 2005, Nonlinear speech processing: overview and possibilities in speech coding, vol. 3445, 15 Gibson, J., Stanners, S., McClellan, S., 1993. Spectral entropy and coefficient rate for speech coding. In: Proceeding of 27th Asilomar Conference on Signals, Systems & Computers, pp. 925–929. Gupta, M., Friedlander, M., 2000. Maximum entropy classification applied to speech. In: Proceeding of 34th Asilomar Conference on Signals, Systems & Computers, pp. 1480–1483. Gómes, J., Baeyens, E., 2000. Identification of multivariable Hammerstein systems using rational orthonormal bases. In: Proceeding of 39th IEEE Conference on Decision and Control, pp. 2849–2854. Hoang, 2002, Model selection tests for nonlinear dynamic models, Econometrics Journal, 5, 1, 10.1111/1368-423X.t01-1-00071 ITU-T Recommendation G.723.1 Annex A, Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3kbit/s, ITU, 1996, 27p. ITU-T Recommendation G.729, Coding of Speech at 8kbit/s using Algebraic-Code-Excited Linear-Prediction (CS-ACELP), ITU, 1996, 35p. ITU-T Recommendation H.323, Packet-based multimedia communications systems, ITU, 1998, 120p. Johnson, M., Lindgren, A., Powinelli, R., Yuan, X., 2003. Performance of nonlinear speech enhancement using phase space reconstruction. In: Proceeding of IEEE ICASSP, pp. 920–923. Kleijn, W. 1997. On optimal and minimum-entropy coding. In: Proceeding of IEEE ICASSP, pp. 1671–1674. Kokes, M., Gibson, J. 2000. Spectral entropy-based wideband speech coding. In: Proceeding of 34th Asilomar Conference on Signals, Systems & Computers, pp. 1464–1468. Kondoz, 1998 Kubin, 1995, Nonlinear processing of speech, 557 Kumar, 1990, Attractor dimension entropy and modelling of speech time series, Electronics Letters, 26, 1790, 10.1049/el:19901147 Langi, A., Soemintaputra, K., Kisner, W., 1997. Multifractal processing of speech signals. In: Proceeding of IEEE ICICS, pp. 527–531. Le Gaillec, 2003, Time series nonlinearity modelling: a Giannakis formula type approach, Signal Processing, 83, 1759, 10.1016/S0165-1684(03)00092-6 Li, J., Zhang, B., Lin, F., 2003. Nonlinear speech model based on support vector machine and wavelet transform. In: Proceeding of 15th IEEE ICTAI, pp. 259–263. Maragos, P., 1991. Fractal aspects of speech signals: dimension and interpolation. In: Proceeding of IEEE ICASSP, pp. 417–420. Mergell, 1997, Modelling biphonation – the role of the vocal tract, Speech Communication, 22, 141, 10.1016/S0167-6393(97)00016-2 Miyano, 2000, Detecting nonlinear determinism in voiced sounds of Japanese vowel /a/, International Journal of Bifurcation and Chaos, 10, 1973, 10.1142/S0218127400001213 Martínez, F., Guillamón, A., Alcaraz, J., Alcaraz, M., 2002. Detection of chaotic behaviour in speech signals using the largest Lyapunov exponent. In: Proceeding of IEEE 14th International Conference on Digital Signal Processing, pp. 317–320. Nelles, 2001 OTAGO speech corpus, available URL: <http://translator.kedri.info/datasets/corpus/otago>. (accessed 01.06.05.). Paluš, 1995, Testing for nonlinearity using redundancies: quantitative and qualitative aspects, Physica D, 80, 186, 10.1016/0167-2789(95)90079-9 Paluš, 1993, Information theoretic test for nonlinearity in time series, Physics Letters A, 175, 203, 10.1016/0375-9601(93)90827-M Pearson, 2000, Gray-box identification of block-oriented nonlinear models, Journal of Process Control, 10, 301, 10.1016/S0959-1524(99)00055-4 Schreiber, 1996, Improved surrogate data for nonlinearity tests, Physical Review Letters, 77, 635, 10.1103/PhysRevLett.77.635 Schroeter, 1992, Speech coding based on physiological models of speech production, 231 Sinclair, S., Watson, C., 1995. The development of OTAGO speech database. In: Proceeding of 2nd New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems, pp. 298–301. Townshend, B. 1991. Nonlinear prediction of speech. In: Proceeding of IEEE ICASSP, pp. 425–428. Turunen, J., Tanttu, J.T., Lipping, T., 2005. Speech analysis using Higuchi fractal dimension. In: Proceeding of 4th International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, pp. 171–174. Turunen, 2003, Hammerstein model for speech coding, Eurasip Journal of Applied Signal Processing, 12, 1238, 10.1155/S1110865703307048 Thyssen, J., Nielsen, H., Hansen, S., 1994. Non-linear short-term prediction in speech coding. In: Proceeding of ICASSP, pp. 185–188. Wesley Barnes, 1994 Xu, 2001, Identifying chaotic systems using Wiener and Hammerstein cascade models, Mathematical and Computer Modelling, 22, 483, 10.1016/S0895-7177(00)00256-9