Articulatory Event Detectors

Pleiades Publishing Ltd - Tập 66 - Trang 67-80 - 2020
V. N. Sorokin

Tóm tắt

Abstract—Articulatory event detectors, i.e., detectors of transitions from one articulatory state to another, are formed on based on analysis of spectral–temporal inhomogeneities in a speech signal. A triad such as /pause–fricative–vowel/ is segmented and recognized in the space of the principal components of the response spectrum of the detector of the pause–fricative transition, the spectrum of the fricative at its peak energy, and the response spectrum of the detector of the fricative–vowel transition at this detector’s peak. The root-mean-square error with respect to manual marking for the onset of fricatives is, on average, about 12 ms, and for the moment of the fricative–vowel transition, about 5 ms. Triad recognition errors with the same fricative and different subsequent vowels, as well as triad recognition errors differing only in the presence or absence of vocal excitement, constituted a few percent.

Tài liệu tham khảo

S. Furui, J. Acoust. Soc. Am. 80, 1016 (1986). K. N. Stevens, in Phonetic Linguistics: Essays in Honor of Peter Ladefoged, Ed. by V. A. Fromkin (Academic, Cambridge, MA, 1985), p. 243. K. N. Stevens, Acoustic Phonetics (MIT Press, Cambridge, MA, 2000). S. A. Liu, J. Acoust. Soc. Am. 100, 3417 (1996). K. Kirchhoff, G. A. Finkard, and G. Sagerer, Speech Commun. 37, 303 (2002). M. Hasegawa-Johnson, J. Baker, S. Borys, K. Chen, E. Coogan, S. Greenberg, A. Juneja, K. Kirchhoff, K. Livescu, S. Mohan, J. Muller, K. Sonmez, and T. Wang, in Proc. 2005 IEEE Int. Conference on Acoustics, Speech, and Signal Processing (ICASSP) (Philadelphia, PA, 2005), Vol. 1, p. 1213. A. Juneja and C. Espy-Wilson, J. Acoust. Soc. Am. 123, 1154 (2008). A. Jansen and P. Niyogi, J. Acoust. Soc. Am. 124, 1739 (2008). D. He, B. P. Lim, X. Yang, M. Hasegawa-Johnson, and D. Chen, J. Acoust. Soc. Am. 143, 3207 (2018). E. Seifritz, F. Esposito, F. Hennel, C. H. Mustofi, J. G. Neuhoff, D. Bilecen, G. Tedeschi, K. Scheffler, and F. Di Salle, Science 297 (5587), 1706 (2002). B. Delgutte and N. Y. S. Kiang, J. Acoust. Soc. Am. 75, 866 (1984). D. G. Sinex, J. Acoust. Soc. Am. 94, 1351 (1993). N. G. Bibikov, Sound Features Description by Auditory System Neurons of Land-Living Vertebrata (Nauka, Moscow, 1987) [in Russian]. P. X. Joris and T. C. Yin, J. Acoust. Soc. Am. 91, 215 (1992). W. Rhode and S. Greenberg, J. Neurophysiol. 71, 1797 (1994). K. Wang and S. A. Shamma, IEEE Trans. Speech Audio Process. 3, 382 (1995). N. G. Bibikov and S. V. Nizamov, Hear. Res. 101, 23 (1996). B. C. J. Moore, Auditory Processing of Temporal Fine Structure: Effects of Age and Hearing Loss (World Scientific, Singapore, 2014). N. Suga, J. Physiol. 217, 159 (1971). S. A. Shamma, J. W. Fleshman, P. R. Wiser, and H. Versnel, J. Neurophysiol. 69, 367 (1993). N. Kowalski, Y. Versnel, and S. A. Shamma, J. Neurophysiol. 73, 1513 (1995). N. Kowalski, Y. Versnel, and D. H. Raab, J. Acoust. Soc. Am. 33, 137 (1961). D. H. Raab, J. Acoust. Soc. Am. 33, 137 (1961). L. L. Elliot, J. Acoust. Soc. Am. 34, 1116 (1962). H. Babkoff and S. Sutton, J. Acoust. Soc. Am. 44, 1373 (1968). M. Wojtczak and N. Viemeister, J. Acoust. Soc. Am. 118, 3198 (2005). E. Roverud and E. A. Strickland, J. Acoust. Soc. Am. 138, 3245 (2015). S. G. Jennings, J. Chen, S. E. Fultz, J. B. Ahlstrom, and J. R. Dubno, J. Acoust. Soc. Am. 143 , 2232 (2018). V. N. Sorokin and D. N. Chepelev, Acoust. Phys. 51, 536 (2005). V. N. Sorokin, Theory of Speech-Generation (Radio i Svyaz’, Moscow, 1985) [in Russian]. L. A. Chistovich, V. A. Kozhevnikov, et al., Speech. Articulation and Speech Reception (Nauka, Moscow, 1965) [in Russian]. L. S. Chudnovskii and V. M. Ageev, Acoust. Phys. 60, 436 (2014). B. C. J. Moore and B. R. Glasberg, J. Acoust. Soc. Am. 74, 750 (1983). R. D. Patterson and J. Holdsworth, in Advances in Speech, Hearing and Language Processing (JAI, London, 1996), Vol. 3, p. 547. H. Yin, V. Hohmann, and C. Nadeu, Speech Commun. 53, 707 (2011). G. Fant, Acoustic Theory of Speech Production (Mouton, Hague, 1960). J. L. Flanagan, Speech Analysis, Synthesis, and Perception (Springer, New York, 1972). C. H. Shadle, The Handbook of Phonetic Sciences, Ed. by W. J. Hardcastle and J. Laver (Blackwell, Malden, MA, 1997), Vol. 2, p. 33. J. J. Ohala and M.-J. Solé, Turbulent Sounds: An Interdisciplinary Guide, Ed. by S. Fuchs, M. Toda, and M. Żygis (De Gruyter Mouton, Berlin, 2010), Vol. 2, p. 37. S. Narayanan and A. Alwan, IEEE Trans. Speech Audio Process. 8, 328 (2000). V. N. Sorokin, Speech Synthesis (Nauka, Moscow, 1992) [in Russian]. L. G. Loitsyanskii, Fluid Mechanics (Nauka, Moscow, 1978) [in Russian]. D. I. Blokhintsev, Acoustics of Heterogeneous Moving Medium (Nauka, Moscow, 1981) [in Russian]. I. Titze, J. Acoust. Soc. Am. 123, 2733 (2008). F. Alipour, R. Schere, and V. Patel, J. Fluids Eng. 117, 577 (1995). R. Signorello, S. Hassid, and D. Demolin, J. Acoust. Soc. Am. 143, EL386 (2018). C. Chan and K. Ng, IEEE Trans. Acoust., Speech, Signal Process. 33, 1130 (1985). A. Jongman, R. Wayland, and S. Wong, J. Acoust. Soc. Am. 108, 1252 (2000). A. M. A. Ali and J. V. der Spiegel, J. Acoust. Soc. Am. 109, 2217 (2001). V. N. Sorokin, Speech Processes (Narodnoe Obrazovanie, Moscow, 2012) [in Russian]. B. McMurray and A. Jongman, Psychol. Rev. 118, 219 (2011). L. Spinu, A. Kochetov, and J. Lilley, Speech Commun. 100, 41 (2018). V. N. Sorokin, Speech Commun. 14, 249 (1994). G. A. F. Seber, Multivariate Observations (Wiley, New York, 1984). A. I. Tsyplikhin and V. N. Sorokin, Inf. Protsessy 6, 177 (2006).