Analysis of English Nonsense Syllable Recognition in Noise

Phonetica - Tập 60 Số 2 - Trang 129-157 - 2003
José R. Benkı́1
1Department of Linguistics, University of Michigan, Ann Arbor, Mich. 48109-1285, USA.

Tóm tắt

AbstractEnglish nonsense consonant-vowel-consonant syllables were presented at four different signal-to-noise ratios for recognition. Information theory methods are used to analyze the response data according to segment type and phonological feature, and are consistent with previous studies showing that the consonantal contrast of voicing is more robust than place of articulation, word-initial consonants are more robust than word-final consonants, and that vowel height is more robust than vowel backing. Asymmetrical confusions are also analyzed, indicating a bias toward front vowels over back vowels. The results are interpreted as parts of phonetic explanations for synchronic and diachronic phonological patterns.

Từ khóa


Tài liệu tham khảo

Allen, J.: How do humans process and recognize speech? IEEE Trans. Speech Audio Processing 2: 567–577 (1994).

Alwan, A.; Zhu, Q., Lo, J.: Human and machine recognition of speech sounds in noise. Proc. 13th World Multiconf. on Systems, Cybernetics, and Information, pp. 218–223 (2001).

Bahl, L.R.; Jelinek, F.; Mercer, R.L.: A maximum likelihood approach to continuous speech recognition. IEEE Trans. Pattern Analysis Machine Intelligence PAMI-5, pp. 179–190 (1983).

Barrett, R.: A grammar of Sipakapense Maya; PhD diss. University of Texas, Austin (1999).

Beckman, J.N.: Positional faithfulness; PhD diss. University of Massachusetts, Amherst (1998).

Beddor, P.S.; Evans-Romaine, D.: Acoustic-perceptual factors in phonological assimilations: a study of syllable-final nasals. Riv. Ling. 7: 145–174 (1995).

Benkí, J.R.: Evidence for phonological categories from speech perception; PhD diss. University of Massachusetts, Amherst (1998).

Benkí, J.R.: Place of articulation and first formant transition pattern both affect perception of voicing in English. J. Phonet. 29: 1–22 (2001). Benkí, J.R.: Quantitative evaluation of lexical status, word frequency, and neighborhood density as context effects in spoken word recognition. J. acoust. Soc. Am. 113: 1689–1705 (2003).

Boothroyd, A.; Nittrouer, S.: Mathematical treatment of context effects in phoneme and word recognition. J. acoust. Soc. Am. 84: 101–114 (1988).

Chang, S.C.; Plauché, M.; Ohala, J.: Markedness and consonant confusion asymmetries; in Hume, Johnson, The role of speech perception in phonology (Academic Press, San Diego 2001).

Clarke, F.R.: Constant-ratio rule for confusion matrices in speech communication. J. acoust. Soc. 29: 515–520 (1957).

Clements, G.N.; Keyser, S.J.: CV phonology: a generative theory of the syllable (MIT Press, Cambridge 1983).

Cooper, F.S.; Liberman, A.M.; Delattre, P.; Gerstman, L.: Some experiments on the perception of speech sounds. J. acoust. Soc. Am. 24: 597–606 (1952).

Cover, T.M.; Thomas, J.A.: Elements of information theory (Wiley, New York 1991).

Cox, F.: Vowel change in Australian English. Phonetica 56: 1–27 (1999).

Crothers, J.: Typology and universals of vowel systems; in Greenberg, Ferguson, Moravcsik, Universals of human language, vol. 2: Phonology (Stanford University Press, Stanford 1978). Delattre et al., 1955.

Delgutte, B.; Kiang, N.Y.S.: Speech coding in the auditory nerve. III. Voiceless fricative consonants. J. acoust. Soc. Am. 75: 887–896 (1984a).

Delgutte, B.; Kiang, N.Y.S.: Speech coding in the auditory nerve. IV. Sounds with consonant-like dynamic characteristics. J. acoust. Soc. Am. 75: 897–907 (1984b). Fant, G.: On the predictability of formant levels and spectrum envelopes from formant frequencies; in For Roman Jakobson (Mouton, The Hague 1956).

Fant, G.: Acoustic theory of speech production. The Hague, Netherlands: Mouton (1960).

Fairbanks, G.; Grubb, P.: A psychophysical investigation of vowel formants. J. Speech Hear. Res. 4: 203–219 (1961).

Fletcher, H.: Speech and hearing in communication (Van Nostrand, New York 1953).

Fourgeron, C.; Keating, P.A.: Articulatory strengthening at edges of prosodic domains. J. acoust. Soc. Am. 101: 3728–3740 (1997).

French, N.R.; Carter, C.W.; Koenig, W.: The words and sounds of telephone conversations. Bell Syst. Tech. J. 9: 290–324 (1930).

Fujimura, O.; Macchi, M.J.; Streeter, L.A.: Perception of stop consonants with conflicting transitional cues: a cross-linguistic study. Lg. and Speech 21: 227–346 (1978).

Goldstein, L.: Vowel shifts and articulatory-acoustic relations; in Cohen, van den Broecke, Proc. 10th Int. Congr. Phonet. Sci., pp. 267–273 (Foris, Dordrecht 1983).

Gordon, M.; Heath, J.: Sex, sound symbolism, and sociolinguistics. Curr. Anthrop. 39: 421–449 (1998).

Greenberg, S.: The ear as a speech analyzer. J. Phonet. 16: 139–150 (1988).

Hajek, J.: Universals of sound change in nasalization. Publ. Philological Soc. 31 (Blackwell, Oxford 1997).

Hillenbrand, J.M.; Clark, M.J.; Nearey, T.M.: Effects of consonant environment on vowel formant patterns. J. acoust. Soc. Am. 109: 748–763 (2001).

Hillenbrand, J.M.; Getty, L.A.; Clark, M.J.; Wheeler, K.: Acoustic characteristics of American English vowels. J. acoust. Soc. Am. 97: 3099–3111 (1995).

Hume, E.; Johnson, K.: A model of the interplay of speech perception and phonology; in Hume, Johnson, The role of speech perception in phonology (Academic Press, San Diego 2001).

Hura, S.L.; Lindblom, B.; Diehl, R.: On the role of perception in shaping phonological assimilation rules. Lang. Speech 35: 59–72 (1992).

Itô, J.: Syllable theory in prosodic phonology; PhD diss. University of Massachusetts, Amherst (Garland, New York 1986).

Jakobson, R.: Kindersprache, Aphasie und allgemeine Lautgesetze (Sprakvetenskapliga Sallskapets i Uppsala Forhandlingar 1940–1942, Almqvist and Wiksell, Uppsala 1941). Reprinted in Jakobson, Selected writings 1 (Mouton, The Hague, 1962).

Jenison, R.L.; Greenberg, S.; Kluender, K.R.; Rhode, W.S.: A composite model of the auditory periphery for the processing of speech based on the filter response functions of single auditory-nerve fibers. J. acoust. Soc. Am. 90: 773–786 (1991).

Kenstowicz, M.; Kisseberth, C.: Generative phonology (Academic Press, San Diego 1979).

Kiang, N.Y.S.; Moxon, E.C.: Tails of tuning curves of auditory-nerve fibers. J. acoust. Soc. Am. 55: 620–630 (1974).

Kiang, N.Y.S.; Watanabe, T.; Thomas, E.C.; Clark, L.F.: Discharge patterns of single fibers in the cat’s auditory nerve. MIT Res. Monogr. No. 35 (MIT Press, Cambridge 1965). Kingston, J.: Keeping and losing contrasts. Proc. 28th Annu. Meet. Berkeley Ling. Soc., Berkeley 2002 (Berkeley Linguistics Society, Berkeley, in press). Kirk, C.J.: Phonological constraints on the segmentation of continuous speech; PhD diss. University of Massachusetts, Amherst (2001).

Klein, W.; Plomp, R.; Pols, L.C.W.: Vowel spectra, vowel spaces, and vowel identification. J. acoust. Soc. Am. 48: 999–1009 (1970).

Kluender, K.R.: Effects of first formant onset properties on voicing judgments result from processes not specific to humans. J. acoust. Soc. Am. 90: 83–96 (1991).

Kluender, K.R.; Lotto, A.J.; Jenison, R.L.: Perception of voicing for syllable-initial stops at different intensities: does synchrony capture signal voiceless stop consonants? J. acoust. Soc. Am. 97: 2552–2567 (1995).

Krakow, R.: The articulatory organization of syllables: a kinematic analysis of labial and velar gestures; PhD diss. Yale University, New Haven (1989).

Krakow, R.: Nonsegmental influences on velum movement patterns: syllables, sentences, stress, and speaking rate; in Huffman, Krakow, Nasals, nasalization, and the velum (Phonetics and phonology V), (Academic Press, San Diego 1993).

Krakow, R.: Physiological organization of syllables: a review. J. Phonet. 27: 23–54 (1999).

Kucera, F.; Francis, W.: Computational analysis of present day American English (Brown University Press, Providence 1967.

Labov, W.: Principles of linguistic change, vol. I: Internal factors (Blackwell, Cambridge 1994). Labov, W., Yaeger, M.; Steiner, R.: A quantitative study of sound change in progress (US Regional Study, Philadelphia 1972).

Lenzo, K.: t2p: Text-to-phoneme converter builder software. Retrieved from <http://www-2.cs.cmu.edu/~lenzo/ t2p/> (1998).

Liberman, A.M.; Delattre, P.C.; Cooper, F.S.: Some cues for the distinction between voiced and voiceless stops in initial position. Lang. Speech 1: 153–167 (1958).

Liljencrantz, J.; Lindblom, B.: Numerical simulation of vowel quality systems: the role of perceptual contrast. Language 48: 839–862 (1972).

Lindblom, B.: Spectrographic study of vowel reduction. J. acoust. Soc. Am. 35: 1773–1781 (1963).

Lindblom, B.: Experiments in sound structure. Plenary address, 8th Int. Congr. Phonet. Sci., Leeds 1975. Rev. Phonét. appl. 51: 154–189 (1979).

Lindblom, B.: Phonetic universals in vowel systems; in Ohala, Jaeger, Experimental phonology (Academic Press, New York 1986).

Lindblom, B.; Guion, S.; Hura, S.; Moon, S.-J.; Willerman, R.: Is sound change adaptive? Riv. Ling. 7: 5–37 (1995).

Lippmann, R.P.: Speech recognition by machines and humans. Speech Commun. 22: 1–15 (1997).

Lisker, L.: Closure duration and the intervocalic voiced-voiceless distinction in English. Language 33: 42–49 (1957).

Lisker, L.; Abramson, A.S.: A cross-language study of voicing in initial stops: acoustical measurements. Word 20: 384–422 (1964).

Lombardi, L.: Why place and voice are different: constraint-specific alternations in optimality theory; in Lombardi, Segmental phonology in optimality theory: constraints and representations (Cambridge University Press, Cambridge 2001).

Lotto, A.J., Kluender, K.R.: Synchrony capture hypothesis fails to account for effects of amplitude on voicing perception. J. acoust. Soc. Am. 111: 1056–1062 (2002).

Luce, P.A.; Pisoni, D.B.: Recognizing spoken words: the neighborhood activation model. Ear Hear. 19: 1–36 (1998).

Macmillan, N.A.; Creelman, C.D.: Detection theory: a user’s guide (Cambridge University Press, Cambridge 1991).

Manuel, S.Y.: Some phonetic bases for the relative malleability of syllable-final versus syllable-initial consonants. Proc. 12th Int. Congr. Phonet. Sci., vol. 5, pp. 118–121 (University of Provence, Aix-en-Provence 1991).

Miller, G.A.; Heise, G.A.; Lichten, W.: The intelligibility of speech as a function of the context of the test material. J. exp. Psychol. 41: 329–335 (1951).

Miller, G.A.; Niceley, P.E.: An analysis of perceptual confusions among some English consonants. J. acoust. Soc. Am. 27: 338–352 (1955).

Miller, J.L.: Nonindependence of feature processing in initial consonants. J. Speech Hear. Res. 20: 519–528 (1977).

Nearey, T.N.: The segment as a unit of speech perception. J. Phonet. 18: 347–373 (1990).

Nearey, T.N.: Speech perception as pattern recognition. J. acoust. Soc. Am. 101: 3241–3254 (1997).

Newman, S.: Yokuts language of California. Viking Fund Publ. Anthrop. vol. 2 (New York 1944).

Nooteboom, S.: Perceptual confusions among Dutch vowels presented in noise. IPO Annu. Progr. Rep., vol. 3, pp. 68–71 (Instituut voor Perceptie Onderzoek, Eindhoven 1968).

Nusbaum, H.C.; Pisoni, D.B.; Davis, C.K.: Sizing up the Hoosier mental lexicon: measuring the familiarity of 20,000 words. Research on Speech Perception Progress Report, No. 10 (Speech Research Laboratory, Psychology Department, Indiana University, Bloomington 1984).

Ohala, J.: The listener as a source of sound change. Papers from the parasession on language and behavior (Chicago Linguistics Society, Chicago 1981).

Ohala, J.J.: Alternatives to the sonority hierarchy for explaining segmental sequential constraints. Papers from the regional meetings, vol. 2, pp. 319–338 (Chicago Linguistic Society, Chicago 1990).

Ohala, J.J.; Kawasaki, H.: Prosodic phonology and phonetics. Phon. Yb. 1: 113–127 (1984).

Paradis, C.; Prunet, J.-F.: The special status of coronals: internal and external evidence (Academic Press, New York 1991).

Peterson, G.E.; Barney, H.L.: Control methods used in a study of the vowels. J. acoust. Soc. Am. 24: 175–184 (1952).

Pickett, J.M.: Perception of vowels heard in noises of various spectra. J. acoust. Soc. Am. 29: 613–620 (1957).

Plauché, M.C.: Acoustic cues in the directionality of stop consonant confusions; PhD diss. University of California, Berkeley (2001).

Plauché, M.; Delogu, C.; Ohala, J.: Asymmetries in consonant confusion. Proc. Eurospeech ’97: 5th Eur. Conf. on Speech Commun. Technol., vol. 4, pp. 2187–2190 (1997).

Prince, A.; Smolensly, P.: Optimality theory: constraint interaction in generative grammar (Rutgers University, New Brunswick and University of Colorado, Boulder, unpublished manuscript, 1993).

Recasens, D.: Coarticulatory patterns and degrees of coarticulatory resistance in Catalan CV sequences. Lang. Speech 28: 97–114 (1985).

Recasens, D.: An electropalatographic and acoustic study of consonant-to-vowel coarticulation. J. Phonet. 19: 177–192 (1991).

Recasens, D.: Lingual coarticulation; in Hardcastle, Hewlett, Coarticulation: theory, data and techniques (Cambridge University Press, Cambridge 1999).

Repp, B.H.: Perceptual integration and differentiation of spectral information of spectral cue for intervocalic stop consonants. Percept. Psychophys. 24: 471–485 (1978).

Repp, B.H.: Bidirectional context effects in the perception of VC-CV sequences. Percept. Psychophys. 33: 147–155 (1983).

Savin, H.: Word-frequency effects and errors in the perception of speech. J. acoust. Soc. Am. 35: 200–206 (1963).

Schroeder, M.R.: Reference signal for signal quality studies. J. acoust. Soc. Am. 44: 1735–1736 (1968).

Shannon, C.E.: A mathematical theory of communication. Bell Syst. tech. J. 27: 379–423, 623–656 (1948).

Sinex, D.G.: Auditory nerve fiber representation of cues to voicing in syllable-final stop contexts. J. acoust. Soc. Am. 90: 2441–2449 (1995).

Sinex, D.G.; Geisler, C.D.: Responses of auditory-nerve fibers to consonant-vowel syllables. J. acoust. Soc. Am. 73: 602–615 (1983).

Smith, R.L.: Adaptation, saturation and physiological masking in single auditory-nerve fibers. J. acoust. Soc. Am. 65: 166–178 (1979).

Smits, R.: Temporal distribution of information for human consonant recognition VCV utterances. J. Phonet. 28: 111–135 (2000).

Sommers, M.S.; Kirk, K.I.; Pisoni, D.B.: Some considerations in evaluating spoken word recognition by normal-hearing, noise-masked normal-hearing, and cochlear implant listeners. I: The effects of response format. Ear Hear. 18: 89–99 (1997).

Son, R.J.J.H. van; Pols, L.C.W.: Perisegmental speech improves consonant and vowel identification. Speech Commun. 29: 1–22 (1999). Steele, R.D.: The segmental phonology of contemporary standard Polish; PhD diss. Harvard University (1973).

Steiber, Z.: A historical phonology of the Polish language (Winter Universitätsverlag, Heidelberg 1973).

Stemberger, J.P.: Vocalic underspecification in English language production. Language 68: 492–524 (1992).

Steriade, D.: Directional asymmetries in assimilation: a directional account; in Hume, Johnson, The role of speech perception in phonology (Academic Press, San Diego 2001).

Steriade, D.: Phonetics in phonology: the case of laryngeal neutralization (University of California, Los Angeles, unpublished manuscript; <http://web.mit.edu/linguistics/www/steriade.home.html>, 1997).

Stevens, K.N.: On the quantal nature of speech. J. Phonet. 17: 3–45 (1989).

Stevens, K.N.; Blumstein, S.E.: The search for invariant acoustic correlates of phonetic features; in Eimas, Miller, Perspectives on the study of speech (Erlbaum, Hillsdale 1981).

Stevens, K.N.; House, A.S.: Perturbations of vowel articulations by consonantal context: an acoustical study. J. Speech Hear. Res. 6: 111–128 (1963).

Sweet, H.: History of English sounds (Clarendon Press, Oxford 1888).

Wang, M.D.; Bilger, R.C.: Consonant confusions in noise: a study of perceptual features. J. acoust. Soc. Am. 54: 1248–1266 (1973).

Whalen, D.W.: Vowel and consonant judgments are not independent when cued by the same information. Percept. Psychophys. 46: 284–292 (1989).

Wright, R.: Perceptual cues in contrast maintenance; in Hume, Johnson, The role of speech perception in phonology (Academic Press, San Diego 2001).

Young, E.D.; Sachs, M.B.: Representation of steady-state vowels in the temporal aspects of the discharge patterns of the populations of auditory-nerve fibers. J. acoust. Soc. Am. 66: 1381–1403 (1979).