YIN, a fundamental frequency estimator for speech and music

Journal of the Acoustical Society of America - Tập 111 Số 4 - Trang 1917-1930 - 2002
Alain de Cheveigné1, Hideki Kawahara2,3
1Ircam-CNRS, 1 place Igor Stravinsky, 75004 Paris, France
2Equipe Perception et cognition musicales
3Wakayama University

Tóm tắt

An algorithm is presented for the estimation of the fundamental frequency (F0) of speech or musical sounds. It is based on the well-known autocorrelation method with a number of modifications that combine to prevent errors. The algorithm has several desirable features. Error rates are about three times lower than the best competing methods, as evaluated over a database of speech recorded together with a laryngograph signal. There is no upper limit on the frequency search range, so the algorithm is suited for high-pitched voices and music. The algorithm is relatively simple and may be implemented efficiently and with low latency, and it involves few parameters that must be tuned. It is based on a signal model (periodic signal) that may be extended in several ways to handle various forms of aperiodicity that occur in particular applications. Finally, interesting parallels may be drawn with models of auditory processing.

Từ khóa


Tài liệu tham khảo

2000, A fully-temporal account of the perception of dichotic pitches, Br. J. Audiol., 33, 106

1991, Pitch detection with a neural-net classifier, IEEE Trans. Signal Process., 39, 298, 10.1109/78.80812

1993, Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound, Proc. Institute of Phonetic Sciences, 17, 97

2001, Binaural processing model based on contralateral inhibition. I. Model structure, J. Acoust. Soc. Am., 110, 1074, 10.1121/1.1383297

1989, Calculation of a ‘narrowed’ autocorrelation function, J. Acoust. Soc. Am., 85, 1595, 10.1121/1.397363

1991, Musical frequency tracking using the methods of conventional and ‘narrowed’ autocorrelation, J. Acoust. Soc. Am., 89, 2346, 10.1121/1.400923

1996, Neural correlates of the pitch of complex tones. I. Pitch and pitch salience, J. Neurophysiol., 76, 1698, 10.1152/jn.1996.76.3.1698

1995, Perceptual segregation of concurrent speech sounds: absence of across-frequency grouping by common interaural delay, J. Acoust. Soc. Am., 98, 785, 10.1121/1.413571

1993, Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancellation model of auditory processing, J. Acoust. Soc. Am., 93, 3271, 10.1121/1.405712

1997, Concurrent vowel identification. III. A neural model of harmonic interference cancellation, J. Acoust. Soc. Am., 101, 2857, 10.1121/1.419480

1998, Cancellation model of pitch perception, J. Acoust. Soc. Am., 103, 1261, 10.1121/1.423232

1999, Multiple period estimation and pitch perception model, Speech Commun., 27, 175, 10.1016/S0167-6393(98)00074-0

1982, Measurement of pitch in speech: an implementation of Goldstein’s theory of pitch perception, J. Acoust. Soc. Am., 71, 1568, 10.1121/1.387811

1973, An optimum processor theory for the central formation of the pitch of complex tones, J. Acoust. Soc. Am., 54, 1496, 10.1121/1.1914448

1988, Measurement of pitch by subharmonic summation, J. Acoust. Soc. Am., 83, 257, 10.1121/1.396427

1998, Envelope coding in the lateral superior olive. III. Comparison with afferent pathways, J. Neurophysiol., 79, 253, 10.1152/jn.1998.79.1.253

1999, Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds, Speech Commun., 27, 187, 10.1016/S0167-6393(98)00085-5

1951, A duplex theory of pitch perception, Experientia, 7, 128, 10.1007/BF02156143

1991, Super resolution pitch determination of speech signals, IEEE Trans. Acoust., Speech, Signal Process., 39, 40, 10.1109/78.80763

1991, Virtual pitch and phase sensitivity of a computer model of the auditory periphery. I: Pitch identification, J. Acoust. Soc. Am., 89, 2866, 10.1121/1.400725

1948, The perception of repeated bursts of noise, J. Acoust. Soc. Am., 20, 171, 10.1121/1.1906360

1982, A time warping approach to fundamental period estimation, IEEE Trans. Syst. Man Cybern., 12, 383, 10.1109/TSMC.1982.4308828

1967, Cepstrum pitch determination, J. Acoust. Soc. Am., 41, 293, 10.1121/1.1910339

2001, The lower limit of melodic pitch, J. Acoust. Soc. Am., 109, 2074, 10.1121/1.1359797

1962, Existence region of the tonal residue. I, J. Acoust. Soc. Am., 34, 1224, 10.1121/1.1918307

1992, Maximum-likelihood harmonic matching for fundamental frequency estimation, J. Acoust. Soc. Am., 92, 2428, 10.1121/1.404627

1974, Average magnitude difference function pitch extractor, IEEE Trans. Acoust., Speech, Signal Process., 22, 353, 10.1109/TASSP.1974.1162598

1974, Pitch, consonance and harmony, J. Acoust. Soc. Am., 55, 1061, 10.1121/1.1914648

1973, The pattern-transformation model of pitch, J. Acoust. Soc. Am., 54, 407, 10.1121/1.1913592

1996, Pitch strength of iterated rippled noise, J. Acoust. Soc. Am., 100, 3329, 10.1121/1.416973