YIN, a fundamental frequency estimator for speech and music
Tóm tắt
An algorithm is presented for the estimation of the fundamental frequency (F0) of speech or musical sounds. It is based on the well-known autocorrelation method with a number of modifications that combine to prevent errors. The algorithm has several desirable features. Error rates are about three times lower than the best competing methods, as evaluated over a database of speech recorded together with a laryngograph signal. There is no upper limit on the frequency search range, so the algorithm is suited for high-pitched voices and music. The algorithm is relatively simple and may be implemented efficiently and with low latency, and it involves few parameters that must be tuned. It is based on a signal model (periodic signal) that may be extended in several ways to handle various forms of aperiodicity that occur in particular applications. Finally, interesting parallels may be drawn with models of auditory processing.
Từ khóa
Tài liệu tham khảo
2000, A fully-temporal account of the perception of dichotic pitches, Br. J. Audiol., 33, 106
1991, Pitch detection with a neural-net classifier, IEEE Trans. Signal Process., 39, 298, 10.1109/78.80812
1993, Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound, Proc. Institute of Phonetic Sciences, 17, 97
2001, Binaural processing model based on contralateral inhibition. I. Model structure, J. Acoust. Soc. Am., 110, 1074, 10.1121/1.1383297
1989, Calculation of a ‘narrowed’ autocorrelation function, J. Acoust. Soc. Am., 85, 1595, 10.1121/1.397363
1991, Musical frequency tracking using the methods of conventional and ‘narrowed’ autocorrelation, J. Acoust. Soc. Am., 89, 2346, 10.1121/1.400923
1996, Neural correlates of the pitch of complex tones. I. Pitch and pitch salience, J. Neurophysiol., 76, 1698, 10.1152/jn.1996.76.3.1698
1995, Perceptual segregation of concurrent speech sounds: absence of across-frequency grouping by common interaural delay, J. Acoust. Soc. Am., 98, 785, 10.1121/1.413571
1993, Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancellation model of auditory processing, J. Acoust. Soc. Am., 93, 3271, 10.1121/1.405712
1997, Concurrent vowel identification. III. A neural model of harmonic interference cancellation, J. Acoust. Soc. Am., 101, 2857, 10.1121/1.419480
1999, Multiple period estimation and pitch perception model, Speech Commun., 27, 175, 10.1016/S0167-6393(98)00074-0
1982, Measurement of pitch in speech: an implementation of Goldstein’s theory of pitch perception, J. Acoust. Soc. Am., 71, 1568, 10.1121/1.387811
1973, An optimum processor theory for the central formation of the pitch of complex tones, J. Acoust. Soc. Am., 54, 1496, 10.1121/1.1914448
1998, Envelope coding in the lateral superior olive. III. Comparison with afferent pathways, J. Neurophysiol., 79, 253, 10.1152/jn.1998.79.1.253
1999, Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds, Speech Commun., 27, 187, 10.1016/S0167-6393(98)00085-5
1991, Super resolution pitch determination of speech signals, IEEE Trans. Acoust., Speech, Signal Process., 39, 40, 10.1109/78.80763
1991, Virtual pitch and phase sensitivity of a computer model of the auditory periphery. I: Pitch identification, J. Acoust. Soc. Am., 89, 2866, 10.1121/1.400725
1982, A time warping approach to fundamental period estimation, IEEE Trans. Syst. Man Cybern., 12, 383, 10.1109/TSMC.1982.4308828
1992, Maximum-likelihood harmonic matching for fundamental frequency estimation, J. Acoust. Soc. Am., 92, 2428, 10.1121/1.404627
1974, Average magnitude difference function pitch extractor, IEEE Trans. Acoust., Speech, Signal Process., 22, 353, 10.1109/TASSP.1974.1162598