Improved MFCC feature extraction by PCA-optimized filter-bank for speech recognition

Shang-Ming Lee1, Shi-Hau Fang1, Jeih-weih Hung1, Lin-Shan Lee1
1Graduate Institute of Commuication Engineering, National Taiwan University, Taipei, Taiwan

Tóm tắt

Although Mel-frequency cepstral coefficients (MFCC) have been proven to perform very well under most conditions, some limited efforts have been made in optimizing the shape of the filters in the filter-bank in the conventional MFCC approach. This paper presents a new feature extraction approach that designs the shapes of the filters in the filter-bank. In this new approach, the filter-bank coefficients are data-driven and obtained by applying principal component analysis (PCA) to the FFT spectrum of the training data. The experimental results show that this method is robust under noisy environment and is well additive with other noise-handling techniques.

Từ khóa

#Mel frequency cepstral coefficient #Feature extraction #Speech recognition #Shape #Filters #Principal component analysis #Additive noise #Working environment noise #Noise shaping #Cepstral analysis

Tài liệu tham khảo

yim, 2000, Auditory Spectrum Based features (ASBF) for Robust Speech Recognition, ICSLP saon, 2000, Minimum Bayes error feature selection, ICSLP jolliffe, 1986, Principal Component Analysis, 10.1007/978-1-4757-1904-8 10.1016/S0167-6393(98)00061-2 0 10.1109/ICASSP.1993.319393 ghitza, 1991, Auditory Nerve Representations as a Basis for Speech Recognition rabiner, 1993, Fundamentals of Speech Recognition 10.1109/ICASSP.1998.675351 biern, 2001, An Application of Discriminative Feature Extraction to Filter-Bank-Based Speech Recognition, IEEE Transactions on Speech and Audio Processing, 9 dernuynck, 1999, Optimal feature sub-space selection based on discriminant analysis, Eurospeech