Distributed speech recognition with codec parameters

B. Raj1, J. Migdal2, R. Singh3
1Mitsubishi Electric Research Laboratories, Inc., Cambridge, MA, USA
2Massachusetts Institute of Technology, Cambridge, MA, USA
3Carnegie Mellon University, Pittsburgh, PA USA

Tóm tắt

Communication devices which perform distributed speech recognition (DSR) tasks currently transmit standardized coded parameters of speech signals. Recognition features are extracted from signals reconstructed using these on a remote server. Since reconstruction losses degrade recognition performance, proposals are being considered to standardize DSR-codecs which derive recognition features, to be transmitted and used directly for recognition. However, such a codec must be embedded on the transmitting device, along with its current standard codec. Performing recognition using codec bitstreams avoids these complications: no additional feature-extraction mechanism is required on the device, and there are no reconstruction losses on the server. We propose an LDA-based method for extracting optimal feature sets from codec bitstreams and demonstrate that features so derived result in improved recognition performance for the LPC, GSM and CELP codecs. For GSM and CELP, we show that the performance is comparable to that with uncoded speech and standard DSR-codec features.

Từ khóa

#Speech recognition #Codecs #Feature extraction #GSM #Propagation losses #Degradation #Performance loss #Proposals #Code standards #Linear predictive coding

Tài liệu tham khảo

huerta, 1998, Speech Recognition from GSM codec parameters, Proc ICSLP 1998 gallardo-antolin, 1998, Recognition from GSM digital signal, Proc ICSLP 1998 duda, 2001, Pattern Classification kim, 2000, Bitstream-based feature extraction for wireless speech recognition, Proc ICASSP 2000 0 2000, Distributed Speech Recognition; Front-end feature extraction algorithm; Compression algorithms, Document ETSI E8201 108 V1 1 2 (2000-04) 10.1049/el:19980101 0 10.1109/ICSLP.1996.607278