thumbnail

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.

 

 

 

 

Cơ quản chủ quản:  N/A

Các bài báo tiêu biểu

Improved MFCC feature extraction by PCA-optimized filter-bank for speech recognition
- Trang 49-52
Shang-Ming Lee, Shi-Hau Fang, Jeih-weih Hung, Lin-Shan Lee
Although Mel-frequency cepstral coefficients (MFCC) have been proven to perform very well under most conditions, some limited efforts have been made in optimizing the shape of the filters in the filter-bank in the conventional MFCC approach. This paper presents a new feature extraction approach that designs the shapes of the filters in the filter-bank. In this new approach, the filter-bank coeffic...... hiện toàn bộ
#Mel frequency cepstral coefficient #Feature extraction #Speech recognition #Shape #Filters #Principal component analysis #Additive noise #Working environment noise #Noise shaping #Cepstral analysis
Out-of-vocabulary word modeling using multiple lexical fillers
- Trang 226-229
G. Boulianne, P. Dumouchel
In large vocabulary speech recognition, out-of-vocabulary words are an important cause of errors. We describe a lexical filler model that can be used in a single pass recognition system to detect out-of-vocabulary words and reduce the error rate. When rescoring word graphs with better acoustic models, word fillers cause a combinatorial explosion. We introduce a new technique, using several thousan...... hiện toàn bộ
#Vocabulary #Speech recognition #Dictionaries #Acoustic signal detection #Explosions #Robustness #Natural languages #Error analysis #Degradation #Character recognition
Statistical learning of language pronunciation structure
- Trang 339-342
F. Korkmazskiy
This paper presents a new approach to rule based pronunciation generation. The system presented can automatically learn a new language pronunciation structure and use this knowledge for pronunciation generation for an arbitrary context sensitive language. Unlike conventional text-to-speech systems which are based on the cost expensive human expert knowledge about a specific language, this system c...... hiện toàn bộ
#Statistical learning #Speech synthesis #Humans #Dictionaries #Databases #Natural languages #Speech recognition #Multimedia communication #Costs #Decision trees
Adaptive training for robust ASR
- Trang 15-20
M.J.F. Gales
Adaptive training is a powerful training technique for building speech recognition systems on nonhomogeneous data. The aim is to remove unwanted variability, such as changes in speaker, channel or acoustic environment, from desired changes, the acoustic differences between words. During training, two sets of models are generated: a canonical model set for the desired "true" variability of the spee...... hiện toàn bộ
#Robustness #Automatic speech recognition #Loudspeakers #Speech recognition #Training data #Target recognition #Feature extraction #Acoustical engineering #Data engineering #Power engineering and energy
Collaborative steering of microphone array and video camera toward multi-lingual tele-conference through speech-to-speech translation
- Trang 119-122
T. Nishiura, R. Gruhn, S. Nakamura
It is very important for multilingual teleconferencing through speech-to-speech translation to capture distant-talking speech with high quality. In addition, the speaker image is also needed to realize a natural communication in such a conference. A microphone array is an ideal candidate for capturing distant-talking speech. Uttered speech can be enhanced and speaker images can be captured by stee...... hiện toàn bộ
#Microphone arrays #Collaboration #Cameras #Teleconferencing #Speech synthesis #Direction of arrival estimation #Loudspeakers #Acoustic noise #Working environment noise #Natural languages
Language models beyond word strings
- Trang 167-176
E. Noth, A. Batliner, H. Niemann, G. Stemmer, F. Gallwitz, J. Spilker
In this paper we want to show how n-gram language models can be used to provide additional information in automatic speech understanding systems beyond the pure word chain. This becomes important in the context of conversational dialogue systems that have to recognize and interpret spontaneous speech. We show how n-grams can: (1) help to classify prosodic events like boundaries and accents; (2) be...... hiện toàn bộ
#Speech recognition #Speech processing #Speech analysis #Databases #Natural languages #Event detection #Phase detection #Stochastic systems #Automatic speech recognition #Virtual manufacturing
Multispeaker speech activity detection for the ICSI meeting recorder
- Trang 107-110
T. Pfau, D.P.W. Ellis, A. Stolcke
As part of a project into speech recognition in meeting environments, we have collected a corpus of multichannel meeting recordings. We expected the identification of speaker activity to be straightforward given that the participants had individual microphones, but simple approaches yielded unacceptably erroneous labelings, mainly due to crosstalk between nearby speakers and wide variations in cha...... hiện toàn bộ
#Crosstalk #Speech recognition #Microphones #Hidden Markov models #Labeling #Detectors #Microwave integrated circuits #Noise level #Silicon compounds #Computer science
Speech interfaces for mobile communications
- Trang 93-95
H. Nakano
This paper explains speech interfaces for mobile communication. Mobile interfaces have three important design rules: do not disturb the user's main task, work within the restrictions of user's ability, and minimize the resource requirements. Social acceptance is also important. In Japan, trial and regular services with speech interfaces in mobile environments have already been launched, but they a...... hiện toàn bộ
#Mobile communication #Cellular phones #Displays #Postal services #Weather forecasting #Portals #Privacy #Working environment noise #Automatic speech recognition #Voice mail
Incremental language models for speech recognition using finite-state transducers
- Trang 194-197
H.J.G.A. Dolfing, I.L. Hetherington
In the context of the weighted finite-state transducer approach to speech recognition, we investigate a novel decoding strategy to deal with very large n-gram language models often used in large-vocabulary systems. In particular, we present an alternative to full, static expansion and optimization of the finite-state transducer network. This alternative is useful when the individual knowledge sour...... hiện toàn bộ
#Natural languages #Speech recognition #Decoding #Hidden Markov models #Acoustic transducers #Laboratories #Context modeling #Oceans #Surveillance #Oxygen
High performance telephone bandwidth speaker independent continuous digit recognition
- Trang 405-408
P. Cosi, J.-P. Hosoma, A. Valente
The development of a high-performance telephone-bandwidth speaker independent connected digit recognizer for Italian is described. The CSLU Speech Toolkit was used to develop and implement the hybrid ANN/HMM system, which is trained on context-dependent categories to account for coarticulatory variation. Various front-end processing and system architectures were compared and, when the best feature...... hiện toàn bộ
#Telephony #Bandwidth #Speech recognition #Automatic speech recognition #Natural languages #Hidden Markov models #System testing #Mel frequency cepstral coefficient #Collision mitigation #Feedforward systems