thumbnail

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.

 

 

 

 

Cơ quản chủ quản:  N/A

Các bài báo tiêu biểu

Improved MFCC feature extraction by PCA-optimized filter-bank for speech recognition
- Trang 49-52
Shang-Ming Lee, Shi-Hau Fang, Jeih-weih Hung, Lin-Shan Lee
Although Mel-frequency cepstral coefficients (MFCC) have been proven to perform very well under most conditions, some limited efforts have been made in optimizing the shape of the filters in the filter-bank in the conventional MFCC approach. This paper presents a new feature extraction approach that designs the shapes of the filters in the filter-bank. In this new approach, the filter-bank coeffic... hiện toàn bộ
#Mel frequency cepstral coefficient #Feature extraction #Speech recognition #Shape #Filters #Principal component analysis #Additive noise #Working environment noise #Noise shaping #Cepstral analysis
Recognition of negative emotions from the speech signal
- Trang 240-243
C.M. Lee, S. Narayanan, R. Pieraccini
This paper reports on methods for automatic classification of spoken utterances based on the emotional state of the speaker. The data set used for the analysis comes from a corpus of human-machine dialogues recorded from a commercial application deployed by SpeechWorks. Linear discriminant classification with Gaussian class-conditional probability distribution and k-nearest neighbors methods are u... hiện toàn bộ
#Emotion recognition #Speech recognition #Principal component analysis #Automatic speech recognition #Speech analysis #Man machine systems #Linear discriminant analysis #Probability distribution #Statistical distributions #Frequency
Computing consensus translation from multiple machine translation systems
- Trang 351-354
B. Bangalore, G. Bordel, G. Riccardi
We address the problem of computing a consensus translation given the outputs from a set of machine translation (MT) systems. The translations from the MT systems are aligned with a multiple string alignment algorithm and the consensus translation is then computed. We describe the multiple string alignment algorithm and the consensus MT hypothesis computation. We report on the subjective and objec... hiện toàn bộ
#Natural languages #Tagging #Text categorization #Performance evaluation #Optical wavelength conversion #Impedance matching #Robustness #Speech recognition #Stochastic processes #Automatic speech recognition
Verification of multi-class recognition decision using classification approach
- Trang 123-126
T. Matsui, F.K. Soong, Biing-Hwang Juang
We investigate various strategies to improve the utterance verification performance using a 2-class pattern classifier. They include utilizing N-best candidate scores, modifying segmentation boundaries, applying background and out-of-vocabulary filler models, incorporating contexts, and minimizing verification errors via discriminative training. A connected-digit database containing utterances rec... hiện toàn bộ
#Testing #Automatic speech recognition #Natural languages #Context modeling #Databases #Microphones #Performance evaluation #Man machine systems #Degradation #Working environment noise
Task-specific adaptation of speech recognition models
- Trang 433-436
A. Sankar, A. Kannan, B. Shahshahani, E. Jackson
Most published adaptation research focuses on speaker adaptation, and on adaptation for noisy channels and background environments. We study acoustic, grammar, and combined acoustic and grammar adaptation for creating task-specific recognition models. Comprehensive experimental results are presented using data from natural language quotes and a trading application. The results show that task adapt... hiện toàn bộ
#Speech recognition #Hidden Markov models #Distributed computing #Loudspeakers #Acoustic applications #Adaptation model #Smoothing methods #Acoustic noise #Background noise #Working environment noise
Improved pronunciation modelling by inverse word frequency and pronunciation entropy
- Trang 53-56
Ming-yi Tsai, Fu-chiang Chou, Lin-shan Lee
We propose a new approach to rank the potential pronunciations for each word by their pronunciation frequency and inverse word frequency (pf-iwf) weights. The pronunciation set obtained in this way can then be pruned with different criteria. This approach not only considers the frequencies of occurrence of the pronunciations, but tries to minimize the extra confusion which may be introduced by pro... hiện toàn bộ
#Inverse problems #Frequency #Entropy #Automatic speech recognition #Vocabulary #Natural languages #Costs #Training data #Dynamic programming #Heuristic algorithms
Collaborative steering of microphone array and video camera toward multi-lingual tele-conference through speech-to-speech translation
- Trang 119-122
T. Nishiura, R. Gruhn, S. Nakamura
It is very important for multilingual teleconferencing through speech-to-speech translation to capture distant-talking speech with high quality. In addition, the speaker image is also needed to realize a natural communication in such a conference. A microphone array is an ideal candidate for capturing distant-talking speech. Uttered speech can be enhanced and speaker images can be captured by stee... hiện toàn bộ
#Microphone arrays #Collaboration #Cameras #Teleconferencing #Speech synthesis #Direction of arrival estimation #Loudspeakers #Acoustic noise #Working environment noise #Natural languages
Dialogue management in the Talk'n'Travel system
- Trang 235-239
D. Stallard
A central problem for mixed-initiative dialogue management is coping with user utterances that fall outside of the expected sequence of dialogue. Independent initiative by the user may require a complete revision of the future course of the dialogue, even when the system is engaged in activities of its own, such as querying a database, etc. This paper presents an event-driven, goal-based dialogue ... hiện toàn bộ
#Databases #Natural languages #Technology management #Prototypes #Telephony #Speech recognition #Robustness #Communication system control
Time-varying noise compensation by sequential Monte Carlo method
- Trang 163-166
K. Yao, S. Nakamura
We present a sequential Monte Carlo method applied to additive noise compensation for robust speech recognition in time-varying noise. At each frame, the method generates a set of samples, approximating the posterior distribution of speech and noise parameters for given observation sequences to the current frame. An explicit model representing noise effects on speech features is used, so that an e... hiện toàn bộ
#Noise generators #Additive noise #Speech enhancement #Noise robustness #Speech recognition #Predictive models #State estimation #Mean square error methods #Smoothing methods #Inference algorithms
Robust speech recognition with multi-channel codebook dependent cepstral normalization (MCDCN)
- Trang 151-154
S. Deligne, R. Gopinath
We address the issue of speech recognition in the presence of interfering signals, in cases where the signals corrupting the speech are recorded in separate channels. We propose to combine a trivial form of filtering with MCDCN, a multi-channel version of codebook dependent cepstral normalization, where the cepstra of the noise are estimated from the reference signals. We report on recognition exp... hiện toàn bộ
#Robustness #Speech recognition #Cepstral analysis #Speech synthesis #Adaptive filters #Decorrelation #Filtering #Nonlinear filters #Linear systems #Acoustic noise