thumbnail

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.

 

 

 

 

Cơ quản chủ quản:  N/A

Các bài báo tiêu biểu

Improved MFCC feature extraction by PCA-optimized filter-bank for speech recognition
- Trang 49-52
Shang-Ming Lee, Shi-Hau Fang, Jeih-weih Hung, Lin-Shan Lee
Although Mel-frequency cepstral coefficients (MFCC) have been proven to perform very well under most conditions, some limited efforts have been made in optimizing the shape of the filters in the filter-bank in the conventional MFCC approach. This paper presents a new feature extraction approach that designs the shapes of the filters in the filter-bank. In this new approach, the filter-bank coeffic... hiện toàn bộ
#Mel frequency cepstral coefficient #Feature extraction #Speech recognition #Shape #Filters #Principal component analysis #Additive noise #Working environment noise #Noise shaping #Cepstral analysis
Collaborative steering of microphone array and video camera toward multi-lingual tele-conference through speech-to-speech translation
- Trang 119-122
T. Nishiura, R. Gruhn, S. Nakamura
It is very important for multilingual teleconferencing through speech-to-speech translation to capture distant-talking speech with high quality. In addition, the speaker image is also needed to realize a natural communication in such a conference. A microphone array is an ideal candidate for capturing distant-talking speech. Uttered speech can be enhanced and speaker images can be captured by stee... hiện toàn bộ
#Microphone arrays #Collaboration #Cameras #Teleconferencing #Speech synthesis #Direction of arrival estimation #Loudspeakers #Acoustic noise #Working environment noise #Natural languages
Computing consensus translation from multiple machine translation systems
- Trang 351-354
B. Bangalore, G. Bordel, G. Riccardi
We address the problem of computing a consensus translation given the outputs from a set of machine translation (MT) systems. The translations from the MT systems are aligned with a multiple string alignment algorithm and the consensus translation is then computed. We describe the multiple string alignment algorithm and the consensus MT hypothesis computation. We report on the subjective and objec... hiện toàn bộ
#Natural languages #Tagging #Text categorization #Performance evaluation #Optical wavelength conversion #Impedance matching #Robustness #Speech recognition #Stochastic processes #Automatic speech recognition
Language modeling for multi-domain speech-driven text retrieval
- Trang 327-330
K. Itou, A. Fujii, T. Ishikawa
We report experimental results associated with speech-driven text retrieval, which facilitates retrieving information in multiple domains with spoken queries. Since users speak contents related to a target collection, we produce language models used for speech recognition based on the target collection, so as to improve both the recognition and retrieval accuracy. Experiments using existing test c... hiện toàn bộ
#Natural languages #Speech recognition #Information retrieval #Automatic speech recognition #Testing #Decoding #Libraries #Information science #Target recognition #Content based retrieval
Improvements on a semi-automatic grammar induction framework
- Trang 288-291
Chin-Chung Wong, H. Meng
This work extends the semi-automatic grammar induction approach previously proposed (see Meng, H. and Siu, K.C., IEEE Trans. on Knowledge and Data Engineering). The data-driven approach learns semantic and phrasal categories from a training corpus of unannotated natural language queries in a specific domain. The approach can be seeded with prespecified semantic categories to expedite the learning ... hiện toàn bộ
#Natural languages #Testing #Equations #Databases #Laboratories #Systems engineering and theory #Research and development management #Scalability #Humans #Speech recognition
Multispeaker speech activity detection for the ICSI meeting recorder
- Trang 107-110
T. Pfau, D.P.W. Ellis, A. Stolcke
As part of a project into speech recognition in meeting environments, we have collected a corpus of multichannel meeting recordings. We expected the identification of speaker activity to be straightforward given that the participants had individual microphones, but simple approaches yielded unacceptably erroneous labelings, mainly due to crosstalk between nearby speakers and wide variations in cha... hiện toàn bộ
#Crosstalk #Speech recognition #Microphones #Hidden Markov models #Labeling #Detectors #Microwave integrated circuits #Noise level #Silicon compounds #Computer science
Speech interfaces for mobile communications
- Trang 93-95
H. Nakano
This paper explains speech interfaces for mobile communication. Mobile interfaces have three important design rules: do not disturb the user's main task, work within the restrictions of user's ability, and minimize the resource requirements. Social acceptance is also important. In Japan, trial and regular services with speech interfaces in mobile environments have already been launched, but they a... hiện toàn bộ
#Mobile communication #Cellular phones #Displays #Postal services #Weather forecasting #Portals #Privacy #Working environment noise #Automatic speech recognition #Voice mail
Speech recognition of broadcast news for the European Portuguese language
- Trang 319-322
H. Meinedo, N. Souto, J.P. Neto
This paper describes our work on the development of a large vocabulary continuous speech recognition system applied to a broadcast news task for the European Portuguese language in the scope of the ALERT project. We start by presenting the baseline recogniser AUDIMUS, which was originally developed with a corpus of read newspaper text. This is a hybrid system that uses a combination of phone proba... hiện toàn bộ
#Speech recognition #Broadcasting #Natural languages #Streaming media #System testing #Databases #Vocabulary #Multimedia systems #TV #Audio recording
Recognition experiments with the SpeechDat-Car Aurora Spanish database using 8 kHz- and 16 kHz-sampled signals
- Trang 135-138
C. Nadeu, M. Tolos
Like the other SpeechDat-Car databases, the Spanish one has been collected using a 16 kHz sampling frequency, and several microphone positions and environmental noises. We aim at clarifying whether there is any advantage in terms of recognition performance from processing the 16 kHz-sampled signals instead of the usual 8 kHz-sampled ones. Recognition tests have been carried out within the Aurora e... hiện toàn bộ
#Databases #Microphones #Frequency #Working environment noise #Testing #Sampling methods #Bandwidth #Speech recognition #Telecommunication standards #Standards development
Smoothed language model incorporation for efficient time-synchronous beam search decoding in LVCSR
- Trang 178-181
D. Willett, E. McDermott, S. Katagiri
For performing the decoding search in large vocabulary continuous speech recognition (LVCSR) with hidden Markov models (HMM) and statistical language models, the most straightforward and popular approach is the time-synchronous beam search procedure. A drawback of this approach is that the time-asynchrony of the language model weight application during search leads to performance degradations. Thi... hiện toàn bộ
#Decoding #Hidden Markov models #Acoustic beams #Degradation #Smoothing methods #Viterbi algorithm #Context modeling #Laboratories #Speech recognition #Natural languages