thumbnail

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.

 

 

 

 

Cơ quản chủ quản:  N/A

Các bài báo tiêu biểu

European Language Resources Association history and recent developments
- Trang 465-466
K. Choukri
This paper aims at briefly describing the rationale behind the foundation of the European Language Resources Association (ELRA) in 1995 and its activities since then. We would like to focus on the issues involved in making language resources available to different sectors of the language engineering community. ELRA is presented as a conduit for the distribution of speech, written and terminology d... hiện toàn bộ
#History #Speech #Terminology #Databases #Natural languages #Research and development #Law #Legal factors #Logistics #Investments
Acoustic analysis and recognition of whispered speech
- Trang 429-432
T. Itoh, K. Takeda, F. Itakura
The acoustic properties and a recognition method of whispered speech are discussed. A whispered speech database that consists of whispered speech, normal speech and the corresponding facial video images of more than 6,000 sentences from 100 speakers was prepared. The comparison between whispered and normal utterances show that: 1) the cepstrum distance between them is 4 dB for voiced and 2 dB for ... hiện toàn bộ
#Speech analysis #Speech recognition #Speech processing #Hidden Markov models #Image databases #Maximum likelihood linear regression #Video recording #Loudspeakers #Cepstrum #Frequency
n-gram and decision tree based language identification for written words
- Trang 335-338
J. Hakkinen, Jilei Tian
As the demand for multilingual speech recognizers increases, the development of systems which combine automatic language identification, language-specific pronunciation modeling and language-independent acoustic models becomes increasingly important. When the recognition grammar is dynamic and obtained directly from written text, the language associated with each grammar item has to be identified ... hiện toàn bộ
#Decision trees #Natural languages #Speech recognition #Mobile handsets #Automatic speech recognition #Testing #Vocabulary #Usability #Signal processing #Embedded computing
Histogram based normalization in the acoustic feature space
- Trang 21-24
S. Molau, M. Pitz, H. Ney
We describe a technique called histogram normalization that aims at normalizing feature space distributions at different stages in the signal analysis front-end, namely the log-compressed filterbank vectors, cepstrum coefficients, and LDA (local density approximation) transformed acoustic vectors. Best results are obtained at the filterbank, and in most cases there is a minor additional gain when ... hiện toàn bộ
#Histograms #Filter bank #Signal analysis #Cepstrum #Linear discriminant analysis #Target recognition #Smoothing methods #Training data #Speech recognition #Error analysis
Construction of model-space constraints
- Trang 69-72
P. Nguyen, L. Rigazio, C. Wellekens, J.-C. Junqua
HMM systems exhibit a large amount of redundancy. To this end, a technique called eigenvoices was found to be very effective for speaker adaptation. The correlation between HMM parameters is exploited via a linear constraint called eigenspace. This constraint is obtained through a PCA of the training speakers. We show how PCA can be linked to the maximum-likelihood criterion. Then, we extend the m... hiện toàn bộ
#Maximum likelihood linear regression #Covariance matrix #Maximum likelihood estimation #Hidden Markov models #Principal component analysis #Speech #Linear discriminant analysis #Piecewise linear techniques #Gaussian processes #Vocabulary
Introduction of speech interface for mobile information services
- Trang 462-463
H. Nakano
Popular Japanese mobile Web-phones are widely used to connect to Internet providers (IP). The most popular service on mobile Web-phones is E-mail. Currently, users type the messages using the ten standard keys on the phone. Several letters and Kana (Japanese phonetic characters) are assigned to each key, and the user steps through them by tapping the key repeatedly. After inputting several words, ... hiện toàn bộ
#Automatic speech recognition #Working environment noise #Background noise #Electronic mail #Speech recognition #Cellular phones #Switches #Laboratories #Web and internet services #Lapping
Language modeling for multi-domain speech-driven text retrieval
- Trang 327-330
K. Itou, A. Fujii, T. Ishikawa
We report experimental results associated with speech-driven text retrieval, which facilitates retrieving information in multiple domains with spoken queries. Since users speak contents related to a target collection, we produce language models used for speech recognition based on the target collection, so as to improve both the recognition and retrieval accuracy. Experiments using existing test c... hiện toàn bộ
#Natural languages #Speech recognition #Information retrieval #Automatic speech recognition #Testing #Decoding #Libraries #Information science #Target recognition #Content based retrieval
Bridging the gap between mixed-initiative dialogs and reusable sub-dialogs
- Trang 276-279
S. Kronenberg, P. Regel-Brietzman
For easing the development process for dialog systems it is desired that reusable dialog components provide pre-packaged functionality 'out-of-the-box' that enables developers to quickly build applications by providing standard default settings and behavior. Additionally, human-computer interaction should become more human-like in that mixed-initiative dialogs are supported. Mixed-initiative inter... hiện toàn bộ
#Application software #Banking #Speech processing #Standards development #Expert systems #Vocabulary #Navigation
Recognition experiments with the SpeechDat-Car Aurora Spanish database using 8 kHz- and 16 kHz-sampled signals
- Trang 135-138
C. Nadeu, M. Tolos
Like the other SpeechDat-Car databases, the Spanish one has been collected using a 16 kHz sampling frequency, and several microphone positions and environmental noises. We aim at clarifying whether there is any advantage in terms of recognition performance from processing the 16 kHz-sampled signals instead of the usual 8 kHz-sampled ones. Recognition tests have been carried out within the Aurora e... hiện toàn bộ
#Databases #Microphones #Frequency #Working environment noise #Testing #Sampling methods #Bandwidth #Speech recognition #Telecommunication standards #Standards development
Smoothed language model incorporation for efficient time-synchronous beam search decoding in LVCSR
- Trang 178-181
D. Willett, E. McDermott, S. Katagiri
For performing the decoding search in large vocabulary continuous speech recognition (LVCSR) with hidden Markov models (HMM) and statistical language models, the most straightforward and popular approach is the time-synchronous beam search procedure. A drawback of this approach is that the time-asynchrony of the language model weight application during search leads to performance degradations. Thi... hiện toàn bộ
#Decoding #Hidden Markov models #Acoustic beams #Degradation #Smoothing methods #Viterbi algorithm #Context modeling #Laboratories #Speech recognition #Natural languages