Improved MFCC feature extraction by PCA-optimized filter-bank for speech recognition - Trang 49-52
Shang-Ming Lee, Shi-Hau Fang, Jeih-weih Hung, Lin-Shan Lee
Although Mel-frequency cepstral coefficients (MFCC) have been proven to perform
very well under most conditions, some limited efforts have been made in
optimizing the shape of the filters in the filter-bank in the conventional MFCC
approach. This paper presents a new feature extraction approach that designs the
shapes of the filters in the filter-bank. In this new approach, the filter-bank
coeffic... hiện toàn bộ
#Mel frequency cepstral coefficient #Feature extraction #Speech recognition #Shape #Filters #Principal component analysis #Additive noise #Working environment noise #Noise shaping #Cepstral analysis
Collaborative steering of microphone array and video camera toward multi-lingual tele-conference through speech-to-speech translation - Trang 119-122
T. Nishiura, R. Gruhn, S. Nakamura
It is very important for multilingual teleconferencing through speech-to-speech
translation to capture distant-talking speech with high quality. In addition,
the speaker image is also needed to realize a natural communication in such a
conference. A microphone array is an ideal candidate for capturing
distant-talking speech. Uttered speech can be enhanced and speaker images can be
captured by stee... hiện toàn bộ
#Microphone arrays #Collaboration #Cameras #Teleconferencing #Speech synthesis #Direction of arrival estimation #Loudspeakers #Acoustic noise #Working environment noise #Natural languages
Computing consensus translation from multiple machine translation systems - Trang 351-354
B. Bangalore, G. Bordel, G. Riccardi
We address the problem of computing a consensus translation given the outputs
from a set of machine translation (MT) systems. The translations from the MT
systems are aligned with a multiple string alignment algorithm and the consensus
translation is then computed. We describe the multiple string alignment
algorithm and the consensus MT hypothesis computation. We report on the
subjective and objec... hiện toàn bộ
#Natural languages #Tagging #Text categorization #Performance evaluation #Optical wavelength conversion #Impedance matching #Robustness #Speech recognition #Stochastic processes #Automatic speech recognition
Language modeling for multi-domain speech-driven text retrieval - Trang 327-330
K. Itou, A. Fujii, T. Ishikawa
We report experimental results associated with speech-driven text retrieval,
which facilitates retrieving information in multiple domains with spoken
queries. Since users speak contents related to a target collection, we produce
language models used for speech recognition based on the target collection, so
as to improve both the recognition and retrieval accuracy. Experiments using
existing test c... hiện toàn bộ
#Natural languages #Speech recognition #Information retrieval #Automatic speech recognition #Testing #Decoding #Libraries #Information science #Target recognition #Content based retrieval
Improvements on a semi-automatic grammar induction framework - Trang 288-291
Chin-Chung Wong, H. Meng
This work extends the semi-automatic grammar induction approach previously
proposed (see Meng, H. and Siu, K.C., IEEE Trans. on Knowledge and Data
Engineering). The data-driven approach learns semantic and phrasal categories
from a training corpus of unannotated natural language queries in a specific
domain. The approach can be seeded with prespecified semantic categories to
expedite the learning ... hiện toàn bộ
#Natural languages #Testing #Equations #Databases #Laboratories #Systems engineering and theory #Research and development management #Scalability #Humans #Speech recognition
Multispeaker speech activity detection for the ICSI meeting recorder - Trang 107-110
T. Pfau, D.P.W. Ellis, A. Stolcke
As part of a project into speech recognition in meeting environments, we have
collected a corpus of multichannel meeting recordings. We expected the
identification of speaker activity to be straightforward given that the
participants had individual microphones, but simple approaches yielded
unacceptably erroneous labelings, mainly due to crosstalk between nearby
speakers and wide variations in cha... hiện toàn bộ
#Crosstalk #Speech recognition #Microphones #Hidden Markov models #Labeling #Detectors #Microwave integrated circuits #Noise level #Silicon compounds #Computer science
Speech interfaces for mobile communications - Trang 93-95
H. Nakano
This paper explains speech interfaces for mobile communication. Mobile
interfaces have three important design rules: do not disturb the user's main
task, work within the restrictions of user's ability, and minimize the resource
requirements. Social acceptance is also important. In Japan, trial and regular
services with speech interfaces in mobile environments have already been
launched, but they a... hiện toàn bộ
#Mobile communication #Cellular phones #Displays #Postal services #Weather forecasting #Portals #Privacy #Working environment noise #Automatic speech recognition #Voice mail
Speech recognition of broadcast news for the European Portuguese language - Trang 319-322
H. Meinedo, N. Souto, J.P. Neto
This paper describes our work on the development of a large vocabulary
continuous speech recognition system applied to a broadcast news task for the
European Portuguese language in the scope of the ALERT project. We start by
presenting the baseline recogniser AUDIMUS, which was originally developed with
a corpus of read newspaper text. This is a hybrid system that uses a combination
of phone proba... hiện toàn bộ
#Speech recognition #Broadcasting #Natural languages #Streaming media #System testing #Databases #Vocabulary #Multimedia systems #TV #Audio recording
Recognition experiments with the SpeechDat-Car Aurora Spanish database using 8 kHz- and 16 kHz-sampled signals - Trang 135-138
C. Nadeu, M. Tolos
Like the other SpeechDat-Car databases, the Spanish one has been collected using
a 16 kHz sampling frequency, and several microphone positions and environmental
noises. We aim at clarifying whether there is any advantage in terms of
recognition performance from processing the 16 kHz-sampled signals instead of
the usual 8 kHz-sampled ones. Recognition tests have been carried out within the
Aurora e... hiện toàn bộ
#Databases #Microphones #Frequency #Working environment noise #Testing #Sampling methods #Bandwidth #Speech recognition #Telecommunication standards #Standards development
Smoothed language model incorporation for efficient time-synchronous beam search decoding in LVCSR - Trang 178-181
D. Willett, E. McDermott, S. Katagiri
For performing the decoding search in large vocabulary continuous speech
recognition (LVCSR) with hidden Markov models (HMM) and statistical language
models, the most straightforward and popular approach is the time-synchronous
beam search procedure. A drawback of this approach is that the time-asynchrony
of the language model weight application during search leads to performance
degradations. Thi... hiện toàn bộ
#Decoding #Hidden Markov models #Acoustic beams #Degradation #Smoothing methods #Viterbi algorithm #Context modeling #Laboratories #Speech recognition #Natural languages