Speech recognition using advanced HMM2 features - Trang 65-68
K. Weber, S. Bengio, H. Bourlard
HMM2 is a particular hidden Markov model where state emission probabilities of
the temporal (primary) HMM are modeled through (secondary) state-dependent
frequency-based HMMs (see Weber, K. et al., Proc. ICSGP, vol.III, p.147-50,
2000). As we show in another paper (see Weber et al., Proc. Eurospeech, Sep.
2001), a secondary HMM can also be used to extract robust ASR features. Here, we
further inve... hiện toàn bộ
#Speech recognition #Hidden Markov models #Frequency #Feature extraction #Robustness #Automatic speech recognition #Data mining #Spatial databases #Error analysis #Indexes
An open concept metric for assessing dialog system complexity - Trang 264-267
T.M. DuBois, A.I. Rudnicky
Techniques for assessing dialog system performance commonly focus on
characteristics of the interaction, using metrics such as completion,
satisfaction or time on task. However, such metrics are not always capable of
differentiating systems that operate on fundamentally different principles,
particularly when tested on tasks that focus on common-denominator capabilities.
We introduce a new metric,... hiện toàn bộ
#System performance #Computer science #System testing #Problem-solving #Equations #Performance analysis #Particle measurements #Cities and towns #History
Investigating stochastic speech understanding - Trang 260-263
H. Bonneau-Maynard, F. Lefevre
The need for human expertise in the development of a speech understanding system
can be greatly reduced by the use of stochastic techniques. However corpus-based
techniques require the annotation of large amounts of training data. Manual
semantic annotation of such corpora is tedious, expensive, and subject to
inconsistencies. This work investigates the influence of the training corpus
size on the... hiện toàn bộ
#Stochastic processes #Costs #Natural languages #Stochastic systems #Humans #Data mining #Speech analysis #Training data #Performance evaluation #Telephony
Recursive noise estimation using iterative stochastic approximation for stereo-based robust speech recognition - Trang 81-84
Li Deng, J. Droppo, A. Acero
We present an algorithm for recursive estimation of parameters in a mildly
nonlinear model involving incomplete data. In particular, we focus on the
time-varying deterministic parameters of additive noise in the nonlinear model.
For the nonstationary noise that we encounter in robust speech recognition,
different observation data segments correspond to different noise parameter
values. Hence, recu... hiện toàn bộ
#Recursive estimation #Stochastic resonance #Noise robustness #Speech enhancement #Working environment noise #Iterative algorithms #Testing #Cepstral analysis #Acoustic noise #Piecewise linear approximation
Time-varying noise compensation by sequential Monte Carlo method - Trang 163-166
K. Yao, S. Nakamura
We present a sequential Monte Carlo method applied to additive noise
compensation for robust speech recognition in time-varying noise. At each frame,
the method generates a set of samples, approximating the posterior distribution
of speech and noise parameters for given observation sequences to the current
frame. An explicit model representing noise effects on speech features is used,
so that an e... hiện toàn bộ
#Noise generators #Additive noise #Speech enhancement #Noise robustness #Speech recognition #Predictive models #State estimation #Mean square error methods #Smoothing methods #Inference algorithms
Robust speech recognition with multi-channel codebook dependent cepstral normalization (MCDCN) - Trang 151-154
S. Deligne, R. Gopinath
We address the issue of speech recognition in the presence of interfering
signals, in cases where the signals corrupting the speech are recorded in
separate channels. We propose to combine a trivial form of filtering with MCDCN,
a multi-channel version of codebook dependent cepstral normalization, where the
cepstra of the noise are estimated from the reference signals. We report on
recognition exp... hiện toàn bộ
#Robustness #Speech recognition #Cepstral analysis #Speech synthesis #Adaptive filters #Decorrelation #Filtering #Nonlinear filters #Linear systems #Acoustic noise
MLLR adaptation techniques for pronunciation modeling - Trang 421-424
U. Venkataramani, W. Byrne
Multiple regression class MLLR (maximum likelihood linear regression) transforms
are investigated for use with pronunciation models that predict variation in the
observed pronunciations given the phonetic context. Regression classes can be
constructed so that MLLR transforms can be estimated and used to model specific
acoustic changes associated with pronunciation variation. The effectiveness of
t... hiện toàn bộ
#Maximum likelihood linear regression #Automatic speech recognition #Predictive models #Dictionaries #Natural languages #Speech processing #Context modeling #Speech analysis #Surface treatment #Decision trees
Improved pronunciation modelling by inverse word frequency and pronunciation entropy - Trang 53-56
Ming-yi Tsai, Fu-chiang Chou, Lin-shan Lee
We propose a new approach to rank the potential pronunciations for each word by
their pronunciation frequency and inverse word frequency (pf-iwf) weights. The
pronunciation set obtained in this way can then be pruned with different
criteria. This approach not only considers the frequencies of occurrence of the
pronunciations, but tries to minimize the extra confusion which may be
introduced by pro... hiện toàn bộ
#Inverse problems #Frequency #Entropy #Automatic speech recognition #Vocabulary #Natural languages #Costs #Training data #Dynamic programming #Heuristic algorithms
Liên kết xu hướng trong mô hình HMM dựa trên đặc trưng phân đoạn Dịch bởi AI - Trang 45-48
Young-Sun Yun
Chúng tôi trình bày một phương pháp giảm số lượng tham số trong mô hình HMM dựa
trên đặc trưng phân đoạn (SFHMM). Nếu SFHMM cho kết quả tốt hơn CHMM, số lượng
tham số sẽ lớn hơn CHMM. Do đó, cần có một cách tiếp cận mới để giảm số lượng
tham số. Tương tự, quỹ đạo có thể được tách biệt thành xu hướng và vị trí. Vì xu
hướng có nghĩa là sự biến đổi của các đặc trưng phân đoạn và chiếm một phần lớn
củ... hiện toàn bộ
#Hidden Markov models #Speech #Polynomials #Information technology #Electronic mail #Quantization #Linear systems #Working environment noise #Gaussian distribution #Feature extraction
Speaker-trained recognition using allophonic enrollment models - Trang 61-64
V. Yanhoucke, M.M. Hochberg, C.J. Leggetter
We introduce a method for performing speaker-trained recognition based on
context-dependent allophone models from a large-vocabulary, speaker-independent
recognition system. A set of speaker-enrollment templates is selected from the
context-dependent allophone models. These templates are used to build
representations of the speaker-enrolled utterances. The advantages of this
approach include impro... hiện toàn bộ
#Context modeling #Engines #Acoustics #Databases #Testing #Degradation #Speech recognition #Natural languages #Vocabulary #Data mining