In this paper, the usage of pseudo 2-dimensional hidden Markov models for speech
recognition is discussed. This image processing method should better model the
time-frequency structure in speech signals. The method calculates the emission
probability of a standard HMM by embedded HMM for each state. If a temporal
sequence of spectral vectors is imagined as a spectrogram, this leads to a
2-dimensio... hiện toàn bộ
In this study, we examine how fast decoding of conversational speech with large
vocabularies profits from efficient use of linguistic information, i.e. language
models and grammars. Based on a re-entrant single pronunciation prefix tree, we
use the concept of linguistic context polymorphism to allow an early
incorporation of language model information. This approach allows us to use all
available ... hiện toàn bộ
#Decoding #Context modeling #Automatic speech recognition #Vocabulary #Acoustic beams #Speech recognition #History #Interactive systems #Laboratories #Natural languages
A new language model adaptation scheme is proposed to cope with multiple varied
speech recognition tasks. Both topic difference and sentence style difference
resulting from the speaker's role are reflected in the proposed language model
adaptation. An adaptation is carried out using two different language corpora
where only the topic or speaker's style is matched. New word clustering
techniques ar... hiện toàn bộ
#Adaptation model #Natural languages #Speech recognition #Data mining #Frequency #Error analysis #Vocabulary
Bài báo này nghiên cứu việc sử dụng các phụ thuộc cú pháp phong phú hơn trong mô
hình ngôn ngữ có cấu trúc (SLM). Chúng tôi trình bày hai phương pháp đơn giản để
làm phong phú thêm các phụ thuộc trong cây phân tích cú pháp được sử dụng để
khởi tạo SLM. Chúng tôi đánh giá tác động của cả hai phương pháp đối với
perplexity (PPL) và tỷ lệ lỗi từ (WER, N-best rescoring) của SLM. Chúng tôi cho
thấy rằn... hiện toàn bộ
#Ngôn ngữ tự nhiên #Con người #Xử lý âm thanh #Mô hình dự đoán
We describe the new implementation of a speech-to-speech translation system at
ATR Spoken Language Translation Research Laboratories (SLT). We use the
architecture standard CORBA (Common Object Request Broker Architecture) to
interface between a speech recognizer, translation system and TTS engine.
Various input types are supported, including close-talking microphone and
telephony hardware.
#Speech recognition #Natural languages #Computer architecture #Communication standards #Access protocols #Speech synthesis #Standards development #Standards publication #Network servers #Web server
The main goal of the paper is to propose automatic schemes for the translation
paired comparison method, which was proposed by the authors to evaluate
precisely a speech translation system's capability. In the method, the outputs
of the speech translation system are subjectively compared with the results of
native Japanese taking the Test of English for International Communication
(TOEIC), which i... hiện toàn bộ
The need for human expertise in the development of a speech understanding system
can be greatly reduced by the use of stochastic techniques. However corpus-based
techniques require the annotation of large amounts of training data. Manual
semantic annotation of such corpora is tedious, expensive, and subject to
inconsistencies. This work investigates the influence of the training corpus
size on the... hiện toàn bộ
#Stochastic processes #Costs #Natural languages #Stochastic systems #Humans #Data mining #Speech analysis #Training data #Performance evaluation #Telephony
Even a modest degree of room reverberation can greatly increase the difficulty
of automatic speech recognition. We have observed large increases in speech
recognition word error rates when using a far-field (3-6 feet) microphone in a
conference room, in comparison with recordings from head-mounted microphones. In
this paper, we describe experiments with a proposed remedy based on the
subtraction o... hiện toàn bộ
We present a sequential Monte Carlo method applied to additive noise
compensation for robust speech recognition in time-varying noise. At each frame,
the method generates a set of samples, approximating the posterior distribution
of speech and noise parameters for given observation sequences to the current
frame. An explicit model representing noise effects on speech features is used,
so that an e... hiện toàn bộ