Many techniques for speaker adaptation have been successfully applied to automatic speech recognition. This paper compares the performance of several adaptation methods with respect to their memory need and processing demand. For adaptation of a compact acoustic model with 4k densities, eigenvoices and structural MAP (SMAP) are investigated next to the well-known techniques of MAP (maximum a poste...... hiện toàn bộ
#Adaptation model #Speech recognition #Loudspeakers #Automatic speech recognition #Maximum likelihood linear regression #Laboratories #Error analysis #Command and control systems #Degradation #Regression tree analysis
This paper deals with the simultaneous recognition of distant-talking speech of multiple talkers using the 3D N-best search algorithm. We describe the basic idea of the 3D N-best search and we address two additional techniques implemented into the baseline system. Namely, a path distance-based clustering and a likelihood normalization technique appeared to be necessary in order to build an efficie...... hiện toàn bộ
#Speech recognition #Hidden Markov models #Viterbi algorithm #Search methods #Natural languages #Clustering algorithms #Reverberation #Feature extraction #Adaptive systems #Sorting
Many state-of-the-art conversational systems use semantic-based robust understanding and manually derived grammars, a very time-consuming and error-prone process. This paper describes a machine-aided grammar authoring system that enables a programmer to develop rapidly a high quality grammar for conversational systems. This is achieved with a combination of domain-specific semantics, a library gra...... hiện toàn bộ
#Natural languages #Robustness #Authoring systems #Programming profession #Libraries #Information systems #Computer errors #Writing #Law #Legal factors
This paper presents a new approach to rule based pronunciation generation. The system presented can automatically learn a new language pronunciation structure and use this knowledge for pronunciation generation for an arbitrary context sensitive language. Unlike conventional text-to-speech systems which are based on the cost expensive human expert knowledge about a specific language, this system c...... hiện toàn bộ
#Statistical learning #Speech synthesis #Humans #Dictionaries #Databases #Natural languages #Speech recognition #Multimedia communication #Costs #Decision trees
Adaptive training is a powerful training technique for building speech recognition systems on nonhomogeneous data. The aim is to remove unwanted variability, such as changes in speaker, channel or acoustic environment, from desired changes, the acoustic differences between words. During training, two sets of models are generated: a canonical model set for the desired "true" variability of the spee...... hiện toàn bộ
#Robustness #Automatic speech recognition #Loudspeakers #Speech recognition #Training data #Target recognition #Feature extraction #Acoustical engineering #Data engineering #Power engineering and energy
This paper reports on methods for automatic classification of spoken utterances based on the emotional state of the speaker. The data set used for the analysis comes from a corpus of human-machine dialogues recorded from a commercial application deployed by SpeechWorks. Linear discriminant classification with Gaussian class-conditional probability distribution and k-nearest neighbors methods are u...... hiện toàn bộ
The World Wide Web is the greatest information space ever seen, distributed all over the world, in many languages, on many various topics. We first describe the evolution of a French subset of this space during the last 3 years. During this time, the size of automatically extracted text for language modelling has multiplied by 6.5. Moreover, French coverage has grown from 140,000 to 200,000 lexica...... hiện toàn bộ
#Internet #Natural languages #Speech recognition #Web server #Robots #HTML #Web sites #Data mining #Crawlers #Stochastic processes
From an historical review of how we got to where we are now, we discuss the interrelationship between our system design objectives and goals, our modeling of the speech signal and its generation and parameterization, and the broadly developing DSP methodology. We take a critical look at some of the underlying assumptions in. our modeling to see if they may be limiting the performance that can be o...... hiện toàn bộ
#Symbiosis #Digital signal processing #Speech recognition #Automatic speech recognition #Telephony #Speech synthesis #Physics #Laboratories #Speech coding #Mathematical model
Multiple regression class MLLR (maximum likelihood linear regression) transforms are investigated for use with pronunciation models that predict variation in the observed pronunciations given the phonetic context. Regression classes can be constructed so that MLLR transforms can be estimated and used to model specific acoustic changes associated with pronunciation variation. The effectiveness of t...... hiện toàn bộ
#Maximum likelihood linear regression #Automatic speech recognition #Predictive models #Dictionaries #Natural languages #Speech processing #Context modeling #Speech analysis #Surface treatment #Decision trees