International Journal of Speech Technology

Công bố khoa học tiêu biểu

Sắp xếp:  
Bitter Pills to Swallow. ASR and TTS have Drug Problems
International Journal of Speech Technology - Tập 8 Số 3 - Trang 247-257 - 2005
Caroline Henton
Voice assessments for detecting patients with neurological diseases using PCA and NPCA
International Journal of Speech Technology - - 2017
Achraf Benba, Abdelilah Jilbab, Ahmed Hammouch
Applications of Language Modeling in Speech-To-Speech Translation
International Journal of Speech Technology - Tập 7 - Trang 221-229 - 2004
Fu-Hua Liu, Liang Gu, Yuqing Gao, Michael Picheny
This paper describes various language modeling issues in a speech-to-speech translation system. These issues are addressed in the IBM speech-to-speech system we developed for the DARPA Babylon program in the context of two-way translation between English and Mandarin Chinese. First, the language models for the speech recognizer had to be adapted to the specific domain to improve the recognition performance for in-domain utterances, while keeping the domain coverage as broad as possible. This involved considerations of disfluencies and lack of punctuation, as well as domain-specific utterances. Second, we used a hybrid semantic/syntactic representation to minimize the data sparseness problem in a statistical natural language generation framework. Serious inflection and synonym issues arise when words in the target language are to be determined in the translation output. Instead of relying on tedious handcrafted grammar rules, we used N-gram models as a post-processing step to enhance the generation performance. When an interpolated language model was applied to a Chinese-to-English translation task, the translation performance, measured by an objective metric of BLEU, improved substantially to 0.514 from 0.318 when we used the correct transcription as input. Similarly, the BLEU score improved to 0.300 from 0.194 for the same task when the input was speech data.
Wearable sensor based acoustic gait analysis using phase transition-based optimization algorithm on IoT
International Journal of Speech Technology - - Trang 1-11 - 2021
Sampath Dakshina Murthy Achanta, Thangavel Karthikeyan, R. Vinoth Kanna
Gait monitoring with IOT has emerged as an important area of research because of the need of assessment of daily activities of patients and elder people. Ailments such as Parkin’s stroke and the need of monitoring physically challenged persons in a crowd have been the driving force in the research of gait analysis. The evaluation of athletic performance is yet another area of application. Current measurement techniques rely on gait parameters, and the accuracy due to different gait-related occurrences is very restricted. Many sophisticated sensor-based gait patterns were established to keep the patient from falling and alerting in an emergency. The main objective of this research endeavour paper is to utilize phase transition based optimization in IOT environment for developing characteristic phases which maybe stable, unstable or Meta stable. The method proposed by IOT is used to detect early stage failure to monitor by data produced by signals interacted with wearable sensors. Moreover, optimisation is performed for forecasting and detecting fall more effectively in comparison with conventional gait analysis. In this phase transition based optimization fitness function of the subject is defined by degrees of order and disorder. Similar to genetic algorithm, the elements of individual nodes are considered based on initial population and size. The current generation evolves the next through operators along with terminal condition. For high fitness value, the stability is worse and based on fitness, the 3 phases are defined. For the experimentation, real time data of 50 participants having 20 elder persons and 20 physically challenged persons with other from stroke cases is processed on MATLAB 14.1. Sensors are placed at leg, hip and toe: the collected data are processed in the processing unit before classification. Following cuckoo search method with many iterations. False alarm rate probability and detection probability are plotted using the ROC and having a threshold between these on the histogram in dynamic range. It is observed that the proposed method has less false ratio and greater accuracy in comparison with KNN 88% and HMDTW models. Moreover, the average precision of 96.42% is achieved by this method; the maximum detection rate is 96% for given gait cycle. It is inferred that phase transition and adaptive cuckoo search method can be effectively combined so give better classification accuracy, detection sets and time of duration. Interpolated IOT adds to the effectiveness of the proposed system to the extent of accuracy of 98.44% and false ratio of 2.02%.
GMM based language identification system using robust features
International Journal of Speech Technology - Tập 17 - Trang 99-105 - 2013
Sadanandam Manchala, V. Kamakshi Prasad, V. Janaki
In this work, we have proposed new feature vectors for spoken language identification (LID) system. The Mel frequency cepstral coefficients (MFCC) and formant frequencies derived using short-time window speech signal. Formant frequencies are extracted from linear prediction (LP) analysis of speech signal. Using these two kind of features of speech signal, new feature vectors are derived using cluster based computation. A GMM based classifier has been designed using these new feature vectors. The language specific apriori knowledge is applied on the recognition output. The experiments are carried out on OGI database and LID recognition performance is improved.
Usefulness, localizability, humanness, and language-benefit: additional evaluation criteria for natural language dialogue systems
International Journal of Speech Technology - Tập 19 - Trang 373-383 - 2016
Bayan AbuShawar, Eric Atwell
Human–computer dialogue systems interact with human users using natural language. We used the ALICE/AIML chatbot architecture as a platform to develop a range of chatbots covering different languages, genres, text-types, and user-groups, to illustrate qualitative aspects of natural language dialogue system evaluation. We present some of the different evaluation techniques used in natural language dialogue systems, including black box and glass box, comparative, quantitative, and qualitative evaluation. Four aspects of NLP dialogue system evaluation are often overlooked: “usefulness” in terms of a user’s qualitative needs, “localizability” to new genres and languages, “humanness” or “naturalness” compared to human–human dialogues, and “language benefit” compared to alternative interfaces. We illustrated these aspects with respect to our work on machine-learnt chatbot dialogue systems; we believe these aspects are worthwhile in impressing potential new users and customers.
A voice command system for AUTONOMY using a novel speech alignment algorithm
International Journal of Speech Technology - Tập 16 - Trang 461-469 - 2013
Helmut Hickersberger, Wolfgang L. Zagler
The Viterbi dynamic programming algorithm is currently the de-facto standard for speech recognizers to deal with duration variations of the sub-word units of speech by properly aligning the sub-word units to the sub-word unit models. The algorithm is an integral part of the hidden Markov model speech recognizers. In this work a robust and simple voice command system is developed, implemented and tested. It uses a novel speech alignment algorithm, the so-called “run-length limited dynamic programming algorithm” (RLL-DP) instead. The voice command system described hereinafter facilitates the operation of the AUTONOMY system, which is an environmental control system combined with an alternative and augmentative communication system, using isolated words as voice commands. The activation of “run-length limits” causes a statistically significant reduction of the word error rate, even when using simple “centroid sequence word models” instead of acoustic models based on “hidden control neural networks” used in previous versions.
Assessing American presidential candidates using principles of ontological engineering, word sense disambiguation, data envelope analysis and qualitative comparative analysis
International Journal of Speech Technology - Tập 26 - Trang 743-764 - 2023
James A. Rodger, Justin Piper
Word sense disambiguation (WSD) is the process of automatically identifying which the appropriate meaning of a word given in its sentence. WSD is a promising research area in computational linguistics, especially in wide range of advanced applications, such as medical and social sciences. This research employs the concept (WSD) to determine the inherent meaning of voter intentions regarding possible political candidates. Where candidates can be examined and their true assets and competencies in three major areas of eligibility, education, and experience inputs can be deciphered. Data envelope analysis (DEA) is used to determine underlying word instances for elected and successful outputs. The results demonstrate the validity of using (DEA) as a tool for (WSD). The results also indicate that the survey administered by the website which is developed for the purpose of this research, and used in this study, is a promising tool for predicting successful presidential candidates. We further validated our research findings by employing a qualitative comparative analysis approach to define the fuzzy relationships found in our data.
Performance enhancement of text-independent speaker recognition in noisy and reverberation conditions using Radon transform with deep learning
International Journal of Speech Technology - Tập 25 - Trang 679-687 - 2022
Samia Abd El-Moneim, Eman Abd El-Mordy, M. A. Nassar, Moawad I. Dessouky, Nabil A. Ismail, Adel S. El-Fishawy, Sami El-Dolil, Ibrahim M. El-Dokany, Fathi E. Abd El-Samie
Automatic Speaker Recognition (ASR) in mismatched conditions is a challenging task, since robust feature extraction and classification techniques are required. Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) is an efficient network that can learn to recognize speakers, text-independently, when the recording circumstances are similar. Unfortunately, when the recording circumstances differ, its performance degrades. In this paper, Radon projection of the spectrograms of speech signals is implemented to get the features, since Radon Transform (RT) has less sensitivity to noise and reverberation conditions. The Radon projection is implemented on the spectrograms of speech signals, and then 2-D Discrete Cosine Transform (DCT) is computed. This technique improves the system recognition accuracy, text-independently with less sensitivity to noise and reverberation effects. The ASR system performance with the proposed features is compared to that of the system that depends on Mel Frequency Cepstral Coefficients (MFCCs) and spectrum features. For noisy utterances at 25 dB, the recognition rate with the proposed feature reaches 80%, while it is 27% and 28% with MFCCs and spectrum, respectively. For reverberant speech, the recognition rate reaches 80.67% with the proposed features, while it reaches 54% and 62.67% with the MFCCs and spectrum, respectively.
Image processing for time-frequency speech analysis
International Journal of Speech Technology - Tập 11 Số 1 - Trang 43-49 - 2008
M. Benyoucef
Tổng số: 849   
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 85