Multispeaker speech activity detection for the ICSI meeting recorder

T. Pfau1, D.P.W. Ellis2, A. Stolcke3,1
1International Computer Science Institute, Berkeley, CA, USA
2Department of Electrical Engineering, Columbia University, NY, USA
3Speech Technology and Research Laboratory, SRI International, Inc., Menlo Park, CA, USA

Tóm tắt

As part of a project into speech recognition in meeting environments, we have collected a corpus of multichannel meeting recordings. We expected the identification of speaker activity to be straightforward given that the participants had individual microphones, but simple approaches yielded unacceptably erroneous labelings, mainly due to crosstalk between nearby speakers and wide variations in channel characteristics. Therefore, we have developed a more sophisticated approach for multichannel speech activity detection using a simple hidden Markov model (HMM). A baseline HMM speech activity detector has been extended to use mixtures of Gaussians to achieve robustness for different speakers under different conditions. Feature normalization and crosscorrelation processing are used to increase the channel independence and to detect crosstalk. The use of both energy normalization and crosscorrelation based postprocessing results in a 35% relative reduction of the frame error rate. Speech recognition experiments show that it is beneficial in this multispeaker setting to use the output of the speech activity detector for presegmenting the recognizer input, achieving word error rates within 10% of those achieved with manual turn labeling.

Từ khóa

#Crosstalk #Speech recognition #Microphones #Hidden Markov models #Labeling #Detectors #Microwave integrated circuits #Noise level #Silicon compounds #Computer science

Tài liệu tham khảo

ruske, 1994, Autornatische Spracherkennung, Methoden der Klassifikation und Merkmalsextraktion acero, 0, Robust HMM-Based Endpoint Detector, Proc Eurospeech 1993, 1551 stolcke, 2000, The SRI March 2000 Hub-5 Conversational Speech Transcription System, Proc NIST Speech Transcription Workshop barras, 0, Transcriber: A Tool for Segmenting, labeling and Transcribing Speech shriberg, 0, Observations on overlap: Findings and implications for automatic processing of multi-party conversation, Proc Eurospeech-2001 10.1109/ICASSP.1999.759809 10.1007/978-3-642-79980-8_7 10.3115/1072133.1072203