Step-by-step and integrated approaches in broadcast news speaker diarization

Computer Speech & Language - Tập 20 - Trang 303-330 - 2006

Sylvain Meignier^1,2, Daniel Moraru³, Corinne Fredouille¹, Jean-François Bonastre¹, Laurent Besacier³

¹Laboratoire Informatique d’Avignon (LIA)/CNRS, Department of Computing, University of Avignon, BP1228, 84911 Avignon Cedex 9, France

²LIUM/CNRS, Université du Maine, Avenue Laennec, 72085 Le Mans Cedex 9, France

³CLIPS, IMAG (UJF & CNRS), BP 53, 38041 Grenoble Cedex 9, France

Tài liệu tham khảo

Adami, A., Kajarekar, S.S., Hermansky, H., 2002. A new speaker change detection method for two-speaker segmentation. In: Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP 2002), vol. IV, pp. 3908–3911. Ajmera, J., Wooters, C., 2003. A robust speaker clustering algorithm. In: Automatic Speech Recognition and Understanding, IEEE, ASRU 2003, St. Thomas, US Virgin Islands, pp. 411–416. Chen, S., Gopalakrishnan, P., 1998. Speaker, environment and channel change detection and clustering via the bayesian information criterion. In: DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, VA. Darpa speech recognition evaluation workshop. Available from: <http://www.nist.gov/speech/publications/>. Delacourt, 2000, DISTBIC: a speaker based segmentation for audio data indexing, Speech Communication, 32, 111, 10.1016/S0167-6393(00)00027-3 ELISA, 2000. The ELISA systems for the NIST 99 evaluation in speaker detection and tracking. Digital Signal Processing (DSP), a review journal – Special issue on NIST 1999 speaker recognition workshop 10 (1–3), pp. 143–153. Fredouille, C., Moraru, D., Meignier, S., Besacier, L., Bonastre, J.-F., 2004. The NIST 2004 spring rich transcription evaluation: two-axis merging strategy in the context of multiple distance microphone based meeting speaker segmentation, In: RT2004 Spring Meeting Recognition Workshop, p. 5. Gauvain, 1994, Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains, IEEE Transactions on Speech and Audio Processing, 22, 291, 10.1109/89.279278 Gauvain, J.-L., Lamel, L., Adda, G., 1998. Partitioning and transcription of broadcast news data, In: Proceedings of International Conference on Spoken Language Processing (ICSLP 98). Gauvain, 2001, Audio partitioning and transcription for broadcast data indexation, Multimedia Tools and Applications, 187, 10.1023/A:1011303401042 Gauvain, 2002, The LIMSI broadcast news transcription system, Speech Communication, 37, 89, 10.1016/S0167-6393(01)00061-9 Hain, T., Woodland, P., 1998. Segmentation and classification of broadcast news audio. In: Proceedings of International Conference on Spoken Language Processing (ICSLP 98), Sydney, Australia. Kim, D.Y., Evermann, G., Hain, T., Mrva, D., Tranter, S., Wang, L., Woodland, P.C., 2003. Recent advances in broadcast news transcription. In: Automatic Speech Recognition and Understanding, IEEE, ASRU 2003, St. Thomas, US Virgin Islands, pp. 105–110. Magrin-Chagnolleau, I., Gravier, G., Blouet, R., 2001. for the ELISA consortium, Overview of the ELISA consortium research activities. In: 2001: A Speaker Odyssey. The Speaker Recognition Workshop, Chania, Crete, pp. 67–72. Meignier, S., Bonastre, J.-F., Fredouille, C., Merlin, T., 2000. Evolutive HMM for speaker tracking system. In: Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP 2000), Istanbul, Turkey, pp. 1177–1180. Meignier, S., Bonastre, J.-F., Igounet, S., 2001. E-HMM approach for learning and adapting sound models for speaker indexing. In: 2001: a Speaker Odyssey. The Speaker Recognition Workshop, Chania, Crete, pp. 175–180. Moraru, D., Meignier, S., Besacier, L., Bonastre, J.-F., Magrin-Chagnolleau, Y., 2003. The ELISA consortium approaches in speaker segmentation during the NIST 2002 speaker recognition evaluation. In: Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP 2003), vol. II, Hong Kong, pp. 89–92. Moraru, D., Meignier, S., Fredouille, C., Besacier, L., Bonastre, J.-F., 2004. The ELISA consortium approaches in broadcast news speaker segmentation during the NIST 2003 rich transcription evaluation. In: Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP 2004), Montreal, Canada. Moraru, D., Besacier, L., Castelli, E., 2004. Using a priori information for speaker diarization. In: 2004: A Speaker Odyssey. The Speaker Recognition Workshop, Toledo, Spain, pp. 355–362. Nguyen, L., Xiang, B., 2004. Light supervision in acoustic model training. In: Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP 2004), Montreal, Canada. NIST, Reference data cookbook for who spoke when diarization task. Available from: <http://www.nist.gov/speech/tests/rt/rt2003/spring/docs/ref-cookbook-v2_4.pdf>, v2.4 (2003). NIST, Rt-03s workshop agenda and presentations. Available from: <http://www.nist.gov/speech/tests/rt/rt2003/spring/presentations>. NIST, The NIST 2001 speaker recognition evaluation plan. Available from: <http://www.nist.gov/speech/tests/spk/2001/doc/2001-spkrec-evalplan-v05.9.pdf> (March 2001). NIST, The NIST year 2002 speaker recognition evaluation plan. Available from: <http://www.nist.gov/speech/tests/spk/2002/doc/2002-spkrec-evalplan-v60.pdf> (February 2002). NIST, The rich transcription spring 2003 (RT-03S) evaluation plan. Available from: <http://www.nist.gov/speech/tests/rt/rt2003/spring/docs/rt03-spring-eval-plan-v4.pdf>, (Version 4, Updated 02/25/2003) (February 2003). NIST, Spring 2004 (rt-04s) rich transcription meeting recognition evaluation plan. Available from: <http://www.nist.gov/speech/tests/rt/rt2004/spring/documents/rt04s-meeting-eval-plan-v1.pdf> (February 2004). Quénot, G., Moraru, D., Besacier, L., Mulhem, P., 2002. Clips-imag at trec-11: Experiments in video retrieval. In: TREC 2002, Gaithersburg, MD, USA. Quénot, G., Moraru, D., Besacier, L., 2003. Clips at trecvid: Shot boundary detection and feature detection. In: TREC 2003, Gaithersburg, MD, USA. Reynolds, D.A., Dunm, R.B., Laughlin, J.J., 2000. The Lincoln speaker recognition system: NIST EVAL2000. In: Proceedings of International Conference on Spoken Language Processing (ICSLP 2000), vol. 2, Beijing, China, pp. 470–473. Reynolds, D.A., Quatieri, T.F., Dunn, R.B., 2000. Speaker verification using adapted Gaussian mixture models, Digital Signal Processing (DSP), a review journal – Special issue on NIST 1999 speaker recognition workshop 10 (1–3), pp. 19–41. Schwarz, 1978, Estimating the dimension of a model, The Annals of Statistics, 6, 461, 10.1214/aos/1176344136 Siegler, M., Jain, U., Raj, B., Stern, R., 1997. Automatic segmentation and clustering of broadcast news audio. In: the DARPA Speech Recognition Workshop, Westfields, Chantilly, Virginia. Siu, M.-H., Rohlicek, R., Gish, H., 1992. An unsupervised, sequential learning algorithm for segmentation of speech waveforms with multi-speakers. In: Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP 92), vol. 2, San Francisco, CA, pp. 189–192. Smeaton, A., Kraaij, W., Over, P., 2003. TRECVID 2003 – an introduction. In: 12th Text Retrieval Conference. Wilcox, L., Chen, F., Kimber, D., Balasubramanian, V., 1994. Segmentation of speech using speaker identification, In: Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP 94), Adelaide, Australia, pp. 161–164. Wilcox, L., Kimber, D., Chen, F., 1994. Audio indexing using speaker identification. In: Proceedings SPIE Conference on Automatic Systems for the Inspection and Identification of Humans, San Diego, CA, pp. 149–157. Woodland, 2002, The development of the HTK broadcast news transcription system: an overview, Speech Communication, 37, 291

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA