Step-by-step and integrated approaches in broadcast news speaker diarization
Tài liệu tham khảo
Adami, A., Kajarekar, S.S., Hermansky, H., 2002. A new speaker change detection method for two-speaker segmentation. In: Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP 2002), vol. IV, pp. 3908–3911.
Ajmera, J., Wooters, C., 2003. A robust speaker clustering algorithm. In: Automatic Speech Recognition and Understanding, IEEE, ASRU 2003, St. Thomas, US Virgin Islands, pp. 411–416.
Chen, S., Gopalakrishnan, P., 1998. Speaker, environment and channel change detection and clustering via the bayesian information criterion. In: DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, VA.
Darpa speech recognition evaluation workshop. Available from: <http://www.nist.gov/speech/publications/>.
Delacourt, 2000, DISTBIC: a speaker based segmentation for audio data indexing, Speech Communication, 32, 111, 10.1016/S0167-6393(00)00027-3
ELISA, 2000. The ELISA systems for the NIST 99 evaluation in speaker detection and tracking. Digital Signal Processing (DSP), a review journal – Special issue on NIST 1999 speaker recognition workshop 10 (1–3), pp. 143–153.
Fredouille, C., Moraru, D., Meignier, S., Besacier, L., Bonastre, J.-F., 2004. The NIST 2004 spring rich transcription evaluation: two-axis merging strategy in the context of multiple distance microphone based meeting speaker segmentation, In: RT2004 Spring Meeting Recognition Workshop, p. 5.
Gauvain, 1994, Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains, IEEE Transactions on Speech and Audio Processing, 22, 291, 10.1109/89.279278
Gauvain, J.-L., Lamel, L., Adda, G., 1998. Partitioning and transcription of broadcast news data, In: Proceedings of International Conference on Spoken Language Processing (ICSLP 98).
Gauvain, 2001, Audio partitioning and transcription for broadcast data indexation, Multimedia Tools and Applications, 187, 10.1023/A:1011303401042
Gauvain, 2002, The LIMSI broadcast news transcription system, Speech Communication, 37, 89, 10.1016/S0167-6393(01)00061-9
Hain, T., Woodland, P., 1998. Segmentation and classification of broadcast news audio. In: Proceedings of International Conference on Spoken Language Processing (ICSLP 98), Sydney, Australia.
Kim, D.Y., Evermann, G., Hain, T., Mrva, D., Tranter, S., Wang, L., Woodland, P.C., 2003. Recent advances in broadcast news transcription. In: Automatic Speech Recognition and Understanding, IEEE, ASRU 2003, St. Thomas, US Virgin Islands, pp. 105–110.
Magrin-Chagnolleau, I., Gravier, G., Blouet, R., 2001. for the ELISA consortium, Overview of the ELISA consortium research activities. In: 2001: A Speaker Odyssey. The Speaker Recognition Workshop, Chania, Crete, pp. 67–72.
Meignier, S., Bonastre, J.-F., Fredouille, C., Merlin, T., 2000. Evolutive HMM for speaker tracking system. In: Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP 2000), Istanbul, Turkey, pp. 1177–1180.
Meignier, S., Bonastre, J.-F., Igounet, S., 2001. E-HMM approach for learning and adapting sound models for speaker indexing. In: 2001: a Speaker Odyssey. The Speaker Recognition Workshop, Chania, Crete, pp. 175–180.
Moraru, D., Meignier, S., Besacier, L., Bonastre, J.-F., Magrin-Chagnolleau, Y., 2003. The ELISA consortium approaches in speaker segmentation during the NIST 2002 speaker recognition evaluation. In: Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP 2003), vol. II, Hong Kong, pp. 89–92.
Moraru, D., Meignier, S., Fredouille, C., Besacier, L., Bonastre, J.-F., 2004. The ELISA consortium approaches in broadcast news speaker segmentation during the NIST 2003 rich transcription evaluation. In: Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP 2004), Montreal, Canada.
Moraru, D., Besacier, L., Castelli, E., 2004. Using a priori information for speaker diarization. In: 2004: A Speaker Odyssey. The Speaker Recognition Workshop, Toledo, Spain, pp. 355–362.
Nguyen, L., Xiang, B., 2004. Light supervision in acoustic model training. In: Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP 2004), Montreal, Canada.
NIST, Reference data cookbook for who spoke when diarization task. Available from: <http://www.nist.gov/speech/tests/rt/rt2003/spring/docs/ref-cookbook-v2_4.pdf>, v2.4 (2003).
NIST, Rt-03s workshop agenda and presentations. Available from: <http://www.nist.gov/speech/tests/rt/rt2003/spring/presentations>.
NIST, The NIST 2001 speaker recognition evaluation plan. Available from: <http://www.nist.gov/speech/tests/spk/2001/doc/2001-spkrec-evalplan-v05.9.pdf> (March 2001).
NIST, The NIST year 2002 speaker recognition evaluation plan. Available from: <http://www.nist.gov/speech/tests/spk/2002/doc/2002-spkrec-evalplan-v60.pdf> (February 2002).
NIST, The rich transcription spring 2003 (RT-03S) evaluation plan. Available from: <http://www.nist.gov/speech/tests/rt/rt2003/spring/docs/rt03-spring-eval-plan-v4.pdf>, (Version 4, Updated 02/25/2003) (February 2003).
NIST, Spring 2004 (rt-04s) rich transcription meeting recognition evaluation plan. Available from: <http://www.nist.gov/speech/tests/rt/rt2004/spring/documents/rt04s-meeting-eval-plan-v1.pdf> (February 2004).
Quénot, G., Moraru, D., Besacier, L., Mulhem, P., 2002. Clips-imag at trec-11: Experiments in video retrieval. In: TREC 2002, Gaithersburg, MD, USA.
Quénot, G., Moraru, D., Besacier, L., 2003. Clips at trecvid: Shot boundary detection and feature detection. In: TREC 2003, Gaithersburg, MD, USA.
Reynolds, D.A., Dunm, R.B., Laughlin, J.J., 2000. The Lincoln speaker recognition system: NIST EVAL2000. In: Proceedings of International Conference on Spoken Language Processing (ICSLP 2000), vol. 2, Beijing, China, pp. 470–473.
Reynolds, D.A., Quatieri, T.F., Dunn, R.B., 2000. Speaker verification using adapted Gaussian mixture models, Digital Signal Processing (DSP), a review journal – Special issue on NIST 1999 speaker recognition workshop 10 (1–3), pp. 19–41.
Schwarz, 1978, Estimating the dimension of a model, The Annals of Statistics, 6, 461, 10.1214/aos/1176344136
Siegler, M., Jain, U., Raj, B., Stern, R., 1997. Automatic segmentation and clustering of broadcast news audio. In: the DARPA Speech Recognition Workshop, Westfields, Chantilly, Virginia.
Siu, M.-H., Rohlicek, R., Gish, H., 1992. An unsupervised, sequential learning algorithm for segmentation of speech waveforms with multi-speakers. In: Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP 92), vol. 2, San Francisco, CA, pp. 189–192.
Smeaton, A., Kraaij, W., Over, P., 2003. TRECVID 2003 – an introduction. In: 12th Text Retrieval Conference.
Wilcox, L., Chen, F., Kimber, D., Balasubramanian, V., 1994. Segmentation of speech using speaker identification, In: Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP 94), Adelaide, Australia, pp. 161–164.
Wilcox, L., Kimber, D., Chen, F., 1994. Audio indexing using speaker identification. In: Proceedings SPIE Conference on Automatic Systems for the Inspection and Identification of Humans, San Diego, CA, pp. 149–157.
Woodland, 2002, The development of the HTK broadcast news transcription system: an overview, Speech Communication, 37, 291