Improving robustness of MLLR adaptation with speaker-clustered regression class trees
Tài liệu tham khảo
Anastasakos, T., McDonough, J., Makhoul, J., 1997. Speaker adaptive training: a maximum likelihood approach to speaker normalization. In: Proc. of ICASSP, vol. 2, pp. 1043–1046.
Bocchieri, E., Digalakis, V., Corduneanu, A., Boulis, C., 1999. Correlation modeling of MLLR transform biases for rapid HMM adaptation to new speakers. In: Proc. of ICASSP, vol. 2, pp. 773–776.
Boulis, 2001, Maximum likelihood stochastic transformations adaptation for medium and small data sets, Computer Speech & Language, 15, 257, 10.1006/csla.2001.0168
Chen, K.T., Liau, W.W., Wang, H.M., Lee, L.S., 2000. Fast speaker adaptation using eigenspace-based maximum likelihood linear regression. In: Proc. of ICSLP, vol. III, pp. 742–745.
Cieri, C., Miller, D., Walker, K., 2004. The Fisher corpus: a resource for the next generations of speech-to-text. In: Fourth International Conference on Language Resources and Evaluation.
Dempster, 1977, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, 39, 1
Digalakis, 1995, Speaker adaptation using constrained estimation of Gaussian mixtures, IEEE Transactions on Speech and Audio Processing, 3, 357, 10.1109/89.466659
Ferrer, L., Sönmez, K., Kajarekar, S., 2005. Class-based score combination for speaker recognition. In: Proc. of Eurospeech, pp. 2173–2176.
Gales, M., 1996. The generation and use of regression class trees for MLLR adaptation. Tech. Rep. CUED/F-INFENG/TR263, Cambridge University.
Gales, M., 1997. Transformation smoothing for speaker and environmental adaptation. In: Proc. of Eurospeech, vol. 4, pp. 2067–2070.
Gales, 1998, Maximum likelihood linear transformations for HMM-based speech recognition, Computer Speech & Language, 12, 75, 10.1006/csla.1998.0043
Gales, 2000, Cluster adaptive training of hidden Markov models, IEEE Transactions on Speech and Audio Processing, 8, 417, 10.1109/89.848223
Gales, 1996, Mean and variance compensation within the MLLR framework, Computer Speech & Language, 10, 249, 10.1006/csla.1996.0013
Haeb-Umbach, 2001, Automatic generation of phonetic regression class trees for MLLR adaptation, IEEE Transactions on Speech and Audio Processing, 9, 299, 10.1109/89.906003
Huang, C., Chen, T., Li, S., Chang, E., Zhou, J., 2001. Analysis of speaker variability. In: Proc. of Eurospeech, vol. 2, pp. 1377–1380.
Hwang, M.-Y., Huang, X., 1998. Dynamically configurable acoustic models for speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’98, vol. 2, pp. 669–672.
Hwang, M.-Y., Lei, X., Wang, W., Shinozaki, T., 2006. Investigation on Mandarin broadcast news speech recognition. In: Proc. of ICSLP, pp. 1233–1236.
Imamura, A., 1991. Speaker adaptive HMM-based speech recognition with a stochastic speaker classifier. In: Proc. of ICASSP, vol. 2, pp. 841–844.
Kannan, 1994, Maximum likelihood clustering of gaussians for speech recognition, IEEE Transactions in Speech and Audio Processing, 2, 453, 10.1109/89.294362
Kosaka, T., Sagayama, S., 1994. Tree structured speaker clustering for fast speaker adaptation. In: Proc. of ICASSP, vol. 1, pp. 245–248.
Kuhn, 2000, Rapid speaker adaptation in eigenvoice space, IEEE Transactions on Speech and Audio Processing, 8, 695, 10.1109/89.876308
Labov, W., 1996. The organization of dialect diversity in North America. In: Fourth International Conference on Spoken Language Processing.
Leggetter, C., 1995. Improved acoustic modelling for HMMs using linear transformations. Ph.D. Thesis, University of Cambridge.
Leggetter, 1995, Maximum likelihood linear regression for speaker adaptation of HMMs, Computer Speech & Language, 9, 171, 10.1006/csla.1995.0010
Mak, B., Hsiao, R., 2004. Improving eigenspace-based MLLR adaptation by kernel PCA. In: Proc. of ICSLP, vol. I, pp. 13–16.
Mandal, A., Ostendorf, M., Stolcke, A., 2005. Leveraging speaker-dependent variation of adaptation. In: Proc. of Eurospeech, pp. 1793–1796.
Mandal, A., Ostendorf, M., Stolcke, A., 2006. Speaker clustered regression-class trees for MLLR adaptation. In: Proc. of ICSLP, pp. 1133–1136.
Mardia, 1979
Padmanabhan, 1998, Speaker clustering and transformation for speaker adaptation in speech recognition systems, IEEE Transactions on Speech and Audio Processing, 6, 71, 10.1109/89.650313
R Development Core Team, 2005. R: a language and environment for statistical computing. R Foundation for Statistical Computing, ISBN 3-900051-07-0. <http://www.R-project.org>.
Sankar, A., Beaufays, F., Digilakis, V., 1995. Training data clustering for improved speech recognition. In: Proc. of Eurospeech, vol. 1, pp. 502–505.
Sankar, A., Neumeyer, L., Weintraub, M., 1996. An experimental study of acoustic adaptation algorithms. In: Proc. of ICASSP, vol. 2, pp. 713–716.
Sankar, A., Gadde, R., Weng, F., 1999. SRI’s 1998 broadcast news system – towards faster, smaller, and better speech recognition. In: DARPA Broadcast News Workshop, pp. 281–286.
Stolcke, A., Ferrer, L., Kajarekar, S., Shriberg, E., Venkataraman, A., 2005. MLLR transforms as features in speaker recognition. In: Proc. of Eurospeech, pp. 2425–2428.
Stolcke, 2006, Recent innovations in speech-to-text transcription at SRI-ICSI-UW, IEEE Transactions on Audio, Speech and Language Processing, 14, 1729, 10.1109/TASL.2006.879807
Venkataraman, A., Stolcke, A., Wang, W., Vergyri, D., Gadde, V., Zheng, J., 2004. SRI’s 2004 broadcast news speech to text system. In: EARS RT04 Workshop.
Woodland, P., Gales, M., Pye, D., 1996. Improving environmental robustness in large vocabulary speech recognition. In: Proc. of ICASSP, vol. 1, pp. 65–68.
Young, S., Odell, J., Woodland, P., 1994. Tree based state tying for high accuracy modelling. In: Proc. ARPA Spoken Language Technology Workshop, pp. 405–410.