Sparse canonical correlation analysis

Machine Learning - Tập 83 - Trang 331-353 - 2010
David R. Hardoon1,2, John Shawe-Taylor2
1Data Mining Department, Institute for Infocomm Research (I2R), A*STAR, Singapore, Singapore
2Centre for Computational Statistics and Machine Learning, Department of Computer Science, University College London, London, UK

Tóm tắt

We present a novel method for solving Canonical Correlation Analysis (CCA) in a sparse convex framework using a least squares approach. The presented method focuses on the scenario when one is interested in (or limited to) a primal representation for the first view while having a dual representation for the second view. Sparse CCA (SCCA) minimises the number of features used in both the primal and dual projections while maximising the correlation between the two views. The method is compared to alternative sparse solutions as well as demonstrated on paired corpuses for mate-retrieval. We are able to observe, in the mate-retrieval, that when the number of the original features is large SCCA outperforms Kernel CCA (KCCA), learning the common semantic space from a sparse set of features.

Tài liệu tham khảo

Akaho, S. (2001). A kernel method for canonical correlation analysis. In International meeting of psychometric society, Osaka. Bach, F., & Jordan, M. (2002). Kernel independent component analysis. Journal of Machine Leaning Research, 3, 1–48. Breiman, L., & Friedman, L. H. (1985). Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association, 80, 580–598. Chen, S. S., Donoho, D. L., & Saunders, M. A. (1999). Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20(1), 33–61. d’Aspremont, A., Ghaoui, L. E., Jordan, M. I., & Lanckriet, G. (2007). A direct formulation for sparse pca using semidefinite programming. SIAM Review, 49(3), 434–448. Dhanjal, C., Gunn, S. R., & Shawe-Taylor, J. (2006). Sparse feature extraction using generalised partial least squares. In Proceedings of the IEEE international workshop on machine learning for signal processing (pp. 27–32). Friman, O., Borga, M., Lundberg, P., & Knutsson, H. (2001a). A correlation framework for functional MRI data analysis. In Proceedings of the 12th Scandinavian conference on image analysis, Bergen, Norway, June 2001. Friman, O., Carlsson, J., Lundberg, P., Borga, M., & Knutsson, H. (2001b). Detection of neural activity in functional MRI using canonical correlation analysis. Magnetic Resonance in Medicine, 450(2), 323–330. Fukumizu, K., Bach, F. R., & Gretton, A. (2007). Consistency of kernel canonical correlation analysis. Journal of Machine Learning Research, 8, 361–383. Fyfe, C., & Lai, P. L. (2000). ICA using kernel canonical correlation analysis. In Proc. int. workshop on independent component analysis and blind signal separation (ICA 2000) (pp. 279–284). Hardoon, D. R., & Shawe-Taylor, J. (2003). KCCA for different level precision in content-based image retrieval. In Proceedings of third international workshop on content-based multimedia indexing, IRISA, Rennes, France. Hardoon, D., & Shawe-Taylor, J. (2007). Sparse canonical correlation analysis (Technical report). UK: University College London. Hardoon, D. R., & Shawe-Taylor, J. (2009). Convergence analysis of kernel canonical correlation analysis: Theory and practice. Machine Learning, 74(1), 23–38. Hardoon, D. R., Szedmak, S., & Shawe-Taylor, J. (2003). Canonical correlation analysis; an overview with application to learning methods (Technical Report CSD-TR-03-02). Royal Holloway University of London. Hardoon, D. R., Szedmak, S., & Shawe-Taylor, J. (2004). Canonical correlation analysis: an overview with application to learning methods. Neural Computation, 16, 2639–2664. Hardoon, D. R., Saunders, C., Szedmak, S., & Shawe-Taylor, J. (2006). A correlation approach for automatic image annotation. In Springer LNAI (Vol. 4093, pp. 681–692). Berlin: Springer. Hardoon, D. R., Mourao-Miranda, J., Brammer, M., & Shawe-Taylor, J. (2007). Unsupervised analysis of fmri data using kernel canonical correlation. NeuroImage, 37(4), 1250–1259. Hastie, T. J., & Tibshirani, R. J. (1990). Generalized additive models. London/Boca Raton: Chapman & Hall/CRC Press. Heiler, M., & Schnor, C. (2006). Learning sparse representations by non-negative matrix factorization and sequential cone programming. Journal of Machine Learning Research, 7, 1385–1407. Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28, 312–377. Ketterling, J. R. (1971). Canonical analysis of several sets of variables. Biometrika, 58, 433–451. Koehn, P. (2005). Europarl: A multilingual corpus for evaluation of machine translation. In Conference proceedings: the tenth machine translation summit (pp. 79–86). Lai, P. L., & Fyfe, C. (2000). Kernel and nonlinear canonical correlation analysis. International Journal of Neural Systems, 10(5), 365–377. Lee, H., Battle, A., Raina, R., & Ng, A. Y. (2006). Efficient sparse coding algorithms. In Proceedings of the 20th annual conference on neural information process systems (NIPS). Moghaddam, B., Weiss, Y., & Avidan, S. (2006). Spectral bounds for sparse pca: Exact and greedy algorithms. In Neural information processing systems (NIPS 06). Ralf, S., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D., & Varga, D. (2006). The jrc-acquis: A multilingual aligned parallel corpus with 20+ languages. In Proceedings of the 5th international conference on language resources and evaluation (LREC’2006). Roth, V. (2004). The generalized lasso. IEEE Transactions on Neural Networks, 15(1), 16–28. Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press. Sriperumbudur, B. K., Torres, D., & Lanckriet, G. (2007). Sparse eigen methods by d.c. programming. In C. Brodley & A. Danyluk (Eds.), Proceedings of 2nd international conference on machine learning (pp. 831–838). San Mateo: Morgan Kaufmann. Szedmak, S., De Bie, T., & Hardoon, D. R. (2007). A metamorphosis of canonical correlation analysis into multivariate maximum margin learning. In 15th European symposium on artificial neural networks (ESANN). Tibshirani, R. (1994). Regression shrinkage and selection via the lasso (Technical report). University of Toronto. Torres, D., Turnbull, D., Barrington, L., & Lanckriet, G. (2007). Identifying words that are musically meaningful. In Proceedings of the 8th international conference on music information retrieval. Vinokourov, A., Hardoon, D. R., & Shawe-Taylor, J. (2003). Learning the semantics of multimedia content with application to web image retrieval and classification. In Proceedings of fourth international symposium on independent component analysis and blind source separation, Nara, Japan. Weston, J., Elisseeff, A., Scholkopf, B., & Tipping, M. (2003). Use of the zero norm with linear models and kernel method. Journal of Machine Learning Research, 3, 1439–1461. Zou, H., Hastie, T., & Tibshirani, R. (2004). Sparse principal component analysis (Technical report). Statistics department, Stanford University.