Regularized Generalized Canonical Correlation Analysis: A Framework for Sequential Multiblock Component Methods

Michel Tenenhaus1, Arthur Tenenhaus2,3, Patrick J. F. Groenen4
1HEC Paris, Jouy-en-Josas, France
2Laboratoire des Signaux et Systèmes (L2S, UMR CNRS 8506), CentraleSupelec-L2S-Université Paris-Sud, Gif-sur-Yvette Cedex, France
3Bioinformatics and Biostatistics Core Facility, Brain and Spine Institute, Paris, France
4Econometric Institute, Erasmus School of Economics, Erasmus University Rotterdam, the Netherlands

Tóm tắt

A new framework for sequential multiblock component methods is presented. This framework relies on a new version of regularized generalized canonical correlation analysis (RGCCA) where various scheme functions and shrinkage constants are considered. Two types of between block connections are considered: blocks are either fully connected or connected to the superblock (concatenation of all blocks). The proposed iterative algorithm is monotone convergent and guarantees obtaining at convergence a stationary point of RGCCA. In some cases, the solution of RGCCA is the first eigenvalue/eigenvector of a certain matrix. For the scheme functions x, $${\vert }x{\vert }$$ , $$x^{2}$$ or $$x^{4}$$ and shrinkage constants 0 or 1, many multiblock component methods are recovered.

Từ khóa


Tài liệu tham khảo

Addinsoft (2016). XLSTAT software, Paris. Carroll, J. D. (1968a). A generalization of canonical correlation analysis to three or more sets of variables. Proceedings of the 76th Convention - American Psychological Association, pp. 227–228. Carroll, J. D. (1968b). Equations and Tables for a generalization of canonical correlation analysis to three or more sets of variables. Unpublished companion paper to Carroll J.D. Chessel, D., & Hanafi, M. (1996). Analyses de la co-inertie de \(K\) nuages de points. Revue de Statistique Appliquée, 44, 35–60. Dahl, T., & Næs, T. (2006). A bridge between Tucker-1 and Carroll’s generalized canonical analysis. Computational Statistics and Data Analysis, 50, 3086–3098. Dijkstra T. K. (1981). Latent variables in linear stochastic models, PhD thesis. Amsterdam: Sociometric Research Foundation. Dijkstra, T. K. (1983). Some comments on maximum likelihood and partial least squares methods. Journal of Economics, 22, 67–90. Dijkstra, T. K., & Henseler, J. (2015). Consistent and asymptotically normal PLS estimators for linear structural equations. Computational Statistics and Data Analysis, 81, 10–23. Escofier, B., & Pagès, J. (1994). Multiple factor analysis, (AFMULT package). Computational Statistics and Data Analysis, 18, 121–140. Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4(3), 272–299. Fessler J. (2004). Monotone convergence. Lecture notes. https://web.eecs.umich.edu/~fessler/course/600/l/lmono.pdf. Hair, J. F., Hult, G. T. M., Ringle, C. M., & Sarstedt, M. (2014). A primer on partial least squares structural equation modeling (PLS-SEM). Thousand Oaks, CA: SAGE. Hanafi, M. (2007). PLS path modelling: Computation of latent variables with the estimation mode B. Computational Statistics, 22, 275–292. Hanafi, M., & Kiers, H. A. L. (2006). Analysis of \(K\) sets of data, with differential emphasis on agreement between and within sets. Computational Statistics and Data Analysis, 51, 1491–1508. Hanafi, M., Kohler, A., & Qannari, E. M. (2010). Shedding new light on hierarchical principal component analysis. Journal of Chemometrics, 24, 703–709. Hanafi, M., Kohler, A., & Qannari, E. M. (2011). Connections between multiple co-inertia analysis and consensus principal component analysis. Chemometrics and Intelligent Laboratory Systems, 106, 37–40. Hassani, S., Hanafi, M., Qannari, E. M., & Kohler, A. (2013). Deflation strategies for multi-block principal component analysis revisited. Chemometrics and Intelligent Laboratory Systems, 120, 154–168. Horst, P. (1961a). Relations among \(m\) sets of measures. Psychometrika, 26, 126–149. Horst, P. (1961b). Generalized canonical correlations and their applications to experimental data. Journal of Clinical Psychology (Monograph supplement), 14, 331–347. Horst, P. (1965). Factor analysis of data matrices. New York: Holt, Rinehart and Winston. Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28, 321–377. Hwang, H., & Takane, Y. (2014). Generalized structured component analysis: A component-based approach to structural equation modeling. Boca Raton: CRC Press. Jöreskog, K. G., & Wold, H. (1982). The ML and PLS techniques for modeling with latent variables, historical and comparative aspects. In K. G. Jöreskog & H. Wold (Eds.), Systems under indirect observation, Part 1 (pp. 263–270). Amsterdam: North-Holland. Journée, M., Nesterov, Y., Richtárik, P., & Sepulchre, R. (2010). Generalized power method for sparse principal component analysis. The Journal of Machine Learning Research, 11, 517–553. Kettenring J. R. (1969). Canonical analysis of several sets of variables. Unpublished Ph. D. thesis, Institute of Statistics Mimeo Series No. 615, University of North Carolina at Chapel Hill. Kettenring, J. R. (1971). Canonical analysis of several sets of variables. Biometrika, 58, 433–451. Krämer, N. (2007). Analysis of high-dimensional data with partial least squares and boosting. Doctoral dissertation. Technischen Universität Berlin. Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88, 365–411. Lohmöller, J.-B. (1989). Latent variables path modeling with partial least squares. Heildelberg: Springer (reprinted 2013). McDonald, R. P. (1968). A unified treatment of the weighting problem. Psychometrika, 33, 351–381. McDonald, R. P. (1996). Path analysis with composite variables. Multivariate Behavioral Research, 31, 239–270. McKeon J. J. (1966). Canonical analysis: Some relation between canonical correlation, factor analysis, discriminant analysis, and scaling theory. Psychometric Monograph, 13. Meyer, R. R. (1976). Sufficient conditions for the convergence of monotonic mathematical programming algorithms. Journal of Computer and System Sciences, 12(1), 108–121. Ringle, C. M., Wende, S., & Becker, J.-M. (2015). SmartPLS 3. Bönningstedt: SmartPLS GmbH. Schäfer, J., & Strimmer, K. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4(1), Article 32. Smilde, A. K., Westerhuis, J. A., & de Jong, S. (2003). A framework for sequential multiblock component methods. Journal of Chemometrics, 17, 323–337. Steel, R. G. D. (1951). Minimum generalized variance for a set of linear functions. Annals of Mathematical Statistics, 22, 456–460. Ten Berge, J. M. F. (1988). Generalized approaches to the MAXBET problem and the MAXDIFF problem, with applications to canonical correlations. Psychometrika, 53, 487–494. Tenenhaus, M. (2008). Component-based structural equation modelling. Total Quality Management & Business Excellence, 19(7), 871–886. Tenenhaus, A., & Guillemot, V. (2017). RGCCA: Regularized and sparse generalized canonical correlation analysis for multiblock data. http://cran.project.org/web/packages/RGCCA/index.html. Tenenhaus, A., & Tenenhaus, M. (2011). Regularized generalized canonical correlation analysis. Psychometrika, 76, 257–284. Tenenhaus, A., & Tenenhaus, M. (2014). Regularized generalized canonical correlation analysis for multiblock or multigroup data analysis. European Journal of Operational Research, 238, 391–403. Tenenhaus, M., Esposito, Vinzi V., Chatelin, Y.-M., & Lauro, C. (2005). PLS path modeling. Computational Statistics & Data Analysis, 48, 159–205. Tucker, L. R. (1958). An inter-battery method of factor analysis. Psychometrika, 23, 111–136. Van de Geer, J. P. (1984). Linear relations among \(k\) sets of variables. Psychometrika, 49, 70–94. Van den Wollenberg, A. L. (1977). Redundancy analysis—An alternative to canonical correlation analysis. Psychometrika, 42, 207–219. Wangen, L. E., & Kowalski, B. R. (1989). A multiblock partial least squares algorithm for investigating complex chemical systems. Journal of Chemometrics, 3, 3–20. Westerhuis, J. A., Kourti, T., & MacGregor, J. F. (1998). Analysis of multiblock and hierarchical PCA and PLS models. Journal of Chemometrics, 12, 301–321. Widaman, K. F. (1993). Common factor analysis versus principal component analysis: Differential bias in representing model parameters? Multivariate Behavioral Research, 28(3), 263–311. Wold, H. (1966). Nonlinear estimation by iterative least square procedures. In F. N. David (Ed.), Festschrift for Jerzy Neyman, Research papers in Statistics (pp. 411–444). London: Wiley. Wold, H. (1982). Soft modeling: The basic design and some extensions. In K. G. Jöreskog & H. Wold (Eds.), Systems under indirect observation, Part 2 (pp. 1–54). Amsterdam: North-Holland. Wold, H. (1985). Partial least squares. In S. Kotz & N. L. Johnson (Eds.), Encyclopedia of statistical sciences (Vol. 6, pp. 581–591). New York: Wiley. Wold, S., Hellberg, S., Lundstedt, T., Sjöström, M., & Wold, H. (1987): PLS modeling with latent variables in two or more dimensions. In Proceedings of the symposium on PLS model building: Theory and application pp. 1–21, Frankfurt am Main. Wold, S., Kettaneh, N., & Tjessem, K. (1996). Hierarchical multiblock PLS and PC models for easier model interpretation and as an alternative to variable selection. Journal of Chemometrics, 10, 463–482.