mixOmics: An R package for ‘omics feature selection and multiple data integration

PLoS Computational Biology - Tập 13 Số 11 - Trang e1005752
Florian Rohart1, Benoît Gautier1, Amrit Singh2,3, Kim‐Anh Lê Cao4,1
1The University of Queensland Diamantina Institute, Translational Research Institute, Brisbane, Queensland, Australia
2Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada
3Prevention of Organ Failure (PROOF) Centre of Excellence, Vancouver, British Columbia, Canada
4Melbourne Integrative Genomics and School of Mathematics and Statistics, University of Melbourne, Melbourne, Victoria, Australia

Tóm tắt

Từ khóa


Tài liệu tham khảo

Lê Cao KA, Rohart F, Gonzalez I, Déjean S, Gautier B, Bartolo F, et al. mixOmics: Omics Data Integration Project; 2017. Available from: <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=mixOmics" xlink:type="simple">https://CRAN.R-project.org/package=mixOmics</ext-link>.

AL Boulesteix, 2007, Partial least squares: a versatile tool for the analysis of high-dimensional genomic data, Brief Bioinform, 8, 32, 10.1093/bib/bbl016

C Meng, 2016, Dimension reduction techniques for the integrative analysis of multi-omics data, Briefings in bioinformatics

JS Labus, 2015, Multivariate morphological brain signatures predict patients with chronic abdominal pain from healthy control subjects, Pain, 156, 1545, 10.1097/j.pain.0000000000000196

JA Cook, 2016, Mass Spectrometry–Based Metabolomics Identifies Longitudinal Urinary Metabolite Profiles Predictive of Radiation-Induced Cancer, Cancer research, 76, 1569, 10.1158/0008-5472.CAN-15-2416

L Guidi, 2016, Plankton networks driving carbon export in the oligotrophic ocean, Nature

D Mahana, 2016, Antibiotic perturbation of the murine gut microbiome enhances the adiposity, insulin resistance, and liver disease associated with high-fat diet, Genome medicine, 8, 1, 10.1186/s13073-016-0297-9

D Ramanan, 2016, Helminth infection promotes colonization resistance via type 2 immunity, Science, 352, 608, 10.1126/science.aaf3229

S Rollero, 2016, Key role of lipid management in nitrogen and aroma metabolism in an evolved wine yeast strain, Microbial cell factories, 15, 1, 10.1186/s12934-016-0434-6

KA Lê Cao, 2011, Sparse PLS Discriminant Analysis: biologically relevant feature selection and graphical displays for multiclass problems, BMC bioinformatics, 12, 253, 10.1186/1471-2105-12-253

A Singh, 2016, DIABLO-an integrative, multi-omics, multivariate method for multi-group classification, bioRxiv, 067611

F Rohart, 2017, MINT: A multivariate integrative approach to identify a reproducible biomarker signature across multiple experiments and platforms, BMC Bioinformatics, 18

Y Liu, 2013, Multilevel omic data integration in cancer cell lines: advanced annotation and emergent properties, BMC systems biology, 7, 14, 10.1186/1752-0509-7-14

OP Günther, 2012, A computational pipeline for the development of multi-marker bio-signature panels and ensemble classifiers, BMC bioinformatics, 13, 326, 10.1186/1471-2105-13-326

M Teng, 2016, A benchmark for RNA-seq quantification pipelines, Genome biology, 17, 74, 10.1186/s13059-016-0940-1

M Arumugam, 2011, Enterotypes of the human gut microbiome, nature, 473, 174, 10.1038/nature09944

KA Lê Cao, 2016, MixMC: Multivariate insights into Microbial Communities, PloS one, 11, e0160169, 10.1371/journal.pone.0160169

H Wold, 1975, Path models with latent variables: The NIPALS approach, 10.1016/B978-0-12-103950-9.50017-4

F Yao, 2012, Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets, BMC bioinformatics, 13, 24, 10.1186/1471-2105-13-24

H Wold, 1966, Estimation of principal components and related models by iterative least squares, J Multivar Anal, 391

A Eslami, 2013, New Perspectives in Partial Least Squares and Related Methods, 243

I González, 2008, CCA: An R package to extend canonical correlation analysis, Journal of Statistical Software, 23, 1, 10.18637/jss.v023.i12

A Tenenhaus, 2011, Regularized generalized canonical correlation analysis, Psychometrika, 76, 257, 10.1007/s11336-011-9206-8

DV Nguyen, 2002, Tumor classification by partial least squares using microarray gene expression data, Bioinformatics, 18, 39, 10.1093/bioinformatics/18.1.39

DV Nguyen, 2002, Multi-class cancer classification via partial least squares with gene expression profiles, Bioinformatics, 18, 1216, 10.1093/bioinformatics/18.9.1216

AL Boulesteix, 2004, PLS dimension reduction for classification with microarray data, Statistical applications in genetics and molecular biology, 3, 1, 10.2202/1544-6115.1075

R Tibshirani, 1996, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society Series B (Methodological), 267, 10.1111/j.2517-6161.1996.tb02080.x

L Wangen, 1989, A multiblock partial least squares algorithm for investigating complex chemical systems, Journal of chemometrics, 3, 3, 10.1002/cem.1180030104

JA Westerhuis, 2001, Deflation in multiblock PLS, Journal of chemometrics, 15, 485, 10.1002/cem.652

İ Karaman, 2015, Sparse multi-block PLSR for biomarker discovery when integrating data from LC–MS and NMR metabolomics, Metabolomics, 11, 367, 10.1007/s11306-014-0698-y

A Kawaguchi, 2017, Supervised multiblock sparse multivariable analysis with application to multimodal brain imaging genetics, Biostatistics, kxx011

Tenenhaus A, Guillemot V. RGCCA: Regularized and Sparse Generalized Canonical Correlation Analysis for Multiblock Data; 2017. Available from: <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=RGCCA" xlink:type="simple">https://CRAN.R-project.org/package=RGCCA</ext-link>.

A Tenenhaus, 2014, Variable selection for generalized canonical correlation analysis, Biostatistics, 15, 569, 10.1093/biostatistics/kxu001

I González, 2012, Visualising associations between paired’omics’ data sets, BioData mining, 5, 19, 10.1186/1756-0381-5-19

J Khan, 2001, Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks, Nature medicine, 7, 673, 10.1038/89044

F Rohart, 2016, A molecular classification of human mesenchymal stromal cells, PeerJ, 4, e1845, 10.7717/peerj.1845

AK Shah, 2016, Glyco-centric lectin magnetic bead array (LeMBA)- proteomics dataset of human serum samples from healthy, Barrett’s esophagus and esophageal adenocarcinoma individuals, Data in Brief, 7, 1058, 10.1016/j.dib.2016.03.081

J Friedman, 2010, Regularization paths for generalized linear models via coordinate descent, Journal of statistical software, 33, 1, 10.18637/jss.v033.i01

Witten D, Tibshirani R, Gross S, Narasimhan B. PMA: Penalized Multivariate Analysis; 2013. Available from: <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=PMA" xlink:type="simple">https://CRAN.R-project.org/package=PMA</ext-link>.

Husson F, Josse J, Le S, Mazet J. FactoMineR: factor analysis and data mining with R; 2017. Available from: <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://cran.r-project.org/web/packages/FactoMineR" xlink:type="simple">https://cran.r-project.org/web/packages/FactoMineR</ext-link>.

Chung D, Chun H, Keles S. SPLS: Sparse partial least squares (SPLS) regression and classification; 2013. Available from: <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=spls" xlink:type="simple">https://CRAN.R-project.org/package=spls</ext-link>.

Kraemer N, Boulesteix A. ppls: Penalized Partial Least Squares; 2014. Available from: <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=ppls" xlink:type="simple">https://CRAN.R-project.org/package=ppls</ext-link>.

Del Ferraro M, Kiers H, Giordani P. ThreeWay: Three-Way Component Analysis; 2015. Available from: <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://cran.r-project.org/web/packages/ThreeWay" xlink:type="simple">https://cran.r-project.org/web/packages/ThreeWay</ext-link>.

Leibovici D. PTAk: Principal Tensor Analysis on k Modes; 2015. Available from: <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://cran.r-project.org/web/packages/PTAk" xlink:type="simple">https://cran.r-project.org/web/packages/PTAk</ext-link>.

Thioulouse J, Chessel D, Dolédec S, Olivier J, Goreaud F, Pelissier R. ADE-4: Ecological data analysis. Exploratory and euclidean methods in environmental sciences; 2017. Available from: <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://cran.r-project.org/web/packages/ade4" xlink:type="simple">https://cran.r-project.org/web/packages/ade4</ext-link>.

N Krämer, 2011, The degrees of freedom of partial least squares regression, Journal of the American Statistical Association, 106, 697, 10.1198/jasa.2011.tm10107

R Rosipal, 2010, Nonlinear partial least squares: An overview, Chemoinformatics and advanced machine learning perspectives: complex computational methods and collaborative techniques, 169