Joint Principal Trend Analysis for Longitudinal High-dimensional Data

Biometrics - Tập 74 Số 2 - Trang 430-438 - 2018
Yuping Zhang1,2,3, Zhengqing Ouyang4,5,6
1Center for Quantitative Medicine, University of Connecticut Health Center , Farmington, Connecticut , U.S.A.
2Department of Statistics, University of Connecticut , Storrs, Connecticut , U.S.A.
3Institute for Systems Genomics, Institute for Collaboration on Health, Intervention, and Policy, CT Institute of the Brain and Cognitive Sciences, University of Connecticut , Storrs, Connecticut , U.S.A.
4Department of Biomedical Engineering, Institute for Systems Genomics, University of Connecticut , Storrs, Connecticut , U.S.A.
5Department of Genetics and Genome Sciences, University of Connecticut Health Center , Farmington, Connecticut , U.S.A.
6The Jackson Laboratory for Genomic Medicine , Farmington, Connecticut , U.S.A.

Tóm tắt

Summary We consider a research scenario motivated by integrating multiple sources of information for better knowledge discovery in diverse dynamic biological processes. Given two longitudinal high-dimensional datasets for a group of subjects, we want to extract shared latent trends and identify relevant features. To solve this problem, we present a new statistical method named as joint principal trend analysis (JPTA). We demonstrate the utility of JPTA through simulations and applications to gene expression data of the mammalian cell cycle and longitudinal transcriptional profiling data in response to influenza viral infections.

Từ khóa


Tài liệu tham khảo

Alizadeh, 2015, Toward understanding and exploiting tumor heterogeneity, Nature Medicine, 21, 846, 10.1038/nm.3915

Aschard, 2014, Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies, The American Journal of Human Genetics, 94, 662, 10.1016/j.ajhg.2014.03.016

Balakrishnan, 2012, Sparse additive functional and kernel cca, 911

Bar-Joseph, 2004, Deconvolving cell cycle expression data with complementary information, Bioinformatics, 20, i23, 10.1093/bioinformatics/bth915

Bar-Joseph, 2008, Genome-wide transcriptional analysis of the human cell cycle identifies genes differentially regulated in normal and cancer cells, Proceedings of the National Academy of Sciences, 105, 955, 10.1073/pnas.0704723105

Barrett, 2011, Ncbi geo: Archive for functional genomics data sets 10 years on, Nucleic Acids Research, 39, D1005, 10.1093/nar/gkq1184

Collins, 2015, A new initiative on precision medicine, New England Journal of Medicine, 372, 793, 10.1056/NEJMp1500523

d'Aspremont, 2007, A direct formulation for sparse pca using semidefinite programming, SIAM Review, 49, 434, 10.1137/050645506

Desai, 2011, Dissecting inflammatory complications in critically injured patients by within-patient gene expression changes: A longitudinal clinical genomics study, PLoS Medicine, 8, 10.1371/journal.pmed.1001093

Fritz, 2002, Electronic detection of dna by its intrinsic molecular charge, Proceedings of the National Academy of Sciences, 99, 14142, 10.1073/pnas.232276699

Fukumizu, 2007, Statistical consistency of kernel canonical correlation analysis, The Journal of Machine Learning Research, 8, 361

Hardoon, 2011, Sparse canonical correlation analysis, Machine Learning, 83, 331, 10.1007/s10994-010-5222-7

Hastie, 2005, The elements of statistical learning: Data mining, inference and prediction, The Mathematical Intelligencer, 27, 83, 10.1007/BF02985802

Holter, 2001, Dynamic modeling of gene expression data, Proceedings of the National Academy of Sciences, 98, 1693, 10.1073/pnas.98.4.1693

Hotelling, 1936, Relations between two sets of variates, Biometrika, 28, 321, 10.1093/biomet/28.3-4.321

Huang, 2008, Systematic and integrative analysis of large gene lists using david bioinformatics resources, Nature Protocols, 4, 44, 10.1038/nprot.2008.211

Kolda, 2009, Tensor decompositions and applications, SIAM Review, 51, 455, 10.1137/07070111X

Leurgans, 1993, Canonical correlation analysis when the data are curves, Journal of the Royal Statistical Society, Series B (Methodological), 10.1111/j.2517-6161.1993.tb01936.x

Lu, 2004, Statistical resynchronization and bayesian detection of periodically expressed genes, Nucleic Acids Research, 32, 447, 10.1093/nar/gkh205

Ouyang, 2009, ChIP-Seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells, Proceedings of the National Academy of Sciences, 106, 21521, 10.1073/pnas.0904863106

Parkhomenko, 2009, Sparse canonical correlation analysis with application to genomic data integration, Statistical Applications in Genetics and Molecular Biology, 8, 1, 10.2202/1544-6115.1406

Peña-Diaz, 2013, Transcription profiling during the cell cycle shows that a subset of polycomb-targeted genes is upregulated during dna replication, Nucleic Acids Research, 41, 2846, 10.1093/nar/gks1336

Ramsay, 2005, Functional data analysis, 10.1007/b98888

Ravikumar, 2009, Sparse additive models, Journal of the Royal Statistical Society, Series B (Statistical Methodology), 71, 1009, 10.1111/j.1467-9868.2009.00718.x

Ringnér, 2008, What is principal component analysis?, Nature Biotechnology, 26, 303, 10.1038/nbt0308-303

Su, 2015, Bayesian modeling of the covariance structure for irregular longitudinal data using the partial autocorrelation function, Statistics in Medicine, 34, 2004, 10.1002/sim.6465

Wahba, 1990, CBMS-NSF Regional Conference Series in Applied Mathematics, Society for Industrial and Applied Mathematics, Philadelphia, PA, Spline models for observational data, 10.1137/1.9781611970128

Whitfield, 2002, Identification of genes periodically expressed in the human cell cycle and their expression in tumors, Molecular Biology of the Cell, 13, 1977, 10.1091/mbc.02-02-0030

Witten, 2009, A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis, Biostatistics, 10, 515, 10.1093/biostatistics/kxp008

Xiong, 2010, Temporal collaborative filtering with bayesian probabilistic tensor factorization, Proceedings of the 2010 SIAM International Conference on Data Mining, 211, 10.1137/1.9781611972801.19

Zhai, 2015, Host transcriptional response to influenza and other acute respiratory viral infections—A prospective cohort study, PLoS Pathog, 11, 10.1371/journal.ppat.1004869

Zhang, 2013, Principal trend analysis for time-course data with applications in genomic medicine, The Annals of Applied Statistics, 7, 2205, 10.1214/13-AOAS659

Zhang, 2014, Predicting quantitative outcomes of patients using longitudinal gene expression, Sri Lankan Journal of Applied Statistics, Special Issue Modern Statistical Methodologies in the Cutting Edge of Science, 5

Zhang, 2013, Classification of patients from time-course gene expression, Biostatistics, 14, 87, 10.1093/biostatistics/kxs027

Zhang, 2010, Predicting patient survival from longitudinal gene expression, Statistical Applications in Genetics and Molecular Biology, 9, 10.2202/1544-6115.1617