Joint analysis of multiple high-dimensional data types using sparse matrix approximations of rank-1 with applications to ovarian and liver cancer
Tóm tắt
Technological advances enable the cost-effective acquisition of Multi-Modal Data Sets (MMDS) composed of measurements for multiple, high-dimensional data types obtained from a common set of bio-samples. The joint analysis of the data matrices associated with the different data types of a MMDS should provide a more focused view of the biology underlying complex diseases such as cancer that would not be apparent from the analysis of a single data type alone. As multi-modal data rapidly accumulate in research laboratories and public databases such as The Cancer Genome Atlas (TCGA), the translation of such data into clinically actionable knowledge has been slowed by the lack of computational tools capable of analyzing MMDSs. Here, we describe the Joint Analysis of Many Matrices by ITeration (JAMMIT) algorithm that jointly analyzes the data matrices of a MMDS using sparse matrix approximations of rank-1. The JAMMIT algorithm jointly approximates an arbitrary number of data matrices by rank-1 outer-products composed of “sparse” left-singular vectors (eigen-arrays) that are unique to each matrix and a right-singular vector (eigen-signal) that is common to all the matrices. The non-zero coefficients of the eigen-arrays identify small subsets of variables for each data type (i.e., signatures) that in aggregate, or individually, best explain a dominant eigen-signal defined on the columns of the data matrices. The approximation is specified by a single “sparsity” parameter that is selected based on false discovery rate estimated by permutation testing. Multiple signals of interest in a given MDDS are sequentially detected and modeled by iterating JAMMIT on “residual” data matrices that result from a given sparse approximation. We show that JAMMIT outperforms other joint analysis algorithms in the detection of multiple signatures embedded in simulated MDDS. On real multimodal data for ovarian and liver cancer we show that JAMMIT identified multi-modal signatures that were clinically informative and enriched for cancer-related biology. Sparse matrix approximations of rank-1 provide a simple yet effective means of jointly reducing multiple, big data types to a small subset of variables that characterize important clinical and/or biological attributes of the bio-samples from which the data were acquired.
Tài liệu tham khảo
Donoho DL. High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality. Lecture Delivered at the “Mathematical Challenges of the 21st Century” Conference of the American Math. Los Angeles: Society; 2000. http://www-stat.stanford.edu/donoho/Lectures/AMS2000/AMS2000.html.
Kristensen V, Lingjcerde O, Russnes H, Vollan H, Frigessi A, Borresen-Dale AL. Principles and methods of integrative genomic analyses in cancer. Nat Rev Cancer. 2014;14:299–313.
Network TCGA. Integrated genomic analyses of ovarian carcinoma. Nature. 2011;474(7353):609–15.
Tomczak K, Czerwinska P, Wiznerowicz M. The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. Contemp Oncol. 2015;19(1A):A68–77.
Storey J, Tibshirani R. Statistical significance for genomewide studies. PNAS. 2003;100(16):9440–5.
Efron B, Hastie T, Johnstone I, Tibhshirani R. Least angle regression. Ann Stat. 2004;32:407–99.
Hamid JS, Hu P, Roslin NM, Ling V, Greenwood CMT, Beyene J. Data Integration in Genetics and Genomics: Methods and Challenges. Human Genomics and Proteomics : HGP. 2009;2009:869093. doi:10.4061/2009/869093.
ICGC. International network of cancer genome projects. Nature. 2010;464:993–8.
Zhu Y, Qiu P, Ji Y. TCGA-Assembler: open-source software for retrieving and processing TCGA data. Nature. 2014;11(6):599–600.
Du P, Zhang X, Huang C, Jafari N, Kibbe W, Hou L, Lin S. Comparision of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics. 2010;11:587.
Quackenbush J. Microarray data normalization and transformation. Nat Genet Supplement. 2002;32:496–501.
Friedland S. A new approach to generalized singular value decomposition. SIAM J Matrix Anal Appl. 2005;27(2):434–44.
Lock E, Hoadley K, Marron J, Nobel A. Joint and Individual variation explained (JIVE) for integrated analysis of multiple data types. Ann Appl Stat. 2013;7(1):523–42.
Hastie T, Tibshirani R, Eisen MB, Alizadeh A, Levy R, Staudt L, Brown P. . “Gene shaving” as a method for identifying distinct sets of genes with similar expression patterns. Genome Biology. 2000;1(2):research0003.1–research0003.21.
West M. Bayesian factor regression models in the “large p, small n” paradigm. Bayesian Stat. 2003;7:722–32.
Kalman D. A singularly valuable decomposition: The SVD of a matrix. Coll Math J. 1996;27(1):2–23.
Strang G. Linear Algebra and Its Applications, 4th edn: Thomson Higher Education; 2006.
Zhang T, Golub G. Rank-one approximation to high order tensors. SIAM J Matrix Anal Appl. 2001;23(2):534–50.
Tibhshirani R. In praise of sparsity and convexity. 50th Anniversary volume for COPSS. 2013.
Bishop C. Pattern Recognition and Machine Learning. New York: Springer; 2007.
Jolliffe I, Trendafilov N, Uddin M. A modified principal component technique based on the LASSO. J Comput Graph Stat. 2003;12(3):531–47.
Tibshirani R. Regression shrinkage and selection via the LASSO: A retrospective. J R Stat Soc Ser B. 2011;39:1335–71.
Van Deun K, Van Mechelen I, Thorrez L, Schouteden M, De Moor B, van der Werf MJ, De Lathauwer L, Smilde AK, Kiers HA. DISCO-SCA and properly applied GSVD as swinging methods to find common and distinctive processes. PloS one. 2012;7(5):e37840.
Boulesteix A, Strimmer K. Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Brief Bioinform. 2006;8(1):32–44.
Alter O, Brown P, Botstein D. Generalized singular value decomposition for comparative analysis of genome-scale expression data sets from two different organisms. PNAS. 2003;100:3351–6.
Shen H, Huang J. Sparse principal component analysis via regularized low rank matrix approximation. J Multivar Anal. 2008;99:1015–34.
Sabatti C, Karsten S, Geschwind D. Thresholding rules for recovering a sparse signal from microarray experiments. Math Biosci. 2002;176:17–34.
Chun H, Keles S. Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J R Stat Soc Ser B. 2010;72(1):3–25.
Witten D, Tibshirani R, Hastie T. A penalized matrix decomposition with applications to sparse principal components and canonical correlation analysis. Biostatistics. 2009;10(3):515–34.
Zhang L, Liu C, Zhou X. Identifying multi-layer gene regulatory modules from multi-dimensional genomic data. Bioinformatics. 2012;28(19):2458–66.
Hastie T, Tibhshirani R, Friedman J. The Elements of Statistical Learning. 2001.
Bieze M, Klumpen H, Verheij J, Beuers U, Phoa S, van Gulik T, Bennink R. Diagnostic accuracy of (18)F-methylcholine positron emission tomogrpahy/computed tomography for intra- and extrahepatic hepatocellular carcinoma. Hepatology. 2014;59(3):996–1006.
Talbot J, Fartoux L, Balogova S, Nataf V, Kerrou K, Gutman F, Huchet V, Ancel D, Grange J, Rosmorduc O. Detection of hepatocellular carcinoma with PET/CT: a prospective comparison of 18 F-fluorocholine and 18 F-FDG in patients with cirrhosis or chronic liver disease. J Nucl Med. 2010;51(11):1699–706.
Bentourkia M, Zaidr H. Tracer kinetic modeling in PET. PET Clin. 2007;2(2):267–77.
Watabe H, Ikoma Y, Kimura Y, Nakagawa M, Shidahara M. PET kinetic analysis - compartmental model. Ann Nucl Med. 2006;20(9):583–8.
Lin SM, Du P, Huber W, Kibbe WA. Model-based variance-stabilizing transformation for Illumina microarray data. Nucleic Acids Res. 2008;36(2):e11.
Bair E, Tibshirani R. Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol. 2004;2(4):E108.
Shen Y, Huang S. Improve survival prediction using principal components of gene expression data. Genomics Proteomics Bioinformatics. 2006;4(2):110–9.
Zhang M, He Y, Sun X, Li Q, Wang W, Zhao A, Di W. A high M1/M2 ratio of tumor-associated macrophages is associated with extended survival in ovarian cancer patients. J Ovarian Res. 2014;7:19.
Solinas G, Germano G, Mantovani A, Allavena P. Tumor-associated macrophages (TAM) as major players of the cancer-related inflammation. J Leukoc Biol. 2009;86(5):1065–73.
Moisan F, Francisco E, Brozovic A, Duran G, Wang Y, Chaturvedi S, Seetharam S, Snyder L, Doshi P, Sikic B. Enhancement of paclitaxel and carboplatin therapies by CCL2 blockade in ovarian cancers. Mol Oncol. 2014;8:1231–9.
Gillies R, Anderson A, Gatenby R, Morse D. The biology underlying molecular imaging in oncology: From genome to anatome and back again. Clin Radiol. 2010;65(7):517–21.
Segal E, Sirlin C, Ooi C, Adler A, Gollub J, Chen X, Chan B, Matcuk G, Barry C, Chang H, et al. Decoding gobal gene expression programs in liver cancer by noninvasive imaging. Nat Biotechnol. 2007;25(6):675–80.
Coulouarn C, Cavard C, Rubbla-Brandt L, Audenbourg A, Dumont F, Jacques S, Just PA, Clement B, Gilgenkrantz H, Perret C, et al. Combined hepatocellular-cholangiocarcinomas exhibit progenitor features and activation of wnt and TGFB signaling pathways. Carcinogenesis. 2012;33(9):1791–6.
Woo H, Lee J, Kim C, Lee H, Jang J, Yi N, Suh K, Lee K, Park E, Thorgeirsson S, et al. Identification of a cholangiocarcinoma-like gene expression trait in hepatocellular carcinoma. Cancer Res. 2010;70(8):3034–41.
Walesky C, Apte U. Role of hepatocyte nuclear factor 4 alpha (HNF4A) in cell proliferation and cancer. Gene Expr. 2015;16(3):101–8.
Walesky C, Edwards G, Borude P, Gunewardena S, O'Neil M, Yoo B, Apte U. Hepatocyte nuclear factor 4 alpha deletion promotes diethylnitrosamine-induced hepatocellular carcinoma in mice. Hepatology. 2013;57(6):2480–90.
Pelletier L, Rebouissou S, Paris A, Rathahao-Paris E, Perdu E, Bioulac-Sage P, Imbeaud S, Zucman-Rossi J. Loss of hepatocyte nuclear factor 1alpha function in human hepatocellular adenomas leads to aberrant activation of signaling pathways involved in tumorigenesis. Hepatology. 2010;51(2):557–66.
Yang F, Huang X, Yi T, Yen Y, Moore D, Huang W. Spontaneous development of liver tumors in the absence of the bile acid receptor Farnesoid X Receptor. Cancer Res. 2007;67:863–7.
Wolf A, Thomas A, Edwards G, Jaseja R, Guo GL, Apte U. Increased activation of the Wnt/beta-catenin pathway in spontaneous hepatocellular carcinoma observed in farnesoid X receptor knockout mice. J Pharmacol Exp Ther. 2011;338:12–21.
Keitel V, Reinehr R, Reich M, Sommerfeld A, Cupisti K, Knoefel W. The membrane-bound bile acid receptor TGR5 (GPBAR-1) is highly expressed in intrahepatic cholangiocarcinoma. Hepatology. 2011;54:869.
Halilbasic E, Claudel T, Trauner M. Bile acid transporters and regulatory nuclear receptors in the liver and beyond. J Hepatol. 2013;58:155–68.
Lautem A, Heise M, Grasel A, Hoppe-Lotichius M, Weiler N, Foltys D, Knapstien J, Schattenberg J, Schad A, Zimmermann A, et al. Downregulation of organic cation transporter 1 (SLC22A1) is associated with tumor progression. Int J Oncol. 2013;42:1297–304.
Demidenko R, Razanauskas D, Daniunaite K, Lazutka J, Jankevicius F, Jarmalaite S. Frequent down-regulation of ABC transporter genes in prostate cancer. BMC Cancer. 2015;15:683.
Chen Y, Song X, Valanejad L, Vasilenko A, More V, Qiu X, Chen W, Lai Y, Slitt A, Stoner M, et al. Bile salt export pump is dysregulated with altered farnesoid X receptor isoform expression in patients with hepatocellular carcinoma. Hepatology. 2013;57(4):1530–41.
Schaeffeler E, Hellerbrand C, Nies A, Winter S, Kruck S, Hofmann U, van der Kuip H, Zanger U, Koepsell H, Schwab M. DNA methylation is associated with down-regulation of the organic cation transporter OCT1 (SLC22A1) in human hepatocellular carcinoma. Genome Med. 2011;3:82.
Gupta N, Miyauchi S, Martindale R, Herdman A, Podolsky R, Miyake K, Mager K, Mager S, Prasad P, Ganapathy M, et al. Up-regulation of the amino acid transporter ATB),+(SLC6A14) in colorectal cancer and metastasis in humans. Biochim Biophys Acta. 2005;1741(1–2):215–23.
Bhutia Y, Babu E, Prasad P, Ganapahty V. The amino acid transporter SLC6A14 in cancer and its potential use in chemotherapy. Asian J Pharm Sci. 2014;9:293–303.
Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout R, Granton P, Zegers C, Gilles R, Boellard R, Dekker A, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer. 2012;48(4):441–6.
Kumar V, Gu Y, Basu S, Berglund A, Eschrich S, Schabath M, Forster K, Aerts H, Dekker A, Fenstermacher D, et al. Radiomics: the process and the challenges. Magn Reson Imaging. 2012;30(9):1234–48.
Wasserman P. Advanced Methods in Neural Computing. New Yourk: Van Nostrand Reinhold; 1993.
Donoho D. De-noising by soft-thresholding. IEEE Trans Inf Theory. 1995;41(3):613–27.
Donoho D, Johnstone I. Ideal spatial adaptation by wavelet shrinkage. Biometrika. 1994;81:425–55.
Aerts H, Velazquez E, Leijenaar R, Parmar C, Grossmann P, Cavslho S, Bussink J, Monshouwer R, Haibe-Kains B, Rietveld D, et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat Commun. 2014;5:4006.
Okimoto GS. Data and code in support of the JAMMIT paper in BioData Mining. Retrieved from osf.io/2s3zd. 2016.