Statistical genomics in rare cancer

Seminars in Cancer Biology - Tập 61 - Trang 1-10 - 2020
Farnoosh Abbas-Aghababazadeh1, Qianxing Mo1, Brooke L. Fridley1
1Department of Biostatistics & Bioinformatics, Moffitt Cancer Center, Tampa, FL, 33612, USA

Tài liệu tham khảo

Keat, 2013, International rare cancers initiative, Lancet Oncol., 14, 109, 10.1016/S1470-2045(12)70570-3 DeSantis, 2017, The burden of rare cancers in the United States, CA Cancer J. Clin., 67, 261, 10.3322/caac.21400 Gatta, 2011, Rare cancers are not so rare: the rare cancer burden in Europe, Eur. J. Cancer, 47, 2493, 10.1016/j.ejca.2011.08.008 Edgar, 2002, Gene Expression Omnibus: NCBI gene expression and hybridization array data repository, Nucleic Acids Res., 30, 207, 10.1093/nar/30.1.207 Barrett, 2013, NCBI GEO: archive for functional genomics data sets--update, Nucleic Acids Res., 41, D991, 10.1093/nar/gks1193 Cancer Genome Atlas Research, 2013, The cancer genome atlas pan-cancer analysis project, Nat. Genet., 45, 1113, 10.1038/ng.2764 Zheng, 2016, Comprehensive pan-genomic characterization of adrenocortical carcinoma, Cancer Cell, 29, 723, 10.1016/j.ccell.2016.04.002 Farshidfar, 2017, Integrative genomic analysis of cholangiocarcinoma identifies distinct IDH-Mutant molecular profiles, Cell Rep., 18, 2780, 10.1016/j.celrep.2017.02.033 Hoadley, 2018, Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of Cancer, Cell, 173, 10.1016/j.cell.2018.03.022 Hmeljak, 2018, Integrative molecular characterization of malignant pleural mesothelioma, Cancer Discov., 8, 1548, 10.1158/2159-8290.CD-18-0804 Fishbein, 2017, Comprehensive molecular characterization of pheochromocytoma and paraganglioma, Cancer Cell, 31, 181, 10.1016/j.ccell.2017.01.001 Cancer Genome Atlas Research Network, 2017, Comprehensive and integrated genomic characterization of adult soft tissue sarcomas, Cell, 171, e928 Shen, 2018, Integrated molecular characterization of testicular germ cell tumors, Cell Rep., 23, 3392, 10.1016/j.celrep.2018.05.039 Cherniack, 2017, Integrated molecular characterization of uterine carcinosarcoma, Cancer Cell, 31, 411, 10.1016/j.ccell.2017.02.010 Robertson, 2017, Integrative analysis identifies four molecular and clinical subsets in uveal melanoma, Cancer Cell, 32, e215 Liu, 2017, The genomic landscape of pediatric and young adult T-lineage acute lymphoblastic leukemia, Nat. Genet., 49, 1211, 10.1038/ng.3909 Bolouri, 2018, The molecular landscape of pediatric acute myeloid leukemia reveals recurrent structural alterations and age-specific mutational interactions, Nat. Med., 24, 103, 10.1038/nm.4439 Pugh, 2013, The genetic landscape of high-risk neuroblastoma, Nat. Genet., 45, 279, 10.1038/ng.2529 Armstrong, 2018, A unique subset of low-risk Wilms tumors is characterized by loss of function of TRIM28 (KAP1), a gene critical in early renal development: a children’s oncology group study, PLoS One, 13, 10.1371/journal.pone.0208936 Blay, 2016, The value of research collaborations and consortia in rare cancers, Lancet Oncol., 17, e62, 10.1016/S1470-2045(15)00388-5 Ovarian Cancer Association Consortium, 2015, No clinical utility of KRAS variant rs61764370 for ovarian or breast cancer, Gynecol. Oncol. Phelan, 2017, Identification of 12 new susceptibility loci for different histotypes of epithelial ovarian cancer, Nat. Genet., 49, 680, 10.1038/ng.3826 Easton, 2007, Genome-wide association study identifies novel breast cancer susceptibility loci, Nature, 447, 1087, 10.1038/nature05887 Zhang, 2011, International cancer genome consortium data portal--a one-stop shop for cancer genomics data, Database, 2011, 10.1093/database/bar026 Zhang, 2019, The international cancer genome consortium data portal, Nat. Biotechnol., 37, 367, 10.1038/s41587-019-0055-9 Varley, 1997, Germ-line mutations of TP53 in Li-Fraumeni families: an extended study of 39 families, Cancer Res., 57, 3245 Eng, 1997, Third international workshop on collaborative interdisciplinary studies of p53 and other predisposing genes in Li-Fraumeni syndrome, Cancer Epidemiol. Biomarkers Prev., 6, 379 Johnson, 2007, Adjusting batch effects in microarray expression data using empirical Bayes methods, Biostatistics, 8, 118, 10.1093/biostatistics/kxj037 Abbas-Aghababazadeh, 2018, Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing, PLoS One, 13, 10.1371/journal.pone.0206312 Price, 2010, New approaches to population stratification in genome-wide association studies, Nature reviews, 11, 459, 10.1038/nrg2813 Deb, 2014, Mutational profiling of familial male breast cancers reveals similarities with luminal A female breast cancer with rare TP53 mutations, Br. J. Cancer, 111, 2351, 10.1038/bjc.2014.511 Weiss, 2005, Epidemiology of male breast cancer, Cancer Epidemiol. Biomarkers Prev., 14, 20, 10.1158/1055-9965.20.14.1 Korde, 2010, Multidisciplinary meeting on male breast cancer: summary and research recommendations, J. Clin. Oncol., 28, 2114, 10.1200/JCO.2009.25.5729 Harlan, 2010, Breast cancer in men in the United States: a population-based study of diagnosis, treatment, and survival, Cancer, 116, 3558, 10.1002/cncr.25153 Giordano, 2005, A review of the diagnosis and management of male breast cancer, Oncologist, 10, 471, 10.1634/theoncologist.10-7-471 Chang, 2013, Meta-analysis methods for combining multiple expression profiles: comparisons, statistical characterization and an application guideline, BMC Bioinformatics, 14, 368, 10.1186/1471-2105-14-368 Wang, 2013, Comparing methods for performing trans-ethnic meta-analysis of genome-wide association studies, Hum. Mol. Genet., 22, 2303, 10.1093/hmg/ddt064 Ramasamy, 2008, Key issues in conducting a meta-analysis of gene expression microarray datasets, PLoS Med., 5, e184, 10.1371/journal.pmed.0050184 Thompson, 2011, The meta-analysis of genome-wide association studies, Brief Bioinform, 12, 259, 10.1093/bib/bbr020 Mo, 2018, Prognostic power of a tumor differentiation gene signature for bladder urothelial carcinomas, J. Natl. Cancer Inst., 110, 448, 10.1093/jnci/djx243 Richardson, 2016, Statistical methods in integrative genomics, Annu. Rev. Stat. Appl., 3, 181, 10.1146/annurev-statistics-041715-033506 Tseng, 2012, Comprehensive literature review and statistical considerations for microarray meta-analysis, Nucleic Acids Res., 40, 3785, 10.1093/nar/gkr1265 M. Borenstein, L.V. Hedges, J. Higgins, Rothstein, Introduction to Meta-Analysis, (Chichester, UK), (2009). Rhodes, 2002, Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer, Cancer Res., 62, 4427 Fisher, 1932 Stouffer, 1949, The American soldier, Vol 1 van Zwet, 1967, On the combination of independent test statistics, Ann. Math. Stat., 38, 659, 10.1214/aoms/1177698861 Won, 2009, Choosing an optimal method to combine P-values, Stat. Med., 28, 1537, 10.1002/sim.3569 Tippett, 1931 Li, 2011, An adaptively weighted statistic for detecting differential gene expression when combining multiple transcriptomic studies, Ann. Appl. Stat., 5, 994, 10.1214/10-AOAS393 Barton, 2013, Correction of unexpected distributions of P values from analysis of whole genome arrays by rectifying violation of statistical assumptions, BMC Genomics, 14, 161, 10.1186/1471-2164-14-161 Fodor, 2007, Towards the uniform distribution of null P values on Affymetrix microarrays, Genome Biol., 8, R69, 10.1186/gb-2007-8-5-r69 Borenstein, 2010, A basic introduction to fixed-effect and random-effects models for meta-analysis, Res. Synth. Methods, 1, 97, 10.1002/jrsm.12 Brockwell, 2001, A comparison of statistical methods for meta-analysis, Stat. Med., 20, 825, 10.1002/sim.650 Goldstein, 2011 Viechtbauer, 2005, Bias and efficiency of meta-analytic variance estimators in the random-effects model, J. Educ. Behav. Stat., 30, 261, 10.3102/10769986030003261 Cochran, 1954, The combination of estimates from different experiments, Biometrics, 10, 101, 10.2307/3001666 Paul, 1992, Small sample performance of tests of homogeneity of odds ratios in K 2 x 2 tables, Stat. Med., 11, 159, 10.1002/sim.4780110203 Hardy, 1998, Detecting and describing heterogeneity in meta-analysis, Stat. Med., 17, 841, 10.1002/(SICI)1097-0258(19980430)17:8<841::AID-SIM781>3.0.CO;2-D Higgins, 2003, Measuring inconsistency in meta-analyses, Bmj, 327, 557, 10.1136/bmj.327.7414.557 Higgins, 2002, Quantifying heterogeneity in a meta-analysis, Stat. Med., 21, 1539, 10.1002/sim.1186 Lin, 2009, Integration of ranked lists via cross entropy Monte Carlo with applications to mRNA and microRNA Studies, Biometrics, 65, 9, 10.1111/j.1541-0420.2008.01044.x Deng, 2014, Bayesian aggregation of order-based rank data, J. Am. Stat. Assoc., 109, 1023, 10.1080/01621459.2013.878660 Hong, 2006, RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis, Bioinformatics, 22, 2825, 10.1093/bioinformatics/btl476 Dreyfuss, 2009, Meta-analysis of glioblastoma multiforme versus anaplastic astrocytoma identifies robust gene markers, Mol. Cancer, 8, 71, 10.1186/1476-4598-8-71 Zintzaras, 2008, Meta-analysis for ranked discovery datasets: theoretical framework and empirical demonstration for microarrays, Comput. Biol. Chem., 32, 38, 10.1016/j.compbiolchem.2007.09.003 DeConde, 2006 Hong, 2008, A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments, Bioinformatics, 24, 374, 10.1093/bioinformatics/btm620 Li, 2019, A comparative study of rank aggregation methods for partial and top ranked lists in genomic applications, Brief. Bioinformatics, 20, 178, 10.1093/bib/bbx101 Balding, 2007 Liang, 2000, Statistical designs for familial aggregation, Stat. Methods Med. Res., 9, 543, 10.1177/096228020000900603 Jarvik, 1998, Complex segregation analyses: uses and limitations, Am. J. Hum. Genet., 63, 942, 10.1086/302075 Genetic Approaches to Familial Aggregation. II. Segregation Analysis. In Fundamentals of Genetic Epidemiology. pp 233-283. Elston, 1998, Methods of linkage analysis--and the assumptions underlying them [see comment], Am. J. Hum. Genet., 63, 931, 10.1086/302073 MD, 2005, Genetic genetic linkage, Lancet, 366, 1036, 10.1016/S0140-6736(05)67382-5 Kruglyak, 1996, Parametric and nonparametric linkage analysis: a unified multipoint approach, Am. J. Hum. Genet., 58, 1347 Malkin, 2011, Li-fraumeni syndrome, Genes Cancer, 2, 475, 10.1177/1947601911413466 Varley, 1997, Li-Fraumeni syndrome--a molecular and clinical review, Br. J. Cancer, 76, 1, 10.1038/bjc.1997.328 Balding, 2006, A tutorial on statistical methods for population association studies, Nature reviews, 7, 781, 10.1038/nrg1916 Chung, 2010, Genome-wide association studies in cancer--current and future directions, Carcinogenesis, 31, 111, 10.1093/carcin/bgp273 Capasso, 2009, Common variations in BARD1 influence susceptibility to high-risk neuroblastoma, Nat. Genet., 41, 718, 10.1038/ng.374 Maris, 2008, Chromosome 6p22 locus associated with clinically aggressive neuroblastoma, N. Engl. J. Med., 358, 2585, 10.1056/NEJMoa0708698 Kanehisa, 2017, KEGG: new perspectives on genomes, pathways, diseases and drugs, Nucleic Acids Res., 45, D353, 10.1093/nar/gkw1092 Kanehisa, 2000, KEGG: kyoto encyclopedia of genes and genomes, Nucleic Acids Res., 28, 27, 10.1093/nar/28.1.27 Ashburner, 2000, Gene ontology: tool for the unification of biology. The Gene Ontology Consortium, Nat Genet, 25, 25, 10.1038/75556 Lachmann, 2010, ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments, Bioinformatics, 26, 2438, 10.1093/bioinformatics/btq466 Mezzapelle, 2013, Mutation analysis of the EGFR gene and downstream signalling pathway in histologic samples of malignant pleural mesothelioma, Br. J. Cancer, 108, 1743, 10.1038/bjc.2013.130 Goeman, 2007, Analyzing gene expression data in terms of gene sets: methodological issues, Bioinformatics, 23, 980, 10.1093/bioinformatics/btm051 Fridley, 2011, Gene set analysis of SNP data: benefits, challenges, and future directions, Eur. J. Hum. Genet., 19, 837, 10.1038/ejhg.2011.57 Subramanian, 2005, Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles, Proc. Natl. Acad. Sci. U. S. A., 102, 15545, 10.1073/pnas.0506580102 Chen, 2013, Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool, BMC Bioinformatics, 14, 128, 10.1186/1471-2105-14-128 Kuleshov, 2016, Enrichr: a comprehensive gene set enrichment analysis web server 2016 update, Nucleic Acids Res., 44, W90, 10.1093/nar/gkw377 Huang da, 2009, Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources, Nat. Protoc., 4, 44, 10.1038/nprot.2008.211 Dennis, 2003, DAVID: database for annotation, visualization, and integrated discovery, Genome Biol., 4, 10.1186/gb-2003-4-9-r60 Ferreira, 2008, Array CGH and gene-expression profiling reveals distinct genomic instability patterns associated with DNA repair and cell-cycle checkpoint pathways in Ewing’s sarcoma, Oncogene, 27, 2084, 10.1038/sj.onc.1210845 Kikuta, 2009, Nucleophosmin as a candidate prognostic biomarker of Ewing’s sarcoma revealed by proteomics, Clin. Cancer Res., 15, 2885, 10.1158/1078-0432.CCR-08-1913 Goeman, 2004, A global test for groups of genes: testing association with a clinical outcome, Bioinformatics, 20, 93, 10.1093/bioinformatics/btg382 Biernacka, 2012, Use of the gamma method for self-contained gene-set analysis of SNP data, Eur. J. Hum. Genet., 20, 565, 10.1038/ejhg.2011.236 Fridley, 2013, Soft truncation thresholding for gene set analysis of RNA-seq data: application to a vaccine study, Sci. Rep., 3, 2898, 10.1038/srep02898 de Rooij, 2017, Pediatric non-Down syndrome acute megakaryoblastic leukemia is characterized by distinct genomic subsets with varying outcomes, Nat. Genet., 49, 451, 10.1038/ng.3772 Saelens, 2018, A comprehensive evaluation of module detection methods for gene expression data, Nat. Commun., 9, 1090, 10.1038/s41467-018-03424-4 Werhli, 2006, Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and bayesian networks, Bioinformatics, 22, 2523, 10.1093/bioinformatics/btl391 Grzegorczyk, 2007, Extracting protein regulatory networks with graphical models, Proteomics, 1, 51, 10.1002/pmic.200700466 Butte, 2000, Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks, Proc. Natl. Acad. Sci. U. S. A., 97, 12182, 10.1073/pnas.220392197 Langfelder, 2008, WGCNA: an R package for weighted correlation network analysis, BMC Bioinformatics, 9, 559, 10.1186/1471-2105-9-559 Zhang, 2005, A general framework for weighted gene co-expression network analysis, Stat. Appl. Genet. Mol. Biol., 4, 10.2202/1544-6115.1128 Yip, 2007, Gene network interconnectedness and the generalized topological overlap measure, BMC Bioinformatics, 8, 22, 10.1186/1471-2105-8-22 Wang, 2019, Weighted gene coexpression network analysis for identifying hub genes in association with prognosis in Wilms tumor, Mol. Med. Rep., 19, 2041 Yuan, 2018, Co-expression network analysis of biomarkers for adrenocortical carcinoma, Front. Genet., 9, 328, 10.3389/fgene.2018.00328 Zhang, 2019, Co-expression network analysis identified gene signatures in Osteosarcoma as a predictive tool for lung metastasis and survival, J. Cancer, 10, 3706, 10.7150/jca.32092 Schafer, 2005, An empirical Bayes approach to inferring large-scale gene association networks, Bioinformatics, 21, 754, 10.1093/bioinformatics/bti062 Zhao, 2019, Cancer genetic network inference using gaussian graphical models, Bioinform. Biol. Insights, 13, 10.1177/1177932219839402 Friedman, 2000, Using bayesian networks to analyze expression data, J. Comput. Biol., 7, 601, 10.1089/106652700750050961 Ni, 2018, Bayesian graphical models for computational network biology, BMC Bioinformatics, 19, 63, 10.1186/s12859-018-2063-z Bulashevska, 2010, Bayesian statistical modelling of human protein interaction network incorporating protein disorder information, BMC Bioinformatics, 11, 46, 10.1186/1471-2105-11-46 Hill, 2012, Bayesian inference of signaling network topology in a cancer cell line, Bioinformatics, 28, 2804, 10.1093/bioinformatics/bts514 Kramer, 2009, Regularized estimation of large-scale gene association networks using graphical Gaussian models, BMC Bioinformatics, 10, 384, 10.1186/1471-2105-10-384 Yin, 2011, A sparse conditional gaussian graphical model for analysis of genetical genomics data, Ann. Appl. Stat., 5, 2630, 10.1214/11-AOAS494 Chun, 2015, Gene regulation network inference with joint sparse Gaussian graphical models, J. Comput. Graph. Stat., 24, 954, 10.1080/10618600.2014.956876 Blum, 2016, Sparse factor model for co-expression networks with an application using prior biological knowledge, Stat. Appl. Genet. Mol. Biol., 15, 253, 10.1515/sagmb-2015-0002 Serra, 2018, Robust and sparse correlation matrix estimation for the analysis of high-dimensional genomics data, Bioinformatics, 34, 625, 10.1093/bioinformatics/btx642 Schafer, 2005, A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics, Stat. Appl. Genet. Mol. Biol., 4, 10.2202/1544-6115.1175 Kristensen, 2014, Principles and methods of integrative genomic analyses in cancer, Nat. Rev. Cancer, 14, 299, 10.1038/nrc3721 Wu, 2019, A selective review of multi-level omics data integration using variable selection, High Throughput, 8 Jiang, 2016, Integrated analysis of multidimensional omics data on cutaneous melanoma prognosis, Genomics, 107, 223, 10.1016/j.ygeno.2016.04.005 Zhao, 2015, Combining multidimensional genomic measurements for predicting cancer prognosis: observations from TCGA, Brief Bioinform, 16, 291, 10.1093/bib/bbu003 Kandoth, 2013, Mutational landscape and significance across 12 major cancer types, Nature, 502, 333, 10.1038/nature12634 Zack, 2013, Pan-cancer patterns of somatic copy number alteration, Nat. Genet., 45, 1134, 10.1038/ng.2760 Chen, 2018, A pan-cancer analysis of enhancer expression in nearly 9000 patient samples, Cell, 173, 10.1016/j.cell.2018.03.027 Sanchez-Vega, 2018, Oncogenic signaling pathways in the Cancer genome atlas, Cell, 173, e310 Rosario, 2018, Pan-cancer analysis of transcriptional metabolic dysregulation using the cancer genome atlas, Nat. Commun., 9, 5330, 10.1038/s41467-018-07232-8 Network, 2012, Comprehensive molecular portraits of human breast tumours, Nature, 490, 61, 10.1038/nature11412 Radovich, 2018, The integrated genomic landscape of thymic epithelial tumors, Cancer Cell, 33, e210 Shen, 2009, Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis, Bioinformatics, 25, 2906, 10.1093/bioinformatics/btp543 Shen, 2013, Sparse integrative clustering of multiple omics data sets, Ann. Appl. Stat., 7, 269, 10.1214/12-AOAS578 Mo, 2013, Pattern discovery and cancer gene identification in integrated cancer genomic data, Proc. Natl. Acad. Sci. U. S. A., 110, 4245, 10.1073/pnas.1208949110 Mo, 2018, A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data, Biostatistics, 19, 71, 10.1093/biostatistics/kxx017 Brunet, 2004, Metagenes and molecular pattern discovery using matrix factorization, Proc. Natl. Acad. Sci. U. S. A., 101, 4164, 10.1073/pnas.0308531101 Gao, 2005, Improving molecular cancer class discovery through sparse non-negative matrix factorization, Bioinformatics, 21, 3970, 10.1093/bioinformatics/bti653 Kim, 2007, Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis, Bioinformatics, 23, 1495, 10.1093/bioinformatics/btm134 Monti, 2003, Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data, Mach. Learn., 52, 91, 10.1023/A:1023949509487 Zhang, 2012, Discovery of multi-dimensional modules by integrative analysis of cancer genomic data, Nucleic Acids Res., 40, 9379, 10.1093/nar/gks725 Yang, 2016, A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data, Bioinformatics, 32, 1, 10.1093/bioinformatics/btv544 Chalise, 2017, Integrative clustering of multi-level’ omic data based on non-negative matrix factorization algorithm, PLoS One, 12, 10.1371/journal.pone.0176278