Comparison of RNA-seq and microarray-based models for clinical endpoint prediction

Genome Biology - Tập 16 - Trang 1-12 - 2015
Davide Albanese1, Benedikt Brors2, Zhiyu Peng3,4, Yuri Nikolsky5, Marco Chierici6, Linda H. Malkas7, Chen Zhao8, Zirui Dong3, Wenwei Zhang3, Simon M Lin9, Wenjun Bao10, Cesare Furlanello6, Jian Wang11, Gian Paolo Tonini12, Ke K. Zhang13, Matthias Fischer14,15, Zhenqiang Su16, Charles Wang17, Frederik Roels14,15, Min Max He9, Shahab Asgharzadeh18, Smadar Avigad19, Leming Shi16,8, Marina Bessarabova20, Barbara Hero14, Joshua Xu16, Jo Vandesompele21, Tao Qing8, Li Li10, Yong Yang11, Jean Thierry-Mieg22, Jiekun Xuan16, Wenzhong Xiao23, Wenqian Zhang3, Meiwen Jia8, Murray H. Brilliant24, Tzu-Ming Chu10, Richard G. Grundy25, Jie Cheng26, Susan Shao10, Jessica Theissen14, Huixiao Hong16, Samir Lababidi27, Howard L. Kaufman28, Yan Li28, Falk Hertwig14,15, Zhan Ye9, Ying Yu8, Weihong Xu29, May D. Wang30, Xiwen Ma31, Weida Tong16, Xin X. Lu32, Baitang Ning16, Scott Hebbring24, Jie Shen16, Ruth Volland14, Tieliu Shi33, André Oberthuer14, Lee J. Lancashire20, Carolina Rosswog14, Youping Deng28, Russell D. Wolfinger10, Frank Berthold14,15, Rosa Noguera34, Junmei Ai28, Jibin Zhang3, Po-Yen Wu35, Ye Yin3, Danielle Thierry-Mieg22, Jun Wang36,37,38,3, Yuanting Zheng8, Heng Luo16,39, Viswanath Devanarayan40, John H. Phan30, Martin Peifer41,15
1Fondazione Edmund Mach, CRI-CBC, San Michele all’Adige, Italy
2Department of Theoretical Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany
3BGI-Shenzhen, Main Building, Bei Shan Industrial Zone, Shenzhen, China
4BGI-Guangzhou, Guangzhou Higher Education Mega Center, Guangzhou, China
5Thomson Reuters, IP & Science, Carlsbad, USA
6Fondazione Bruno Kessler (FBK), Trento Povo, Italy
7Department of Molecular & Cellular Biology, Beckman Research Institute, City of Hope Comprehensive Cancer Center, Duarte, USA
8Collaborative Innovation Center for Genetics and Development, State Key Laboratory of Genetic Engineering and MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences and School of Pharmacy, Fudan University, Shanghai, China
9Marshfield Clinic Research Foundation, Biomedical Informatics Research Center, Marshfield, USA
10SAS Institute Inc., Cary, USA
11Eli Lilly and Company Research Informatics, Lilly Corporate Center, Indianapolis, USA
12Neuroblastoma Laboratory, Onco/Hematology Laboratory, SDB Department, University of Padua, Pediatric Research Institute, Padua, Italy
13Department of Pathology, University of North Dakota School of Medicine, Grand Forks, USA
14Department of Pediatric Oncology and Hematology, University Children’s Hospital of Cologne, Cologne, Germany
15University of Cologne, Center for Molecular Medicine (CMMC), Medical Faculty, Cologne, Germany
16National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, USA
17Center for Genomics and Division of Microbiology & Molecular Genetics, School of Medicine, Loma Linda University, Loma Linda, USA
18Children’s Hospital Los Angeles, Los Angeles, USA
19Department of Pediatric Hematology-Oncology, Molecular Oncology, Felsenstein Medical Research Center, Schneider Children’s Medical Center of Israel, Petach Tikva, Israel
20Thomson Reuters IP & Science, Carlsbad, USA
21Department of Pediatrics and Genetics, Ghent University, Center for Medical Genetics, Ghent University, Ghent, Belgium
22NIH/NCBI, Bethesda, USA
23Harvard Medical School, Massachusetts General Hospital, Boston, USA
24Marshfield Clinic Research Foundation, Center of Human Genetics, Marshfield, USA
25University of Nottingham, Children’s Brain Tumour Research Centre, Queen’s Medical Centre, University of Nottingham, Nottingham, UK
26GlaxoSmithKline, Discovery Analytics, Collegeville, USA
27Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, WOC1 RM400S, HFM-210, Rockville, USA
28Department of Internal Medicine, Rush University Cancer Center, Chicago, USA
29Stanford University, Stanford Genome Technology Center, Palo Alto, USA
30Department of Biomedical Engineering, GeorgiaTech and Emory University, Atlanta, USA
31Eli Lilly and Company, Discovery Statistics, Lilly Corporate Center, Indianapolis, USA
32AbbVie Inc., Global Pharmaceutical Research and Development, North Chicago, USA
33East China Normal University, Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, Shanghai, China
34Department of Pathology, University of Valencia Medical School, Valencia, Spain
35Georgia Institute of Technology, School of Electrical and Computer Engineering, Atlanta, USA
36Department of Biology, University of Copenhagen, Copenhagen, Denmark
37Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
38King Abdulaziz University, Jeddah, Saudi Arabia
39University of Arkansas at Little Rock, UALR/UAMS Joint Bioinformatics Graduate Program, Little Rock, USA
40AbbVie Inc., Global Pharmaceutical R&D, Souderton, USA
41Department of Translational Genomics, University of Cologne, Cologne, Germany

Tóm tắt

Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model. We generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models. We demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice.

Tài liệu tham khảo

Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, Misek DE, et al. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med. 2002;8:816–24. Glas AM, Kersten MJ, Delahaye LJ, Witteveen AT, Kibbelaar RE, Velds A, et al. Gene expression profiling in follicular lymphoma to assess clinical aggressiveness and to guide the choice of treatment. Blood. 2005;105:301–7. Glinsky GV, Glinskii AB, Stephenson AJ, Hoffman RM, Gerald WL. Gene expression profiling predicts clinical outcome of prostate cancer. J Clin Invest. 2004;113:913–23. Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, et al. Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature. 2002;415:436–42. Van’t Veer LJ, Dai H, Van de Vijver MJ, He YD, Hart AA, Mao M, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415:530–6. Reis-Filho JS, Pusztai L. Gene expression profiling in breast cancer: classification, prognostication, and prediction. Lancet. 2011;378:1812–23. Shi L, Campbell G, Jones WD, Campagne F, Wen Z, Walker SJ, et al. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat Biotechnol. 2010;28:827–38. Su Z, Łabaj PP, Li S, Thierry-Mieg J, Thierry-Mieg D, Shi W, et al. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat Biotech. 2014;32:903–14. Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, et al. Landscape of transcription in human cells. Nature. 2012;489:101–8. Ozsolak F, Milos PM. RNA sequencing: advances, challenges and opportunities. Nat Rev Genet. 2011;12:87–98. Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M, et al. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science. 2008;321:956–60. Ferreira PG, Jares P, Rico D, Gomez-Lopez G, Martinez-Trillos A, Villamor N, et al. Transcriptome characterization by RNA sequencing identifies a major molecular and clinical subdivision in chronic lymphocytic leukemia. Genome Res. 2014;24:212–26. Cancer Genome Atlas Research Network. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature. 2013;499:43–9. Volinia S, Croce CM. Prognostic microRNA/mRNA signature from the integrated analysis of patients with invasive breast cancer. Proc Natl Acad Sci U S A. 2013;110:7413–7. Maris JM, Hogarty MD, Bagatell R, Cohn SL. Neuroblastoma. Lancet. 2007;369:2106–20. Oberthuer A, Berthold F, Warnat P, Hero B, Kahlert Y, Spitz R, et al. Customized oligonucleotide microarray gene expression-based classification of neuroblastoma patients outperforms current clinical risk stratification. J Clin Oncol. 2006;24:5070–8. Ohira M, Oba S, Nakamura Y, Isogai E, Kaneko S, Nakagawa A, et al. Expression profiling using a tumor-specific cDNA microarray predicts the prognosis of intermediate risk neuroblastomas. Cancer Cell. 2005;7:337–50. Oberthuer A, Hero B, Berthold F, Juraeva D, Faldum A, Kahlert Y, et al. Prognostic impact of gene expression-based classification for neuroblastoma. J Clin Oncol. 2010;28:3506–15. Asgharzadeh S, Pique-Regi R, Sposto R, Wang H, Yang Y, Shimada H, et al. Prognostic significance of gene expression profiles of metastatic neuroblastomas lacking MYCN gene amplification. J Natl Cancer Inst. 2006;98:1193–203. Vermeulen J, De Preter K, Naranjo A, Vercruysse L, Van Roy N, Hellemans J, et al. Predicting outcomes for children with neuroblastoma using a multigene-expression signature: a retrospective SIOPEN/COG/GPOH study. Lancet Oncol. 2009;10:663–71. Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014;42:D756–63. Thierry-Mieg D, Thierry-Mieg J. AceView: a comprehensive cDNA-supported gene and transcripts annotation. Genome Biol. 2006;7:1–14. Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012;22:1760–74. Guo L, Lobenhofer EK, Wang C, Shippy R, Harris SC, Zhang L, et al. Rat toxicogenomic study reveals analytical consistency across microarray platforms. Nat Biotechnol. 2006;24:1162–9. Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, et al. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol. 2006;24:1151–61. Lenos K, Grawenda AM, Lodder K, Kuijjer ML, Teunisse AF, Repapi E, et al. Alternate splicing of the p53 inhibitor HDMX offers a superior prognostic biomarker than p53 mutation in human cancer. Cancer Res. 2012;72:4074–84. Nishi T, Lee PS, Oka K, Levin VA, Tanase S, Morino Y, et al. Differential expression of two types of the neurofibromatosis type 1 (NF1) gene transcripts related to neuronal differentiation. Oncogene. 1991;6:1555–9. Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, et al. A census of human cancer genes. Nat Rev Cancer. 2004;4:177–83. Littell RC, Milliken GA, Stroup WW, Wolfinger RD, Schabenberger O. SAS system for mixed models. 2nd ed. Cary, NC: SAS Institute Inc.; 2006. Prensner JR, Iyer MK, Balbin OA, Dhanasekaran SM, Cao Q, Brenner JC, et al. Transcriptome sequencing across a prostate cancer cohort identifies PCAT-1, an unannotated lincRNA implicated in disease progression. Nat Biotechnol. 2011;29:742–9. Simon R. Roadmap for developing and validating therapeutically relevant genomic classifiers. J Clin Oncol. 2005;23:7332–41. Garcia I, Mayol G, Rios J, Domenech G, Cheung NK, Oberthuer A, et al. A three-gene expression signature model for risk stratification of patients with neuroblastoma. Clin Cancer Res. 2012;18:2012–23. Stricker TP. Morales La Madrid A, Chlenski A, Guerrero L, Salwen HR, Gosiengfiao Y, et al. Validation of a prognostic multi-gene signature in high-risk neuroblastoma using the high throughput digital NanoString nCounter system. Mol Oncol. 2014;8:669–78. Brodeur GM, Pritchard J, Berthold F, Carlsen NL, Castel V, Castelberry RP, et al. Revisions of the international criteria for neuroblastoma diagnosis, staging, and response to treatment. J Clin Oncol. 1993;11:1466–77. Oberthuer A, Juraeva D, Li L, Kahlert Y, Westermann F, Eils R, et al. Comparison of performance of one-color and two-color gene-expression analyses in predicting clinical endpoints of neuroblastoma patients. Pharmacogenomics J. 2010;10:258–66. Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25:1105–11. Trapnell C, Williams BA, Pertea G, Mortazavi AM, Kwan G, van Baren MJ, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotech. 2010;28:511–5. Pervouchine DD, Knowles DG, Guigo R. Intron-centric estimation of alternative splicing from RNA-seq data. Bioinformatics. 2013;29:273–4. Therneau T. A Package for Survival Analysis in S. R package version 2.36-12. 2012. Available at: http://CRAN.R-project.org/package=survival.