Granatum: a graphical single-cell RNA-Seq analysis pipeline for genomics scientists
Tóm tắt
Single-cell RNA sequencing (scRNA-Seq) is an increasingly popular platform to study heterogeneity at the single-cell level. Computational methods to process scRNA-Seq data are not very accessible to bench scientists as they require a significant amount of bioinformatic skills. We have developed Granatum, a web-based scRNA-Seq analysis pipeline to make analysis more broadly accessible to researchers. Without a single line of programming code, users can click through the pipeline, setting parameters and visualizing results via the interactive graphical interface. Granatum conveniently walks users through various steps of scRNA-Seq analysis. It has a comprehensive list of modules, including plate merging and batch-effect removal, outlier-sample removal, gene-expression normalization, imputation, gene filtering, cell clustering, differential gene expression analysis, pathway/ontology enrichment analysis, protein network interaction visualization, and pseudo-time cell series construction. Granatum enables broad adoption of scRNA-Seq technology by empowering bench scientists with an easy-to-use graphical interface for scRNA-Seq data analysis. The package is freely available for research use at
http://garmiregroup.org/granatum/app
Tài liệu tham khảo
Patel AP, Tirosh I, Trombetta JJ, Shalek AK, Gillespie SM, Wakimoto H, et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science. 2014;344:1396–401.
Lewis BP, Burge CB, Bartel DP. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005;120:15–20.
Trapnell C, Cacchiarelli D, Grimsby J, Pokharel P, Li S, Morse M, et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol. 2014;32:381–6.
Brennecke P, Anders S, Kim JK, Kołodziejczyk AA, Zhang X, Proserpio V, et al. Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods. 2013;10:1093–5.
Poirion OB, Zhu X, Ching T, Garmire L. Single-cell transcriptomics bioinformatics and computational challenges. Front. Genet. 2016;7:163.
Team RC. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. 2015. http://www.R-project.org. Accessed 15 Oct 2017.
McCarthy DJ, Campbell KR, Lun ATL, Wills QF. scater: pre-processing, quality control, normalisation and visualisation of single-cell RNA-seq data in R. bioRxiv. 2016. http://biorxiv.org/content/early/2016/08/15/069633. Accessed 15 Oct 2017.
Ihaka R, Gentleman R. R: a language for data analysis and graphics. J Comput Graph Stat. 1996;5:299–314.
RStudio, Inc. Easy web applications in R. 2013.
Attali D. shinyjs: easily improve the user experience of your shiny apps in seconds. 2016. https://cran.r-project.org/package=shinyjs.
Almende BV, Thieurmel B. visNetwork: network visualization using “vis.js” library. 2016. https://cran.r-project.org/package=visNetwork.
Xie Y. DT: a wrapper of the JavaScript library “DataTables”. 2016. https://cran.r-project.org/package=DT.
Sievert C, Parmer C, Hocking T, Chamberlain S, Ram K, Corvellec M, et al. plotly: create interactive web graphics via “plotly.js”. 2016. https://cran.r-project.org/package=plotly.
Wickham H. ggplot2: elegant graphics for data analysis. 2009. http://ggplot2.org.
Hicks SC, Teng M, Irizarry RA. On the widespread and critical impact of systematic bias and batch effects in single-cell RNA-Seq data. bioRxiv. 2015;25528.
Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8:118–27.
Kim K-T, Lee HW, Lee H-O, Kim SC, Seo YJ, Chung W, et al. Single-cell mRNA sequencing identifies subclonal heterogeneity in anti-cancer drug responses of lung adenocarcinoma cells. Genome Biol. 2015;16:127.
Kim K-T, Lee HW, Lee H-O, Song HJ, Shin S, Kim H, et al. Application of single-cell RNA sequencing in optimizing a combinatorial therapeutic strategy in metastatic renal cell carcinoma. Genome Biol. 2016;17:80.
Petropoulos S, Edsgärd D, Reinius B, Deng Q, Panula SP, Codeluppi S, et al. Single-cell RNA-Seq reveals lineage and X chromosome dynamics in human preimplantation embryos. Cell. 2016:165:1012–26.
Leek JT, Storey JD. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 2007;3:e161.
Iglewicz B, Hoaglin DC. How to detect and handle outliers. Milwaukee: Asq Press; 1993.
Zhu X, Ching T, Pan X, Weissman S, Garmire L. Detecting heterogeneity in single-cell RNA-Seq data by non-negative matrix factorization. PeerJ Prepr. 2016;4:e1839v1.
Chang, Winston, et al. Shiny: Web Application Framework for R, 2015. R package version 0.11 (2015). https://cran.r-project.org/package=shiny.
Gaujoux R, Seoighe C. A flexible R package for nonnegative matrix factorization BMC Bioinformatics. 2010;11:367.
Lloyd S. Least squares quantization in PCM. IEEE Trans Inf Theory IEEE. 1982;28:129–37.
Murtagh F, Contreras P. Methods of hierarchical clustering. arXiv prepr. arXiv1105.0121. 2011. https://arxiv.org/abs/1105.0121.
Krijthe J. Rtsne: t-distributed stochastic neighbor embedding using Barnes-Hut implementation. R Package version 0.10. 2015. http://CRAN.R-project.org/package=Rtsne. Accessed 15 Oct 2017.
Pearson K. LIII. On lines and planes of closest fit to systems of points in space. Lond Edinburgh Dublin Philos Mag J Sci. 1901;2:559–72.
Ji Z, Zhou W, Ji H. Single-cell regulome data analysis by SCRAT. Bioinformatics. 2017;33:2930–32.
Sengupta D, Rayan NA, Lim M, Lim B, Prabhakar S. Fast, scalable and accurate differential expression analysis for single cells. bioRxiv. 2016;49734.
Kharchenko PV, Silberstein L, Scadden DT. Bayesian approach to single-cell differential expression analysis. Nat Methods. 2014;11:740–2.
Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–40.
Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47.
Fan X, Zhang X, Wu X, Guo H, Hu Y, Tang F, et al. Single-cell RNA-seq transcriptome analysis of linear and circular RNAs in mouse preimplantation embryos. Genome Biol. 2015;16:148.
Tasic B, Menon V, Nguyen TN, Kim TK, Jarsky T, Yao Z, et al. Adult mouse cortical cell taxonomy by single cell transcriptomics. Nat Neurosci. 2016;19:335.
Sergushichev A. An algorithm for fast preranked gene set enrichment analysis using cumulative statistic calculation. bioRxiv. 2016. http://biorxiv.org/content/early/2016/06/20/060012. Accessed 15 Oct 2017.
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102:15545–50.
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B. 1995;57:289–300.
Gardeux V, David F, Shajkofci A, Schwalie PC, Deplancke B. ASAP: a web-based platform for the analysis and inter-active visualization of single-cell RNA-seq data. bioRxiv. 2016;96222.
Zappia L, Phipson B, Oshlack A. Splatter: simulation of single-cell RNA sequencing data. bioRxiv. 2017;133173.
van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9:2579–605.
Bolstad BM, Irizarry RA, Åstrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19:185–93.
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome Biol. 2014;15:550.
Law CW, Chen Y, Shi W, Smyth GK. Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 2014;15:R29.
Xue Z, Huang K, Cai C, Cai L, Jiang C, Feng Y, et al. Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing. Nature. 2013;500:593.
Hansen KD, Irizarry RA, Wu Z. Removing technical variability in RNA-seq data using conditional quantile normalization. Biostatistics. 2012;13:204–16.
Pierson E, Yau C. ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol. 2015;16:1–10.
Li WV, Li JJ. scImpute: accurate and robust imputation for single cell RNA-seq data. bioRxiv. 2017;141598.
Huang M, Wang J, Torre E, Dueck H, Shaffer S, Bonasio R, et al. Gene expression recovery for single cell RNA sequencing. bioRxiv. 2017;138677.
Rebhan M, Chalifa-Caspi V, Prilusky J, Lancet D. GeneCards: integrating information about genes, proteins and diseases. Trends Genet. 1997;13:163.
Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017;45:D353–61.
Consortium GO. Gene ontology consortium: going forward. Nucleic Acids Res. 2015;43:D1049–56.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene Ontology: tool for the unification of biology. Nat Genet. 2000;25:25–9.
Fritz JH, Ferrero RL, Philpott DJ, Girardin SE. Nod-like proteins in immunity, inflammation and disease. Nat Immunol. 2006;7:1250–7.
Belfiore A, Genua M, Malaguarnera R. PPAR-γ agonists and their effects on IGF-I receptor signaling: implications for cancer. PPAR Res. 2009;2009:830501.
Watkins DN, Berman DM, Burkholder SG, Wang B, Beachy PA, Baylin SB. Hedgehog signalling within airway epithelial progenitors and in small-cell lung cancer. Nature. 2003;422:313–7.
Santoro MG. Heat shock factors and the control of the stress response. Biochem Pharmacol. 2000;59:55–63.
Tamura Y, Peng P, Liu K, Daou M, Srivastava PK. Immunotherapy of tumors with autologous tumor-derived heat shock protein preparations. Science. 1997;278:117–20.
Eccles SA, Massey A, Raynaud FI, Sharp SY, Box G, Valenti M, et al. NVP-AUY922: a novel heat shock protein 90 inhibitor active against xenograft tumor growth, angiogenesis, and metastasis. Cancer Res. 2008;68:2850–60.
Zheng, Grace XY, et al. Massively parallel digital transcriptional profiling of single cells. Nature communications 8. 2017:14049.
Satija R, Butler, Andrew. Integrated analysis of single cell transcriptomic data across conditions, technologies, and species. bioRxiv. 2017:164889.
Juliá, Miguel, Telenti A, Rausell A. Sincell: an R/Bioconductor package for statistical assessment of cell-state hierarchies from single-cell RNA-seq. Bioinformatics 31.20. 2015:3380–3382.
Guo M, et al. SINCERA: a pipeline for single-cell RNA-Seq profiling analysis. PLoS computational biology 11.11. 2015:e1004575.