SCANPY: large-scale single-cell gene expression data analysis

Genome Biology - Tập 19 - Trang 1-5 - 2018
F. Alexander Wolf1, Philipp Angerer1, Fabian J. Theis1,2
1Helmholtz Zentrum München – German Research Center for Environmental Health, Institute of Computational Biology, Munich, Germany
2Department of Mathematics, Technische Universität München, Munich, Germany

Tóm tắt

Scanpy is a scalable toolkit for analyzing single-cell gene expression data. It includes methods for preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing, and simulation of gene regulatory networks. Its Python-based implementation efficiently deals with data sets of more than one million cells ( https://github.com/theislab/Scanpy ). Along with Scanpy, we present AnnData, a generic class for handling annotated data matrices ( https://github.com/theislab/anndata ).

Tài liệu tham khảo

Wagner A, Regev A, Yosef N. Revealing the vectors of cellular identity with single-cell genomics. Nat Biotechnol. 2016; 34:1145–60. Satija R, Farrell JA, Gennert D, Schier AF, Regev A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015; 33:495–502. Trapnell C, et al.The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol. 2014; 32:381–6. Kharchenko PV, Silberstein L, Scadden DT, Bayesian approach to single-cell differential expression analysis. Nat Methods. 2014; 11:740–2. Finak, G, et al.MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 2015; 16:278. Zheng GXY, et al.Massively parallel digital transcriptional profiling of single cells. Nat Commun. 2017; 8:14049. McCarthy D, Wills Q, Campbell K. scater: single-cell analysis toolkit for gene expression data in R. Bioinformatics. 2017; 33:1179. Lun A, McCarthy D, Marioni J. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Research. 2016; 5:2122. Abadi M, et al.TensorFlow: large-scale machine learning on heterogeneous systems. 2015. https://www.tensorflow.org/about/bib. Macosko EZ, et al.Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell. 2015; 161:1202–14. Coifman RR, et al.Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. Proc Natl Acad Sci. 2005; 102:7426–31. Amir EAD, Davis KL, Tadmor MD, Simonds EF, Levine JH, Bendall SC, et al.viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia. Nat Biotechnol. 2013; 31:545–52. Reingold EM. Graph drawing by force-directed placement. Softw Pract Exp. 1991; 21:1129–64. Csardi G, Nepusz T. The igraph software package for complex network research. InterJournal Compl Syst. 2006; 2006:1695. Weinreb C, Wolock S, Klein A. Spring: a kinetic interface for visualizing high dimensional single-cell expression data. bioRxiv. 2017. https://doi.org/10.1093/bioinformatics/btx792. Buettner F, Theis FJ. Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics. 2015; 31:2989–98. Angerer P, et al.destiny: diffusion maps for large-scale single-cell data in R. Bioinformatics. 2015; 32:1241. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech. 2008; 2008:P10008. Levine JH, et al.Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell. 2015; 162:184–97. Xu C, Su Z. Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics. 2015; 31:1974–80. Haghverdi L, Buttner, M̈, Wolf FA, Buettner F, Theis FJ. Diffusion pseudotime robustly reconstructs branching cellular lineages. Nat Methods. 2016; 13:845–8. Qiu X, et al.Reversed graph embedding resolves complex single-cell trajectories. Nat Methods. 2017; 14:979–82. Setty, M, et al.Wishbone identifies bifurcating developmental trajectories from single-cell data. Nat Biotechnol. 2016; 34:637–45. Wittmann, DM, et al.Transforming Boolean models to continuous models: methodology and application to T-cell receptor signaling. BMC Syst Biol. 2009; 3:98. Eulenberg P, et al.Reconstructing cell cycle and disease progression using deep learning. Nat Commun. 2017; 8:463. Huber, W, et al.Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods. 2015; 12:115–21. Pedregosa F, et al.Scikit-learn: machine learning in Python. J Mach Learn Res. 2011; 12:2825–30. Hagberg AA, Schult DA, Swart PJ. Exploring network structure, dynamics, and function using networkx. In: Proceedings of the 7th Python in Science Conference (SciPy2008). Pasadena: 2008. p. 11–15. Bastian M, Heymann S, Jacomy M. Gephi: an open source software for exploring and manipulating networks. International AAAI Conference on Weblogs and Social Media. 2009. Angerer, P, et al.Single cells make big data: new challenges and opportunities in transcriptomics. Curr Opin Syst Biol. 2017; 4:85–91. Regev A, et al.Science forum: the human cell atlas. eLife. 2017; 6:e27041. Lun ATL, Pages̀ H, Smith ML. beachmat: a Bioconductor C++ API for accessing single-cell genomics data from a variety of R matrix types. bioRxiv. 2017. https://doi.org/10.1101/167445. van der Walt S, Colbert SC, Varoquaux G. The NumPy array: a structure for efficient numerical computation. Comput Sci Eng. 2011; 13:22–30. Jones E, Oliphant T, Peterson P, et al.SciPy: open source scientific tools for Python. 2001. https://www.scipy.org/citing.html. Hunter JD. Matplotlib: a 2D graphics environment. Comput Sci Eng. 2007; 9:90–5. McKinney W. Data structures for statistical computing in Python In: van der Walt S, Millman J, editors. Proceedings of the 9th Python in Science Conference: 2010. p. 51–6. Collette A. Python and HDF5. Sebasto pol: O’Reilly; 2013. Seabold S, Perktold J. Statsmodels: econometric and statistical modeling with Python. 9th Python in Science Conference. 2010. Waskom, M, et al. In: Varoquaux G, Vaught T, Millman J, (eds).Seaborn; 2016. http://doi.org/10.5281/zenodo.12710, https://networkx.github.io/documentation/networkx-1.10/reference/citing.html. Ulyanov D. Multicore-tsne. 2016. https://github.com/DmitryUlyanov/Multicore-TSNE. Traag V, Louvain. GitHub. 2017. https://doi.org/10.5281/zenodo.595481. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015; 521:436–44. Lippert C, Casale FP, Rakitsch B, Stegle O. In: van der Walt S, Millman J, (eds).Limix: genetic analysis of multiple traits; 2014. https://doi.org/10.1101/003905, http://conference.scipy.org/proceedings/scipy2010/mckinney.html. bioRxiv. Matthews AGdeG, van der Wilk M, Nickson T, Fujii K, Boukouvalas A, Le’on-Villagr’a P, Ghahramani Z, Hensman J. GPflow: A Gaussian process library using TensorFlow. J Mach Learn Res. 2017; 18(40):1–6. http://jmlr.org/papers/v18/16-537.html. Matthews de, G, Alexander G, et al.GPflow: a Gaussian process library using TensorFlow. J Mach Learn Res. 2017; 18:1–6. https://github.com/SheffieldML/GPy. Buettner F, et al.Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat Biotechnol. 2015; 33:155. Buettner F, Pratanwanich N, McCarthy DJ, Marioni JC, Stegle O. f-scLVM: scalable and versatile factor analysis for single-cell RNA-seq. Genome Biol. 2017; 18:212. DeTomaso D, Yosef N. Fastproject: a tool for low-dimensional analysis of single-cell RNA-seq data. BMC Bioinform. 2016; 17:315. Shekhar K, Brodin P, Davis MM, Chakraborty AK. Automatic classification of cellular expression by nonlinear stochastic embedding (accense): 2013. p 202–7. Dixit A, et al.Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell. 2016; 167:1853–66.e17. Svensson V, et al.Power analysis of single cell RNA-sequencing experiments. Nat Methods. 2017; 14:381. Giecold G, Marco E, Garcia SP, Trippa L, Yuan G-C. Robust lineage reconstruction from high-dimensional single-cell data. Nucleic Acids Res. 2016; 44:e122.