SCANPY: large-scale single-cell gene expression data analysis
Tóm tắt
Scanpy is a scalable toolkit for analyzing single-cell gene expression data. It includes methods for preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing, and simulation of gene regulatory networks. Its Python-based implementation efficiently deals with data sets of more than one million cells (
https://github.com/theislab/Scanpy
). Along with Scanpy, we present AnnData, a generic class for handling annotated data matrices (
https://github.com/theislab/anndata
).
Tài liệu tham khảo
Wagner A, Regev A, Yosef N. Revealing the vectors of cellular identity with single-cell genomics. Nat Biotechnol. 2016; 34:1145–60.
Satija R, Farrell JA, Gennert D, Schier AF, Regev A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015; 33:495–502.
Trapnell C, et al.The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol. 2014; 32:381–6.
Kharchenko PV, Silberstein L, Scadden DT, Bayesian approach to single-cell differential expression analysis. Nat Methods. 2014; 11:740–2.
Finak, G, et al.MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 2015; 16:278.
Zheng GXY, et al.Massively parallel digital transcriptional profiling of single cells. Nat Commun. 2017; 8:14049.
McCarthy D, Wills Q, Campbell K. scater: single-cell analysis toolkit for gene expression data in R. Bioinformatics. 2017; 33:1179.
Lun A, McCarthy D, Marioni J. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Research. 2016; 5:2122.
Abadi M, et al.TensorFlow: large-scale machine learning on heterogeneous systems. 2015. https://www.tensorflow.org/about/bib.
Macosko EZ, et al.Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell. 2015; 161:1202–14.
Coifman RR, et al.Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. Proc Natl Acad Sci. 2005; 102:7426–31.
Amir EAD, Davis KL, Tadmor MD, Simonds EF, Levine JH, Bendall SC, et al.viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia. Nat Biotechnol. 2013; 31:545–52.
Reingold EM. Graph drawing by force-directed placement. Softw Pract Exp. 1991; 21:1129–64.
Csardi G, Nepusz T. The igraph software package for complex network research. InterJournal Compl Syst. 2006; 2006:1695.
Weinreb C, Wolock S, Klein A. Spring: a kinetic interface for visualizing high dimensional single-cell expression data. bioRxiv. 2017. https://doi.org/10.1093/bioinformatics/btx792.
Buettner F, Theis FJ. Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics. 2015; 31:2989–98.
Angerer P, et al.destiny: diffusion maps for large-scale single-cell data in R. Bioinformatics. 2015; 32:1241.
Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech. 2008; 2008:P10008.
Levine JH, et al.Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell. 2015; 162:184–97.
Xu C, Su Z. Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics. 2015; 31:1974–80.
Haghverdi L, Buttner, M̈, Wolf FA, Buettner F, Theis FJ. Diffusion pseudotime robustly reconstructs branching cellular lineages. Nat Methods. 2016; 13:845–8.
Qiu X, et al.Reversed graph embedding resolves complex single-cell trajectories. Nat Methods. 2017; 14:979–82.
Setty, M, et al.Wishbone identifies bifurcating developmental trajectories from single-cell data. Nat Biotechnol. 2016; 34:637–45.
Wittmann, DM, et al.Transforming Boolean models to continuous models: methodology and application to T-cell receptor signaling. BMC Syst Biol. 2009; 3:98.
Eulenberg P, et al.Reconstructing cell cycle and disease progression using deep learning. Nat Commun. 2017; 8:463.
Huber, W, et al.Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods. 2015; 12:115–21.
Pedregosa F, et al.Scikit-learn: machine learning in Python. J Mach Learn Res. 2011; 12:2825–30.
Hagberg AA, Schult DA, Swart PJ. Exploring network structure, dynamics, and function using networkx. In: Proceedings of the 7th Python in Science Conference (SciPy2008). Pasadena: 2008. p. 11–15.
Bastian M, Heymann S, Jacomy M. Gephi: an open source software for exploring and manipulating networks. International AAAI Conference on Weblogs and Social Media. 2009.
Angerer, P, et al.Single cells make big data: new challenges and opportunities in transcriptomics. Curr Opin Syst Biol. 2017; 4:85–91.
Regev A, et al.Science forum: the human cell atlas. eLife. 2017; 6:e27041.
Lun ATL, Pages̀ H, Smith ML. beachmat: a Bioconductor C++ API for accessing single-cell genomics data from a variety of R matrix types. bioRxiv. 2017. https://doi.org/10.1101/167445.
van der Walt S, Colbert SC, Varoquaux G. The NumPy array: a structure for efficient numerical computation. Comput Sci Eng. 2011; 13:22–30.
Jones E, Oliphant T, Peterson P, et al.SciPy: open source scientific tools for Python. 2001. https://www.scipy.org/citing.html.
Hunter JD. Matplotlib: a 2D graphics environment. Comput Sci Eng. 2007; 9:90–5.
McKinney W. Data structures for statistical computing in Python In: van der Walt S, Millman J, editors. Proceedings of the 9th Python in Science Conference: 2010. p. 51–6.
Collette A. Python and HDF5. Sebasto pol: O’Reilly; 2013.
Seabold S, Perktold J. Statsmodels: econometric and statistical modeling with Python. 9th Python in Science Conference. 2010.
Waskom, M, et al. In: Varoquaux G, Vaught T, Millman J, (eds).Seaborn; 2016. http://doi.org/10.5281/zenodo.12710, https://networkx.github.io/documentation/networkx-1.10/reference/citing.html.
Ulyanov D. Multicore-tsne. 2016. https://github.com/DmitryUlyanov/Multicore-TSNE.
Traag V, Louvain. GitHub. 2017. https://doi.org/10.5281/zenodo.595481.
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015; 521:436–44.
Lippert C, Casale FP, Rakitsch B, Stegle O. In: van der Walt S, Millman J, (eds).Limix: genetic analysis of multiple traits; 2014. https://doi.org/10.1101/003905, http://conference.scipy.org/proceedings/scipy2010/mckinney.html. bioRxiv.
Matthews AGdeG, van der Wilk M, Nickson T, Fujii K, Boukouvalas A, Le’on-Villagr’a P, Ghahramani Z, Hensman J. GPflow: A Gaussian process library using TensorFlow. J Mach Learn Res. 2017; 18(40):1–6. http://jmlr.org/papers/v18/16-537.html.
Matthews de, G, Alexander G, et al.GPflow: a Gaussian process library using TensorFlow. J Mach Learn Res. 2017; 18:1–6. https://github.com/SheffieldML/GPy.
Buettner F, et al.Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat Biotechnol. 2015; 33:155.
Buettner F, Pratanwanich N, McCarthy DJ, Marioni JC, Stegle O. f-scLVM: scalable and versatile factor analysis for single-cell RNA-seq. Genome Biol. 2017; 18:212.
DeTomaso D, Yosef N. Fastproject: a tool for low-dimensional analysis of single-cell RNA-seq data. BMC Bioinform. 2016; 17:315.
Shekhar K, Brodin P, Davis MM, Chakraborty AK. Automatic classification of cellular expression by nonlinear stochastic embedding (accense): 2013. p 202–7.
Dixit A, et al.Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell. 2016; 167:1853–66.e17.
Svensson V, et al.Power analysis of single cell RNA-sequencing experiments. Nat Methods. 2017; 14:381.
Giecold G, Marco E, Garcia SP, Trippa L, Yuan G-C. Robust lineage reconstruction from high-dimensional single-cell data. Nucleic Acids Res. 2016; 44:e122.