Benchmarking Computational Doublet-Detection Methods for Single-Cell RNA Sequencing Data

Cell Systems - Tập 12 - Trang 176-194.e6 - 2021
Nan Miles Xi1, Jingyi Jessica Li1,2,3
1Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA
2Department of Human Genetics, University of California, Los Angeles, CA 90095-7088, USA
3Department of Computational Medicine, University of California, Los Angeles, CA 90095-1766, USA

Tài liệu tham khảo

Allaire, 2018, Reticulate: interface to Python, R Package Version, 1 Amezquita, 2020, Orchestrating single-cell analysis with Bioconductor, Nat. Methods, 17, 137, 10.1038/s41592-019-0654-x Andrews, 2018, False signals induced by single-cell imputation, F1000Res, 7, 1740, 10.12688/f1000research.16613.1 Bais, 2020, scds: computational annotation of doublets in single-cell RNA sequencing data, Bioinformatics, 36, 1150, 10.1093/bioinformatics/btz698 Bernstein, 2020, Solo: doublet identification in single-cell RNA-Seq via semi-supervised deep learning, Cell Syst., 11, 95, 10.1016/j.cels.2020.05.010 Blondel, 2008, Fast unfolding of communities in large networks, J. Stat. Mech., 2008, 10008, 10.1088/1742-5468/2008/10/P10008 Bloom, 2018, Estimating the frequency of multiplets in single-cell RNA sequencing from cell-mixing experiments, PeerJ, 6, e5578, 10.7717/peerj.5578 Branco, 2016, A survey of predictive modeling on imbalanced domains, ACM Comput. Surv., 49, 1, 10.1145/2907070 Butler, 2018, Integrating single-cell transcriptomic data across different conditions, technologies, and species, Nat. Biotechnol., 36, 411, 10.1038/nbt.4096 Chen, 2019, Single-cell RNA-Seq technologies and related computational data analysis, Front. Genet., 10, 317, 10.3389/fgene.2019.00317 Chen, T., and Guestrin, C. (2016). XGBoost: a scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. DePasquale, 2019, DoubletDecon: deconvoluting doublets from single-cell RNA-sequencing data, Cell Rep., 29, 1718, 10.1016/j.celrep.2019.09.082 Dietterich, 2000, Ensemble methods in machine learning, 1, 10.1007/3-540-45014-9_1 Domingues, 2018, A comparative evaluation of outlier detection algorithms: experiments and analyses, Pattern Recognit., 74, 406, 10.1016/j.patcog.2017.09.037 Duò, 2018, A systematic performance evaluation of clustering methods for single-cell RNA-seq data, F1000Res, 7, 1141, 10.12688/f1000research.15666.2 Durinck, 2009, Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt, Nat. Protoc., 4, 1184, 10.1038/nprot.2009.97 Edgar, 2002, Gene Expression Omnibus: NCBI gene expression and hybridization array data repository, Nucleic Acids Res., 30, 207, 10.1093/nar/30.1.207 Efron, 2016 Ester, M., Kriegel, H.-P., Sander, J., and Xiaowei, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. KDD'96: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231. Fay, 2010, Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules, Stat. Surv., 4, 1, 10.1214/09-SS051 Feng, 2020, Dimension reduction and clustering models for single-cell RNA sequencing data: a comparative study, Int. J. Mol. Sci., 21, 10.3390/ijms21062181 Feurer, 2019, Hyperparameter optimization, 3 Finak, 2015, MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data, Genome Biol., 16, 278, 10.1186/s13059-015-0844-5 Gayoso, 2018 Github Gong, 2013, DeconRNASeq: a statistical framework for deconvolution of heterogeneous tissue samples based on mRNA-Seq data, Bioinformatics, 29, 1083, 10.1093/bioinformatics/btt090 Grau, 2015, PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R, Bioinformatics, 31, 2595, 10.1093/bioinformatics/btv153 Hastie, 2009 Hastie, 1990 Herring, 2018, Single-cell computational strategies for lineage reconstruction in tissue systems, Cell. Mol. Gastroenterol. Hepatol., 5, 539, 10.1016/j.jcmgh.2018.01.023 Hwang, 2018, Single-cell RNA sequencing technologies and bioinformatics pipelines, Exp. Mol. Med., 50, 96, 10.1038/s12276-018-0071-8 Ji, 2016, TSCAN: pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis, Nucleic Acids Res., 44, e117, 10.1093/nar/gkw430 Kang, 2018, Multiplexed droplet single-cell RNA-sequencing using natural genetic variation, Nat. Biotechnol., 36, 89, 10.1038/nbt.4042 Kolodziejczyk, 2015, The technology and biology of single-cell RNA sequencing, Mol. Cell, 58, 610, 10.1016/j.molcel.2015.04.005 Lähnemann, 2020, Eleven grand challenges in single-cell data science, Genome Biol., 21, 31, 10.1186/s13059-020-1926-6 Li, 2018, An accurate and robust imputation method scImpute for single-cell RNA-seq data, Nat. Commun., 9, 997, 10.1038/s41467-018-03405-7 Li, 2019, A statistical simulator scDesign for rational scRNA-seq experimental design, Bioinformatics, 35, i41, 10.1093/bioinformatics/btz321 Liu, 2016, Single-cell transcriptome sequencing: recent advances and remaining challenges, F1000Res, 5, 10.12688/f1000research.7223.1 Lopez, 2018, Deep generative modeling for single-cell transcriptomics, Nat. Methods, 15, 1053, 10.1038/s41592-018-0229-2 Love, 2014, Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2, Genome Biol., 15, 550, 10.1186/s13059-014-0550-8 Luecken, 2019, Current best practices in single-cell RNA-seq analysis: a tutorial, Mol. Syst. Biol., 15, e8746, 10.15252/msb.20188746 Lun, 2016, A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor, F1000Res, 5, 2122 Mangul, 2019, Improving the usability and archival stability of bioinformatics software, Genome Biol., 20, 47, 10.1186/s13059-019-1649-8 McGinnis, 2019, DoubletFinder: doublet detection in single-cell RNA sequencing data using artificial nearest neighbors, Cell Syst., 8, 329, 10.1016/j.cels.2019.03.003 McGinnis, 2019, Multi-seq: sample multiplexing for single-cell RNA sequencing using lipid-tagged indices, Nat. Methods, 16, 619, 10.1038/s41592-019-0433-8 Natarajan, 2013, Learning with noisy labels, 1196 Nettleton, 2010, A study of the effect of different types of noise on the precision of supervised learning techniques, Artif. Intell. Rev., 33, 275, 10.1007/s10462-010-9156-z Pfister, 2013, Good things peak in pairs: a note on the bimodality coefficient, Front. Psychol., 4, 700, 10.3389/fpsyg.2013.00700 Pierre-Luc, 2020 Regev, 2017, The human cell atlas, eLife, 6, e27041, 10.7554/eLife.27041 Risso, 2018, A general and flexible method for signal extraction from single-cell RNA-seq data, Nat. Commun., 9, 284, 10.1038/s41467-017-02554-5 Saelens, 2019, A comparison of single-cell trajectory inference methods: towards more accurate and robust tools, bioRXiv Saito, 2015, The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets, PLoS One, 10, e0118432, 10.1371/journal.pone.0118432 Saliba, 2014, Single-cell RNA-seq: advances and future challenges, Nucleic Acids Res., 42, 8845, 10.1093/nar/gku555 Stoeckius, 2018, Cell Hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics, Genome Biol., 19, 224, 10.1186/s13059-018-1603-1 Street, 2018, Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics, BMC Genomics, 19, 477, 10.1186/s12864-018-4772-0 Stuart, 2019, Comprehensive integration of single-cell data, Cell, 177, 1888, 10.1016/j.cell.2019.05.031 Tian, 2019, Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments, Nat. Methods, 16, 479, 10.1038/s41592-019-0425-8 Trapnell, 2014, The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells, Nat. Biotechnol., 32, 381, 10.1038/nbt.2859 Vallejos, 2015, BASiCS: bayesian analysis of single-cell sequencing data, PLoS Comp. Biol., 11, e1004333, 10.1371/journal.pcbi.1004333 van Dijk, 2018, Recovering gene interactions from single-cell data using data diffusion, Cell, 174, 716, 10.1016/j.cell.2018.05.061 Wang, 2019, Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data, BMC Bioinformatics, 20, 40, 10.1186/s12859-019-2599-6 Waring, 2020, Automated machine learning: review of the state-of-the-art and opportunities for healthcare, Artif. Intell. Med., 104, 101822, 10.1016/j.artmed.2020.101822 Weber, 2019, Essential guidelines for computational method benchmarking, Genome Biol., 20, 125, 10.1186/s13059-019-1738-8 Wolock, 2019, Scrublet: computational identification of cell doublets in single-cell transcriptomic data, Cell Syst., 8, 281, 10.1016/j.cels.2018.11.005 Yang, 2020, Decontamination of ambient RNA in single-cell RNA-seq with DecontX, Genome Biol., 21, 57, 10.1186/s13059-020-1950-6 Yip, 2019, Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data, Brief. Bioinform., 20, 1583, 10.1093/bib/bby011 Young, 2020, SoupX removes ambient RNA contamination from droplet based single cell RNA sequencing data, bioRxiv Zappia, 2017, Splatter: simulation of single-cell RNA sequencing data, Genome Biol., 18, 174, 10.1186/s13059-017-1305-0 Zappia, 2018, Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database, PLoS Comp. Biol., 14, e1006245, 10.1371/journal.pcbi.1006245 Zheng, 2017, Massively parallel digital transcriptional profiling of single cells, Nat. Commun., 8, 14049, 10.1038/ncomms14049