PreCanCell: An ensemble learning algorithm for predicting cancer and non-cancer cells from single-cell transcriptomes

Computational and Structural Biotechnology Journal - Tập 21 - Trang 3604-3614 - 2023
Tao Yang1,2,3, Qiyu Yan1,2,3, Rongzhuo Long1,2,3, Zhixian Liu4, Xiaosheng Wang1,2,3
1Biomedical Informatics Research Lab, School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing 211198, China
2Cancer Genomics Research Center, School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing, 211198, China
3Big Data Research Institute, China Pharmaceutical University, Nanjing 211198, China
4Jiangsu Cancer Hospital, Jiangsu Institute of Cancer Research, The Affiliated Cancer Hospital of Nanjing Medical University, Nanjing, Jiangsu Province, China

Tài liệu tham khảo

Stegle, 2015, Computational and analytical challenges in single-cell transcriptomics, Nat Rev Genet, 16, 133, 10.1038/nrg3833 Butler, 2018, Integrating single-cell transcriptomic data across different conditions, technologies, and species, Nat Biotechnol, 36, 411, 10.1038/nbt.4096 Kiselev, 2019, Challenges in unsupervised clustering of single-cell RNA-seq data, Nat Rev Genet, 20, 273, 10.1038/s41576-018-0088-9 Abdelaal, 2019, A comparison of automatic cell identification methods for single-cell RNA sequencing data, Genome Biol, 20, 194, 10.1186/s13059-019-1795-z Cao, 2017, Comprehensive single-cell transcriptional profiling of a multicellular organism, Science, 357, 661, 10.1126/science.aam8940 Cao, 2019, The single-cell transcriptional landscape of mammalian organogenesis, Nature, 566, 496, 10.1038/s41586-019-0969-x de Kanter, 2019, CHETAH: a selective, hierarchical cell type identification method for single-cell RNA sequencing, Nucleic Acids Res, 47 Li, 2020, SciBet as a portable and fast single cell type identifier, Nat Commun, 11, 1818, 10.1038/s41467-020-15523-2 Zhang, 2019, SCINA: a semi-supervised subtyping algorithm of single cells and bulk samples, Genes, 10, 10.3390/genes10070531 Kiselev, 2018, scmap: projection of single-cell RNA-seq data across data sets, Nat Methods, 15, 359, 10.1038/nmeth.4644 Aran, 2019, Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage, Nat Immunol, 20, 163, 10.1038/s41590-018-0276-y Dohmen, 2022, Identifying tumor cells at the single-cell level using machine learning, Genome Biol, 23, 10.1186/s13059-022-02683-1 Tirosh, 2016, Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq, Science, 352, 189, 10.1126/science.aad0501 Wang, 2021, Single-cell dissection of intratumoral heterogeneity and lineage diversity in metastatic gastric adenocarcinoma, Nat Med, 27, 141, 10.1038/s41591-020-1125-8 Puram, 2017, Single-cell transcriptomic analysis of primary and metastatic tumor ecosystems in head and neck, Cancer Cell, 171, 1611 Wu, 2021, A single-cell and spatially resolved atlas of human breast cancers, Nat Genet, 53, 1334, 10.1038/s41588-021-00911-1 Hao, 2021, Single-cell transcriptomes reveal heterogeneity of high-grade serous ovarian carcinoma, Clin Transl Med, 11, 10.1002/ctm2.500 Ben-David, 2020, Context is everything: aneuploidy in cancer, Nat Rev Genet, 21, 44, 10.1038/s41576-019-0171-x Patel, 2014, Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma, Science, 344, 1396, 10.1126/science.1254257 Serin Harmanci, 2020, CaSpER identifies and visualizes CNV events by integrative analysis of single-cell or bulk RNA-sequencing data, Nat Commun, 11, 89, 10.1038/s41467-019-13779-x Gao, 2021, Delineating copy number and clonal substructure in human tumors from single-cell transcriptomes, Nat Biotechnol, 39, 599, 10.1038/s41587-020-00795-2 Zhang, 2019, Dissecting the single-cell transcriptome network underlying gastric premalignant lesions and early gastric cancer, Cell Rep, 27, 1934, 10.1016/j.celrep.2019.04.052 Hao, 2021, Integrated analysis of multimodal single-cell data, Cell, 184, 10.1016/j.cell.2021.04.048 Yoshihara, 2013, Inferring tumour purity and stromal and immune cell admixture from expression data, Nat Commun, 4, 2612, 10.1038/ncomms3612 Hanzelmann, 2013, GSVA: gene set variation analysis for microarray and RNA-seq data, BMC Bioinform, 14, 7, 10.1186/1471-2105-14-7 MacDermed, 2010, MUC1-associated proliferation signature predicts outcomes in lung adenocarcinoma patients, Bmc Med Genom, 3 Li, 2021, DITHER: an algorIthm for defining intratumor heterogeneity based on EntRopy, Brief Bioinform, 22, 10.1093/bib/bbab202 Bland, 1998, Survival probabilities (the Kaplan-Meier method), BMJ, 317, 1572, 10.1136/bmj.317.7172.1572 Benjamini, 1995, Controlling the false discovery rate: a practical and powerful approach to multiple testing, J R Stat Soc B, 57, 289 Sagi, 2018, Ensemble learning: a survey, Wiley Interdiscip Rev-Data Min Knowl Discov, 8, 10.1002/widm.1249