Ensemble deep learning in bioinformatics
Tóm tắt
Từ khóa
Tài liệu tham khảo
Eraslan, G., Avsec, Ž., Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019).
Camacho, D. M., Collins, K. M., Powers, R. K., Costello, J. C. & Collins, J. J. Next-generation machine learning for biological networks. Cell 173, 1581–1592 (2018).
Hansen, L. K. & Salamon, P. Neural network ensembles. IEEE Trans. Pattern Anal. Mach. 12, 993–1001 (1990).
Yang, P., Hwa Yang, Y., Zhou, B. B. & Zomaya, A. Y. A review of ensemble methods in bioinformatics. Curr. Bioinform. 5, 296–308 (2010).
Min, S., Lee, B. & Yoon, S. Deep learning in bioinformatics. Briefings Bioinform. 18, 851–869 (2017).
Dietterich, T. G. Ensemble methods in machine learning. In International Workshop on Multiple Classifier Systems 1–15 (Springer, 2000).
Breiman, L. Bagging predictors. Mach. Learn. 24, 123–140 (1996).
Schapire, R. E., Freund, Y., Bartlett, P. & Lee, W. S. Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Stat. 26, 1651–1686 (1998).
Vega-Pons, S. & Ruiz-Shulcloper, J. A survey of clustering ensemble algorithms. Int. J. Pattern Recogn. 25, 337–372 (2011).
Altman, N. & Krzywinski, M. Points of significance: ensemble methods: bagging and random forests. Nat. Methods 14, 933–935 (2017).
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Proc. 26th Int. Conf. Advances in Neural Information Processing Systems 1097–1105 (NIPS, 2012).
Williams, R. J. & Zipser, D. A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1, 270–280 (1989).
Cho, K. et al. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proc. 2014 Conf. Empirical Methods in Natural Language Processing 1724–1734 (EMNLP, 2014).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).
Baldi, P. Autoencoders, unsupervised learning, and deep architectures. In Proc. ICML Workshop on Unsupervised and Transfer learning 37–49 (ICML, 2012).
Ju, C., Bibaut, A. & van der Laan, M. The relative performance of ensemble methods with deep convolutional neural networks for image classification. J. Appl. Stat. 45, 2800–2818 (2018).
Lee, S., Purushwalkam, S., Cogswell, M., Crandall, D. & Batra, D. Why M heads are better than one: training a diverse ensemble of deep networks. Preprint at https://arxiv.org/abs/1511.06314 (2015).
Granitto, P. M., Verdes, P. F. & Ceccatto, H. A. Neural network ensembles: evaluation of aggregation algorithms. Artif. Intell. 163, 139–162 (2005).
Lee, S. et al. Stochastic multiple choice learning for training diverse deep ensembles. In Proc. 30th Int. Conf. Advances in Neural Information Processing Systems 2119–2127 (NIPS, 2016).
Hinton, G., Vinyals, O. & Dean, J. Distilling the knowledge in a neural network. Preprint at http://arxiv.org/abs/1503.02531 (2015).
Shen, Z., He, Z. & Xue, X. Meal: multi-model ensemble via adversarial learning. In Proc. AAAI Conf. Artificial Intelligence Vol. 33 4886–4893 (AAAI, 2019).
Parisotto, E., Ba, J. & Salakhutdinov, R. Actor-mimic: deep multitask and transfer reinforcement learning. In Proc. Int. Conf. Learning Representations (ICLR, 2016).
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).
Baldi, P. & Sadowski, P. J. Understanding dropout. In Proc. 27th Int. Conf. Advances in Neural Information Processing Systems 2814–2822 (NIPS, 2013).
Hara, K., Saitoh, D. & Shouno, H. Analysis of dropout learning regarded as ensemble learning. In Proc. 25th Int. Conf. Artificial Neural Networks 72–79 (ICANN, 2016).
Huang, G., Sun, Y., Liu, Z., Sedra, D. & Weinberger, K. Q. Deep networks with stochastic depth. In 14th European Conf. Computer Vision 646–661 (Springer, 2016).
Singh, S., Hoiem, D. & Forsyth, D. Swapout: learning an ensemble of deep architectures. In Proc. 30th Int. Conf. Advances in Neural Information Processing Systems 28–36 (NIPS, 2016).
Huang, G. et al. Snapshot ensembles: train 1, get M for free. Preprint at https://arxiv.org/abs/1704.00109 (2017).
Han, B., Sim, J. & Adam, H. Branchout: regularization for online ensemble tracking with convolutional neural networks. In Proc. IEEE Conf. Computer Vision and Pattern Recognition 3356–3365 (IEEE, 2017).
Wang, X., Bao, A., Cheng, Y. & Yu, Q. Multipath ensemble convolutional neural network. IEEE Trans. Emerg. Topics Comput. https://doi.org/10.1109/TETCI.2018.2877154 (2018).
Zhu, X., Gong, S. et al. Knowledge distillation by on-the-fly native ensemble. In Proc. 32nd Int. Conf. Advances in Neural Information Processing Systems 7517–7527 (NIPS, 2018).
Geddes, T. A. et al. Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis. BMC Bioinform. 20, 660 (2019).
Shao, H., Jiang, H., Lin, Y. & Li, X. A novel method for intelligent fault diagnosis of rolling bearings using ensemble deep auto-encoders. Mech. Syst. Signal Process. 102, 278–297 (2018).
Wang, W., Arora, R., Livescu, K. & Bilmes, J. On deep multi-view representation learning. In Proc. 32nd Int. Conf. International Conference on Machine Learning 1083–1092 (ICML, 2015).
Huang, Z. et al. Multi-view spectral clustering network. In Proc. 28th Int. Joint Conf. Artificial Intelligence 2563–2569 (IJCAI, 2019).
Vincent, P., Larochelle, H., Bengio, Y. & Manzagol, P.-A. Extracting and composing robust features with denoising autoencoders. In Proc. 25th Int. Conf. Machine Learning 1096–1103 (ICML, 2008).
Bachman, P., Alsharif, O. & Precup, D. Learning with pseudo-ensembles. In Proc. 28th Int. Conf. Advances in Neural Information Processing Systems 3365–3373 (NIPS, 2014).
Antelmi, L., Ayache, N., Robert, P. & Lorenzi, M. Sparse multi-channel variational autoencoder for the joint analysis of heterogeneous data. In Proc. 36th Int. Conf. Machine Learning 302–311 (ICML, 2019).
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y. & Manzagol, P.-A. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010).
Geman, S., Bienenstock, E. & Doursat, R. Neural networks and the bias/variance dilemma. Neural Comput. 4, 1–58 (1992).
Keskar, N. S., Nocedal, J., Tang, P. T. P., Mudigere, D. & Smelyanskiy, M. On large-batch training for deep learning: generalization gap and sharp minima. In Proc. 5th Int. Conf. Learning Representations (ICLR, 2017).
Zhao, D., Yu, G., Xu, P. & Luo, M. Equivalence between dropout and data augmentation: a mathematical check. Neural Netw. 115, 82–89 (2019).
Bartoszewicz, J. M., Seidel, A., Rentzsch, R. & Renard, B. Y. Deepac: predicting pathogenic potential of novel DNA with reverse-complement neural networks. Bioinformatics 36, 81–89 (2020).
Cao, Z., Pan, X., Yang, Y., Huang, Y. & Shen, H.-B. The lncLocator: a subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier. Bioinformatics 34, 2185–2194 (2018).
Zhang, S., Hu, H., Jiang, T., Zhang, L. & Zeng, J. TITER: predicting translation initiation sites by deep learning. Bioinformatics 33, i234–i242 (2017).
Zhang, Y., Qiao, S., Ji, S. & Zhou, J. Ensemble-CNN: predicting DNA binding sites in protein sequences by an ensemble deep learning method. In Proc. 14th Int. Conf. Intelligent Computing 301–306 (ICIC, 2018).
He, F. et al. Protein ubiquitylation and sumoylation site prediction based on ensemble and transfer learning. In Proc. 2019 IEEE Int. Conf. Bioinformatics and Biomedicine 117–123 (IEEE, 2019).
Feuk, L., Carson, A. R. & Scherer, S. W. Structural variation in the human genome. Nat. Rev. Genet. 7, 85–97 (2006).
Portela, A. & Esteller, M. Epigenetic modifications and human disease. Nat. Biotechnol. 28, 1057–1068 (2010).
Karim, M. R., Rahman, A., Jares, J. B., Decker, S. & Beyan, O. A snapshot neural ensemble method for cancer-type prediction based on copy number variations. Neural Comput. Appl. https://doi.org/10.1007/s00521-019-04616-9 (2019).
Erhan, D. et al. Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res 11, 625–660 (2010).
Angermueller, C., Lee, H. J., Reik, W. & Stegle, O. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol. 18, 67 (2017).
Hu, H. et al. Deephint: understanding HIV-1 integration via deep learning with attention. Bioinformatics 35, 1660–1667 (2019).
Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. Preprint at https://arxiv.org/abs/1409.0473 (2014).
Yang, Y. H. & Speed, T. Design issues for cDNA microarray experiments. Nat. Rev. Genet. 3, 579–588 (2002).
Ozsolak, F. & Milos, P. M. RNA sequencing: advances, challenges and opportunities. Nat. Rev. Genet. 12, 87–98 (2011).
Kolodziejczyk, A. A., Kim, J. K., Svensson, V., Marioni, J. C. & Teichmann, S. A. The technology and biology of single-cell RNA sequencing. Mol. Cell 58, 610–620 (2015).
Grewal, J. K. et al. Application of a neural network whole transcriptome-based pan-cancer method for diagnosis of primary and metastatic cancers. JAMA Netw. Open 2, e192597 (2019).
Xiao, Y., Wu, J., Lin, Z. & Zhao, X. A deep learning-based multi-model ensemble method for cancer prediction. Comput. Methods Prog. Biomed. 153, 1–9 (2018).
West, M. D. et al. Use of deep neural network ensembles to identify embryonic-fetal transition markers: repression of COX7A1 in embryonic and cancer cells. Oncotarget 9, 7796–7811 (2018).
Tan, J. et al. Unsupervised extraction of stable expression signatures from public compendia with an ensemble of neural networks. Cell Syst. 5, 63–71 (2017).
Lee, D., Redfern, O. & Orengo, C. Predicting protein function from sequence and structure. Nat. Rev. Mol. Cell Biol. 8, 995–1005 (2007).
Li, Z. & Yu, Y. Protein secondary structure prediction using cascaded convolutional and recurrent neural networks. In Proc. 25th Int. Joint Conf. Artificial Intelligence 2560–2567 (AAAI, 2016).
Torrisi, M., Kaleel, M. & Pollastri, G. Deeper profiles and cascaded recurrent and convolutional neural networks for state-of-the-art protein secondary structure prediction. Sci. Rep. 9, 12374 (2019).
Singh, J., Hanson, J., Paliwal, K. & Zhou, Y. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nat. Commun. 10, 5407 (2019).
Zhang, B., Li, J. & Lü, Q. Prediction of 8-state protein secondary structures by a novel deep learning architecture. BMC Bioinform. 19, 293 (2018).
Zacharaki, E. I. Prediction of protein function using a deep convolutional neural network ensemble. PeerJ Comput. Sci. 3, e124 (2017).
Singh, J. et al. Detecting proline and non-proline cis isomers in protein structures from sequences using deep residual ensemble learning. J. Chem. Inf. Model. 58, 2033–2042 (2018).
Walther, T. C. & Mann, M. Mass spectrometry-based proteomics in cell biology. J. Cell Biol. 190, 491–500 (2010).
Cox, J. & Mann, M. Quantitative, high-resolution proteomics for data-driven systems biology. Annu. Rev. Biochem. 80, 273–299 (2011).
Zohora, F. T. et al. DeepIso: a deep learning model for peptide feature detection from LC-MS map. Sci. Rep. 9, 17168 (2019).
Demichev, V., Messner, C. B., Vernardis, S. I., Lilley, K. S. & Ralser, M. DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput. Nat. Methods 17, 41–44 (2020).
Hu, Y. et al. ACME: pan-specific peptide–MHC class I binding prediction through attention-based deep neural networks. Bioinformatics 35, 4946–4954 (2019).
Zhang, L., Yu, G., Xia, D. & Wang, J. Protein–protein interactions prediction based on ensemble deep neural networks. Neurocomputing 324, 10–19 (2019).
Karimi, M., Wu, D., Wang, Z. & Shen, Y. DeepAffinity: interpretable deep learning of compound–protein affinity through unified recurrent and convolutional neural networks. Bioinformatics 35, 3329–3338 (2019).
Hu, S. et al. Predicting drug-target interactions from drug structure and protein sequence using novel convolutional neural networks. BMC Bioinform. 20, 689 (2019).
Yang, P. et al. Multi-omic profiling reveals dynamics of the phased progression of pluripotency. Cell Syst. 8, 427–445 (2019).
Kim, H. J. et al. Transcriptional network dynamics during the progression of pluripotency revealed by integrative statistical learning. Nucl. Acids Res. 48, 1828–1842 (2020).
Ramazzotti, D., Lal, A., Wang, B., Batzoglou, S. & Sidow, A. Multi-omic tumor data reveal diversity of molecular mechanisms that correlate with survival. Nat. Commun. 9, 4453 (2018).
Liang, M., Li, Z., Chen, T. & Zeng, J. Integrative data analysis of multi-platform cancer data with a multimodal deep learning approach. IEEE/ACM Trans. Comput. Biol. Bioinform. 12, 928–937 (2014).
Arefeen, A., Xiao, X. & Jiang, T. DeepPasta: deep neural network based polyadenylation site analysis. Bioinformatics 35, 4577–4585 (2019).
Gala, R. et al. A coupled autoencoder approach for multi-modal analysis of cell types. In Proc. 33st Int. Conf. Advances in Neural Information Processing Systems 9263–9272 (NIPS, 2019).
Zhang, X. et al. Integrated multi-omics analysis using variational autoencoders: application to pan-cancer classification. In Proc. 2019 IEEE Int. Conf. Bioinformatics and Biomedicine 765–769 (IEEE, 2019).
Sharifi-Noghabi, H., Zolotareva, O., Collins, C. C. & Ester, M. MOLI: multi-omics late integration with deep neural networks for drug response prediction. Bioinformatics 35, i501–i509 (2019).
Lu, Z. et al. The classification of gliomas based on a pyramid dilated convolution resnet model. Pattern Recognit. Lett. 133, 173–179 (2020).
Codella, N. C. F. et al. Deep learning ensembles for melanoma recognition in dermoscopy images. IBM J. Res. Dev. 61, 5 (2017).
Song, Y. et al. Accurate segmentation of cervical cytoplasm and nuclei based on multiscale convolutional network and graph partitioning. IEEE Trans. Biomed. Eng. 62, 2421–2433 (2015).
Rasti, R., Teshnehlab, M. & Phung, S. L. Breast cancer diagnosis in DCE-MRI using mixture ensemble of convolutional neural networks. Pattern Recognit. 72, 381–390 (2017).
Yuan, X., Xie, L. & Abouelenien, M. A regularized ensemble framework of deep learning for cancer detection from multi-class, imbalanced training data. Pattern Recognit. 77, 160–172 (2018).
Xie, J., Xu, B. & Chuang, Z. Horizontal and vertical ensemble with deep representation for classification. Preprint at https://arxiv.org/abs/1306.2759 (2013).
Dvornik, N., Schmid, C. & Mairal, J. Diversity with cooperation: ensemble methods for few-shot classification. In Proc. IEEE Int. Conf. Computer Vision 3723–3731 (IEEE, 2019).
Bzdok, D., Nichols, T. E. & Smith, S. M. Towards algorithmic analytics for large-scale datasets. Nat. Mach. Intell. 1, 296–306 (2019).
Yang, P. et al. Sample subset optimization techniques for imbalanced and ensemble learning problems in bioinformatics applications. IEEE Trans. Cybern. 44, 445–455 (2014).
Yang, P. et al. AdaSampling for positive-unlabeled and label noise learning with bioinformatics applications. IEEE Trans. Cybern. 49, 1932–1943 (2019).
Abeel, T., Helleputte, T., Van de Peer, Y., Dupont, P. & Saeys, Y. Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26, 392–398 (2010).
Pusztai, L., Hatzis, C. & Andre, F. Reproducibility of research and preclinical validation: problems and solutions. Nat. Rev. Clin. Oncol. 10, 720–724 (2013).
Dean, J. et al. Large scale distributed deep networks. In Proc. 26th Int. Conf. Advances in Neural Information Processing Systems 1223–1231 (NIPS, 2012).
Smith, V., Chiang, C.-K., Sanjabi, M. & Talwalkar, A. S. Federated multi-task learning. In Proc. 31th Int. Conf. Advances in Neural Information Processing Systems 4424–4434 (NIPS, 2017).