AC-Caps: Attention Based Capsule Network for Predicting RBP Binding Sites of LncRNA
Tóm tắt
Long non-coding RNA(lncRNA) is one of the non-coding RNAs longer than 200 nucleotides and it has no protein encoding function. LncRNA plays a key role in many biological processes. Studying the RNA-binding protein (RBP) binding sites on the lncRNA chain helps to reveal epigenetic and post-transcriptional mechanisms, to explore the physiological and pathological processes of cancer, and to discover new therapeutic breakthroughs. To improve the recognition rate of RBP binding sites and reduce the experimental time and cost, many calculation methods based on domain knowledge to predict RBP binding sites have emerged. However, these prediction methods are independent of nucleotides and do not take into account nucleotide statistics. In this paper, we use a high-order statistical-based encoding scheme, then the encoded lncRNA sequences are fed into a hybrid deep learning architecture named AC-Caps. It consists of a joint processing layer(composed of attention mechanism and convolutional neural network) and a capsule network. The AC-Caps model was evaluated using 31 independent experimental data sets from 12 lncRNA-binding proteins. In experiments, our method achieves excellent performance, with an average area under the curve (AUC) of 0.967 and an average accuracy (ACC) of 92.5%, which are 0.014, 2.3%, 0.261, 28.9%, 0.189, and 21.8% higher than HOCCNNLB, iDeepS, and DeepBind, respectively. The results show that the AC-Caps method can reliably process the large-scale RBP binding site data on the lncRNA chain, and the prediction performance is better than existing deep-learning models. The source code of AC-Caps and the datasets used in this paper are available at
https://github.com/JinmiaoS/AC-Caps
.
Từ khóa
Tài liệu tham khảo
Chen LL, Carmichael GG (2010) Decoding the function of nuclear long non-coding RNAs. Curr Opin Cell Biol 22(3):357–364. https://doi.org/10.1016/j.ceb.2010.03.003
Carpenter S, Ricci EP, Mercier BC et al (2014) Post-transcriptional regulation of gene expression in innate immunity. Nat Rev Immunol 14(6):361–376. https://doi.org/10.1038/nri3682
Jiang Q, Wang J, Wu X et al (2015) LncRNA2Target: a database for differentially expressed genes after lncRNA knockdown or overexpression. Nucl Acids Res 43(D1):D193–D196. https://doi.org/10.1093/nar/gku1173
Michalik KM, You X, Manavski Y et al (2014) Long noncoding RNA MALAT1 regulates endothelial cell function and vessel growth. Circ Res 114(9):1389–1397. https://doi.org/10.1161/CIRCRESAHA.114.303265
Rossi MN (2014) Antonangeli F (2014) LncRNAs: new players in apoptosis control. Int J Cell Biol. https://doi.org/10.1155/2014/473857
Van K, Marieke Kedde M et al (2011) MicroRNA regulation by RNA-binding proteins and its implications for cancer. Nat Rev Cancer 11(9):644–656. https://doi.org/10.1038/nrc3107
Xie G, Huang S, Luo Y et al (2019) LLCLPLDA: a novel model for predicting lncRNA-disease associations. Mol Genet Genom 294(6):1477–1486. https://doi.org/10.1007/s00438-019-01590-8
Jiang W, Qu Y, Yang Q et al (2019) D-lnc: a comprehensive database and analytical platform to dissect the modification of drugs on lncRNA expression. RNA Biol 16(11):1586–1591. https://doi.org/10.1080/15476286.2019.1649584
Si J, Cui J, Cheng J, Wu R (2015) Computational prediction of rna-binding proteins and binding sites. Int J Mol Sci 16(11):26303–26317. https://doi.org/10.3390/ijms161125952
Cirillo D, Blanco M, Armaos A et al (2017) Quantitative predictions of protein interactions with long noncoding RNAs. Nat Methods 14(1):5. https://doi.org/10.1038/nmeth.4100
Paz I, Kligun E, Bengad B et al (2016) BindUP: a web server for non-homology-based prediction of DNA and RNA binding proteins. Nucl Acids Res 44(W1):W568–W574. https://doi.org/10.1093/nar/gkw454
Maticzka D, Lange SJ, Costa F et al (2014) GraphProt: modeling binding preferences of RNA-binding proteins. Genome Biol 15(1):R17. https://doi.org/10.1186/gb-2014-15-1-r17
Stražar M, Žitnik M, Zupan B et al (2016) Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins. Bioinformatics 32(10):1527–1535. https://doi.org/10.1093/bioinformatics/btw003
Zhang X, Liu S (2017) RBPPred: predicting RNA-binding proteins from sequence using SVM. Bioinformatics 33(6):854–862. https://doi.org/10.1093/bioinformatics/btw730
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25 (NIPS 2012), pp 1097-1105. https://doi.org/10.1145/3065386
Kamada S, Ichimura T, Harada T (2019) Knowledge extraction of adaptive structural learning of deep belief network for medical examination data. Int J Semant Comput 13(1):67–86. https://doi.org/10.1142/S1793351X1940004X
Zoughi T, Homayounpour MM (2019) A gender-aware deep neural network structure for speech recognition. Iran J Sci Technol Trans Electr Eng 43(3):635–644. https://doi.org/10.1007/s40998-019-00177-8
Alipanahi B, Delong A, Weirauch MT et al (2015) Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat Biotechnol 33(8):831. https://doi.org/10.1038/nbt.3300
Pan X, Rijnbeek P, Yan J et al (2018) Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks. BMC Genom 19(1):511. https://doi.org/10.1186/s12864-018-4889-1
Pan X, Shen HB (2018) Predicting RNA-protein binding sites and motifs through combining local and global deep convolutional neural networks. Bioinformatics 34(20):3427–3436. https://doi.org/10.1093/bioinformatics/bty364
Ghanbari M, Ohler U (2019) Deep neural networks for interpreting RNA binding protein target preferences. Genome Res 30(2):214–226. https://doi.org/10.1101/gr.247494.118
Zhang K, Pan X, Yang Y et al (2019) CRIP: predicting circRNA-RBP-binding sites using a codon-based encoding and hybrid deep neural networks. RNA 25(12):1604–1615. https://doi.org/10.1261/rna.070565.119
Du X, Diao Y, Yao Y et al (2018) DeepMVF-RBP: deep multi-view fusion representation learning for RNA-binding proteins prediction. In: IEEE International Conference on bioinformatics and biomedicine (BIBM), pp 65-68.https://doi.org/10.1109/BIBM.2018.8621102
Chung T, Kim D (2019) Prediction of binding property of RNA-binding proteins using multi-sized filters and multi-modal deep convolutional neural network. PLoS One. https://doi.org/10.1371/journal.pone.0216257
Zhang Q, Zhu L, Huang DS (2018) High-order convolutional neural network architecture for predicting DNA-protein binding sites. IEEE/ACM Trans Comput Biol Bioinform 16(4):1184–1192. https://doi.org/10.1109/TCBB.2018.2819660
Zhang SW, Wang Y, Zhang XX et al (2019) Prediction of the RBP binding sites on lncRNAs using the high-order nucleotide encoding convolutional neural network. Anal Biochem 583:113364. https://doi.org/10.1016/j.ab.2019.113364
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Advances in neural information processing systems, pp 3856-3866. arXiv:1710.09829
Pan X, Shen HB (2017) RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach. BMC Bioinform 18(1):136. https://doi.org/10.1186/s12859-017-1561-8
Muhammod R, Ahmed S, Md Farid D et al (2019) PyFeat: a Python-based effective feature generation tool for DNA. RNA and protein sequences. Bioinformatics 35(19):3831–3833. https://doi.org/10.1093/bioinformatics/btz165
Marçais G, Kingsford C (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27(6):764–770. https://doi.org/10.1093/bioinformatics/btr011
Melsted P, Pritchard JK (2011) Efficient counting of k-mers in DNA sequences using a bloom filter. BMC Bioinform 12(1):333. https://doi.org/10.1186/1471-2105-12-333
LeCun Y, Boser B, Denker JS et al (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551. https://doi.org/10.1162/neco.1989.1.4.541
Shen Y, He X, Gao J et al (2014) A latent semantic model with convolutional-pooling structure for information retrieval. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, pp 101-110. https://doi.org/10.1145/2661829.2661935
Pan X, Yan J (2017) Attention based convolutional neural network for predicting RNA-protein binding sites. arXiv:1712.02270
Kim J, Jang S, Park E et al (2019) Text classification using capsules. Neurocomputing 376:214–221. https://doi.org/10.1016/j.neucom.2019.10.033
Liu F, Zhang SW, Guo WF et al (2016) Inference of gene regulatory network based on local bayesian networks. PLoS Comput Biol. https://doi.org/10.1371/journal.pcbi.1005024
Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437. https://doi.org/10.1016/j.ipm.2009.03.002
Baldi P, Brunak S, Chauvin Y et al (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16(5):412–424. https://doi.org/10.1093/bioinformatics/16.5.412
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874. https://doi.org/10.1016/j.patrec.2005.10.010
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: The 32nd International Conference on International Conference on machine learning. https://doi.org/10.5555/3045118.3045167
Srivastava N, Hinton G, Krizhevsky A et al (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929-1958. http://jmlr.org/papers/v15/srivastava14a.html. Accessed 1 June 2020
