AC-Caps: Attention Based Capsule Network for Predicting RBP Binding Sites of LncRNA

Jinmiao Song1,2, Shengwei Tian3, Long Yu4, Yan Xing5, Qimeng Yang2, Xiaodong Duan1, Qiguo Dai1
1Dalian Key Lab of Digital Technology for National Culture, Dalian Minzu University, Dalian, China
2School of Information Science and Engineering, Xinjiang University, Urumqi, China
3School of Software, Xinjiang University, Urumqi, China
4Network Center, Xinjiang University, Urumqi, China
5Imaging Center, Xinjiang Medical University Affiliated First Hospital, Urumqi, China

Tóm tắt

Long non-coding RNA(lncRNA) is one of the non-coding RNAs longer than 200 nucleotides and it has no protein encoding function. LncRNA plays a key role in many biological processes. Studying the RNA-binding protein (RBP) binding sites on the lncRNA chain helps to reveal epigenetic and post-transcriptional mechanisms, to explore the physiological and pathological processes of cancer, and to discover new therapeutic breakthroughs. To improve the recognition rate of RBP binding sites and reduce the experimental time and cost, many calculation methods based on domain knowledge to predict RBP binding sites have emerged. However, these prediction methods are independent of nucleotides and do not take into account nucleotide statistics. In this paper, we use a high-order statistical-based encoding scheme, then the encoded lncRNA sequences are fed into a hybrid deep learning architecture named AC-Caps. It consists of a joint processing layer(composed of attention mechanism and convolutional neural network) and a capsule network. The AC-Caps model was evaluated using 31 independent experimental data sets from 12 lncRNA-binding proteins. In experiments, our method achieves excellent performance, with an average area under the curve (AUC) of 0.967 and an average accuracy (ACC) of 92.5%, which are 0.014, 2.3%, 0.261, 28.9%, 0.189, and 21.8% higher than HOCCNNLB, iDeepS, and DeepBind, respectively. The results show that the AC-Caps method can reliably process the large-scale RBP binding site data on the lncRNA chain, and the prediction performance is better than existing deep-learning models. The source code of AC-Caps and the datasets used in this paper are available at https://github.com/JinmiaoS/AC-Caps .

Từ khóa


Tài liệu tham khảo

Chen LL, Carmichael GG (2010) Decoding the function of nuclear long non-coding RNAs. Curr Opin Cell Biol 22(3):357–364. https://doi.org/10.1016/j.ceb.2010.03.003 Carpenter S, Ricci EP, Mercier BC et al (2014) Post-transcriptional regulation of gene expression in innate immunity. Nat Rev Immunol 14(6):361–376. https://doi.org/10.1038/nri3682 Jiang Q, Wang J, Wu X et al (2015) LncRNA2Target: a database for differentially expressed genes after lncRNA knockdown or overexpression. Nucl Acids Res 43(D1):D193–D196. https://doi.org/10.1093/nar/gku1173 Michalik KM, You X, Manavski Y et al (2014) Long noncoding RNA MALAT1 regulates endothelial cell function and vessel growth. Circ Res 114(9):1389–1397. https://doi.org/10.1161/CIRCRESAHA.114.303265 Rossi MN (2014) Antonangeli F (2014) LncRNAs: new players in apoptosis control. Int J Cell Biol. https://doi.org/10.1155/2014/473857 Van K, Marieke Kedde M et al (2011) MicroRNA regulation by RNA-binding proteins and its implications for cancer. Nat Rev Cancer 11(9):644–656. https://doi.org/10.1038/nrc3107 Xie G, Huang S, Luo Y et al (2019) LLCLPLDA: a novel model for predicting lncRNA-disease associations. Mol Genet Genom 294(6):1477–1486. https://doi.org/10.1007/s00438-019-01590-8 Jiang W, Qu Y, Yang Q et al (2019) D-lnc: a comprehensive database and analytical platform to dissect the modification of drugs on lncRNA expression. RNA Biol 16(11):1586–1591. https://doi.org/10.1080/15476286.2019.1649584 Si J, Cui J, Cheng J, Wu R (2015) Computational prediction of rna-binding proteins and binding sites. Int J Mol Sci 16(11):26303–26317. https://doi.org/10.3390/ijms161125952 Cirillo D, Blanco M, Armaos A et al (2017) Quantitative predictions of protein interactions with long noncoding RNAs. Nat Methods 14(1):5. https://doi.org/10.1038/nmeth.4100 Paz I, Kligun E, Bengad B et al (2016) BindUP: a web server for non-homology-based prediction of DNA and RNA binding proteins. Nucl Acids Res 44(W1):W568–W574. https://doi.org/10.1093/nar/gkw454 Maticzka D, Lange SJ, Costa F et al (2014) GraphProt: modeling binding preferences of RNA-binding proteins. Genome Biol 15(1):R17. https://doi.org/10.1186/gb-2014-15-1-r17 Stražar M, Žitnik M, Zupan B et al (2016) Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins. Bioinformatics 32(10):1527–1535. https://doi.org/10.1093/bioinformatics/btw003 Zhang X, Liu S (2017) RBPPred: predicting RNA-binding proteins from sequence using SVM. Bioinformatics 33(6):854–862. https://doi.org/10.1093/bioinformatics/btw730 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25 (NIPS 2012), pp 1097-1105. https://doi.org/10.1145/3065386 Kamada S, Ichimura T, Harada T (2019) Knowledge extraction of adaptive structural learning of deep belief network for medical examination data. Int J Semant Comput 13(1):67–86. https://doi.org/10.1142/S1793351X1940004X Zoughi T, Homayounpour MM (2019) A gender-aware deep neural network structure for speech recognition. Iran J Sci Technol Trans Electr Eng 43(3):635–644. https://doi.org/10.1007/s40998-019-00177-8 Alipanahi B, Delong A, Weirauch MT et al (2015) Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat Biotechnol 33(8):831. https://doi.org/10.1038/nbt.3300 Pan X, Rijnbeek P, Yan J et al (2018) Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks. BMC Genom 19(1):511. https://doi.org/10.1186/s12864-018-4889-1 Pan X, Shen HB (2018) Predicting RNA-protein binding sites and motifs through combining local and global deep convolutional neural networks. Bioinformatics 34(20):3427–3436. https://doi.org/10.1093/bioinformatics/bty364 Ghanbari M, Ohler U (2019) Deep neural networks for interpreting RNA binding protein target preferences. Genome Res 30(2):214–226. https://doi.org/10.1101/gr.247494.118 Zhang K, Pan X, Yang Y et al (2019) CRIP: predicting circRNA-RBP-binding sites using a codon-based encoding and hybrid deep neural networks. RNA 25(12):1604–1615. https://doi.org/10.1261/rna.070565.119 Du X, Diao Y, Yao Y et al (2018) DeepMVF-RBP: deep multi-view fusion representation learning for RNA-binding proteins prediction. In: IEEE International Conference on bioinformatics and biomedicine (BIBM), pp 65-68.https://doi.org/10.1109/BIBM.2018.8621102 Chung T, Kim D (2019) Prediction of binding property of RNA-binding proteins using multi-sized filters and multi-modal deep convolutional neural network. PLoS One. https://doi.org/10.1371/journal.pone.0216257 Zhang Q, Zhu L, Huang DS (2018) High-order convolutional neural network architecture for predicting DNA-protein binding sites. IEEE/ACM Trans Comput Biol Bioinform 16(4):1184–1192. https://doi.org/10.1109/TCBB.2018.2819660 Zhang SW, Wang Y, Zhang XX et al (2019) Prediction of the RBP binding sites on lncRNAs using the high-order nucleotide encoding convolutional neural network. Anal Biochem 583:113364. https://doi.org/10.1016/j.ab.2019.113364 Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Advances in neural information processing systems, pp 3856-3866. arXiv:1710.09829 Pan X, Shen HB (2017) RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach. BMC Bioinform 18(1):136. https://doi.org/10.1186/s12859-017-1561-8 Muhammod R, Ahmed S, Md Farid D et al (2019) PyFeat: a Python-based effective feature generation tool for DNA. RNA and protein sequences. Bioinformatics 35(19):3831–3833. https://doi.org/10.1093/bioinformatics/btz165 Marçais G, Kingsford C (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27(6):764–770. https://doi.org/10.1093/bioinformatics/btr011 Melsted P, Pritchard JK (2011) Efficient counting of k-mers in DNA sequences using a bloom filter. BMC Bioinform 12(1):333. https://doi.org/10.1186/1471-2105-12-333 LeCun Y, Boser B, Denker JS et al (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551. https://doi.org/10.1162/neco.1989.1.4.541 Shen Y, He X, Gao J et al (2014) A latent semantic model with convolutional-pooling structure for information retrieval. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, pp 101-110. https://doi.org/10.1145/2661829.2661935 Pan X, Yan J (2017) Attention based convolutional neural network for predicting RNA-protein binding sites. arXiv:1712.02270 Kim J, Jang S, Park E et al (2019) Text classification using capsules. Neurocomputing 376:214–221. https://doi.org/10.1016/j.neucom.2019.10.033 Liu F, Zhang SW, Guo WF et al (2016) Inference of gene regulatory network based on local bayesian networks. PLoS Comput Biol. https://doi.org/10.1371/journal.pcbi.1005024 Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437. https://doi.org/10.1016/j.ipm.2009.03.002 Baldi P, Brunak S, Chauvin Y et al (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16(5):412–424. https://doi.org/10.1093/bioinformatics/16.5.412 Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874. https://doi.org/10.1016/j.patrec.2005.10.010 Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: The 32nd International Conference on International Conference on machine learning. https://doi.org/10.5555/3045118.3045167 Srivastava N, Hinton G, Krizhevsky A et al (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929-1958. http://jmlr.org/papers/v15/srivastava14a.html. Accessed 1 June 2020