Computational Methods for Prediction of Human Protein-Phenotype Associations: A Review

Springer Science and Business Media LLC - Tập 1 - Trang 171-185 - 2021
Lizhi Liu1, Shanfeng Zhu2,3,4,5,6
1School of Computer Science, Fudan University, Shanghai, China
2Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
3Key Laboratory of Computational Neuroscience and Brain Inspired Intelligence, Fudan University, Ministry of Education, Shanghai, China
4MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
5Zhangjiang Fudan International Innovation Center, Shanghai, China
6Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China

Tóm tắt

Deciphering the relationship between human proteins (genes) and phenotypes is one of the fundamental tasks in phenomics research. The Human Phenotype Ontology (HPO) builds upon a standardized logical vocabulary to describe the abnormal phenotypes encountered in human diseases and paves the way towards the computational analysis of their genetic causes. To date, many computational methods have been proposed to predict the HPO annotations of proteins. In this paper, we conduct a comprehensive review of the existing approaches to predicting HPO annotations of novel proteins, identifying missing HPO annotations, and prioritizing candidate proteins with respect to a certain HPO term. For each topic, we first give the formalized description of the problem, and then systematically revisit the published literatures highlighting their advantages and disadvantages, followed by the discussion on the challenges and promising future directions. In addition, we point out several potential topics to be worthy of exploration including the selection of negative HPO annotations and detecting HPO misannotations. We believe that this review will provide insight to the researchers in the field of computational phenotype analyses in terms of comprehending and developing novel prediction algorithms.

Tài liệu tham khảo

Akhmetov I, Bubnov RV (2015) Assessing value of innovative molecular diagnostic tests in the concept of predictive, preventive, and personalized medicine. EPMA J 6(1):19. https://doi.org/10.1186/s13167-015-0041-3 Anbalagan M, Huderson B, Murphy L, Rowan BG (2012) Post-translational modifications of nuclear receptors and human disease. Nucl Recept Signal 10(1):nrs-1001 Ashburner M et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25(1):25–29. https://doi.org/10.1038/75556 Barbeira AN et al (2018) Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat Commun 9(1):1–20 Bekker J, Davis J (2020) Learning from positive and unlabeled data: a survey. Mach Learn 109(4):719–760. https://doi.org/10.1007/s10994-020-05877-5 Bentz AB, Thomas GWC, Rusch DB, Rosvall KA (2019) Tissue-specific expression profiles and positive selection analysis in the tree swallow (Tachycineta bicolor) using a de novo transcriptome assembly. Sci Rep 9(1):1–12 Boycott KM, Vanstone MR, Bulman DE, MacKenzie AE (2013) Rare-disease genetics in the era of next-generation sequencing: discovery to translation. Nat Rev Genet 14(10):681–691. https://doi.org/10.1038/nrg3555 Bromberg Y (2013) Disease gene prioritization. PLoS Comput Biol 9(4):e1002902. https://doi.org/10.1371/journal.pcbi.1002902 Burges C (2010) From RankNet to LambdaRank to LambdaMART: an overview. Technical report, Microsoft Research Bush WS, Moore JH (2012) Genome-wide association studies. PLoS Comput Biol 8(12):e1002822. https://doi.org/10.1371/journal.pcbi.1002822 Chapelle O, Schölkopf B, Zien A (eds) (2006) Semi-Supervised Learning. The MIT Press. https://doi.org/10.7551/mitpress/9780262033589.001.0001 Chen M, Wei Z, Huang Z, Ding B, Li Y (2020) Simple and deep graph convolutional networks. In: Proceedings of the 37th international conference on machine learning, ICML 2020, 13–18 July 2020, virtual event. Proceedings of machine learning research, vol 119, pp 1725–1735. PMLR Cho H, Berger B, Peng J (2016) Compact integration of multi-network topology for functional analysis of genes. Cell Syst 3(6):540–548. https://doi.org/10.1016/j.cels.2016.10.017 Chong JX et al (2015) The genetic basis of Mendelian phenotypes: discoveries, challenges, and opportunities. Am J Hum Genet 97(2):199–215. https://doi.org/10.1016/j.ajhg.2015.06.009 Deans Andrew R et al (2015) Finding our way through phenotypes. PLoS Biol 13(1):e1002033. https://doi.org/10.1371/journal.pbio.1002033 Deegan JI, Dimmer EC, Mungall CJ (2010) Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development. BMC Bioinform 11:530. https://doi.org/10.1186/1471-2105-11-530 Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in neural information processing systems 29: annual conference on neural information processing systems 2016, December 5–10, 2016, Barcelona, Spain, pp 3837–3845 Doğan T (2018) HPO2GO: prediction of human phenotype ontology term associations for proteins using cross ontology annotation co-occurrences. PeerJ 6:e5298. https://doi.org/10.7717/peerj.5298 Dolinski K, Botstein D (2007) Orthology and functional conservation in eukaryotes. Annu Rev Genet 41:465–507. https://doi.org/10.1146/annurev.genet.40.110405.090439 Firth HV et al (2009) DECIPHER: database of chromosomal imbalance and phenotype in humans using ensembl resources. Am J Hum Genet 84(4):524–533. https://doi.org/10.1016/j.ajhg.2009.03.010 Forster DT, Boone C, Bader GD, Wang B (2021) BIONIC: biological network integration using convolutions. bioRxiv. https://doi.org/10.1101/2021.03.15.435515 Fu G, Wang J, Yang B, Yu G (2016a) NegGOA: negative GO annotations selection using ontology structure. Bioinformatics 32(19):2996–3004. https://doi.org/10.1093/bioinformatics/btw366 Fu G, Yu G, Wang J, Guo M (2016b) Protein function prediction using positive and negative examples (in Chinese). J Comput Res Dev 53(8):1753–1765. https://doi.org/10.7544/issn1000-1239.2016.20160196 Gao J, Yao S, Mamitsuka H, Zhu S (2018) AiProAnnotator: low-rank approximation with network side information for high-performance, large-scale human protein abnormality annotator. In: IEEE international conference on bioinformatics and biomedicine, BIBM 2018, Madrid, Spain, December 3–6, 2018, pp 13–20. IEEE Computer Society. https://doi.org/10.1109/BIBM.2018.8621517 Gao J, Liu L, Yao S, Mamitsuka H, Zhu S (2019) HPOAnnotator: improving large-scale prediction of HPO annotations by low-rank approximation with HPO semantic similarities and multiple PPI networks. BMC Med Genom 12(10):187. https://doi.org/10.1186/s12920-019-0625-1 Gligorijevic V, Barot M, Bonneau R (2018) deepNF: deep network fusion for protein function prediction. Bioinformatics 34(22):3873–3881. https://doi.org/10.1093/bioinformatics/bty440 Goh K, Cusick ME, Valle D, Childs B, Vidal M, Barabási A (2007) The human disease network. Proc Natl Acad Sci USA 104(21):8685–8690. https://doi.org/10.1073/pnas.0701361104 Groza T et al (2015) The human phenotype ontology: semantic unification of common and rare disease. Am J Hum Genet 97(1):111–124. https://doi.org/10.1016/j.ajhg.2015.05.020 Guan Y et al (2012) Tissue-specific functional networks for prioritizing phenotype and disease genes. PLoS Comput Biol 8(9):e1002694 Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA (2002) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 30(1):52–55. https://doi.org/10.1093/nar/gki033 Han P, Yang P, Zhao P, Shang S, Liu Y, Zhou J, Gao X, Kalnis P (2019) GCN-MF: disease-gene association identification by graph convolutional networks and matrix factorization. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, KDD 2019, Anchorage, AK, USA, August 4–8, 2019, pp 705–713. ACM. https://doi.org/10.1145/3292500.3330912 Hekselman I, Yeger-Lotem E (2020) Mechanisms of tissue and cell-type specificity in heritable traits and diseases. Nat Rev Genet 21(3):137–150 Hoehndorf R, Schofield PN, Gkoutos GV (2011) PhenomeNET: a whole-phenome approach to disease gene discovery. Nucleic Acids Res 39(18):e119. https://doi.org/10.1093/nar/gkr538 Horton Jay D, Cohen Jonathan C, Hobbs Helen H (2007) Molecular biology of PCSK9: its role in LDL metabolism. Trends Biochem Sci 32(2):71–77 Hu Y, Koren Y, Volinsky C (2008) Collaborative filtering for implicit feedback datasets. In: Proceedings of the 8th IEEE international conference on data mining (ICDM 2008), December 15–19, 2008, Pisa, Italy, pp 263–272. IEEE Computer Society. https://doi.org/10.1109/ICDM.2008.22 Jiang Y et al (2016) An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biol 17(1):184. https://doi.org/10.1186/s13059-016-1037-6 Joshi B et al (2008) Phosphorylated caveolin-1 regulates Rho/ROCK-dependent focal adhesion dynamics and tumor cell migration and invasion. Cancer Res 68(20):8210–8220 Kahanda I, Funk C, Verspoor K, Ben-Hur A (2015) PHENOstruct: prediction of human phenotype ontology terms using heterogeneous data sources [version 1; peer review: 2 approved]. F1000Research 4:259. https://doi.org/10.12688/f1000research.6670.1 Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: 5th International conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, conference track proceedings. OpenReview.net Koonin EV (2005) Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet 39:309–338. https://doi.org/10.1146/annurev.genet.39.073003.114725 Kulmanov M, Hoehndorf R (2020) DeepPheno: predicting single gene loss-of-function phenotypes using an ontology-aware hierarchical classifier. PLoS Comput Biol 16(11):e1008453. https://doi.org/10.1371/journal.pcbi.1008453 Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization. In: Advances in neural information processing systems 13, Papers from Neural Information Processing Systems (NIPS) 2000, Denver, CO, USA, pp 556–562. MIT Press Lee JS et al (2019) PCSK9 inhibition as a novel therapeutic target for alcoholic liver disease. Sci Rep 9(1):1–16 Li H (2011) A short introduction to learning to rank. IEICE Trans Inf Syst 94-D(10):1854–1862. https://doi.org/10.1587/TRANSINF.E94.D.1854 Li Q, Han Z, Wu X-M (2018) Deeper insights into graph convolutional networks for semi-supervised learning. In: Proceedings of the thirty-second AAAI conference on artificial intelligence, (AAAI-18), the 30th innovative applications of artificial intelligence (IAAI-18), and the 8th AAAI symposium on educational advances in artificial intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2–7, 2018, pp 3538–3545. AAAI Press Li G, Müller M, Thabet AK, Ghanem B (2019) DeepGCNs: can GCNs go as deep as CNNs? In: 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019, pp 9266–9275. IEEE. https://doi.org/10.1109/ICCV.2019.00936 Lin D (1998) An information-theoretic definition of similarity. In: Shavlik JW (ed) Proceedings of the fifteenth international conference on machine learning (ICML 1998), Madison, Wisconsin, USA, July 24–27. Morgan Kaufmann, pp 296–304 Liu L, Huang X, Mamitsuka H, Zhu S (2020) HPOLabeler: improving prediction of human protein-phenotype associations by learning to rank. Bioinformatics 36(14):4180–4188. https://doi.org/10.1093/bioinformatics/btaa284 Lu C, Wang J, Zhang Z, Yang P, Yu G (2016) NoisyGOA: noisy GO annotations prediction using taxonomic and semantic similarity. Comput Biol Chem 65:203–211. https://doi.org/10.1016/j.compbiolchem.2016.09.005 Lu C, Chen X, Wang J, Yu G, Yu Z (2018) Identifying noisy functional annotations of proteins using sparse semantic similarity (in Chinese). Sci Sin Inform 48(8):1035–1050. https://doi.org/10.1360/N112017-00105 Mann M, Jensen ON (2003) Proteomic analysis of post-translational modifications. Nat Biotechnol 21(3):255–261 Martin L, Latypova X, Terro F (2011) Post-translational modifications of tau protein: implications for Alzheimer’s disease. Neurochem Int 58(4):458–471 Mostafavi S, Morris Q (2010) Fast integration of heterogeneous data sources for predicting gene function with limited annotation. Bioinformatics 26(14):1759–1765. https://doi.org/10.1093/bioinformatics/btq262 Mostafavi S, Ray D, Warde-Farley D, Grouios C, Morris Q (2008) GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biol 9(S1):S4. https://doi.org/10.1186/gb-2008-9-s1-s4 Notaro M, Schubach M, Frasca M, Mesiti M, Robinson PN, Valentini G (2017a) Ensembling descendant term classifiers to improve gene-abnormal phenotype predictions. In: International meeting on computational intelligence methods for bioinformatics and biostatistics, pp 70–80. Springer. https://doi.org/10.1007/978-3-030-14160-8_8 Notaro M, Schubach M, Robinson PN, Valentini G (2017b) Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods. BMC Bioinform 18(1):1–18. https://doi.org/10.1186/s12859-017-1854-y Oti M, Snel B, Huynen MA, Brunner HG (2006) Predicting disease genes using protein–protein interactions. J Med Genet 43(8):691–698. https://doi.org/10.1136/jmg.2006.041376 Pavan S, Rommel K, Marquina MEM, Höhn S, Lanneau V, Rath A (2017) Clinical practice guidelines for rare diseases: the orphanet database. PLoS One 12(1):e0170365. https://doi.org/10.1371/journal.pone.0170365 Peng J, Xue H, Wei Z, Tuncali I, Hao J, Xuequn Shang (2021) Integrating multi-network topology for gene function prediction using deep neural networks. Brief Bioinform 22(2):2096–2105. https://doi.org/10.1093/bib/bbaa036 Petegrosso R, Park S, Hwang TH, Kuang R (2017) Transfer learning across ontologies for phenome–genome association prediction. Bioinformatics 33(4):529–536. https://doi.org/10.1093/bioinformatics/btw649 Peter RN (2012) Deep phenotyping for precision medicine. Hum Mutat 33(5):777–780. https://doi.org/10.1002/humu.22080 Robinson PN, Köhler S, Bauer S, Seelow D, Horn D, Mundlos S (2008) The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am J Hum Genet 83(5):610–615. https://doi.org/10.1016/j.ajhg.2008.09.017 Rousselet E, Marcinkiewicz J, Kriz J, Zhou A, Hatten ME, Annik Prat, Seidah NG (2011) PCSK9 reduces the protein levels of the LDL receptor in mouse brain during development and after ischemic stroke. J Lipid Res 52(7):1383–1391 Scheuermann RH, Ceusters W, Smith B (2009) Toward an ontological treatment of disease and diagnosis. Summit Transl Bioinform 2009:116–120 Schriml LM, Arze C, Nadendla S, Wayne Chang Y, Mazaitis M, Felix V, Feng G, Kibbe WA (2012) Disease ontology: a backbone for disease semantic integration. Nucleic Acids Res 40(D1):D940–D946. https://doi.org/10.1093/nar/gkr972 Seo J-W, Lee K-J (2004) Post-translational modifications and their biological functions: proteomic analysis and systematic approaches. BMB Rep 37(1):35–44 Smith B (2003) Ontology. In: Floridi L (ed) Blackwell Guide to the Philosophy of Computing and Information, Chapter 11. Blackwell, Oxford, pp 155–166 Smith CL, Goldsmith CW, Eppig JT (2005) The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol 6(1):R7. https://doi.org/10.1186/gb-2004-6-1-r7 Son JH et al (2018) Deep phenotyping on electronic health records facilitates genetic diagnosis by clinical exomes. Am J Hum Genet 103(1):58–73. https://doi.org/10.1016/j.ajhg.2018.05.010 Valentini G, Armano G, Frasca M, Lin J, Mesiti M, Matteo Re (2016) RANKS: a flexible tool for node label ranking and classification in biological networks. Bioinformatics 32(18):2872–2874. https://doi.org/10.1093/bioinformatics/btw235 Vargas L et al (2002) Functional interaction of caveolin-1 with Bruton’s tyrosine kinase and Bmx. J Biol Chem 277(11):9351–9357 Wang P, Lai W, Li MJ, Xu F, Yalamanchili HK, Lovell-Badge R, Wang J (2013) Inference of gene-phenotype associations via protein–protein interaction and orthology. PLoS One 8(10):e77478. https://doi.org/10.1371/journal.pone.0077478 Wang Y-C, Peterson SE, Loring JF (2014) Protein post-translational modifications and regulation of pluripotency in human stem cells. Cell Res 24(2):143–160 Wang Z, Zhou M, Arnold CW (2020) Toward heterogeneous information fusion: bipartite graph convolutional networks for in silico drug repurposing. Bioinformatics 36(Supplement\_1):i525–i533, 07. https://doi.org/10.1093/bioinformatics/btaa437 Wei X, Zhang C, Freddolino PL, Zhang Y, Lu Z (2020) Detecting Gene Ontology misannotations using taxon-specific rate ratio comparisons. Bioinformatics 36(16):4383–4388. https://doi.org/10.1093/bioinformatics/btaa548 Wiechen K et al (2001) Caveolin-1 is down-regulated in human ovarian carcinoma and acts as a candidate tumor suppressor gene. Am J Pathol 159(5):1635–1643 Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259. https://doi.org/10.1016/s0893-6080(05)80023-1 Xu X, Cui Y, Cao L, Zhang Y, Yin Y, Hu X (2017) PCSK9 regulates apoptosis in human lung adenocarcinoma A549 cells via endoplasmic reticulum stress and mitochondrial signaling pathways. Exp Ther Med 13(5):1993–1999 Xu H, Wang Y, Lin S, Deng W, Peng D, Cui Q, Yu X (2018) PTMD: a database of human disease-associated post-translational modifications. Genom Proteom Bioinform 16(4):244–251 Xue H, Peng J, Shang X (2019) Towards gene function prediction via multi-networks representation learning. In: The thirty-third AAAI conference on artificial intelligence, AAAI 2019, Honolulu, Hawaii, USA, January 27–February 1, 2019, pp 10069–10070. AAAI Press. https://doi.org/10.1609/aaai.v33i01.330110069 Youngs N, Penfold-Brown D, Drew K, Shasha DE, Bonneau R (2013) Parametric Bayesian priors and better choice of negative examples improve protein function prediction. Bioinformatics 29(9):1190–1198. https://doi.org/10.1093/bioinformatics/btt110 Youngs N, Penfold-Brown D, Bonneau R, Shasha DE (2014) Negative example selection for protein function prediction: the NoGO database. PLoS Comput Biol 10(6):e1003644. https://doi.org/10.1371/journal.pcbi.1003644 Yu H, Zhang VW (2015) Precision medicine for continuing phenotype expansion of human genetic diseases. BioMed Res Int 2015:745043. https://doi.org/10.1155/2015/745043 Yu G, Fu G, Wang J, Guo M (2017a) Predicting irrelevant functions of proteins based on dimensionality reduction (in Chinese). Sci Sin Inf 47(10):1349–1368. https://doi.org/10.1360/N112017-00009 Yu G, Lu C, Wang J (2017b) NoGOA: predicting noisy GO annotations using evidences and sparse representation. BMC Bioinform 18(1):350. https://doi.org/10.1186/s12859-017-1764-z Zhao X-M, Wang Y, Chen L, Aihara K (2008) Gene function prediction using labeled and unlabeled data. BMC Bioinform 9:57. https://doi.org/10.1186/1471-2105-9-57 Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2003) Learning with local and global consistency. In: Advances in neural information processing systems 16 [Neural information processing systems, NIPS 2003, December 8–13, 2003, Vancouver and Whistler, British Columbia, Canada]. MIT Press, pp 321–328 Zhu C, Byrd RH, Lu P, Nocedal J (1997) Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans Math Softw 23(4):550–560. https://doi.org/10.1145/279232.279236 Zhu X, Ghahramani Z, Lafferty JD (2003) Semi-supervised learning using Gaussian fields and harmonic functions. In: Machine learning, proceedings of the twentieth international conference (ICML 2003), August 21–24, 2003, Washington, DC, USA. AAAI Press, pp 912–919 Zitnik M, Leskovec J (2017) Predicting multicellular function through multi-layer tissue networks. Bioinformatics 33(14):i190–i198 Zitnik M, Agrawal M, Leskovec J (2018) Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 34(13):i457–i466. https://doi.org/10.1093/bioinformatics/bty294