A new ensemble coevolution system for detecting HIV-1 protein coevolution

Guangdi Li1, Kristof Theys1, Jens Verheyen2, Andrea-Clemencia Pineda-Peña1,3, Ricardo Khouri1, Supinya Piampongsant1, Mónica Eusébio4, Jan Ramon5, Anne-Mieke Vandamme4,1
1KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium
2Institute of Virology, University Hospital, University Duisburg-Essen, Essen, Germany
3Clinical and Molecular Infectious Disease Group, Faculty of Sciences and Mathematics, Universidad del Rosario, Bogotá, Colombia
4Centro de Malária e Outras Doenças Tropicais and Unidade de Microbiologia, Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa, Lisboa, Portugal
5Department of Computer Science, KU Leuven - University of Leuven, Leuven, Belgium

Tóm tắt

A key challenge in the field of HIV-1 protein evolution is the identification of coevolving amino acids at the molecular level. In the past decades, many sequence-based methods have been designed to detect position-specific coevolution within and between different proteins. However, an ensemble coevolution system that integrates different methods to improve the detection of HIV-1 protein coevolution has not been developed. We integrated 27 sequence-based prediction methods published between 2004 and 2013 into an ensemble coevolution system. This system allowed combinations of different sequence-based methods for coevolution predictions. Using HIV-1 protein structures and experimental data, we evaluated the performance of individual and combined sequence-based methods in the prediction of HIV-1 intra- and inter-protein coevolution. We showed that sequence-based methods clustered according to their methodology, and a combination of four methods outperformed any of the 27 individual methods. This four-method combination estimated that HIV-1 intra-protein coevolving positions were mainly located in functional domains and physically contacted with each other in the protein tertiary structures. In the analysis of HIV-1 inter-protein coevolving positions between Gag and protease, protease drug resistance positions near the active site mostly coevolved with Gag cleavage positions (V128, S373-T375, A431, F448-P453) and Gag C-terminal positions (S489-Q500) under selective pressure of protease inhibitors. This study presents a new ensemble coevolution system which detects position-specific coevolution using combinations of 27 different sequence-based methods. Our findings highlight key coevolving residues within HIV-1 structural proteins and between Gag and protease, shedding light on HIV-1 intra- and inter-protein coevolution. This article was reviewed by Dr. Zoltán Gáspári.

Từ khóa


Tài liệu tham khảo

Zhao G, Perilla JR, Yufenyuy EL, Meng X, Chen B, Ning J, Ahn J, Gronenborn AM, Schulten K, Aiken C, Zhang P (2013) Mature HIV-1 capsid structure by cryo-electron microscopy and all-atom molecular dynamics. Nature 497:643–646

Waheed AA, Freed EO (2012) HIV type 1 Gag as a target for antiviral therapy. AIDS Res Hum Retroviruses 28:54–75

Bell NM, Lever AM (2013) HIV Gag polyprotein: processing and early viral particle assembly. Trends Microbiol 21:136–144

Fun A, Wensing AM, Verheyen J, Nijhuis M (2012) Human immunodeficiency virus Gag and protease: partners in resistance. Retrovirology 9:63

Carlson JM, Brumme ZL, Rousseau CM, Brumme CJ, Matthews P, Kadie C, Mullins JI, Walker BD, Harrigan PR, Goulder PJ, Heckerman D (2008) Phylogenetic dependency networks: inferring patterns of CTL escape and codon covariation in HIV-1 Gag. PLoS Comput Biol 4:e1000225

Kalinina OV, Oberwinkler H, Glass B, Krausslich HG, Russell RB, Briggs JA (2012) Computational identification of novel amino-acid interactions in HIV Gag via correlated evolution. PLoS One 7:e42468

Dahirel V, Shekhar K, Pereyra F, Miura T, Artyomov M, Talsania S, Allen TM, Altfeld M, Carrington M, Irvine DJ, Walker BD, Chakraborty AK (2011) Coordinate linkage of HIV evolution reveals regions of immunological vulnerability. Proc Natl Acad Sci U S A 108:11530–11535

Rhee SY, Liu TF, Holmes SP, Shafer RW (2007) HIV-1 subtype B protease and reverse transcriptase amino acid covariation. PLoS Comput Biol 3:e87

Rhee SY, Liu TF, Kiuchi M, Zioni R, Gifford RJ, Holmes SP, Shafer RW (2008) Natural variation of HIV-1 group M integrase: implications for a new class of antiretroviral inhibitors. Retrovirology 5:74

Beerenwinkel N, Rahnenfuhrer J, Daumer M, Hoffmann D, Kaiser R, Selbig J, Lengauer T (2005) Learning multiple evolutionary pathways from cross-sectional data. J Comput Biol 12:584–598

Travers SA, Tully DC, McCormack GP, Fares MA (2007) A study of the coevolutionary patterns operating within the env gene of the HIV-1 group M subtypes. Mol Biol Evol 24:2787–2801

Bizinoto MC, Yabe S, Leal E, Kishino H, Martins Lde O, de Lima ML, Morais ER, Diaz RS, Janini LM (2013) Codon pairs of the HIV-1 vif gene correlate with CD4+ T cell count. BMC Infect Dis 13:173

Theys K, Deforche K, Libin P, Camacho RJ, Van Laethem K, Vandamme AM (2010) Resistance pathways of human immunodeficiency virus type 1 against the combination of zidovudine and lamivudine. J Gen Virol 91:1898–1908

Fares MA, Travers SA (2006) A novel method for detecting intramolecular coevolution: adding a further dimension to selective constraints analyses. Genetics 173:9–23

Lovell SC, Robertson DL (2010) An integrated view of molecular coevolution in protein-protein interactions. Mol Biol Evol 27:2567–2575

Lockless SW, Ranganathan R (1999) Evolutionarily conserved pathways of energetic connectivity in protein families. Science 286:295–299

Hopf TA, Colwell LJ, Sheridan R, Rost B, Sander C, Marks DS (2012) Three-dimensional structures of membrane proteins from genomic sequencing. Cell 149:1607–1621

Ashkenazy H, Kliger Y (2010) Reducing phylogenetic bias in correlated mutation analysis. Protein Eng Des Sel 23:321–326

Weigt M, White RA, Szurmant H, Hoch JA, Hwa T (2009) Identification of direct residue contacts in protein-protein interaction by message passing. Proc Natl Acad Sci U S A 106:67–72

Suel GM, Lockless SW, Wall MA, Ranganathan R (2003) Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nat Struct Biol 10:59–69

Rausell A, Juan D, Pazos F, Valencia A (2010) Protein interactions and ligand binding: from protein subfamilies to functional specificity. Proc Natl Acad Sci U S A 107:1995–2000

de Juan D, Pazos F, Valencia A (2013) Emerging methods in protein co-evolution. Nat Rev Genet 14:249–261

Fitch WM, Markowitz E (1970) An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem Genet 4:579–593

Horner DS, Pirovano W, Pesole G (2008) Correlated substitution analysis and the prediction of amino acid structural contacts. Brief Bioinform 9:46–56

Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, Zecchina R, Onuchic JN, Hwa T, Weigt M (2011) Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci U S A 108:E1293–E1301

Ekeberg M, Lovkvist C, Lan Y, Weigt M, Aurell E (2013) Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys Rev E Stat Nonlin Soft Matter Phys 87:012707

Liu Y, Bahar I (2012) Sequence evolution correlates with structural dynamics. Mol Biol Evol 29:2253–2263

Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33:1–39

Breiman L (2001) Random forests. Mach Learn 45:5–32

Freund Y, Schapire RE: Experiments with a new boosting algorithm. In ICML 1996, 148–156.

Troć M, Unold O (2010) Self-Adaptation of Parameters in a Learning Classifier System Ensemble Machine

Gao Y, Huang JZ, Wu L (2007) Learning classifier system ensemble and compact rule set. Connect Sci 19:321–337

Bacardit J, Krasnogor N: Empirical evaluation of ensemble techniques for a pittsburgh learning classifier system. In Learning Classifier Systems. Berlin Heidelberg: Springer; 2008, 4998:255–268.

Dunn SD, Wahl LM, Gloor GB (2008) Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics 24:333–340

Deforche K, Silander T, Camacho R, Grossman Z, Soares MA, Van Laethem K, Kantor R, Moreau Y, Vandamme AM, Non BW (2006) Analysis of HIV-1 pol sequences using Bayesian Networks: implications for drug resistance. Bioinformatics 22:2975–2979

Yeang CH, Haussler D (2007) Detecting coevolution in and among protein domains. PLoS Comput Biol 3:e211

Dutheil J, Galtier N (2007) Detecting groups of coevolving positions in a molecule: a clustering approach. BMC Evol Biol 7:242

Halperin I, Wolfson H, Nussinov R (2006) Correlated mutations: advances and limitations. A study on fusion proteins and on the Cohesin-Dockerin families. Proteins 63:832–845

Di Lena P, Nagata K, Baldi P (2012) Deep architectures for protein contact map prediction. Bioinformatics 28:2449–2457

Eickholt J, Cheng J (2012) Predicting protein residue-residue contacts using deep networks and boosting. Bioinformatics 28:3066–3072

Kamisetty H, Ovchinnikov S, Baker D (2013) Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc Natl Acad Sci U S A 110:15674–15679

Tillier ER, Lui TW (2003) Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments. Bioinformatics 19:750–755

Burger L, van Nimwegen E (2010) Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS Comput Biol 6:e1000633

Ackerman SH, Tillier ER, Gatti DL (2012) Accurate simulation and detection of coevolution signals in multiple sequence alignments. PLoS One 7:e47108

Bremm S, Schreck T, Boba P, Held S, Hamacher K (2010) Computing and visually analyzing mutual information in molecular co-evolution. BMC Bioinform 11:330

Gao H, Dou Y, Yang J, Wang J (2011) New methods to measure residues coevolution in proteins. BMC Bioinform 12:206

Lee BC, Kim D (2009) A new method for revealing correlated mutations under the structural and functional constraints in proteins. Bioinformatics 25:2506–2513

Tegge AN, Wang Z, Eickholt J, Cheng J (2009) NNcon: improved protein contact map prediction using 2D-recursive neural networks. Nucleic Acids Res 37:W515–W518

Jones DT, Buchan DW, Cozzetto D, Pontil M (2012) PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28:184–190

Wang Z, Xu J (2013) Predicting protein contact map using evolutionary and physical constraints by integer programming. Bioinformatics 29:i266–i273

Gouveia-Oliveira R, Pedersen AG (2007) Finding coevolving amino acid residues using row and column weighting of mutual information and multi-dimensional amino acid representation. Algorithms Mol Biol 2:12

Poon AF, Lewis FI, Frost SD, Kosakovsky Pond SL (2008) Spidermonkey: rapid detection of co-evolving sites using Bayesian graphical models. Bioinformatics 24:1949–1950

Halabi N, Rivoire O, Leibler S, Ranganathan R (2009) Protein sectors: evolutionary units of three-dimensional structure. Cell 138:774–786

Cheng J, Baldi P (2007) Improved residue contact prediction using support vector machines and a large feature set. BMC Bioinform 8:113

Little DY, Chen L (2009) Identification of coevolving residues and coevolution potentials emphasizing structure, bond formation and catalytic coordination in protein evolution. PLoS One 4:e4762

Gouy M, Guindon S, Gascuel O (2010) SeaView version 4: A multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol Biol Evol 27:221–224

Li G, Verheyen J, Rhee SY, Voet A, Vandamme AM, Theys K (2013) Functional conservation of HIV-1 gag: implications for rational drug design. Retrovirology 10:126

Minh BQ, Le Vinh S, Von Haeseler A, Schmidt HA (2005) pIQPNNI: parallel reconstruction of large maximum likelihood phylogenies. Bioinformatics 21:3794–3796

Stamatakis A (2006) RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22:2688–2690

Hooft RW, Vriend G, Sander C, Abola EE (1996) Errors in protein structures. Nature 381:272

Brodersen KH, Ong CS, Stephan KE, Buhmann JM: The binormal assumption on precision-recall curves. In Pattern Recognition (ICPR), 2010 20th International Conference on. IEEE; 2010:4263–4266.

Li Y, Fang Y, Fang J (2011) Predicting residue-residue contacts using random forest models. Bioinformatics 27:3379–3384

Wolda H (1981) Similarity indices, sample size and diversity. Oecologia 50:296–302

Polikar R: Ensemble learning. In Ensemble Machine Learning. Springer; 2012:1–34.

Krogh A, Sollich P (1997) Statistical mechanics of ensemble learning. Phys Rev E 55:811

Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Machine Learning Res 3:1157–1182

Sinisi SE, Polley EC, Petersen ML, Rhee SY, Van Der Laan MJ: Super learning: an application to the prediction of HIV-1 drug resistance. Stat Appl Genet Mol Biol 2007, 6:Article7.

Gama J, Brazdil P (2000) Cascade generalization. Mach Learn 41:315–343

Saha I, Zubek J, Klingstrom T, Forsberg S, Wikander J, Kierczak M, Maulik U, Plewczynski D: Ensemble learning prediction of protein-protein interactions using proteins functional annotations. Mol BioSyst 2014.

Yang J, Jang R, Zhang Y, Shen HB (2013) High-accuracy prediction of transmembrane inter-helix contacts and application to GPCR 3D structure modeling. Bioinformatics 29:2579–2587

Skwark MJ, Abdel-Rehim A, Elofsson A (2013) PconsC: combination of direct information methods and alignments improves contact prediction. Bioinformatics 29:1815–1816

Dutheil JY (2012) Detecting coevolving positions in a molecule: why and how to account for phylogeny. Brief Bioinform 13:228–243

Hakes L, Lovell SC, Oliver SG, Robertson DL (2007) Specificity in protein interactions and its relationship with sequence diversity and coevolution. Proc Natl Acad Sci U S A 104:7999–8004

Ha JH, Loh SN (2012) Protein conformational switches: from nature to design. Chemistry 18:7984–7999

Fodor AA, Aldrich RW (2004) Influence of conservation on calculations of amino acid covariance in multiple sequence alignments. Proteins 56:211–221

Morikawa Y, Zhang WH, Hockley DJ, Nermut MV, Jones IM (1998) Detection of a trimeric human immunodeficiency virus type 1 Gag intermediate is dependent on sequences in the matrix protein, p17. J Virol 72:7659–7663

Kiernan RE, Ono A, Freed EO (1999) Reversion of a human immunodeficiency virus type 1 matrix mutation affecting Gag membrane binding, endogenous reverse transcriptase activity, and virus infectivity. J Virol 73:4728–4737

Tedbury PR, Ablan SD, Freed EO (2013) Global rescue of defects in HIV-1 envelope glycoprotein incorporation: implications for matrix structure. PLoS Pathog 9:e1003739

Pornillos O, Ganser-Pornillos BK, Yeager M (2011) Atomic-level modelling of the HIV capsid. Nature 469:424–427

Pornillos O, Ganser-Pornillos BK, Kelly BN, Hua Y, Whitby FG, Stout CD, Sundquist WI, Hill CP, Yeager M (2009) X-ray structures of the hexameric building block of the HIV capsid. Cell 137:1282–1292

Byeon IJ, Meng X, Jung J, Zhao G, Yang R, Ahn J, Shi J, Concel J, Aiken C, Zhang P, Gronenborn AM (2009) Structural convergence between Cryo-EM and NMR reveals intersubunit interactions critical for HIV-1 capsid function. Cell 139:780–790

Yufenyuy EL, Aiken C (2013) The NTD-CTD intersubunit interface plays a critical role in assembly and stabilization of the HIV-1 capsid. Retrovirology 10:29

Liang C, Hu J, Russell RS, Roldan A, Kleiman L, Wainberg MA (2002) Characterization of a putative α-helix across the capsid-SP1 boundary that is critical for the multimerization of human immunodeficiency virus type 1 Gag. J Virol 76:11729–11737

Liu Y, Eyal E, Bahar I (2008) Analysis of correlated mutations in HIV-1 protease using spectral clustering. Bioinformatics 24:1243–1250

Haq O, Levy RM, Morozov AV, Andrec M (2009) Pairwise and higher-order correlations among drug-resistance mutations in HIV-1 subtype B protease. BMC Bioinform 10(Suppl 8):S10

Li G, Verheyen J, Theys K, Piampongsant S, Van Laethem K, Vandamme AM (2014) HIV-1 Gag C-terminal amino acid substitutions emerging under selective pressure of protease inhibitors in patient populations infected with different HIV-1 subtypes. Retrovirology 11:79

Prabu-Jeyabalan M, Nalivaika E, Schiffer CA (2002) Substrate shape determines specificity of recognition for HIV-1 protease: analysis of crystal structures of six substrate complexes. Structure 10:369–381

Lee SK, Potempa M, Kolli M, Ozen A, Schiffer CA, Swanstrom R (2012) Context surrounding processing sites is crucial in determining cleavage rate of a subset of processing sites in HIV-1 Gag and Gag-Pro-Pol polyprotein precursors by viral protease. J Biol Chem 287:13279–13290

Vercauteren J, Beheydt G, Prosperi M, Libin P, Imbrechts S, Camacho R, Clotet B, De Luca A, Grossman Z, Kaiser R, Sonnerborg A, Torti C, Van Wijngaerden E, Schmit JC, Zazzi M, Geretti AM, Vandamme AM, Van Laethem K (2013) Clinical evaluation of Rega 8: an updated genotypic interpretation system that significantly predicts HIV-therapy response. PLoS One 8:e61436

Watanabe SM, Chen MH, Khan M, Ehrlich L, Kemal KS, Weiser B, Shi B, Chen C, Powell M, Anastos K, Burger H, Carter CA (2013) The S40 residue in HIV-1 Gag p6 impacts local and distal budding determinants, revealing additional late domain activities. Retrovirology 10:143

Datta SA, Curtis JE, Ratcliff W, Clark PK, Crist RM, Lebowitz J, Krueger S, Rein A (2007) Conformation of the HIV-1 Gag protein in solution. J Mol Biol 365:812–824

Gong S, Park C, Choi H, Ko J, Jang I, Lee J, Bolser DM, Oh D, Kim DS, Bhak J (2005) A protein domain interaction interface database: InterPare. BMC Bioinform 6:207

Soundararajan V, Raman R, Raguram S, Sasisekharan V, Sasisekharan R (2010) Atomic interaction networks in the core of protein domains and their native folds. PLoS One 5:e9391

Li G: HIV genome-wide diversity, interaction and coevolution. Doctoral thesis, University of Leuven, Belgium. 2014 (https://lirias.kuleuven.be/handle/123456789/460408).