Solvent Accessibility of Residues Undergoing Pathogenic Variations in Humans: From Protein Structures to Protein Sequences

Castrense Savojardo1, Matteo Manfredi1, Pier Luigi Martelli1, Rita Casadio1,2
1Biocomputing Group, Department of Pharmacy and Biotechnologies, University of Bologna, Bologna, Italy
2Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies of the National Research Council, Bari, Italy

Tóm tắt

Solvent accessibility (SASA) is a key feature of proteins for determining their folding and stability. SASA is computed from protein structures with different algorithms, and from protein sequences with machine-learning based approaches trained on solved structures. Here we ask the question as to which extent solvent exposure of residues can be associated to the pathogenicity of the variation. By this, SASA of the wild-type residue acquires a role in the context of functional annotation of protein single-residue variations (SRVs). By mapping variations on a curated database of human protein structures, we found that residues targeted by disease related SRVs are less accessible to solvent than residues involved in polymorphisms. The disease association is not evenly distributed among the different residue types: SRVs targeting glycine, tryptophan, tyrosine, and cysteine are more frequently disease associated than others. For all residues, the proportion of disease related SRVs largely increases when the wild-type residue is buried and decreases when it is exposed. The extent of the increase depends on the residue type. With the aid of an in house developed predictor, based on a deep learning procedure and performing at the state-of-the-art, we are able to confirm the above tendency by analyzing a large data set of residues subjected to variations and occurring in some 12,494 human protein sequences still lacking three-dimensional structure (derived from HUMSAVAR). Our data support the notion that surface accessible area is a distinguished property of residues that undergo variation and that pathogenicity is more frequently associated to the buried property than to the exposed one.

Từ khóa


Tài liệu tham khảo

Ali, 2014, A review of methods available to estimate solvent-accessible surface areas of soluble proteins in the folded and unfolded states, Curr. Protein Pept. Sci., 15, 456, 10.2174/1389203715666140327114232

Baldi, 2018, Deep learning in biomedical data science, Annu. Rev. Biomed. Data Sci., 1, 181, 10.1146/annurev-biodatasci-080917-013343

Berman, 2000, The protein data bank, Nucleic Acids Res., 28, 235, 10.1093/nar/28.1.235

Casadio, 2011, Correlating disease related mutations to their effect on protein stability: a large-scale analysis of the human proteome, Hum. Mutat, 32, 1161, 10.1002/humu.21555

Chen, 2005, Prediction of solvent accessibility and sites of deleterious mutations from protein sequence, Nucleic Acids Res., 33, 3193, 10.1093/nar/gki633

Chothia, 1976, The nature of the accessible and buried surfaces in proteins, J. Mol. Biol., 105, 1, 10.1016/0022-2836(76)90191-1

Drozdetskiy, 2015, JPred4: a protein secondary structure prediction server, Nucleic Acids Res., 43, W389, 10.1093/nar/gkv332

Fan, 2016, PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility, BMC Bioinform., 17, S8, 10.1186/s12859-015-0851-2

Graves, 2005, Framewise phoneme classification with bidirectional LSTM and other neural network architectures, Neural Netw., 18, 602, 10.1016/j.neunet.2005.06.042

Hochreiter, 1997, Long short-term memory, Neural Comput., 9, 1735, 10.1162/neco.1997.9.8.1735

Kabsch, 1983, Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features, Biopolymers, 22, 2577, 10.1002/bip.360221211

Kaleel, 2019, PaleAle 5.0: prediction of protein relative solvent accessibility by deep learning, Amino Acids, 51, 1289, 10.1007/s00726-019-02767-6

Klausen, 2019, NetSurfP-2.0: improved prediction of protein structural features by integrated deep learning, Proteins Struct. Funct. Bioinforma., 87, 520, 10.1002/prot.25674

Lee, 1971, The interpretation of protein structures: estimation of static accessibility, J. Mol. Biol., 55, 379, 10.1016/0022-2836(71)90324-X

Ma, 2015, AcconPred: predicting solvent accessibility and contact number simultaneously by a multitask learning framework under the conditional neural fields model, BioMed Res. Int., 2015, 678764, 10.1155/2015/678764

Martelli, 2016, Large scale analysis of protein stability in OMIM disease related human protein variants, BMC Genomics, 17, 397, 10.1186/s12864-016-2726-y

Miller, 1987, The accessible surface area and stability of oligomeric proteins, Nature, 328, 834, 10.1038/328834a0

Mirdita, 2017, Uniclust databases of clustered and deeply annotated protein sequences and alignments, Nucleic Acids Res., 45, D170, 10.1093/nar/gkw1081

Mucchielli-Giorgi, 1999, PredAcc: prediction of solvent accessibility, Bioinformatics, 15, 176, 10.1093/bioinformatics/15.2.176

Pollastri, 2002, Prediction of coordination number and relative solvent accessibility in proteins, Proteins Struct. Funct. Genet., 47, 142, 10.1002/prot.10069

Rost, 1994, Conservation and prediction of solvent accessibility in protein families, Proteins Struct. Funct. Bioinforma., 20, 216, 10.1002/prot.340200303

Savojardo, 2019, Functional and structural features of disease-related protein variants, Int. J. Mol. Sci., 20, 1530, 10.3390/ijms20071530

Savojardo, 2020, Protein–protein interaction methods and protein phase separation, Annu. Rev. Biomed. Data Sci., 3, 89, 10.1146/annurev-biodatasci-011720-104428

Shrake, 1973, Environment and exposure to solvent of protein atoms, Lysozyme and insulin. J. Mol. Biol., 79, 351, 10.1016/0022-2836(73)90011-9

Steinegger, 2019, HH-suite3 for fast remote homology detection and deep protein annotation, BMC Bioinform., 20, 473, 10.1186/s12859-019-3019-7

Thompson, 1996, Predicting solvent accessibility: higher accuracy using Bayesian statistics and optimized residue substitution classes, Proteins, 25, 38, 10.1002/(SICI)1097-0134(199605)25:1<38::AID-PROT4>3.0.CO;2-G

Tien, 2013, Maximum allowed solvent accessibilites of residues in proteins, PLoS ONE, 8, e80635, 10.1371/journal.pone.0080635

Wu, 2017, Accurate prediction of protein relative solvent accessibility using a balanced model, BioData Min., 10, 1, 10.1186/s13040-016-0121-5