Development of an accurate classification system of proteins into structured and unstructured regions that uncovers novel structural domains: its application to human transcription factors
Tóm tắt
In addition to structural domains, most eukaryotic proteins possess intrinsically disordered (ID) regions. Although ID regions often play important functional roles, their accurate identification is difficult. As human transcription factors (TFs) constitute a typical group of proteins with long ID regions, we regarded them as a model of all proteins and attempted to accurately classify TFs into structural domains and ID regions. Although an extremely high fraction of ID regions besides DNA binding and/or other domains was detected in human TFs in our previous investigation, 20% of the residues were left unassigned. In this report, we exploit the generally higher sequence divergence in ID regions than in structural regions to completely divide proteins into structural domains and ID regions. The new dichotomic system first identifies domains of known structures, followed by assignment of structural domains and ID regions with a combination of pre-existing tools and a newly developed program based on sequence divergence, taking un-aligned regions into consideration. The system was found to be highly accurate: its application to a set of proteins with experimentally verified ID regions had an error rate as low as 2%. Application of this system to human TFs (401 proteins) showed that 38% of the residues were in structural domains, while 62% were in ID regions. The preponderance of ID regions makes a sharp contrast to TFs of Escherichia coli (229 proteins), in which only 5% fell in ID regions. The method also revealed that 4.0% and 11.8% of the total length in human and E. coli TFs, respectively, are comprised of structural domains whose structures have not been determined. The present system verifies that sequence divergence including information of unaligned regions is a good indicator of ID regions. The system for the first time estimates the complete fractioning of structured/un-structured regions in human TFs, also revealing structural domains without homology to known structures. These predicted novel structural domains are good targets of structural genomics. When applied to other proteins, the system is expected to uncover more novel structural domains.
Tài liệu tham khảo
Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM, Obradovic Z: Intrinsic disorder and protein function. Biochemistry 2002, 41(21):6573–6582.
Wright PE, Dyson HJ: Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. J Mol Biol 1999, 293(2):321–331.
Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT: Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J Mol Biol 2004, 337(3):635–645.
Dyson HJ, Wright PE: Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol 2005, 6(3):197–208.
Iakoucheva LM, Brown CJ, Lawson JD, Obradovic Z, Dunker AK: Intrinsic disorder in cell-signaling and cancer-associated proteins. J Mol Biol 2002, 323(3):573–584.
Tompa P: The interplay between structure and function in intrinsically unstructured proteins. FEBS Lett 2005, 579(15):3346–3354.
Uversky VN, Gillespie JR, Fink AL: Why are "natively unfolded" proteins unstructured under physiologic conditions? Proteins 2000, 41(3):415–427.
Gsponer J, Futschik ME, Teichmann SA, Babu MM: Tight Regulation of Unstrictires Proteins: From Transcript Synthesis to Protein Degradation. Science 2008, 322: 1365–1368.
Minezaki Y, Homma K, Kinjo AR, Nishikawa K: Human transcription factors contain a high fraction of intrinsically disordered regions essential for transcriptional regulation. J Mol Biol 2006, 359(4):1137–1149.
Liu J, Perumal NB, Oldfield CJ, Su EW, Uversky VN, Dunker AK: Intrinsic disorder in transcription factors. Biochemistry 2006, 45(22):6873–6888.
Bell S, Klein C, Muller L, Hansen S, Buchner J: p53 contains large unstructured regions in its native state. J Mol Biol 2002, 322(5):917–927.
Dawson R, Muller L, Dehner A, Klein C, Kessler H, Buchner J: The N-terminal domain of p53 is natively unfolded. J Mol Biol 2003, 332(5):1131–1141.
Kumar R, Betney R, Li J, Thompson EB, McEwan IJ: Induced alpha-helix structure in AF1 of the androgen receptor upon binding transcription factor TFIIF. Biochemistry 2004, 43(11):3008–3013.
Lee H, Mok KH, Muhandiram R, Park KH, Suk JE, Kim DH, Chang J, Sung YC, Choi KY, Han KH: Local structural elements in the mostly unstructured transcriptional activation domain of human p53. J Biol Chem 2000, 275(38):29426–29432.
Nagadoi A, Nakazawa K, Uda H, Okuno K, Maekawa T, Ishii S, Nishimura Y: Solution structure of the transactivation domain of ATF-2 comprising a zinc finger-like subdomain and a flexible subdomain. J Mol Biol 1999, 287(3):593–607.
Receveur-Brechot V, Bourhis JM, Uversky VN, Canard B, Longhi S: Assessing protein disorder and induced folding. Proteins 2006, 62(1):24–45.
Rustandi RR, Baldisseri DM, Weber DJ: Structure of the negative regulatory domain of p53 bound to S100B(betabeta). Nat Struct Biol 2000, 7(7):570–574.
Minezaki Y, Homma K, Nishikawa K: Genome-wide survey of transcription factors in prokaryotes reveals many bacteria-specific families not found in archaea. DNA Res 2005, 12(5):269–280.
Madan Babu M, Teichmann SA: Evolution of transcription factors and the gene regulatory network in Escherichia coli. Nucleic Acids Res 2003, 31(4):1234–1244.
Minezaki Y, Homma K, Nishikawa K: Intrinsically disordered regions of human plasma membrane proteins preferentially occur in the cytoplasmic segment. J Mol Biol 2007, 368(3):902–913.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389–3402.
Durbin R, Eddy SA, Krogh A, Mitchison G: The theory behind profile HMMs. In Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambirdge University Press; 1998.
Gough J, Karplus K, Hughey R, Chothia C: Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol 2001, 313(4):903–919.
Shimizu K, Hirose S, Noguchi T: POODLE-S: web application for predicting protein disorder by using physicochemical features and reduced amino acid set of a position-specific scoring matrix. Bioinformatics 2007, 23(17):2337–2338.
Hirose S, Shimizu K, Kanai S, Kuroda Y, Noguchi T: POODLE-L: a two-level SVM prediction system for reliably predicting long disordered regions. Bioinformatics 2007, 23(16):2046–2053.
Yang ZR, Thomson R, McNeil P, Esnouf RM: RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins. Bioinformatics 2005, 21(16):3369–3376.
Linding R, Russell RB, Neduva V, Gibson TJ: GlobPlot: Exploring protein sequences for globularity and disorder. Nucleic Acids Res 2003, 31(13):3701–3708.
Romero P, Jensen L, Diella F, Bork P, Gibson TJ, Russel RB: Protein disorder prediction:implications for structural proteomics. Structure 2003, 11: 1453–1459.
Dunker AK, Lawson JD, Brown CJ, Williams RM, Romero P, Oh JS, Oldfield CJ, Campen AM, Ratliff CM, Hipps KW, et al.: Intrinsically disordered protein. J Mol Graph Model 2001, 19(1):26–59.
Weathers EA, Paulaitis ME, Woolf TB, Hoh JH: Reduced amino acid alphabet is sufficient to accurately recognize intrinsically disordered protein. FEBS Lett 2004, 576(3):348–352.
Ekman D, Bjorklund AK, Frey-Skott J, Elofsson A: Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions. J Mol Biol 2005, 348(1):231–243.
Vogel C, Berzuini C, Bashton M, Gough J, Teichmann SA: Supra-domains: evolutionary units larger than single protein domains. J Mol Biol 2004, 336(3):809–823.
Orengo CA, Thornton JM: Protein families and their evolution-a structural perspective. Annu Rev Biochem 2005, 74: 867–900.
George RA, Heringa J: Protein domain identification and improved sequence similarity searching using PSI-BLAST. Proteins 2002, 48(4):672–681.
Kuroda Y, Tani K, Matsuo Y, Yokoyama S: Automated search of natively folded protein fragments for high-throughput structure determination in structural genomics. Protein Sci 2000, 9(12):2313–2321.
Brown CJ, Takayama S, Campen AM, Vise P, Marshall TW, Oldfield CJ, Williams CJ, Dunker AK: Evolutionary rate heterogeneity in proteins with long disordered regions. J Mol Evol 2002, 55(1):104–110.
Hur E, Pfaff SJ, Payne ES, Gron H, Buehrer BM, Fletterick RJ: Recognition and accommodation at the androgen receptor coactivator binding interface. PLoS Biol 2004, 2(9):E274.
Shaffer PL, Jivan A, Dollins DE, Claessens F, Gewirth DT: Structural basis of androgen receptor binding to selective androgen response elements. Proc Natl Acad Sci USA 2004, 101(14):4758–4763.
Sickmeier M, Hamilton JA, LeGall T, Vacic V, Cortese MS, Tantos A, Szabo B, Tompa P, Chen J, Uversky VN, et al.: DisProt: the Database of Disordered Proteins. Nucleic Acids Res 2007, (35 Database):D786–793.
Katan-Khaykovich Y, Shaul Y: Nuclear import and DNA-binding activity of RFX1. Evidence for an autoinhibitory mechanism. Eur J Biochem 2001, 268(10):3108–3116.
Reith W, Herrero-Sanchez C, Kobr M, Silacci P, Berte C, Barras E, Fey S, Mach B: MHC class II regulatory factor RFX has a novel DNA-binding domain and a functionally independent dimerization domain. Genes Dev 1990, 4(9):1528–1540.
Reith W, Ucla C, Barras E, Gaud A, Durand B, Herrero-Sanchez C, Kobr M, Mach B: RFX1, a transactivator of hepatitis B virus enhancer I, belongs to a novel family of homodimeric and heterodimeric DNA-binding proteins. Mol Cell Biol 1994, 14(2):1230–1244.
Hua X, Nohturfft A, Goldstein JL, Brown MS: Sterol resistance in CHO cells traced to point mutation in SREBP cleavage-activating protein. Cell 1996, 87(3):415–426.
Sakai J, Nohturfft A, Cheng D, Ho YK, Brown MS, Goldstein JL: Identification of complexes between the COOH-terminal domains of sterol regulatory element-binding proteins (SREBPs) and SREBP cleavage-activating protein. J Biol Chem 1997, 272(32):20213–20221.
Sakai J, Duncan EA, Rawson RB, Hua X, Brown MS, Goldstein JL: Sterol-regulated release of SREBP-2 from cell membranes requires two sequential cleavages, one within a transmembrane segment. Cell 1996, 85(7):1037–1046.
Clore GM, Omichinski JG, Sakaguchi K, Zambrano N, Sakamoto H, Appella E, Gronenborn AM: High-resolution structure of the oligomerization domain of p53 by multidimensional NMR. Science 1994, 265(5170):386–391.
Dames SA, Martinez-Yamout M, De Guzman RN, Dyson HJ, Wright PE: Structural basis for Hif-1 alpha/CBP recognition in the cellular hypoxic response. Proc Natl Acad Sci USA 2002, 99(8):5271–5276.
Freedman SJ, Sun ZY, Poy F, Kung AL, Livingston DM, Wagner G, Eck MJ: Structural basis for recruitment of CBP/p300 by hypoxia-inducible factor-1 alpha. Proc Natl Acad Sci USA 2002, 99(8):5367–5372.
Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C, Murzin AG: Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res 2008, (36 Database):D419–425.
Rubin SM, Gall AL, Zheng N, Pavletich NP: Structure of the Rb C-terminal domain bound to E2F1-DP1: a mechanism for phosphorylation-induced E2F release. Cell 2005, 123(6):1093–1106.
Trimarchi JM, Lees JA: Sibling rivalry in the E2F family. Nat Rev Mol Cell Biol 2002, 3(1):11–20.
Chen JW, Romero P, Uversky VN, Dunker AK: Conservation of intrinsic disorder in protein domains and families: I. A database of conserved predicted disordered regions. J Proteome Res 2006, 5(4):879–887.
Chen JW, Romero P, Uversky VN, Dunker AK: Conservation of intrinsic disorder in protein domains and families: II. functions of conserved disorder. J Proteome Res 2006, 5(4):888–898.
Sonnhammer EL, Eddy SR, Durbin R: Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 1997, 28(3):405–420.
Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, et al.: The Pfam protein families database. Nucleic Acids Res 2008, (36 Database):D281–288.
Cheng Y, Oldfield CJ, Meng J, Romero P, Uversky VN, Dunker AK: Mining alpha-helix-forming molecular recognition features with cross species sequence alignments. Biochemistry 2007, 46(47):13468–13477.
Mohan A, Oldfield CJ, Radivojac P, Vacic V, Cortese MS, Dunker AK, Uversky VN: Analysis of molecular recognition features (MoRFs). J Mol Biol 2006, 362(5):1043–1059.
Wootton JC, Federhen S: Analysis of compositionally biased regions in sequence databases. Methods Enzymol 1996, 266: 554–571.
Fukuchi S, Homma K, Sakamoto S, Sugawara H, Tateno Y, Gojobori T, Nishikawa K: The GTOP database in 2009: updated content and novel features to expand and deepen insights into protein structures and functions. Nucleic Acids Res 2009, (37 Database):D333–337.
Kawabata T, Fukuchi S, Homma K, Ota M, Araki J, Ito T, Ichiyoshi N, Nishikawa K: GTOP: a database of protein structures predicted from genome sequences. Nucleic Acids Res 2002, 30(1):294–298.
Weinreb PH, Zhen W, Poon AW, Conway KA, Lansbury PT Jr: NACP, a protein implicated in Alzheimer's disease and learning, is natively unfolded. Biochemistry 1996, 35(43):13709–13715.
Kussie PH, Gorina S, Marechal V, Elenbaas B, Moreau J, Levine AJ, Pavletich NP: Structure of the MDM2 oncoprotein bound to the p53 tumor suppressor transactivation domain. Science 1996, 274(5289):948–953.
UniProt Consortium: The universal protein resource (UniProt). Nucleic Acids Res 2008, (36 Database):D190–195.