Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination
Tóm tắt
There is a limited repertoire of domain families in nature that are duplicated and combined in different ways to form the set of proteins in a genome. Most proteins in both prokaryote and eukaryote genomes consist of two or more domains, and we show that the family size distribution of multi-domain protein families follows a power law like that of individual families. Most domain pairs occur in four to six different domain architectures: in isolation and in combinations with different partners. We showed previously that within the set of all pairwise domain combinations, most small and medium-sized families are observed in combination with one or two other families, while a few large families are very versatile and combine with many different partners. Though this may appear to be a stochastic pattern, in which large families have more combination partners by virtue of their size, we establish here that all the domain families with more than three members in genomes are duplicated more frequently than would be expected by chance considering their number of neighbouring domains. This duplication of domain pairs is statistically significant for between one and three quarters of all families with seven or more members. For the majority of pairwise domain combinations, there is no known three-dimensional structure of the two domains together, and we term these novel combinations. Novel domain combinations are interesting and important targets for structural elucidation, as the geometry and interaction between the domains will help understand the function and evolution of multi-domain proteins. Of particular interest are those combinations that occur in the largest number of multi-domain proteins, and several of these frequent novel combinations contain DNA-binding domains. Abbreviations: SCOP: Structural Classification of Proteins database, PDB: Protein DataBank, HMM: hidden Markov model
Tài liệu tham khảo
Aloy, P. and Russell, R. B. (2002) Proc. Natl. Acad. Sci. USA., 99, 5896-5901.
Aloy P., Ciccarelli F. D., Leutwein C., Gavin A. C., Superti-Furga, G., Bork, P., Bottcher B. and Russell, R.B. (2002) EMBO Rep., 7, 628-635.
Apic, G., Gough, J. and Teichmann, S.A. (2001) J. Mol. Biol., 310, 311-325.
Bashton, M. and Chothia, C. (2002) J. Mol. Biol., 315, 927-939.
Berman, H.M., Battistuz, T., Bhat, T.N., Bluhm, W.F., Bourne, P.E., Burkhardt, K., Feng, Z., Gilliland, G.L., Iype, L., Jain, S., Fagan, P., Marvin, J., Padilla, D., Ravichandran, V., Schneider, B., Thanki, N., Weissig, H., Westbrook, J.D. and Zardecki, C. (2002) The protein data bank. Acta Crystallogr. D Biol. Crystallogr., 58, 899-907.
Blevins, R.A. and Tulinsky, A. (1985) J. Biol. Chem., 260, 4264-4268.
Blundell, T.L. and Mizuguchi, K. (2000) Prog. Biophys. Mol. Biol., 73, 289-295.
Brenner, S.E. (2001) Nat. Rev. Genet., 2, 801-809.
Chothia, C. (1992) Nature, 357, 543-544.
Erdös, P. and Rényi, A. (1960) Magyar Tud. Akad. Mat. Kutato Int. Kozl. 5, 17-61.
Geer, L.Y., Domrachev, M., Lipman D. J., Bryant, S. H. (2002) Genome Res., 12, 1619-1623
Gerstein, M. (1998a). Folding & Design, 3, 497-512.
Gerstein, M. (1998b) Proteins, 33, 518-534.
Gough, J., Karplus, K., Hughey, R. and Chothia, C. (2001) J. Mol. Biol., 313, 903-919.
Gough, J. and Chothia, C. (2002) Nucleic Acids Res., 30, 268-272.
Hegyi, H. and Gerstein, M. (2001) Genome Res., 11, 1632-40.
Jardine, O., Gough, J., Chothia, C. and Teichmann, S.A. (2002) Genome Res., 12, 916-929.
Karplus, K., Barrett, C. and Hughey, R. (1998) Bioinformatics, 14, 846-56.
Koonin, E. V., Wolf, Y. I., and Karev, P. (2002) Nature, 420, 218-223.
Liu, J. and Rost, B. (2001) Protein Sci., 10, 1970-1979.
LoConte, L., Brenner, S.E., Hubbard, T.J., Chothia, C. and Murzin, A.G. (2002) Nucleic Acids Res., 30, 264-7.
Kuznetsov, V.A., Pickalov, V.V., Senko, O.V. and Knott, G.D. (2002) J. Biol. Systems 10, 381-407.
Murzin, A., Brenner, S. E., Hubbard, T. and Chothia, C. (1995) J. Mol. Biol., 247, 536-540.
Orengo, C. A., Jones, D. T. and Thornton, J. M. (1994) Nature, 372, 631-634.
Ponting, C. P. and Russell, R. R. (2002) Annu. Rev. Biophys. Biomol. Struct., 31, 45-71.
Qian, J., Luscombe, N.M. and Gerstein, M. (2001) J. Mol. Biol., 313, 673-681.
Sigler, P. B., Jeffery, B.A., Matthews, B.W. and Blow, D. M. (1966) J. Mol. Biol., 15, 175-192.
Spahn, C. M., Beckmann, R., Eswar, N., Penczek, P. A., Sali, A., Blobel, G. and Frank, J. (2002) Cell, 107, 373-386.
Teichmann, S. A., Park, J. and Chothia, C. (1998) Proc. Natl. Acad. Sci. U.S.A., 95, 14658-14663.
Teichmann, S. A., Chothia, C. and Gerstein, M. (1999) Curr. Op. Struc. Biol., 9, 390-399.
Teichmann, S. A., Rison, S. C., Thornton, J. M., Riley, M., Gough, J. and Chothia, C. (2001) Trends Biotechnol., 19, 482-486.
Teichmann, S. A., Rison, S. C., Thornton, J.M., Riley, M., Gough, J. and Chothia, C. (2001) J. Mol. Biol., 311, 693-708.
Wolf, Y. I., Grishin, N. V. and Koonin, E. V. (2000) J. Mol. Biol. 299, 897-905.
Wuchty, S. (2001) Mol. Biol. Evol. 18, 1715-1723.