The COG database: an updated version includes eukaryotes

BMC Bioinformatics - Tập 4 Số 1
Roman L. Tatusov1, Natalie D. Fedorova1, John D. Jackson1, Aviva R. Jacobs1, Boris Kiryutin1, Eugene V. Koonin1, Dmitri M. Krylov1, Raja Mazumder2, Sergei Mekhedov1, A. N. NIKOL'SKAYA2, B Sridhar Rao1, Sergei Smirnov1, Alexander V. Sverdlov1, Sona Vasudevan1, Yuri I. Wolf1, Jodie J Yin1, Darren A. Natale2
1National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
2Protein Information Resource, Georgetown University Medical Center, 3900 Reservoir Road, NW, Washington, DC, 20007, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

Fitch WM: Distinguishing homologous from analogous proteins. Systematic Zoology 1970, 19: 99–106.

Fitch WM: Homology a personal view on some of the problems. Trends Genet 2000, 16: 227–231. 10.1016/S0168-9525(00)02005-9

Henikoff S, Greene EA, Pietrokovski S, Bork P, Attwood TK, Hood L: Gene families: the taxonomy of protein paralogs and chimeras. Science 1997, 278: 609–614. 10.1126/science.278.5338.609

Sonnhammer EL, Koonin EV: Orthology, paralogy and proposed classification for paralog subtypes. Trends Genet 2002, 18: 619–620. 10.1016/S0168-9525(02)02793-2

Wilson CA, Kreychman J, Gerstein M: Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. J Mol Biol 2000, 297: 233–249. 10.1006/jmbi.2000.3550

Sicheritz-Ponten T, Andersson SG: A phylogenomic approach to microbial evolution. Nucleic Acids Res 2001, 29: 545–552. 10.1093/nar/29.2.545

Zmasek CM, Eddy SR: RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs. BMC Bioinformatics 2002, 3: 14. 10.1186/1471-2105-3-14

Storm CE, Sonnhammer EL: Automated ortholog inference from phylogenetic trees and calculation of orthology reliability. Bioinformatics 2002, 18: 92–99. 10.1093/bioinformatics/18.1.92

Tatusov RL, Koonin EV, Lipman DJ: A genomic perspective on protein families. Science 1997, 278: 631–637. 10.1126/science.278.5338.631

Huynen MA, Bork P: Measuring genome evolution. Proc Natl Acad Sci U S A 1998, 95: 5849–5856. 10.1073/pnas.95.11.5849

Montague MG, Hutchison CA 3rd: Gene content phylogeny of herpesviruses. Proc Natl Acad Sci U S A 2000, 97: 5334–5339. 10.1073/pnas.97.10.5334

Jordan IK, Makarova KS, Spouge JL, Wolf YI, Koonin EV: Lineage-specific gene expansions in bacterial and archaeal genomes. Genome Res 2001, 11: 555–565. 10.1101/gr.GR-1660R

Remm M, Storm CE, Sonnhammer EL: Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol 2001, 314: 1041–1052. 10.1006/jmbi.2000.5197

Lespinet O, Wolf YI, Koonin EV, Aravind L: The role of lineage-specific gene family expansion in the evolution of eukaryotes. Genome Res 2002, 12: 1048–1059. 10.1101/gr.174302

Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV: The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res 2001, 29: 22–28. 10.1093/nar/29.1.22

Slesarev AI, Mezhevaya KV, Makarova KS, Polushin NN, Shcherbinina OV, Shakhova VV, Belova GI, Aravind L, Natale DA, Rogozin IB, Tatusov RL, Wolf YI, Stetter KO, Malykh AG, Koonin EV, Kozyavkin SA: The complete genome of hyperthermophile Methanopyrus kandleri AV19 and monophyly of archaeal methanogens. Proc Natl Acad Sci U S A 2002, 99: 4644–4649. 10.1073/pnas.032671499

Natale DA, Shankavaram UT, Galperin MY, Wolf YI, Aravind L, Koonin EV: Genome annotation using clusters of orthologous groups of proteins (COGs) – towards understanding the first genome of a Crenarchaeon. Genome Biology 2001, 5: RESEARCH0009.

Nolling J, Breton G, Omelchenko MV, Makarova KS, Zeng Q, Gibson R, Lee HM, Dubois J, Qiu D, Hitti J, Wolf YI, Tatusov RL, Sabathe F, Doucette-Stamm L, Soucaille P, Daly MJ, Bennett GN, Koonin EV, Smith DR: Genome sequence and comparative analysis of the solvent-producing bacterium Clostridium acetobutylicum. J Bacteriol 2001, 183: 4823–4838. 10.1128/JB.183.16.4823-4838.2001

McClelland M, Sanderson KE, Spieth J, Clifton SW, Latreille P, Courtney L, Porwollik S, Ali J, Dante M, Du F, Hou S, Layman D, Leonard S, Nguyen C, Scott K, Holmes A, Grewal N, Mulvaney E, Ryan E, Sun H, Florea L, Miller W, Stoneking T, Nhan M, Waterston R, Wilson RK: Complete genome sequence of Salmonella enterica serovar Typhimurium LT2. Nature 2001, 413: 852–856. 10.1038/35101614

Makarova KS, Aravind L, Wolf YI, Tatusov RL, Minton KW, Koonin EV, Daly MJ: Genome of the extremely radiation-resistant bacterium Deinococcus radiodurans viewed from the perspective of comparative genomics. Microbiol Mol Biol Rev 2001, 65: 44–79. 10.1128/MMBR.65.1.44-79.2001

Jordan IK, Kondrashov FA, Rogozin IB, Tatusov RL, Wolf YI, Koonin EV: Constant relative rate of protein evolution and detection of functional diversification among bacterial, archaeal and eukaryotic proteins. Genome Biol 2001, 2: RESEARCH0053. 10.1186/gb-2001-2-12-research0053

Yanai I, Derti A, DeLisi C: Genes linked by fusion events are generally of the same functional category: A systematic analysis of 30 microbial genomes. Proc Natl Acad Sci U S A 2001, 98: 7940–7945. 10.1073/pnas.141236298

Lecompte O, Ripp R, Puzos-Barbe V, Duprat S, Heilig R, Dietrich J, Thierry JC, Poch O: Genome evolution at the genus level: comparison of three complete genomes of hyperthermophilic archaea. Genome Res 2001, 11: 981–993. 10.1101/gr.GR1653R

Koonin EV, Makarova KS, Aravind L: Horizontal gene transfer in prokaryotes: quantification and classification. Annu Rev Microbiol 2001, 55: 709–742. 10.1146/annurev.micro.55.1.709

Jordan IK, Rogozin IB, Wolf YI, Koonin EV: Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res 2002, 12: 962–968. 10.1101/gr.87702. Article published online before print in May 2002

Wernegreen JJ: Genome evolution in bacterial endosymbionts of insects. Nat Rev Genet 2002, 3: 850–861. 10.1038/nrg931

Casjens S: Borrelia genomes in the year 2000. J Mol Microbiol Biotechnol 2000, 2: 401–410.

Koonin EV, Mushegian AR, Galperin MY, Walker DR: Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea. Mol Microbiol 1997, 25: 619–637. 10.1046/j.1365-2958.1997.4821861.x

Snel B, Bork P, Huynen MA: Genomes in flux: the evolution of archaeal and proteobacterial gene content. Genome Res 2002, 12: 17–25. 10.1101/gr.176501

Wolf YI, Rogozin IB, Grishin NV, Tatusov RL, Koonin EV: Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol Biol 2001, 1: 8. 10.1186/1471-2148-1-8

Mirkin BG, Fenner TI, Galperin MY, Koonin EV: Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol Biol 2003, 3: 2. 10.1186/1471-2148-3-2

Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, et al.: Initial sequencing and analysis of the human genome. Nature 2001, 409: 860–921. 10.1038/35057062

Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, et al.: The genome sequence of Drosophila melanogaster. Science 2000, 287: 2185–2195. 10.1126/science.287.5461.2185

Consortium TCeS: Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 1998, 282: 2012–2018. 10.1126/science.282.5396.2012

Initiative: TAG Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 2000, 408: 796–815. 10.1038/35048692

Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M, Louis EJ, Mewes HW, Murakami Y, Philippsen P, Tettelin H, Oliver SG: Life with 6000 genes. Science 1996, 274: 563–567. 10.1126/science.274.5287.546

Wood V, Gwilliam R, Rajandream MA, Lyne M, Lyne R, Stewart A, Sgouros J, Peat N, Hayles J, Baker S, Basham D, Bowman S, Brooks K, Brown D, Brown S, Chillingworth T, Churcher C, Collins M, Connor R, Cronin A, Davis P, Feltwell T, Fraser A, Gentles S, Goble A, Hamlin N, Harris D, Hidalgo J, Hodgson G, Holroyd S, Hornsby T, Howarth S, Huckle EJ, Hunt S, Jagels K, James K, Jones L, Jones M, Leather S, McDonald S, McLean J, Mooney P, Moule S, Mungall K, Murphy L, Niblett D, Odell C, Oliver K, O'Neil S, Pearson D, Quail MA, Rabbinowitsch E, Rutherford K, Rutter S, Saunders D, Seeger K, Sharp S, Skelton J, Simmonds M, Squares R, Squares S, Stevens K, Taylor K, Taylor RG, Tivey A, Walsh S, Warren T, Whitehead S, Woodward J, Volckaert G, Aert R, Robben J, Grymonprez B, Weltjens I, Vanstreels E, Rieger M, Schafer M, Muller-Auer S, Gabel C, Fuchs M, Dusterhoft A, Fritzc C, Holzer E, Moestl D, Hilbert H, Borzym K, Langer I, Beck A, Lehrach H, Reinhardt R, Pohl TM, Eger P, Zimmermann W, Wedler H, Wambutt R, Purnelle B, Goffeau A, Cadieu E, Dreano S, Gloux S, et al.: The genome sequence of Schizosaccharomyces pombe. Nature 2002, 415: 871–880. 10.1038/nature724

Katinka MD, Duprat S, Cornillot E, Metenier G, Thomarat F, Prensier G, Barbe V, Peyretaillade E, Brottier P, Wincker P, Delbac F, El Alaoui H, Peyret P, Saurin W, Gouy M, Weissenbach J, Vivares CP: Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi. Nature 2001, 414: 450–453. 10.1038/35106579

Koonin EV, Aravind L, Kondrashov AS: The impact of comparative genomics on our understanding of evolution. Cell 2000, 101: 573–576.

Marchler-Bauer A, Anderson JB, DeWeese-Scott C, Fedorova ND, Geer LY, He S, Hurwitz DI, Jackson JD, Jacobs AR, Lanczycki CJ, Liebert CA, Liu C, Madej T, Marchler GH, Mazumder R, Nikolskaya AN, Panchenko AR, Rao BS, Shoemaker BA, Simonyan V, Song JS, Thiessen PA, Vasudevan S, Wang Y, Yamashita RA, Yin JJ, Bryant SH: CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res 2003, 31: 383–387. 10.1093/nar/gkg087

Makarova KS, Aravind L, Galperin MY, Grishin NV, Tatusov RL, Wolf YI, Koonin EV: Comparative genomics of the Archaea (Euryarchaeota): evolution of conserved protein families, the stable core, and the variable shell. Genome Res 1999, 9: 608–628.

Hedges SB: The origin and evolution of model organisms. Nat Rev Genet 2002, 3: 838–849. 10.1038/nrg929

Pesce A, Bolognesi M, Bocedi A, Ascenzi P, Dewilde S, Moens L, Hankeln T, Burmester T: Neuroglobin and cytoglobin. Fresh blood for the vertebrate globin family. EMBO Rep 2002, 3: 1146–1151. 10.1093/embo-reports/kvf248

Neuwald AF, Liu JS, Lipman DJ, Lawrence CE: Extracting protein alignment models from the sequence database. Nucleic Acids Res 1997, 25: 1665–1677. 10.1093/nar/25.9.1665

Gardner PR, Gardner AM, Martin LA, Dou Y, Li T, Olson JS, Zhu H, Riggs AF: Nitric-oxide dioxygenase activity and function of flavohemoglobins. sensitivity to nitric oxide and carbon monoxide inhibition. J Biol Chem 2000, 275: 31581–31587. 10.1074/jbc.M004141200

Huynen MJ, Snel B: Gene and context: integrative approaches to genome analysis. Adv Prot Chem 2000, 54: 345–379.

Huynen M, Snel B, Lathe W, Bork P: Exploitation of gene context. Curr Opin Struct Biol 2000, 10: 366–370. 10.1016/S0959-440X(00)00098-1

Wolf YI, Rogozin IB, Kondrashov AS, Koonin EV: Genome alignment, evolution of prokaryotic genome organization and prediction of gene function using genomic context. Genome Res 2001, 11: 356–372. 10.1101/gr.GR-1619R

Olsen GJ, Woese CR: Archaeal genomics: an overview. Cell 1997, 89: 991–994.

Winkler HH, Neuhaus HE: Non-mitochondrial ATP transport. Trends Biochem Sci 1999, 24: 64–68. 10.1016/S0968-0004(98)01334-6

Wolf YI, Aravind L, Koonin EV: Rickettsiae and Chlamydiae: evidence of horizontal gene transfer and gene exchange. Trends Genet 1999, 15: 173–175. 10.1016/S0168-9525(99)01704-7

Parkhill J, Wren BW, Thomson NR, Titball RW, Holden MT, Prentice MB, Sebaihia M, James KD, Churcher C, Mungall KL, Baker S, Basham D, Bentley SD, Brooks K, Cerdeno-Tarraga AM, Chillingworth T, Cronin A, Davies RM, Davis P, Dougan G, Feltwell T, Hamlin N, Holroyd S, Jagels K, Karlyshev AV, Leather S, Moule S, Oyston PC, Quail M, Rutherford K, Simmonds M, Skelton J, Stevens K, Whitehead S, Barrell BG: Genome sequence of Yersinia pestis, the causative agent of plague. Nature 2001, 413: 523–527. 10.1038/35097083

Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, Cawley S, Chiaromonte F, Chinwalla AT, Church DM, Clamp M, Clee C, Collins FS, Cook LL, Copley RR, Coulson A, Couronne O, Cuff J, Curwen V, Cutts T, Daly M, David R, Davies J, Delehaunty KD, Deri J, Dermitzakis ET, Dewey C, Dickens NJ, Diekhans M, Dodge S, Dubchak I, Dunn DM, Eddy SR, Elnitski L, Emes RD, Eswara P, Eyras E, Felsenfeld A, Fewell GA, Flicek P, Foley K, Frankel WN, Fulton LA, Fulton RS, Furey TS, Gage D, Gibbs RA, Glusman G, Gnerre S, Goldman N, Goodstadt L, Grafham D, Graves TA, Green ED, Gregory S, Guigo R, Guyer M, Hardison RC, Haussler D, Hayashizaki Y, Hillier LW, Hinrichs A, Hlavina W, Holzer T, Hsu F, Hua A, Hubbard T, Hunt A, Jackson I, Jaffe DB, Johnson LS, Jones M, Jones TA, Joy A, Kamal M, Karlsson EK, et al.: Initial sequencing and comparative analysis of the mouse genome. Nature 2002, 420: 520–562. 10.1038/nature01262

Aparicio S, Chapman J, Stupka E, Putnam N, Chia JM, Dehal P, Christoffels A, Rash S, Hoon S, Smit A, Gelpke MD, Roach J, Oh T, Ho IY, Wong M, Detter C, Verhoef F, Predki P, Tay A, Lucas S, Richardson P, Smith SF, Clark MS, Edwards YJ, Doggett N, Zharkikh A, Tavtigian SV, Pruss D, Barnstead M, Evans C, Baden H, Powell J, Glusman G, Rowen L, Hood L, Tan YH, Elgar G, Hawkins T, Venkatesh B, Rokhsar D, Brenner S: Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 2002, 297: 1301–1310. 10.1126/science.1072104

Holt RA, Subramanian GM, Halpern A, Sutton GG, Charlab R, Nusskern DR, Wincker P, Clark AG, Ribeiro JM, Wides R, Salzberg SL, Loftus B, Yandell M, Majoros WH, Rusch DB, Lai Z, Kraft CL, Abril JF, Anthouard V, Arensburger P, Atkinson PW, Baden H, de Berardinis V, Baldwin D, Benes V, Biedler J, Blass C, Bolanos R, Boscus D, Barnstead M, Cai S, Center A, Chatuverdi K, Christophides GK, Chrystal MA, Clamp M, Cravchik A, Curwen V, Dana A, Delcher A, Dew I, Evans CA, Flanigan M, Grundschober-Freimoser A, Friedli L, Gu Z, Guan P, Guigo R, Hillenmeyer ME, Hladun SL, Hogan JR, Hong YS, Hoover J, Jaillon O, Ke Z, Kodira C, Kokoza E, Koutsos A, Letunic I, Levitsky A, Liang Y, Lin JJ, Lobo NF, Lopez JR, Malek JA, McIntosh TC, Meister S, Miller J, Mobarry C, Mongin E, Murphy SD, O'Brochta DA, Pfannkoch C, Qi R, Regier MA, Remington K, Shao H, Sharakhova MV, Sitter CD, Shetty J, Smith TJ, Strong R, Sun J, Thomasova D, Ton LQ, Topalis P, Tu Z, Unger MF, Walenz B, Wang A, Wang J, Wang M, Wang X, Woodford KJ, Wortman JR, Wu M, Yao A, Zdobnov EM, Zhang H, Zhao Q, et al.: The genome sequence of the malaria mosquito Anopheles gambiae. Science 2002, 298: 129–149. 10.1126/science.1076181

Dehal P, Satou Y, Campbell RK, Chapman J, Degnan B, De Tomaso A, Davidson B, Di Gregorio A, Gelpke M, Goodstein DM, Harafuji N, Hastings KE, Ho I, Hotta K, Huang W, Kawashima T, Lemaire P, Martinez D, Meinertzhagen IA, Necula S, Nonaka M, Putnam N, Rash S, Saiga H, Satake M, Terry A, Yamada L, Wang HG, Awazu S, Azumi K, Boore J, Branno M, Chin-Bow S, DeSantis R, Doyle S, Francino P, Keys DN, Haga S, Hayashi H, Hino K, Imai KS, Inaba K, Kano S, Kobayashi K, Kobayashi M, Lee BI, Makabe KW, Manohar C, Matassi G, Medina M, Mochizuki Y, Mount S, Morishita T, Miura S, Nakayama A, Nishizaka S, Nomoto H, Ohta F, Oishi K, Rigoutsos I, Sano M, Sasaki A, Sasakura Y, Shoguchi E, Shin-i T, Spagnuolo A, Stainier D, Suzuki MM, Tassy O, Takatori N, Tokuoka M, Yagi K, Yoshizaki F, Wada S, Zhang C, Hyatt PD, Larimer F, Detter C, Doggett N, Glavina T, Hawkins T, Richardson P, Lucas S, Kohara Y, Levine M, Satoh N, Rokhsar DS: The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins. Science 2002, 298: 2157–2167. 10.1126/science.1080049

Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B: Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 2002, 419: 498–511. 10.1038/nature01097

Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389–3402. 10.1093/nar/25.17.3389

Wootton JC, Federhen S: Analysis of compositionally biased regions in sequence databases. Methods Enzymol 1996, 266: 554–571.

Schultz J, Milpetz F, Bork P, Ponting CP: SMART, a simple modular architecture research tool: identification of signaling domains. Proc Natl Acad Sci U S A 1998, 95: 5857–5864. 10.1073/pnas.95.11.5857