An automated method for finding molecular complexes in large protein interaction networks

BMC Bioinformatics - Tập 4 - Trang 1-27 - 2003
Gary D Bader1,2, Christopher WV Hogue1
1Samuel Lunenfeld Research Institute, Mt. Sinai Hospital, Toronto ON Canada M5G 1X5, Dept. of Biochemistry, University of Toronto, Toronto, Canada
2Current address: Memorial Sloan-Kettering Cancer Center, New York, USA

Tóm tắt

Recent advances in proteomics technologies such as two-hybrid, phage display and mass spectrometry have enabled us to create a detailed map of biomolecular interaction networks. Initial mapping efforts have already produced a wealth of data. As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery. This paper describes a novel graph theoretic clustering algorithm, "Molecular Complex Detection" (MCODE), that detects densely connected regions in large protein-protein interaction networks that may represent molecular complexes. The method is based on vertex weighting by local neighborhood density and outward traversal from a locally dense seed protein to isolate the dense regions according to given parameters. The algorithm has the advantage over other graph clustering methods of having a directed mode that allows fine-tuning of clusters of interest without considering the rest of the network and allows examination of cluster interconnectivity, which is relevant for protein networks. Protein interaction and complex information from the yeast Saccharomyces cerevisiae was used for evaluation. Dense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes. The algorithm is not affected by a known high rate of false positives in data from high-throughput interaction techniques. The program is available from ftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODE .

Tài liệu tham khảo

Fields S: Proteomics. Proteomics in genomeland. Science 2001, 291: 1221–1224. 10.1126/science.291.5507.1221 Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, et al.: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 2000, 403: 623–627. 10.1038/35001009 Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y: A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci U S A 2001, 98: 4569–4574. 10.1073/pnas.061034498 Drees BL, Sundin B, Brazeau E, Caviston JP, Chen GC, Guo W, et al.: A protein interaction map for cell polarity development. J Cell Biol 2001, 154: 549–571. 10.1083/jcb.200104057 Fromont-Racine M, Mayes AE, Brunet-Simon A, Rain JC, Colley A, Dix I, et al.: Genome-wide protein interaction screens reveal functional networks involving Sm-like proteins. Yeast 2000, 17: 95–110. 10.1002/1097-0061(20000630)17:2<95::AID-YEA16>3.0.CO;2-H Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, et al.: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 2002, 415: 180–183. 10.1038/415180a Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, et al.: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002, 415: 141–147. 10.1038/415141a Christendat D, Yee A, Dharamsi A, Kluger Y, Savchenko A, Cort JR, et al.: Structural proteomics of an archaeon. Nat Struct Biol 2000, 7: 903–909. 10.1038/82823 Kim SK, Lund J, Kiraly M, Duke K, Jiang M, Stuart JM, et al.: A gene expression map for Caenorhabditis elegans. Science 2001, 293: 2087–2092. 10.1126/science.1061603 Tong AH, Drees B, Nardelli G, Bader GD, Brannetti B, Castagnoli L, et al.: A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 2002, 295: 321–324. 10.1126/science.1064987 Winzeler EA, Shoemaker DD, Astromoff A, Liang H, Anderson K, Andre B, et al.: Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 1999, 285: 901–906. 10.1126/science.285.5429.901 Chervitz SA, Hester ET, Ball CA, Dolinski K, Dwight SS, Harris MA, et al.: Using the Saccharomyces Genome Database (SGD) for analysis of protein similarities and structure. Nucleic Acids Res 1999, 27: 74–78. 10.1093/nar/27.1.74 Mewes HW, Frishman D, Gruber C, Geier B, Haase D, Kaps A, et al.: MIPS: a database for genomes and protein sequences. Nucleic Acids Res 2000, 28: 37–40. 10.1093/nar/28.1.37 Costanzo MC, Crawford ME, Hirschman JE, Kranz JE, Olsen P, Robertson LS, et al.: YPD, PombePD and WormPD: model organism volumes of the BioKnowledge library, an integrated resource for protein information. Nucleic Acids Res 2001, 29: 75–79. 10.1093/nar/29.1.75 Bader GD, Donaldson I, Wolting C, Ouellette BF, Pawson T, Hogue CW: BIND-The biomolecular interaction network database. Nucleic Acids Res 2001, 29: 242–245. 10.1093/nar/29.1.242 Xenarios I, Salwinski L, Duan XJ, Higney P, Kim SM, Eisenberg D: DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res 2002, 30: 303–305. 10.1093/nar/30.1.303 Takai-Igarashi T, Nadaoka Y, Kaminuma T: A database for cell signaling networks. J Comput Biol 1998, 5: 747–754. Wingender E, Chen X, Hehl R, Karas H, Liebich I, Matys V, et al.: TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res 2000, 28: 316–319. 10.1093/nar/28.1.316 Karp PD, Riley M, Saier M, Paulsen IT, Paley SM, Pellegrini-Toole A: The EcoCyc and MetaCyc databases. Nucleic Acids Res 2000, 28: 56–59. 10.1093/nar/28.1.56 Overbeek R, Larsen N, Pusch GD, D'Souza M, Selkov EJ, Kyrpides N, et al.: WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction. Nucleic Acids Res 2000, 28: 123–125. 10.1093/nar/28.1.123 Wagner A, Fell DA: The small world inside large metabolic networks. Proc R Soc Lond B Biol Sci 2001, 268: 1803–1810. 10.1098/rspb.2001.1711 Flake GW, Lawrence S, Giles CL, Coetzee FM: Self-Organization of the Web and Identification of Communities. IEEE Computer 2002, 35: 66–71. 10.1109/2.989932 Goldberg AV: Finding a Maximum Density Subgraph. Technical Report UCB/CSD University of California, Berkeley, CA 1984., 84: Ng A, Jordan M, Weiss Y: On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems 14: Proceedings of the 2001 2001. Watts DJ, Strogatz SH: Collective dynamics of 'small-world' networks. Nature 1998, 393: 440–442. 10.1038/30918 Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL: The large-scale organization of metabolic networks. Nature 2000, 407: 651–654. 10.1038/35036627 Albert R, Jeong H, Barabasi AL: Error and attack tolerance of complex networks. Nature 2000, 406: 378–382. 10.1038/35019019 Barabasi AL, Albert R: Emergence of scaling in random networks. Science 1999, 286: 509–512. 10.1126/science.286.5439.509 Fell DA, Wagner A: The small world of metabolism. Nat Biotechnol 2000, 18: 1121–1122. 10.1038/81025 Hartuv E, Shamir R: A clustering algorithm based on graph connectivity. Information processing letters 1999, 76: 175–181. 10.1016/S0020-0190(00)00142-3 Bader GD, Hogue CW: Analyzing yeast protein-protein interaction data obtained from different sources. Nat Biotechnol 2002, 20: 991–997. 10.1038/nbt1002-991 Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 2000, 16: 412–424. 10.1093/bioinformatics/16.5.412 Robinson RC, Turbedsky K, Kaiser DA, Marchand JB, Higgs HN, Choe S, et al.: Crystal structure of Arp2/3 complex. Science 2001, 294: 1679–1684. 10.1126/science.1066333 Mayes AE, Verdone L, Legrain P, Beggs JD: Characterization of Sm-like proteins in yeast and their association with U6 snRNA. EMBO J 1999, 18: 4321–4331. 10.1093/emboj/18.15.4321 von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, et al.: Comparative assessment of large-scale data sets of protein-protein interactions. Nature 2002, 417: 399–403. 10.1038/nature750 Jeong H, Mason SP, Barabasi AL, Oltvai ZN: Lethality and centrality in protein networks. Nature 2001, 411: 41–42. 10.1038/35075138 Maslov S, Sneppen K: Specificity and stability in topology of protein networks. Science 2002, 296: 910–913. 10.1126/science.1065103 Gonzalez F, Delahodde A, Kodadek T, Johnston SA: Recruitment of a 19S proteasome subcomplex to an activated promoter. Science 2002, 296: 548–550. 10.1126/science.1069490 Bochtler M, Ditzel L, Groll M, Hartmann C, Huber R: The proteasome. Annu Rev Biophys Biomol Struct 1999, 28: 295–317. 10.1146/annurev.biophys.28.1.295 Batagelj V, Mrvar A: Pajek – Program for Large Network Analysis. Connections 1998, 2: 47–57. Kamada T, Kawai S: An algorithm for drawing general indirect graphs. Information processing letters 1989, 31: 7–15. 10.1016/0020-0190(89)90102-6 Dobzhansky T: Nothing in Biology Makes Sense Except in the Light of Evolution. American Biology Teacher 1973, 35: 125–129. The Gene Ontology Consortium: Gene ontology: tool for the unification of biology. Nat Genet 2000, 25: 25–29. 10.1038/75556 Pruitt KD, Maglott DR: RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res 2001, 29: 137–140. 10.1093/nar/29.1.137