An automated method for finding molecular complexes in large protein interaction networks
Tóm tắt
Recent advances in proteomics technologies such as two-hybrid, phage display and mass spectrometry have enabled us to create a detailed map of biomolecular interaction networks. Initial mapping efforts have already produced a wealth of data. As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery. This paper describes a novel graph theoretic clustering algorithm, "Molecular Complex Detection" (MCODE), that detects densely connected regions in large protein-protein interaction networks that may represent molecular complexes. The method is based on vertex weighting by local neighborhood density and outward traversal from a locally dense seed protein to isolate the dense regions according to given parameters. The algorithm has the advantage over other graph clustering methods of having a directed mode that allows fine-tuning of clusters of interest without considering the rest of the network and allows examination of cluster interconnectivity, which is relevant for protein networks. Protein interaction and complex information from the yeast Saccharomyces cerevisiae was used for evaluation. Dense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes. The algorithm is not affected by a known high rate of false positives in data from high-throughput interaction techniques. The program is available from
ftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODE
.
Tài liệu tham khảo
Fields S: Proteomics. Proteomics in genomeland. Science 2001, 291: 1221–1224. 10.1126/science.291.5507.1221
Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, et al.: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 2000, 403: 623–627. 10.1038/35001009
Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y: A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci U S A 2001, 98: 4569–4574. 10.1073/pnas.061034498
Drees BL, Sundin B, Brazeau E, Caviston JP, Chen GC, Guo W, et al.: A protein interaction map for cell polarity development. J Cell Biol 2001, 154: 549–571. 10.1083/jcb.200104057
Fromont-Racine M, Mayes AE, Brunet-Simon A, Rain JC, Colley A, Dix I, et al.: Genome-wide protein interaction screens reveal functional networks involving Sm-like proteins. Yeast 2000, 17: 95–110. 10.1002/1097-0061(20000630)17:2<95::AID-YEA16>3.0.CO;2-H
Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, et al.: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 2002, 415: 180–183. 10.1038/415180a
Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, et al.: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002, 415: 141–147. 10.1038/415141a
Christendat D, Yee A, Dharamsi A, Kluger Y, Savchenko A, Cort JR, et al.: Structural proteomics of an archaeon. Nat Struct Biol 2000, 7: 903–909. 10.1038/82823
Kim SK, Lund J, Kiraly M, Duke K, Jiang M, Stuart JM, et al.: A gene expression map for Caenorhabditis elegans. Science 2001, 293: 2087–2092. 10.1126/science.1061603
Tong AH, Drees B, Nardelli G, Bader GD, Brannetti B, Castagnoli L, et al.: A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 2002, 295: 321–324. 10.1126/science.1064987
Winzeler EA, Shoemaker DD, Astromoff A, Liang H, Anderson K, Andre B, et al.: Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 1999, 285: 901–906. 10.1126/science.285.5429.901
Chervitz SA, Hester ET, Ball CA, Dolinski K, Dwight SS, Harris MA, et al.: Using the Saccharomyces Genome Database (SGD) for analysis of protein similarities and structure. Nucleic Acids Res 1999, 27: 74–78. 10.1093/nar/27.1.74
Mewes HW, Frishman D, Gruber C, Geier B, Haase D, Kaps A, et al.: MIPS: a database for genomes and protein sequences. Nucleic Acids Res 2000, 28: 37–40. 10.1093/nar/28.1.37
Costanzo MC, Crawford ME, Hirschman JE, Kranz JE, Olsen P, Robertson LS, et al.: YPD, PombePD and WormPD: model organism volumes of the BioKnowledge library, an integrated resource for protein information. Nucleic Acids Res 2001, 29: 75–79. 10.1093/nar/29.1.75
Bader GD, Donaldson I, Wolting C, Ouellette BF, Pawson T, Hogue CW: BIND-The biomolecular interaction network database. Nucleic Acids Res 2001, 29: 242–245. 10.1093/nar/29.1.242
Xenarios I, Salwinski L, Duan XJ, Higney P, Kim SM, Eisenberg D: DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res 2002, 30: 303–305. 10.1093/nar/30.1.303
Takai-Igarashi T, Nadaoka Y, Kaminuma T: A database for cell signaling networks. J Comput Biol 1998, 5: 747–754.
Wingender E, Chen X, Hehl R, Karas H, Liebich I, Matys V, et al.: TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res 2000, 28: 316–319. 10.1093/nar/28.1.316
Karp PD, Riley M, Saier M, Paulsen IT, Paley SM, Pellegrini-Toole A: The EcoCyc and MetaCyc databases. Nucleic Acids Res 2000, 28: 56–59. 10.1093/nar/28.1.56
Overbeek R, Larsen N, Pusch GD, D'Souza M, Selkov EJ, Kyrpides N, et al.: WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction. Nucleic Acids Res 2000, 28: 123–125. 10.1093/nar/28.1.123
Wagner A, Fell DA: The small world inside large metabolic networks. Proc R Soc Lond B Biol Sci 2001, 268: 1803–1810. 10.1098/rspb.2001.1711
Flake GW, Lawrence S, Giles CL, Coetzee FM: Self-Organization of the Web and Identification of Communities. IEEE Computer 2002, 35: 66–71. 10.1109/2.989932
Goldberg AV: Finding a Maximum Density Subgraph. Technical Report UCB/CSD University of California, Berkeley, CA 1984., 84:
Ng A, Jordan M, Weiss Y: On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems 14: Proceedings of the 2001 2001.
Watts DJ, Strogatz SH: Collective dynamics of 'small-world' networks. Nature 1998, 393: 440–442. 10.1038/30918
Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL: The large-scale organization of metabolic networks. Nature 2000, 407: 651–654. 10.1038/35036627
Albert R, Jeong H, Barabasi AL: Error and attack tolerance of complex networks. Nature 2000, 406: 378–382. 10.1038/35019019
Barabasi AL, Albert R: Emergence of scaling in random networks. Science 1999, 286: 509–512. 10.1126/science.286.5439.509
Fell DA, Wagner A: The small world of metabolism. Nat Biotechnol 2000, 18: 1121–1122. 10.1038/81025
Hartuv E, Shamir R: A clustering algorithm based on graph connectivity. Information processing letters 1999, 76: 175–181. 10.1016/S0020-0190(00)00142-3
Bader GD, Hogue CW: Analyzing yeast protein-protein interaction data obtained from different sources. Nat Biotechnol 2002, 20: 991–997. 10.1038/nbt1002-991
Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 2000, 16: 412–424. 10.1093/bioinformatics/16.5.412
Robinson RC, Turbedsky K, Kaiser DA, Marchand JB, Higgs HN, Choe S, et al.: Crystal structure of Arp2/3 complex. Science 2001, 294: 1679–1684. 10.1126/science.1066333
Mayes AE, Verdone L, Legrain P, Beggs JD: Characterization of Sm-like proteins in yeast and their association with U6 snRNA. EMBO J 1999, 18: 4321–4331. 10.1093/emboj/18.15.4321
von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, et al.: Comparative assessment of large-scale data sets of protein-protein interactions. Nature 2002, 417: 399–403. 10.1038/nature750
Jeong H, Mason SP, Barabasi AL, Oltvai ZN: Lethality and centrality in protein networks. Nature 2001, 411: 41–42. 10.1038/35075138
Maslov S, Sneppen K: Specificity and stability in topology of protein networks. Science 2002, 296: 910–913. 10.1126/science.1065103
Gonzalez F, Delahodde A, Kodadek T, Johnston SA: Recruitment of a 19S proteasome subcomplex to an activated promoter. Science 2002, 296: 548–550. 10.1126/science.1069490
Bochtler M, Ditzel L, Groll M, Hartmann C, Huber R: The proteasome. Annu Rev Biophys Biomol Struct 1999, 28: 295–317. 10.1146/annurev.biophys.28.1.295
Batagelj V, Mrvar A: Pajek – Program for Large Network Analysis. Connections 1998, 2: 47–57.
Kamada T, Kawai S: An algorithm for drawing general indirect graphs. Information processing letters 1989, 31: 7–15. 10.1016/0020-0190(89)90102-6
Dobzhansky T: Nothing in Biology Makes Sense Except in the Light of Evolution. American Biology Teacher 1973, 35: 125–129.
The Gene Ontology Consortium: Gene ontology: tool for the unification of biology. Nat Genet 2000, 25: 25–29. 10.1038/75556
Pruitt KD, Maglott DR: RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res 2001, 29: 137–140. 10.1093/nar/29.1.137