SPICi: a fast clustering algorithm for large biological networks

Bioinformatics - Tập 26 Số 8 - Trang 1105-1111 - 2010
Peng Jiang1, Mona Singh1
11 Lewis-Sigler Institute for Integrative Genomics and 2 Department of Computer Science, Princeton University, Princeton, NJ 08544, USA

Tóm tắt

Abstract Motivation: Clustering algorithms play an important role in the analysis of biological networks, and can be used to uncover functional modules and obtain hints about cellular organization. While most available clustering algorithms work well on biological networks of moderate size, such as the yeast protein physical interaction network, they either fail or are too slow in practice for larger networks, such as functional networks for higher eukaryotes. Since an increasing number of larger biological networks are being determined, the limitations of current clustering approaches curtail the types of biological network analyses that can be performed. Results: We present a fast local network clustering algorithm SPICi. SPICi runs in time O(V log V+E) and space O(E), where V and E are the number of vertices and edges in the network, respectively. We evaluate SPICi's performance on several existing protein interaction networks of varying size, and compare SPICi to nine previous approaches for clustering biological networks. We show that SPICi is typically several orders of magnitude faster than previous approaches and is the only one that can successfully cluster all test networks within very short time. We demonstrate that SPICi has state-of-the-art performance with respect to the quality of the clusters it uncovers, as judged by its ability to recapitulate protein complexes and functional modules. Finally, we demonstrate the power of our fast network clustering algorithm by applying SPICi across hundreds of large context-specific human networks, and identifying modules specific for single conditions. Availability: Source code is available under the GNU Public License at http://compbio.cs.princeton.edu/spici Contact:  [email protected] Supplementary information:  Supplementary data are available at Bioinformatics online.

Từ khóa


Tài liệu tham khảo

Altaf-Ul-Amin, 2006, Development and implementation of an algorithm for detection of protein complexes in large interaction networks, BMC Bioinformatics, 7, 207, 10.1186/1471-2105-7-207

Ashburner, 2000, Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium, Nat. Genet., 25, 25, 10.1038/75556

Bader, 2003, An automated method for finding molecular complexes in large protein interaction networks, BMC Bioinformatics, 4, 2, 10.1186/1471-2105-4-2

Blatt, 1996, Superparamagnetic clustering of data, Phys. Rev. Lett., 76, 3251, 10.1103/PhysRevLett.76.3251

Breitkreutz, 2008, The BioGRID Interaction Database: 2008 Update, Nucleic Acids Res., 36, D637, 10.1093/nar/gkm1001

Brohée, 2006, Evaluation of clustering algorithms for protein-protein interaction networks, BMC Bioinformatics, 7, 488, 10.1186/1471-2105-7-488

Brun, 2003, Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network, Genome Biol., 5, R6, 10.1186/gb-2003-5-1-r6

Chen, 2006, Detecting functional modules in the yeast protein-protein interaction network, Bioinformatics, 22, 2283, 10.1093/bioinformatics/btl370

Colak, 2009, Dense graphlet statistics of protein interaction and random networks, Pacific Symposium on Biocomputing, 178

Enright, 2002, An efficient algorithm for large-scale detection of protein families, Nucleic Acids Res., 30, 1575, 10.1093/nar/30.7.1575

Fredman, 1987, Fibonacci heaps and their uses in improved network optimization algorithms, J. ACM, 34, 596, 10.1145/28869.28874

Georgii, 2009, Enumeration of condition-dependent dense modules in protein interaction networks, Bioinformatics, 25, 933, 10.1093/bioinformatics/btp080

Jensen, 2009, STRING 8–a global view on proteins and their functional interactions in 630 organisms, Nucleic Acids Res., 37, D412, 10.1093/nar/gkn760

Hartwell, 1999, From molecular to modular cell biology, Nature, 402, 6761, 10.1038/35011540

Huttenhower, 2009, Exploring the human genome with functional maps, Genome Res., 19, 1093, 10.1101/gr.082214.108

Jain, 1988, Algorithms for Clustering Data, 2nd

Jensen, 2004, ArrayProspector: a web resource of functional associations inferred from microarray expression data, Nucleic Acids Res., 32, W445, 10.1093/nar/gkh407

King, 2004, An efficient algorithm for large-scale detection of protein families, Bioinformatics, 20, 3013, 10.1093/bioinformatics/bth351

Lord, 2003, Semantic similarity measures as tools for exploring the gene ontology, Pac. Symp. Biocomput., 8, 601

Loewenstein, 2008, Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space, Bioinformatics, 24, i41, 10.1093/bioinformatics/btn174

Mewes, 2004, MIPS: analysis and annotation of proteins from whole genomes, Nucleic Acids Res., 32, D41, 10.1093/nar/gkh092

Navlakha, 2009, Revealing biological modules via graph summarization, J. Comput. Biol., 16, 253, 10.1089/cmb.2008.11TT

Palla, 2005, Uncovering the overlapping community structure of complex networks in nature and society, Nature, 435, 814, 10.1038/nature03607

Pereira-Leal, 2004, Detection of functional modules from protein interaction networks, Proteins, 54, 49, 10.1002/prot.10505

Rhee, 2008, Use and misuse of the gene ontology annotations, Nat. Rev. Genet., 9, 509, 10.1038/nrg2363

Rives, 2003, Modular organization of cellular networks, Proc. Natl Acad. Sci. USA, 100, 1128, 10.1073/pnas.0237338100

Samanta, 2003, Predicting protein functions from redundancies in large-scale protein interaction networks, Proc. Natl Acad. Sci. USA, 100, 12579, 10.1073/pnas.2132527100

Sharan, 2005, Conserved patterns of protein interaction in multiple species, Proc. Natl Acad. Sci. USA, 102, 1974, 10.1073/pnas.0409522102

Song, 2009, How and when should interactome-derived clusters be used to predict functional modules and protein function?, Bioinformatics, 25, 3143, 10.1093/bioinformatics/btp551

Spirin, 2003, Protein complexes and functional modules in molecular networks, Proc. Natl Acad. Sci. USA, 100, 12123, 10.1073/pnas.2032324100