The role of indirect connections in gene networks in predicting function

Bioinformatics - Tập 27 Số 13 - Trang 1860-1866 - 2011
Jesse Gillis1, Paul Pavlidis1
1Centre for High-Throughput Biology and Department of Psychiatry, 177 Michael Smith Laboratories, 2185 East Mall, University of British Columbia, Vancouver, BC V6T1Z4, Canada

Tóm tắt

AbstractMotivation: Gene networks have been used widely in gene function prediction algorithms, many based on complex extensions of the ‘guilt by association’ principle. We sought to provide a unified explanation for the performance of gene function prediction algorithms in exploiting network structure and thereby simplify future analysis.Results: We use co-expression networks to show that most exploited network structure simply reconstructs the original correlation matrices from which the co-expression network was obtained. We show the same principle works in predicting gene function in protein interaction networks and that these methods perform comparably to much more sophisticated gene function prediction algorithms.Availability and implementation: Data and algorithm implementation are fully described and available at http://www.chibi.ubc.ca/extended. Programs are provided in Matlab m-code.Contact:  [email protected]Supplementary information:  Supplementary data are available at Bioinformatics online.

Từ khóa


Tài liệu tham khảo

Agrawal, 2002, Extreme self-organization in networks constructed from gene expression data, Phys. Rev. Lett., 89, 268702, 10.1103/PhysRevLett.89.268702

Ashburner, 2000, Gene ontology: tool for the unification of biology. The Gene Ontology Consortium, Nat. Genet., 25, 25, 10.1038/75556

Breitkreutz, 2008, The BioGRID Interaction Database: 2008 update, Nucleic Acids Res., 36, D637, 10.1093/nar/gkm1001

Brown, 2005, Online predicted human interaction database, Bioinformatics, 21, 2076, 10.1093/bioinformatics/bti273

Ceol, 2010, MINT, the molecular interaction database: 2009 update, Nucleic Acids Res., 38, D532, 10.1093/nar/gkp983

Cesareni, 2008, Searching the MINT database for protein interaction information, Curr. Protoc. Bioinformatics, 10.1002/0471250953.bi0805s22

Chen, 2008, Rank-based edge reconstruction for scale-free genetic regulatory networks, BMC Bioinformatics, 9, 75, 10.1186/1471-2105-9-75

Chua, 2006, Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions, Bioinformatics, 22, 1623, 10.1093/bioinformatics/btl145

Costanzo, 2010, The genetic landscape of a cell, Science, 327, 425, 10.1126/science.1180823

Deng, 2003, Prediction of protein function using protein-protein interaction data, J. Comput. Biol., 10, 947, 10.1089/106652703322756168

Edgar, 2002, Gene Expression Omnibus: NCBI gene expression and hybridization array data repository, Nucleic Acids Res., 30, 207, 10.1093/nar/30.1.207

Faith, 2007, Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles, PLoS Biol., 5, e8, 10.1371/journal.pbio.0050008

Gilbert, 2005, Biomolecular interaction network database, Brief. Bioinform., 6, 194, 10.1093/bib/6.2.194

Gillis, 2011, The impact of multifunctional genes on “guilt by association” analysis, PLoS One, 6, e17258, 10.1371/journal.pone.0017258

Guldener, 2006, MPact: the MIPS protein interaction resource on yeast, Nucleic Acids Res., 34, D436, 10.1093/nar/gkj003

Hibbs, 2007, Exploring the functional landscape of gene expression: directed search of large microarray compendia, Bioinformatics, 23, 2692, 10.1093/bioinformatics/btm403

Hishigaki, 2001, Assessment of prediction accuracy of protein function from protein–protein interaction data, Yeast, 18, 523, 10.1002/yea.706

Horan, 2008, Annotating genes of known and unknown function by large-scale coexpression analysis, Plant Physiol., 147, 41, 10.1104/pp.108.117366

Horvath, 2008, Geometric interpretation of gene coexpression network analysis, PLoS Comput. Biol., 4, e1000117, 10.1371/journal.pcbi.1000117

Kent, 2002, The human genome browser at UCSC, Genome Res., 12, 996, 10.1101/gr.229102

Lanckriet, 2004, Kernel-based data fusion and its application to protein function prediction in yeast, Pac. Symp. Biocomput., 300

Lee, 2004, Coexpression analysis of human genes across many microarray data sets, Genome Res., 14, 1085, 10.1101/gr.1910904

Lynn, 2008, InnateDB: facilitating systems-level analyses of the mammalian innate immune response, Mol. Syst. Biol., 4, 218, 10.1038/msb.2008.55

Mostafavi, 2008, GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function, Genome Biol., 9, S4, 10.1186/gb-2008-9-s1-s4

NCBI, 2002, The NCBI Handbook [Internet]

Oliver, 2000, Guilt-by-association goes global, Nature, 403, 601, 10.1038/35001165

Pellegrini, 1999, Assigning protein functions by comparative genome analysis: protein phylogenetic profiles, Proc. Natl Acad. Sci. USA, 96, 4285, 10.1073/pnas.96.8.4285

Pena-Castillo, 2008, A critical assessment of Mus musculus gene function prediction using integrated genomic evidence, Genome Biol., 9, S2, 10.1186/gb-2008-9-s1-s2

Prasad, 2009, Human Protein Reference Database and Human Proteinpedia as discovery tools for systems biology, Methods Mol. Biol., 577, 67, 10.1007/978-1-60761-232-2_6

Pu, 2008, Local coherence in genetic interaction patterns reveals prevalent functional versatility, Bioinformatics, 24, 2376, 10.1093/bioinformatics/btn440

Ravasz, 2002, Hierarchical organization of modularity in metabolic networks, Science, 297, 1551, 10.1126/science.1073374

Razick, 2008, iRefIndex: a consolidated protein interaction database with provenance, BMC Bioinformatics, 9, 405, 10.1186/1471-2105-9-405

Saito, 2008, Decoding genes with coexpression networks and metabolomics – ‘majority report by precogs’, Trends Plant Sci., 13, 36, 10.1016/j.tplants.2007.10.006

Schietgat, 2010, Predicting gene function using hierarchical multi-label decision tree ensembles, BMC Bioinformatics, 11, 2, 10.1186/1471-2105-11-2

Schwikowski, 2000, A network of protein-protein interactions in yeast, Nat. Biotechnol., 18, 1257, 10.1038/82360

Tong, 2004, Global mapping of the yeast genetic interaction network, Science, 303, 808, 10.1126/science.1091317

Tsuda, 2005, Fast protein classification with multiple networks, Bioinformatics, 21, ii59, 10.1093/bioinformatics/bti1110

Typas, 2008, High-throughput, quantitative analyses of genetic interactions in E. coli, Nat. Methods, 5, 781, 10.1038/nmeth.1240

Vazquez, 2003, Global protein function prediction from protein-protein interaction networks, Nat. Biotechnol., 21, 697, 10.1038/nbt825

von Mering, 2002, Comparative assessment of large-scale data sets of protein-protein interactions, Nature, 417, 399, 10.1038/nature750

Weston, 2004, Protein ranking: from local to global structure in the protein similarity network, Proc. Natl Acad. Sci. USA, 101, 6559, 10.1073/pnas.0308067101

Wolfe, 2005, Systematic survey reveals general applicability of “guilt-by-association” within gene coexpression networks, BMC Bioinformatics, 6, 227, 10.1186/1471-2105-6-227

Xenarios, 2002, DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions, Nucleic Acids Res., 30, 303, 10.1093/nar/30.1.303

Yip, 2007, Gene network interconnectedness and the generalized topological overlap measure, BMC Bioinformatics, 8, 22, 10.1186/1471-2105-8-22

Zhang, 2005, A general framework for weighted gene co-expression network analysis, Stat. Appl. Genet. Mol. Biol., 4, 10.2202/1544-6115.1128

Zhou, 2002, Transitive functional annotation by shortest-path analysis of gene expression data, Proc. Natl Acad. Sci. USA, 99, 12783, 10.1073/pnas.192159399