Assembly of an Interactive Correlation Network for the Arabidopsis Genome Using a Novel Heuristic Clustering Algorithm

Oxford University Press (OUP) - Tập 152 Số 1 - Trang 29-43 - 2009
Marek Mutwil1,2,3, Björn Usadel1,2,3, Moritz Schuݶtte1,2,3, Ann E. Loraine1,2,3, Oliver Ebenhöh1,2,3, Staffan Persson1,2,3
1Department of Bioinformatics and Genomics, North Carolina Research Campus, University of North Carolina at Charlotte, Kannapolis, North Carolina 28081 (A.L.)
2Institute for Complex Systems and Mathematical Biology, University of Aberdeen, Aberdeen AB24 3UE, United Kingdom (O.E.)
3Max-Planck-Institute for Molecular Plant Physiology, 14476 Potsdam, Germany (M.M., B.U., M.S., O.E., S.P.); Department of Bioinformatics and Genomics, North Carolina Research Campus, University of North Carolina at Charlotte, Kannapolis, North Carolina 28081 (A.L.); and Institute for Complex Systems and Mathematical Biology, University of Aberdeen, Aberdeen AB24 3UE, United Kingdom (O.E.)

Tóm tắt

Abstract A vital quest in biology is comprehensible visualization and interpretation of correlation relationships on a genome scale. Such relationships may be represented in the form of networks, which usually require disassembly into smaller manageable units, or clusters, to facilitate interpretation. Several graph-clustering algorithms that may be used to visualize biological networks are available. However, only some of these support weighted edges, and none provides good control of cluster sizes, which is crucial for comprehensible visualization of large networks. We constructed an interactive coexpression network for the Arabidopsis (Arabidopsis thaliana) genome using a novel Heuristic Cluster Chiseling Algorithm (HCCA) that supports weighted edges and that may control average cluster sizes. Comparative clustering analyses demonstrated that the HCCA performed as well as, or better than, the commonly used Markov, MCODE, and k-means clustering algorithms. We mapped MapMan ontology terms onto coexpressed node vicinities of the network, which revealed transcriptional organization of previously unrelated cellular processes. We further explored the predictive power of this network through mutant analyses and identified six new genes that are essential to plant growth. We show that the HCCA-partitioned network constitutes an ideal “cartographic” platform for visualization of correlation networks. This approach rapidly provides network partitions with relative uniform cluster sizes on a genome-scale level and may thus be used for correlation network layouts also for other species.

Từ khóa


Tài liệu tham khảo

2005, J Cell Sci, 118, 4947, 10.1242/jcs.02714

2003, Science, 301, 653, 10.1126/science.1086391

2007, Plant Cell Physiol, 48, 381, 10.1093/pcp/pcm013

2003, BMC Bioinformatics, 4, 2, 10.1186/1471-2105-4-2

2008, Science, 320, 938, 10.1126/science.1157956

2004, Nat Rev Genet, 5, 101, 10.1038/nrg1272

2004, PLoS Biol, 2, E9

2008, Plant Physiol, 147, 650, 10.1104/pp.108.120014

2005, Plant Cell, 17, 2281, 10.1105/tpc.105.031542

2006, BMC Genomics, 7, 40, 10.1186/1471-2164-7-40

2004, BMC Bioinformatics, 5, 118, 10.1186/1471-2105-5-118

1979, IEEE Trans Pattern Anal Mach Intell, 1, 224

1997, Science, 278, 680, 10.1126/science.278.5338.680

2002, Nucleic Acids Res, 30, 1575, 10.1093/nar/30.7.1575

2008, Plant Cell, 20, 1303, 10.1105/tpc.108.058768

2007, PLoS Comput Biol, 3, 2032

2006, Nat Genet, 38, 285, 10.1038/ng1747

2002, Genome Res, 12, 1574, 10.1101/gr.397002

1979, Appl Stat, 28, 100, 10.2307/2346830

2007, Proc Natl Acad Sci USA, 104, 6478, 10.1073/pnas.0611629104

1985, J Classification, 13, 193

2007, BMC Bioinformatics, 8, 250, 10.1186/1471-2105-8-250

2004, Nat Biotechnol, 22, 86, 10.1038/nbt918

2001, Nature, 411, 41, 10.1038/35075138

2008, PLoS One, 3, e1717, 10.1371/journal.pone.0001717

2004, Bioinformatics, 20, 3013, 10.1093/bioinformatics/bth351

2002, Science, 295, 1662, 10.1126/science.1069492

2006, New Phytol, 169, 479, 10.1111/j.1469-8137.2005.01591.x

2004, Science, 303, 540, 10.1126/science.1091403

2007, Genome Res, 17, 1614, 10.1101/gr.6911207

2006, Nucleic Acids Res, 34, W504, 10.1093/nar/gkl204

2008, BMC Plant Biol, 8, 99, 10.1186/1471-2229-8-99

2005, Plant Cell, 17, 705, 10.1105/tpc.104.027920

2008, Nucleic Acids Res, 36, W320, 10.1093/nar/gkn292

2009, Mol Plant., 2, 1015, 10.1093/mp/ssp055

2004, Phys Rev E Stat Nonlin Soft Matter Phys, 69, 026113, 10.1103/PhysRevE.69.026113

2009, Nucleic Acids Res, 37, D987, 10.1093/nar/gkn807

2009, DNA Res, 16, 249, 10.1093/dnares/dsp016

2005, Proc Natl Acad Sci USA, 102, 8633, 10.1073/pnas.0503392102

2008, PLoS One, 3, e3911, 10.1371/journal.pone.0003911

2009, Eukaryot Cell, 8, 217, 10.1128/EC.00255-08

1995, Science, 270, 467, 10.1126/science.270.5235.467

2005, Nat Genet, 37, 501, 10.1038/ng1543

2008, Plant Physiol, 147, 1004, 10.1104/pp.107.115535

2004, Bioinformatics, 20, 3647, 10.1093/bioinformatics/bth398

2006, BMC Bioinformatics, 7, 380, 10.1186/1471-2105-7-380

2003, Science, 302, 249, 10.1126/science.1087447

2000, Proc Natl Acad Sci USA, 97, 9531, 10.1073/pnas.160077797

2005, FEBS Lett, 579, 1973, 10.1016/j.febslet.2005.02.043

2005, Plant J, 43, 153, 10.1111/j.1365-313X.2005.02437.x

2006, BMC Bioinformatics, 18, 535

2009, Plant Cell Environ, 32, 1633, 10.1111/j.1365-3040.2009.02040.x

2009, Plant Physiol, 150, 535, 10.1104/pp.109.136028

2000

2004, EMBO Rep, 5, 280, 10.1038/sj.embor.7400090

1994

2006, Plant Physiol, 142, 762, 10.1104/pp.106.080358

2007, Curr Opin Plant Biol, 10, 564, 10.1016/j.pbi.2007.09.001

2004, Plant Physiol, 136, 2621, 10.1104/pp.104.046367

2008, PLoS Comput Biol, 4, e1000140, 10.1371/journal.pcbi.1000140