Exploring soybean metabolic pathways based on probabilistic graphical model and knowledge-based methods

Jie Hou1, Gary Stacey2, Jianlin Cheng1
1Department of Computer Science, University of Missouri, Columbia, USA
2Divisions of Biochemistry and Plant Science, National Center for Soybean Biotechnology, C. Bond Life Science Center, University of Missouri, Columbia, USA

Tóm tắt

Abstract Soybean (Glycine max) is a major source of vegetable oil and protein for both animal and human consumption. The completion of soybean genome sequence led to a number of transcriptomic studies (RNA-seq), which provide a resource for gene discovery and functional analysis. Several data-driven (e.g., based on gene expression data) and knowledge-based (e.g., predictions of molecular interactions) methods have been proposed and implemented. In order to better understand gene relationships and protein interactions, we applied probabilistic graphical methods, based on Bayesian network and knowledgebase constraints using gene expression data to reconstruct soybean metabolic pathways. The results show that this method can predict new relationships between genes, improving on traditional reference pathway maps.

Từ khóa


Tài liệu tham khảo

JW Anderson, BM Johnstone, ME Cook-Newell, Meta-analysis of the effects of soy protein intake on serum lipids. N. Engl. J. Med. 333(5), 276–282 (1995)

X Zhang, XO Shu, Y-T Gao, G Yang, Q Li, H Li, F Jin, W Zheng, Soy food consumption is associated with lower risk of coronary heart disease in Chinese women. J. Nutr. 133(9), 2874–2878 (2003)

M Libault, A Farmer, T Joshi, K Takahashi, RJ Langley, LD Franklin, J He, D Xu, G May, G Stacey, An integrated transcriptome atlas of the crop model Glycine max, and its use in comparative analyses in plants. Plant J. 63(1), 86–99 (2010)

J Schmutz, SB Cannon, J Schlueter, J Ma, T Mitros, W Nelson, DL Hyten, Q Song, JJ Thelen, J Cheng, D Xu, U Hellsten, GD May, Y Yu, T Sakurai, T Umezawa, MK Bhattacharyya, D Sandhu, B Valliyodan, E Lindquist, M Peto, D Grant, S Shu, D Goodstein, K Barry, M Futrell-Griggs, B Abernathy, J Du, Z Tian, L Zhu, N Gill, T Joshi, M Libault, A Sethuraman, X-C Zhang, K Shinozaki, HT Nguyen, RA Wing, P Cregan, J Specht, J Grimwood, D Rokhsar, G Stacey, RC Shoemaker, SA Jackson, Genome sequence of the palaeopolyploid soybean. Nature 463(7278), 178–183 (2010)

FF Aceituno, N Moseyko, SY Rhee, RA Gutiérrez, The rules of gene expression in plants: organ identity and gene body methylation are key factors for regulation of gene expression in Arabidopsis thaliana. BMC Genomics 9(1), 438 (2008)

M Kanehisa, S Goto, KEGG: Kyoto encyclopedia of genes and genomes. Nucl. Acids Res. 28(1), 27–30 (2000)

M Kanehisa, S Goto, Y Sato, M Furumichi, M Tanabe, KEGG for integration and interpretation of large-scale molecular data sets. Nucl. Acids Res. 40(1), gkr988–D114 (2011)

Y Moriya, M Itoh, S Okuda, AC Yoshizawa, M Kanehisa, KAAS: an automatic genome annotation and pathway reconstruction server. Nucl. Acids Res 35(Web Server issue), W182–5 (2007)

Q Qi, J Li, J Cheng, Reconstruction of metabolic pathways by combining probabilistic graphical model-based and knowledge-based methods. BMC Proc. 8(6), S5 (2014)

MD Robinson, DJ McCarthy, GK Smyth, edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1), 139–140 (2010)

T Joshi, K Patil, MR Fitzpatrick, LD Franklin, Q Yao, JR Cook, Z Wang, M Libault, L Brechenmacher, B Valliyodan, X Wu, J Cheng, G Stacey, HT Nguyen, D Xu, Soybean Knowledge Base (SoyKB): a web resource for soybean translational genomics. BMC Genomics 13(1), S15 (2012)

T Joshi, MR Fitzpatrick, S Chen, Y Liu, H Zhang, RZ Endacott, EC Gaudiello, G Stacey, HT Nguyen, D Xu, Soybean knowledge base (SoyKB): a web resource for integration of soybean translational genomics and molecular breeding. Nucl. Acids Res. 42(Database issue), D1245–52 (2014)

A Kasprzyk, BioMart: driving a paradigm change in biological data management. Database 2011(0), bar049–bar049 (2011)

M Hall, E Frank, G Holmes, B Pfahringer, P Reutemann, IH Witten, The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)

M Scutari, Learning Bayesian networks with the bnlearn R package, 2009

Z Wang, X-C Zhang, MH Le, D Xu, G Stacey, J Cheng, A protein domain co-occurrence network approach for predicting protein function and inferring species phylogeny. PLoS ONE 6(3), e17906 (2011)

Z Wang, R Cao, J Cheng, Three-level prediction of protein function by combining profile-sequence search, profile-profile search, and domain co-occurrence networks. BMC Bioinformatics 14, S3 (2013)

M Ashburner, CA Ball, JA Blake, D Botstein, H Butler, JM Cherry, AP Davis, K Dolinski, SS Dwight, JT Eppig, MA Harris, DP Hill, L Issel-Tarver, A Kasarskis, S Lewis, JC Matese, JE Richardson, M Ringwald, GM Rubin, G Sherlock, Gene ontology: tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000)

M Libault, A Farmer, L Brechenmacher, J Drnevich, RJ Langley, DD Bilgin, O Radwan, DJ Neece, SJ Clough, GD May, G Stacey, Complete transcriptome of the soybean root hair cell, a single-cell model, and its alteration in response to Bradyrhizobium japonicum infection. Plant Physiol. 152(2), 541–552 (2010)

D Meinke, M Koornneef, Community standards for Arabidopsis genetics. Plant J. 12(2), 247–253 (1997)

TA Tatusova, TL Madden, BLAST 2 sequences, a new tool for comparing protein and nucleotide sequences. FEMS Microbiol. Lett. 174(2), 247–50 (1999)

SF Altschul, TL Madden, AA Schaffer, J Zhang, Z Zhang, W Miller, DJ Lipman, G. BLAST, PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–402 (1997)

EJB Williams, DJ Bowles, Coexpression of neighboring genes in the genome of Arabidopsis thaliana. Genome Res. 14(6), 1060–1067 (2004)

V Srinivasasainagendra, GP Page, T Mehta, I Coulibaly, AE Loraine, CressExpress: a tool for large-scale mining of expression data from Arabidopsis. Plant Physiol. 147(3), 1004–1016 (2008)