P-value based visualization of codon usage data

Springer Science and Business Media LLC - Tập 1 - Trang 1-7 - 2006
Peter Meinicke1, Thomas Brodag2, Wolfgang Florian Fricke3, Stephan Waack2
1Abteilung Bioinformatik, Institut für Mikrobiologie und Genetik, Georg-August-Universität Göttingen, Göttingen, Germany
2Institut für Numerische und Angewandte Mathematik, Universität Göttingen, Göttingen, Germany
3Göttingen Genomics Laboratory, Universität Göttingen, Göttingen, Germany

Tóm tắt

Two important and not yet solved problems in bacterial genome research are the identification of horizontally transferred genes and the prediction of gene expression levels. Both problems can be addressed by multivariate analysis of codon usage data. In particular dimensionality reduction methods for visualization of multivariate data have shown to be effective tools for codon usage analysis. We here propose a multidimensional scaling approach using a novel similarity measure for codon usage tables. Our probabilistic similarity measure is based on P-values derived from the well-known chi-square test for comparison of two distributions. Experimental results on four microbial genomes indicate that the new method is well-suited for the analysis of horizontal gene transfer and translational selection. As compared with the widely-used correspondence analysis, our method did not suffer from outlier sensitivity and showed a better clustering of putative alien genes in most cases.

Tài liệu tham khảo

Médigue C, Rouxel T, Vigier P, Hénaut A, Danchin A: Evidence for horizontal gene transfer in Escherichia coli speciation. J Mol Biol. 1991, 222: 851-856. 10.1016/0022-2836(91)90575-Q Wang HC, Badger J, Kearney P, Li M: Analysis of codon usage patterns of bacterial genomes using the self-organizing map. Mol Biol Evol. 2001, 18: 792-792. Holm L: Codon usage and gene expression. Nucleic Acids Res. 1986, 14: 3075-3087. Shields DC, Sharp PM: Synonymous codon usage in Bacillus subtilis reflects both translational selection and mutational biases. Nucleic Acids Res. 1987, 15: 8023-8040. Hill MO: Correspondence analysis: a neglected multivariate method. Appl Stat. 1974, 23: 340-354. 10.2307/2347127. Perrière G, Thioulouse J: Use and misuse of correspondence analysis in codon usage studies. Nucleic Acids Res. 2002, 30: 4548-4555. 10.1093/nar/gkf565 Perrière G, Gouy M, Gojobori T: NRSub: a non-redundant data base for the Bacillus subtilis genome. Nucleic Acids Res. 1994, 22: 5525-5529. Moszer I, Rocha EP, Danchin A: Codon usage and lateral gene transfer in Bacillus subtilis. Curr Opin Microbiol. 1999, 2: 524-528. 10.1016/S1369-5274(99)00011-9 Mclnerney JO: Replicational and transcriptional selection on codon usage in Borrelia burgdorferi. Proc Natl Acad Sci USA. 1998, 95: 10698-10703. 10.1073/pnas.95.18.10698 Lafay B, Lloyd AT, McLean MJ, Devine KM, Sharp PM, Wolfe KH: Proteome composition and codon usage in spirochaetes: species-specific and DNA strand-specific mutational biases. Nucleic Acids Res. 1999, 27: 1642-1649. 10.1093/nar/27.7.1642 Romero H, Zavala A, Musto H: Codon usage in Chlamydia trachomatis is the result of strand-specific mutational biases and a complex pattern of selective forces. Nucleic Acids Res. 2000, 28: 2084-2090. 10.1093/nar/28.10.2084 Mclnerney JO: Prokaryotic Genome Evolution as Assessed by Multivariate Analysis of Codon Usage Patterns. Microbial and Comparative Genomics. 1997, 2: 1-10. Lafay B, Atherton JC, Sharp PM: Absence of translationally selected synonymous codon usage bias in Helicobacter pylori. Microbiology. 2000, 146 (Pt 4): 851-860. Gupta SK, Ghosh TC: Gene expressivity is the main factor in dictating the codon usage variation among the genes in Pseudomonas aeruginosa. Gene. 2001, 273: 63-63. 10.1016/S0378-1119(01)00576-5 Kohonen T: Self-Organizing Maps. 1995, Springer, Berlin Kanaya S, Kinouchi M, Abe T, Kudo Y, Yamada Y, Nishi T, Mori H, Ikemura T: Analysis of codon usage diversity of bacterial genes with a self-organizing map (SOM): characterization of horizontally transferred genes with emphasis on the E. coli O157 genome. Gene. 2001, 276: 89-89. 10.1016/S0378-1119(01)00673-4 Supek F, Vlahovicek K: INCA: synonymous codon usage analysis and clustering by means of self-organizing map. Bioinformatics. 2004, 20: 2329-2330. 10.1093/bioinformatics/bth238 Press WH, Flannery BP, Teukolsky SA, Vetterling WT: Numerical Recipes in C. 1992, Cambridge University Press, Cambridge, 2 Mardia KV, Kent JT, Bibby JM: Multivariate Analysis. 1979, Academic Press, London Karlin S, Mrazek J: Predicted highly expressed genes of diverse prokaryotic genomes. J Bacteriol. 2000, 182 (18): 5238-5250. 10.1128/JB.182.18.5238-5250.2000 Merkl R: SIGI: score-based identification of genomic islands. BMC Bioinformatics. 2004, 5: 22. 10.1186/1471-2105-5-22 Waack S, Keller O, Asper R, Brodag T, Damm C, Fricke WF, Surovcik K, Meinicke P, Merkl R: Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models. BMC Bioinformatics. 2006, 7: 142. 10.1186/1471-2105-7-142 European Bioinformatics Institute. http://www.ebi.ac.uk/genomes/ Waldor MK, Mekalanos JJ: Lysogenic conversion by a filamentous phage encoding cholera toxin. Science. 1996, 272 (5270): 1910-1914. Comment. Kunst F, Ogasawara N, Moszer I, Albertini AM, Alloni G, Azevedo V, Bertero MG, Bessieres P, Bolotin A, Borchert S, Borriss R, Boursier L, Brans A, Braun M, Brignell SC, Bron S, Brouillet S, Bruschi CV, Caldwell B, Capuano V, Carter NM, Choi SK, Codani JJ, Connerton IF, Danchin A: The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature. 1997, 390 (6657): 249-256. 10.1038/36786 Takemaru K, Mizuno M, Sato T, Takeuchi M, Kobayashi Y: Complete nucleotide sequence of a skin element excised by DNA rearrangement during sporulation in Bacillus subtilis. Microbiology. 1995, 141 (Pt 2): 323-327. Wood HE, Dawson MT, Devine KM, McConnell DJ: Characterization of PBSX, a defective prophage of Bacillus subtilis. J Bacteriol. 1990, 172 (5): 2667-2674. Zahler SA, Korman RZ, Rosenthal R, Hemphill HE: Bacillus subtilis bacteriophage SPbeta: localization of the prophage attachment site, and specialized transduction. J Bacteriol. 1977, 129 (1): 556-558. Lawrence JG, Ochman H: Molecular archaeology of the Escherichia coli genome. Proc Natl Acad Sci USA. 1998, 95 (16): 9413-9417. 10.1073/pnas.95.16.9413 Casjens S: Prophages and bacterial genomics: what have we learned so far?. Mol Microbiol. 2003, 49 (2): 277-300. 10.1046/j.1365-2958.2003.03580.x CodonW. http://codonw.sourceforge.net/