Genome display tool: visualizing features in complex data sets
Tóm tắt
The enormity of the information contained in large data sets makes it difficult to develop intuitive understanding. It would be useful to have software that allows visualization of possible correlations between properties that can be associated with a core data set. In the case of bacterial genomes, existing visualization tools focus on either global properties such as variations in composition or detailed local displays of the features that comprise the annotation. It is not easy to visualize other information in the context of this core information. A Java based software known as the Genome Display Tool (GDT), allows the user to simultaneously view the distribution of multiple attributes pertaining to genes and intragenic regions in a single bacterial genome using different colours and shapes on a single screen. The display represents each gene by small boxes that correlate with physical position in the genome. The size of the boxes is dynamically allocated based on the number of genes and a zoom feature allows close-up inspection of regions of interest. The display is interfaced with a MS-Access relational database and can display any feature in the database that can be represented by discrete values. Data is readily added to the database from an MS-Excel spread sheet. The functionality of GDT is demonstrated by comparing the results of two predictions of recent horizontal transfer events in the genome of Synechocystis PCC-6803. The resulting display allows the user to immediately see how much agreement exists between the two methods and also visualize how genes in various categories (e.g. predicted in both methods, one method etc) are distributed in the genome. The GDT software provides the user with a powerful tool that allows development of an intuitive understanding of the relative distribution of features in a large data set. As additional features are added to the data set, the number of possible correlations that can be visualized grows rapidly. Although described here for use in bacterial genomics, the principle is general and similar software might be useful in other contexts such as patient studies.
Tài liệu tham khảo
GeneVision. [http://www.dnastar.com/products/genvision.php]
Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW, Arva A, Lewis S: The generic genome browser: a building block for a model organism system database. Genome Res. 2002, 10: 1599-1610. 10.1101/gr.403602.
Rutherford K, Parkill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell B: Artemis: sequence visualization and annotation. Bioinformatics. 2000, 10: 944-945. 10.1093/bioinformatics/16.10.944.
Genboree-Baylor College of Medicine. [http://www.genboree.org/java-bin/login.jsp]
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The Human Genome Browser at UCSC. Genome Res. 2002, 12: 996-1006. 10.1101/gr.229102. Article published online before print in May 2002.
Hinrichs AS, Karolchik D, Baertsch R, Barber GP, Bejerano G, Clawson H, Diekhans M, Furey TS, Harte RA, Hsu F, Hillman-Jackson J, Kuhn RM, Pedersen JS, Pohl A, Raney BJ, Rosenbloom KR, Siepel A, Smith KE, Sugnet CW, Sultan-Qurraie A, Thomas DJ, Trumbower H, Weber RJ, Weirauch M, Zweig AS, Haussler D, Kent WJ: The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 2006, 34: D590-598. 10.1093/nar/gkj144.
Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA: The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003, 4: 41-10.1186/1471-2105-4-41.
National Center for Biotechnology Information. [http://www.ncbi.nlm.nih.gov/]
Escherichia coli K12 COG Table at NCBI. [http://www.ncbi.nlm.nih.gov/sutils/coxik.cgi?cut=45&gi=115]
Java is freely at the JAVA SE download site. [http://java.sun.com/javase/downloads/index.jsp]
Kaneko T, Sato S, Kotani H, Tanaka A, Asamizu E, Nakamura Y, Miyajima N, Hirosawa M, Sugiura M, Sasamoto S, Kimura T, Hosouch , Matsuno A, Muraki A, Nakazaki N, Naruo K, Okumura S, Shimpo S, Takeuchi C, Wada T, Watanabe A, Yamada M, Yasuda M, Tabata S: "Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions". DNA Res. 1996, 3: 109-136. 10.1093/dnares/3.3.109.
Ochman H, Lawrence JG, Groisman EA: Lateral gene transfer and the nature of bacterial innovation. Nature. 2000, 405: 299-304. 10.1038/35012500.
Garcia-Vallve S, Romeu A, Palau J: Horizontal gene transfer in bacterial and archaeal complete genomes. Genome Res. 2000, 10: 1719-1725. 10.1101/gr.130000.