Identification of mammalian orthologs using local synteny

Springer Science and Business Media LLC - Tập 10 - Trang 1-13 - 2009
Jin Jun1, Ion I Mandoiu1, Craig E Nelson2
1Computer Science & Engineering Department, University of Connecticut, Storrs, USA
2Department of Molecular and Cell Biology, University of Connecticut, Storrs, USA

Tóm tắt

Accurate determination of orthology is central to comparative genomics. For vertebrates in particular, very large gene families, high rates of gene duplication and loss, multiple mechanisms of gene duplication, and high rates of retrotransposition all combine to make inference of orthology between genes difficult. Many methods have been developed to identify orthologous genes, mostly based upon analysis of the inferred protein sequence of the genes. More recently, methods have been proposed that use genomic context in addition to protein sequence to improve orthology assignment in vertebrates. Such methods have been most successfully implemented in fungal genomes and have long been used in prokaryotic genomes, where gene order is far less variable than in vertebrates. However, to our knowledge, no explicit comparison of synteny and sequence based definitions of orthology has been reported in vertebrates, or, more specifically, in mammals. We test a simple method for the measurement and utilization of gene order (local synteny) in the identification of mammalian orthologs by investigating the agreement between coding sequence based orthology (Inparanoid) and local synteny based orthology. In the 5 mammalian genomes studied, 93% of the sampled inter-species pairs were found to be concordant between the two orthology methods, illustrating that local synteny is a robust substitute to coding sequence for identifying orthologs. However, 7% of pairs were found to be discordant between local synteny and Inparanoid. These cases of discordance result from evolutionary events including retrotransposition and genome rearrangements. By analyzing cases of discordance between local synteny and Inparanoid we show that local synteny can distinguish between true orthologs and recent retrogenes, can resolve ambiguous many-to-many orthology relationships into one-to-one ortholog pairs, and might be used to identify cases of non-orthologous gene displacement by retroduplicated paralogs.

Tài liệu tham khảo

Berglund A, Sjolund E, Ostlund G, Sonnhammer ELL: InParanoid 6: eukaryotic ortholog clusters with inparalogs. Nucl Acids Res. 2008, 36: D263-266. 10.1093/nar/gkm1020. Li L, Stoeckert CJ, Roos DS: OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res. 2003, 13: 2178-2189. 10.1101/gr.1224503. Kellis M, Birren BW, Lander ES: Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature. 2004, 428: 617-624. 10.1038/nature02424. Wapinski I, Pfeffer A, Friedman N, Regev A: Natural history and evolutionary principles of gene duplication in fungi. Nature. 2007, 449: 54-61. 10.1038/nature06107. Lemoine F, Lespinet O, Labedan B: Assessing the evolutionary rate of positional orthologous genes in prokaryotes using synteny data. BMC Evolutionary Biology. 2007, 7: 237-10.1186/1471-2148-7-237. Lemoine F, Labedan B, Lespinet O: SynteBase/SynteView: a tool to visualize gene order conservation in prokaryotic genomes. BMC Bioinformatics. 2008, 9: 536-10.1186/1471-2105-9-536. Fu Z, Chen X, Vacic V, Nan P, Zhong Y, Jiang T: A parsimony approach to genome-wide ortholog assignment. Research in Computational Molecular Biology. 2006, 3909: 578-594. 10.1007/11732990. Fu Z, Chen X, Vacic V, Nan P, Zhong Y, Jiang T: MSOAR: a high-throughput ortholog assignment system based on genome rearrangement. J Comput Biol. 2007, 14: 1160-75. 10.1089/cmb.2007.0048. Han MV, Hahn MW: Identifying parent-daughter relationships among duplicated genes. Proceedings of the Pacific Symposium on Biocomputing. 2009, 14: 114-125. Boffelli D, McAuliffe J, Ovcharenko D, Lewis KD, Ovcharenko I, Pachter L, Rubin EM: Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science. 2003, 299: 1391-1394. 10.1126/science.1081331. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389. Felsenstein J: PHYLIP (Phylogeny Inference Package) version 3.6. Department of Genetics, University of Washington, Seattle, [http://evolution.genetics.washington.edu/phylip.html] Finn RD, Mistry J, Schuster-Böckler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, Eddy SR, Sonnhammer EL, Bateman A: Pfam: clans, web tools and services. Nucl Acids Res. 2006, 34: D247-251. 10.1093/nar/gkj149. Flicek P, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Eyre T, Fitzgerald S, Fernandez-Banet J, Gräf S, Haider S, Hammond M, Holland R, Howe KL, Howe K, Johnson N, Jenkinson A, Kähäri A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Megy K, Meidl P, Overduin B, Parker A, Pritchard B, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Slater G, Smedley D, Spudich G, Trevanion S, Vilella AJ, Vogel J, White S, Wood M, Birney E, Cox T, Curwen V, Durbin R, Fernandez-Suarez XM, Herrero J, Hubbard TJ, Kasprzyk A, Proctor G, Smith J, Ureta-Vidal A, Searle S: Ensembl 2008. Nucl Acids Res. 2008, 36: D707-714. 10.1093/nar/gkm988. Chen F, Mackey AJ, Vermunt JK, Roos DS: Assessing performance of orthology detection strategies applied to eukaryotic genomes. PLoS ONE. 2007, 2: e383-10.1371/journal.pone.0000383. Hui SL, Zhou XH: Evaluation of diagnostic tests without gold standards. Statistical Methods in Medical Research. 1998, 7: 354-70. 10.1191/096228098671192352. Bandyopadhyay S, Sharan R, Ideker T: Systematic identification of functional orthologs based on protein network comparison. Genome Res. 2006, 16: 428-435. 10.1101/gr.4526006. Su Z, Wang J, Yu J, Huang X, Gu X: Evolution of alternative splicing after gene duplication. Genome Res. 2006, 16: 182-189. 10.1101/gr.4197006. Sui SJ, Fulton DL, Arenillas DJ, Kwon AT, Wasserman WW: oPOSSUM: integrated tools for analysis of regulatory motif over-representation. Nucl Acids Res. 2007, 35: W245-252. 10.1093/nar/gkm427. Gabaldón T, Dessimoz C, Huxley-Jones J, Vilella AJ, Sonnhammer EL, Lewis S: Joining forces in the quest for orthologs. Genome Biology. 2009, 10: 403-10.1186/gb-2009-10-9-403. Zheng XH, Lu F, Wang ZY, Zhong F, Hoover J, Mural R: Using shared genomic synteny and shared protein functions to enhance the identification of orthologous gene pairs. Bioinformatics. 2005, 21: 703-710. 10.1093/bioinformatics/bti045. Wapinski I, Pfeffer A, Friedman N, Regev A: Automatic genome-wide reconstruction of phylogenetic gene trees. Bioinformatics. 2007, 23: i549-558. 10.1093/bioinformatics/btm193. Chen K, Durand D, Farach-Colton M: NOTUNG: A program for dating gene duplications and optimizing gene family trees. J Comput Biol. 2000, 7: 429-47. 10.1089/106652700750050871. Poptsova MS, Gogarten JP: BranchClust: a phylogenetic algorithm for selecting gene families. BMC Bioinformatics. 2007, 8: 120-10.1186/1471-2105-8-120. Mouse Genome Sequencing Consortium: Initial sequencing and comparative analysis of the mouse genome. Nature. 2002, 420: 520-562. 10.1038/nature01262. Mural RJ, Adams MD, Myers EW, Smith HO, Miklos GL, Wides R, Halpern A, Li PW, Sutton GG, Nadeau J, Salzberg SL, Holt RA, Kodira CD, Lu F, Chen L, Deng Z, Evangelista CC, Gan W, Heiman TJ, Li J, Li Z, Merkulov GV, Milshina NV, Naik AK, Qi R, Shue BC, Wang A, Wang J, Wang X, Yan X, Ye J, Yooseph S, Zhao Q, Zheng L, Zhu SC, Biddick K, Bolanos R, Delcher AL, Dew IM, Fasulo D, Flanigan MJ, Huson DH, Kravitz SA, Miller JR, Mobarry CM, Reinert K, Remington KA, Zhang Q, Zheng XH, Nusskern DR, Lai Z, Lei Y, Zhong W, Yao A, Guan P, Ji RR, Gu Z, Wang ZY, Zhong F, Xiao C, Chiang CC, Yandell M, Wortman JR, Amanatides PG, Hladun SL, Pratts EC, Johnson JE, Dodson KL, Woodford KJ, Evans CA, Gropman B, Rusch DB, Venter E, Wang M, Smith TJ, Houck JT, Tompkins DE, Haynes C, Jacob D, Chin SH, Allen DR, Dahlke CE, Sanders R, Li K, Liu X, Levitsky AA, Majoros WH, Chen Q, Xia AC, Lopez JR, Donnelly MT, Newman MH, Glodek A, Kraft CL, Nodell M, Ali F, An HJ, Baldwin-Pitts D, Beeson KY, Cai S, Carnes M, Carver A, Caulk PM, Center A, Chen YH, Cheng ML, Coyne MD, Crowder M, Danaher S, Davenport LB, Desilets R, Dietz SM, Doup L, Dullaghan P, Ferriera S, Fosler CR, Gire HC, Gluecksmann A, Gocayne JD, Gray J, Hart B, Haynes J, Hoover J, Howland T, Ibegwam C, Jalali M, Johns D, Kline L, Ma DS, MacCawley S, Magoon A, Mann F, May D, McIntosh TC, Mehta S, Moy L, Moy MC, Murphy BJ, Murphy SD, Nelson KA, Nuri Z, Parker KA, Prudhomme AC, Puri VN, Qureshi H, Raley JC, Reardon MS, Regier MA, Rogers YH, Romblad DL, Schutz J, Scott JL, Scott R, Sitter CD, Smallwood M, Sprague AC, Stewart E, Strong RV, Suh E, Sylvester K, Thomas R, Tint NN, Tsonis C, Wang G, Wang G, Williams MS, Williams SM, Windsor SM, Wolfe K, Wu MM, Zaveri J, Chaturvedi K, Gabrielian AE, Ke Z, Sun J, Subramanian G, Venter JC, Pfannkoch CM, Barnstead M, Stephenson LD: A comparison of whole-genome shotgun-derived mouse chromosome 16 and the human genome. Science. 2002, 296: 1661-1671. 10.1126/science.1069193. Ohno S: Evolution by gene duplication. 1970, London, United Kingdom: Allen and Unwin Zhang J: Evolution by gene duplication: an update. Trends in Ecology & Evolution. 2003, 18: 292-298. 10.1016/S0169-5347(03)00033-8. Sakai H, Koyanagi KO, Imanishi T, Itoh T, Gojobori T: Frequent emergence and functional resurrection of processed pseudogenes in the human and mouse genomes. Gene. 2007, 389: 196-203. 10.1016/j.gene.2006.11.007. Vinckenbosch N, Dupanloup I, Kaessmann H: Evolutionary fate of retroposed gene copies in the human genome. Proc Natl Acad Sci USA. 2006, 103: 3220-3225. 10.1073/pnas.0511307103. Ensembl Release 48. [http://dec2007.archive.ensembl.org/index.html] Vermunt JK: LEM 1.0: A general program for the analysis of categorical data. Department of Methodology and Statistics, Tilburg University, [http://www.uvt.nl/faculteiten/fsw/organisatie/departementen/mto/software2.html] Qu Y, Tan M, Kutner MH: Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. Biometrics. 1996, 52: 797-810. 10.2307/2533043. Rogozin IB, Wolf YI, Sorokin AV, Mirkin BG, Koonin EV: Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution. Current Biology. 2003, 13: 1512-1517. 10.1016/S0960-9822(03)00558-X.