Distinct patterns of SSR distribution in the Arabidopsis thalianaand rice genomes

Genome Biology - Tập 7 - Trang 1-11 - 2006
Mark J Lawson1, Liqing Zhang1
1Department of Computer Science Virginia Tech, Blacksburg, USA

Tóm tắt

Simple sequence repeats (SSRs) in DNA have been traditionally thought of as functionally unimportant and have been studied mainly as genetic markers. A recent handful of studies have shown, however, that SSRs in different positions of a gene can play important roles in determining protein function, genetic development, and regulation of gene expression. We have performed a detailed comparative study of the distribution of SSRs in the sequenced genomes of Arabidopsis thaliana and rice. SSRs in different genic regions - 5'untranslated region (UTR), 3'UTR, exon, and intron - show distinct patterns of distribution both within and between the two genomes. Especially notable is the much higher density of SSRs in 5'UTRs compared to the other regions and a strong affinity towards trinucleotide repeats in these regions for both rice and Arabidopsis. On a genomic level, mononucleotide repeats are the most prevalent type of SSRs in Arabidopsis and trinucleotide repeats are the most prevalent type in rice. Both plants have the same most common mononucleotide (A/T) and dinucleotide (AT and AG) repeats, but have little in common for the other types of repeats. Our work provides insight into the evolution and distribution of SSRs in the two sequenced model plant genomes of monocots and dicots. Our analyses reveal that the distributions of SSRs appear highly non-random and vary a great deal in different regions of the genes in the genomes.

Tài liệu tham khảo

Li YC, Korol AB, Fahima T, Nevo E: Microsatellites within genes: Structure, function, and evolution. Mol Biol Evol. 2004, 21: 991-1007. 10.1093/molbev/msh073. Karlin S, Burge C: Trinucleotide repeats and long homopeptides in genes and proteins associated with nervous system disease and development. Proc Natl Acad Sci USA. 1996, 93: 1560-1565. 10.1073/pnas.93.4.1560. Fondon JW, Garner HR: Molecular origins of rapid and continuous morphological evolution. Proc Natl Acad Sci USA. 2004, 101: 18058-18063. 10.1073/pnas.0408118101. Toutenhoofd SL, Garcia F, Zacharias DA, Wilson RA, Strehler EE: Minimum CAG repeat in the human calmodulin-1 gene 5' untranslated region is required for full expression. Biochim Biophys Acta. 1998, 1398: 315-320. 10.1016/S0167-4781(98)00056-6. Meloni R, Albanese V, Ravassard P, Treilhou F, Mallet J: A tetranucleotide polymorphic microsatellite, located in the first intron of the tyrosine hydroxylase gene, acts as a transcription regulatory element in vitro. Hum Mol Genet. 1998, 7: 423-428. 10.1093/hmg/7.3.423. Ranum LPW, Day JW: Dominantly inherited, non-coding microsatellite expansion disorders. Curr Opin Genet Dev. 2002, 12: 266-271. 10.1016/S0959-437X(02)00297-6. Portis E, Acquadro A, Comino C, Mauromicale G, Saba E, Lanteri S: Genetic structure of island populations of wild cardoon [Cynara cardunculus L. var. sylvestris (Lamk) Fiori] detected by AFLPs and SSRs. Plant Sci. 2005, 169: 199-210. 10.1016/j.plantsci.2005.03.014. Lu H, Redus MA, Coburn JR, Rutger JN, McCouch SR, Tai TH: Population structure and breeding patterns of 145 US rice cultivars based on SSR marker analysis. Crop Sci. 2005, 45: 66-76. Saini N, Jain N, Jain S, Jain RK: Assessment of genetic diversity within and among Basmati and non-Basmati rice varieties using AFLP, ISSR and SSR markers. Euphytica. 2004, 140: 133-146. 10.1007/s10681-004-2510-y. Rode J, In-Chol K, Saal B, Flachowsky H, Kriese U, Weber WE: Sex-linked SSR markers in hemp. Plant Breeding. 2005, 124: 167-170. 10.1111/j.1439-0523.2005.01079.x. Casacuberta E, Puigdomenech P, Monfort A: Distribution of microsatellites in relation to coding sequences within the Arabidopsis thaliana genome. Plant Sci. 2000, 157: 97-104. 10.1016/S0168-9452(00)00271-5. Zhang LD, Yuan DJ, Yu SW, Li ZG, Cao YF, Miao ZQ, Qian HM, Tang KX: Preference of simple sequence repeats in coding and non-coding regions of Arabidopsis thaliana. Bioinformatics. 2004, 20: 1081-1086. 10.1093/bioinformatics/bth043. Yuan QP, Ouyang S, Liu J, Suh B, Cheung F, Sultana R, Lee D, Quackenbush J, Buell CR: The TIGR rice genome annotation resource: annotating the rice genome and creating resources for plant biologists. Nucleic Acids Res. 2003, 31: 229-233. 10.1093/nar/gkg059. Jurka J, Pethiyagoda C: Simple repetitive DNA-sequences from primates - compilation and analysis. J Mol Evol. 1995, 40: 120-126. 10.1007/BF00167107. Faux NG, Bottomley SP, Lesk AM, Irving JA, Morrison JR, de la Banda MC, Whisstock JC: Functional insights from the distribution and role of homopeptide repeat-containing proteins. Genome Res. 2005, 15: 537-551. 10.1101/gr.3096505. Fiebig A, Kimport R, Preuss D: Comparisons of pollen coat genes across Brassicaceae species reveal rapid evolution by repeat expansion and diversification. Proc Natl Acad Sci USA. 2004, 101: 3286-3291. 10.1073/pnas.0305448101. Gramene. [http://www.gramene.org] Wolfe KH, Gouy ML, Yang YW, Sharp PM, Li WH: Date of the monocot dicot divergence estimated from chloroplast DNA-sequence data. Proc Natl Acad Sci USA. 1989, 86: 6201-6205. Vision TJ, Brown DG, Tanksley SD: The origins of genomic duplications in Arabidopsis. Science. 2000, 290: 2114-2117. 10.1126/science.290.5499.2114. Blanc G, Hokamp K, Wolfe KH: A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome. Genome Res. 2003, 13: 137-144. 10.1101/gr.751803. Arabidopsis Genome Initiative: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000, 408: 796-815. 10.1038/35048692. Yu J, Hu SN, Wang J, Wong GKS, Li SG, Liu B, Deng YJ, Dai L, Zhou Y, Zhang XQ, et al: A draft sequence of the rice genome (Oryza sativa L. ssp indica). Science. 2002, 296: 79-92. 10.1126/science.1068037. Goff SA, Ricke D, Lan TH, Presting G, Wang RL, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, et al: A draft sequence of the rice genome (Oryza sativa L. ssp japonica). Science. 2002, 296: 92-100. 10.1126/science.1068275. Blanc G, Wolfe KH: Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell. 2004, 16: 1667-1678. 10.1105/tpc.021345. Matsumoto T, Wu JZ, Kanamori H, Katayose Y, Fujisawa M, Namiki N, Mizuno H, Yamamoto K, Antonio BA, Baba T, et al: The map-based sequence of the rice genome. Nature. 2005, 436: 793-800. 10.1038/nature03895. Wong GKS, Wang J, Tao L, Tan J, Zhang JG, Passey DA, Yu J: Compositional gradients in Gramineae genes. Genome Res. 2002, 12: 851-856. 10.1101/gr.189102. Metzgar D, Bytof J, Wills C: Selection against frameshift mutations limits microsatellite expansion in coding DNA. Genome Res. 2000, 10: 72-80. Toth G, Gaspari Z, Jurka J: Microsatellites in different eukaryotic genomes: Survey and analysis. Genome Res. 2000, 10: 967-981. 10.1101/gr.10.7.967. Lockton S, Gaut BS: Plant conserved non-coding sequences and paralogue evolution. Trends Genet. 2005, 21: 60-65. 10.1016/j.tig.2004.11.013. The Arabidopsis Information Resource (TAIR). [http://www.arabidopsis.org] TIGR Rice Genome Annotation. [http://rice.tigr.org] Kolpakov R, Bana G, Kucherov G: mreps: efficient and flexible detection of tandem repeats in DNA. Nucleic Acids Res. 2003, 31: 3672-3678. 10.1093/nar/gkg617. de Wachter R: The number of repeats expected in random nucleic-acid sequences and found in genes. J Theor Biol. 1981, 91: 71-98. 10.1016/0022-5193(81)90375-1. Gene Ontology. [http://www.geneontology.org] Feller W: An Introduction to Probability Theory and its Applications. 1968, New York: John Wiley and Sons Inc