A Draft Sequence of the Rice Genome ( Oryza sativa L. ssp. japonica )

American Association for the Advancement of Science (AAAS) - Tập 296 Số 5565 - Trang 92-100 - 2002
Stephen A. Goff1, Darrell Ricke1, Tian Lan1, Gernot G. Presting1, Ronglin Wang1, Mary Dunn1, Jane Glazebrook1, Allen Sessions1, Paul W. Oeller1, Hemant Varma1, David Hadley1, Don Hutchison1, Chris Martin1, Fumiaki Katagiri1, B. Markus Lange1, Todd Moughamer1, Yu Xia1, Paul Budworth1, Jingping Zhong1, Trini Miguel1, Uta Paszkowski1, Shiping Zhang1, Michelle Colbert1, Weilin Sun1, Lili Chen1, Bret Cooper1, Sylvia Park1, Todd C. Wood2, Long Mao3, Peter H. Quail4, Rod A. Wing5, Ralph A. Dean5, Yeisoo Yu5, Andrey Zharkikh6, Richard Shen6, Sudhir Sahasrabudhe6, Alun Thomas6, Rob Cannings6, Alexander Gutin6, Dmitry Pruss6, Julia Reid6, Sean V. Tavtigian6, Jeff T. Mitchell6, Glenn Eldredge6, Terri Scholl6, Rose Mary Miller6, S. K. Bhatnagar6, Nils B. Adey6, Todd Rubano6, Nadeem Tusneem6, Rosann Robinson6, Jane Feldhaus6, Teresita Macalma6, Arnold Oliphant6, Steven P. Briggs1
1Torrey Mesa Research Institute, Syngenta, 3115 Merryfield Row, San Diego, CA 92121, USA (www.tmri.org).
2Bryan College, Dayton, TN 37321, USA.
3Department of Biological Sciences, Northern Illinois University, DeKalb, IL 60115, USA,
4Department of Plant and Microbial Biology University of California Berkeley CA 94720 USA
5Clemson University Genomics Institute, 100 Jordan Hall, Clemson, SC 29630, USA.
6Myriad Genetics, 320 Wakara Way, Salt Lake City, UT 84108, USA

Tóm tắt

The genome of the japonica subspecies of rice, an important cereal and model monocot, was sequenced and assembled by whole-genome shotgun sequencing. The assembled sequence covers 93% of the 420-megabase genome. Gene predictions on the assembled sequence suggest that the genome contains 32,000 to 50,000 genes. Homologs of 98% of the known maize, wheat, and barley proteins are found in rice. Synteny and gene homology between rice and the other cereal genomes are extensive, whereas synteny with Arabidopsis is limited. Assignment of candidate rice orthologs to Arabidopsis genes is possible in many cases. The rice genome sequence provides a foundation for the improvement of cereals, our most important crops.

Từ khóa


Tài liệu tham khảo

J. R. Harlan The Living Fields: Our Agricultural Heritage (Cambridge Univ. Press New York 1995) pp. 30–31.

World Agricultural Supply and Demand Estimates (WASDE) .

10.1023/A:1005810616885

10.1073/pnas.95.5.2005

10.1126/science.282.5389.656

10.1104/pp.125.3.1155

10.1104/pp.125.3.1191

10.1016/S1369-5266(99)00047-3

National Center for Biotechnology Information Database of Expressed Sequence Tags (www.ncbi.nlm.nih.gov/dbEST/dbEST_summary.html).

10.1104/pp.125.3.1164

J. Yu S. Hu J. Wang J.

Li S., Chin. Sci. Bull. 46, 1937 (2001).

10.1073/pnas.86.16.6201

The Arabidopsis Genome Initiative Nature 408 796 (2000).

10.1073/pnas.92.24.10831

10.1073/pnas.95.5.1993

M. D. Adams et al. Science 287 2185 (2000).

J. C. Venter et al. Science 291 1304 (2001).

J. C. Venter et al. Science 280 1540 (1998).

G. G. Presting et al. Novartis Found. Symp. 236 13 (2001).

L. Mao et al. Genome Res. 10 982 (2000).

M. Chen et al. Plant Cell 14 1 (2002).

R. A. Wing et al. in Rice Genetics IV Proceedings of the Fourth International Rice Genetics Symposium G. S. Khush D. S. Brar B. Hardy Eds. (IRRI Press Makati City Philippines 2001) pp. 215–225.

About 80% of the sequences were from paired (forward and reverse) reads with an average clone size of ∼1700 bp (18.5-fold genome coverage). More than fivefold coverage was from randomly selected clones with the remainder from resequencing gaps or low-quality regions. Low-voltage electrophoresis was used for resequencing which provided longer sequences with better quality and in many cases resulted in closing gaps between contigs. The resulting sequences were analyzed for contamination from nonrice DNA sources (∼500 000 reads) or rice repetitive DNA (∼1 500 000 reads) and the remainder assembled using the Myriad Assembly Program.

J. B. Hogenesch et al. Cell 106 413 (2001).

10.1093/nar/27.1.215

A. Bateman et al. Nucleic Acids Res. 28 263 (2000).

R. Apweiler et al. Nucleic Acids Res. 29 37 (2001).

R. Apweiler et al. Bioinformatics 16 1145 (2000).

10.1016/S0959-440X(00)00095-6

The C. elegans Sequencing Consortium Science 282 2012 (1998).

Y. Harushima et al. Genetics 148 479 (1998).

S. J. O'Brien et al. Nature Genet. 3 103 (1993).

T. H. Lan et al. Genome Res. 10 776 (2000).

Rice genome duplications were dated by calculating amino acid divergence rates of all possible paralogous protein pairs. 14 345 high-evidence rice proteins were grouped by chromosomes. Paralogous protein pairs were identified by comparing groups (BLASTP). Protein pairs are defined as those with 80% identity over a minimum of 30 amino acids. Protein pairs were aligned with CLUSTALW and amino acid divergence rates ( d A ) were estimated by PAML (Phylogenetic Analysis by Maximum Likelihood version 3.0 University College London) using the Dayhoff matrix. The divergence time calculation was based on a molecular clock rate of 9 × 10 −10 nonsynonymous substitutions per site per lineage per year and 2.25 nonsynonymous substitutions per amino acid change.

10.1126/science.290.5499.2114

M. O. Dayhoff R. M. Schwartz B. C. Orcutt Atlas of Protein Sequence and Structure Vol. 5 (National Biomedical Research Foundation Washington DC 1978) pp. 345–352.

10.1023/A:1006319803002

10.1073/pnas.94.13.6809

W. A. Wilson et al. Genetics 153 453 (1999).

10.1093/oxfordjournals.molbev.a025677

A. M. van Dodeweerd et al. Genome 42 887 (1999).

K. Mayer et al. Genome Res. 11 1167 (2001).

A. H. Paterson et al. Nature Genet. 14 380 (1996).

Arabidopsis annotated proteins of chromosomes 1 2 and 4 were obtained from GenBank and annotated proteins of chromosomes 3 and 5 were obtained from The Institute for Genomic Research (TIGR) (May 2001). Arabidopsis proteins from each chromosome were compared to anchored rice sequence contigs by BLAST effectively linking the Arabidopsis and rice maps and enabling a study of syntenic relationships between the two species. Requiring at least 70% identity over a minimum of 30 contiguous amino acids 98% of BLAST hits achieved E values of ≤ –7. Syntenic groups are defined as three or more Arabidopsis proteins from the same chromosome mapping to one rice BAC contig. Bootstrap analysis was used to determine the significance threshold (Table 4).

10.1073/pnas.160271297

10.1073/pnas.070430597

10.1038/35081161

BLAST E score < –3 searching the draft sequence with the pfam0093 NB-ARC consensus sequence as the query.

10.1016/S1369-5266(00)00177-1

M. Yano et al. Plant Cell 12 2473 (2000).

L. Pnueli et al. Development 125 1979 (1998).

D. Bradley et al. Nature 379 791 (1996).

10.1016/S0092-8674(00)81188-5

J. Peng et al. Nature 400 256 (1999).

J. M. Thornsberry et al. Nature Genet. 28 286 (2001).

B. A. Ambrose et al. Mol. Cell 5 569 (2000).

10.1073/pnas.95.5.1979

10.1023/A:1026429922616

Y. Y. Chung et al. Plant Sci. 109 45 (1995).

10.1023/A:1006051911291

10.1073/pnas.240454797

10.1038/35081178

10.1016/S1360-1385(99)01476-4

10.1073/pnas.95.8.4126

10.1146/annurev.arplant.47.1.245

10.1016/S1360-1385(00)01741-6

10.1016/S0031-9422(00)00450-7

K. F. Tierens et al. Plant Physiol. 125 1688 (2001).

10.1104/pp.98.4.1304

Dictionary of Natural Products on CD-ROM (Chapman & Hall/CRC Press Boca Raton FL 2000).

10.1016/S1360-1385(00)01746-5

10.1105/tpc.11.4.661

Bun-Ya M., Nishimura M., Harashima S., Oshima Y., Mol. Cell. Biol. 11, 3229 (1991).

10.1128/JB.180.8.2253-2256.1998

C. Rausch et al. Nature 414 462 (2001).

P. Daram et al. Plant Cell 11 2153 (1999).

GenBank accession number .

10.1073/pnas.93.19.10519

GenBank accession number .

The 3501 TFs in the TRANSFAC data set (v5.2) were compared against the rice gene predictions (no size cutoff) using TBLASTN. Only matches with an E value ≤ –4 and in which the subject extended at least 70% of the length of the TF-specific motif or domain in the query were included. In a parallel analyses of the Arabidopsis genome 1799 TF genes were identified.

J. L. Riechmann et al. Science 290 2105 (2000).

10.1073/pnas.95.5.1971

Gale M., Moore G., Devos K., Novartis Found. Symp. 236, 46 (2001).

10.1073/pnas.96.14.8265

10.1101/gr.7.4.301

10.1104/pp.125.1.152

10.1016/S1360-1385(00)01629-0

10.1016/S0959-437X(96)80025-6

Lee M., Symp. Soc. Exp. Biol. 50, 31 (1996).

10.1016/S0168-9525(00)89157-X

10.1093/genetics/141.1.333

10.1093/genetics/140.2.745

J. C. Lanceras et al. DNA Res. 7 93 (2000).

T. J. Flowers et al. J. Exp. Bot. 51 99 (2000).

10.1023/A:1005764209331

10.1101/gr.5.4.321

10.1093/genetics/149.1.383

S. R. McCouch et al. Plant Mol. Biol. 35 89 (1997).

10.2135/cropsci1996.0011183X003600050040x

10.1016/S1369-5266(99)00041-2

10.1016/S1369-5266(00)00137-0

O. J. Ratcliffe et al. Development 125 1609 (1998).

10.1016/0092-8674(94)90291-7

We thank D. Patton J. Salmeron B. Dietrich A. Binder and L. Mattle for critical reading of the manuscript and S. Guimil for artwork.