A Draft Sequence of the Rice Genome ( Oryza sativa L. ssp. japonica )
Tóm tắt
Từ khóa
Tài liệu tham khảo
J. R. Harlan The Living Fields: Our Agricultural Heritage (Cambridge Univ. Press New York 1995) pp. 30–31.
World Agricultural Supply and Demand Estimates (WASDE) .
National Center for Biotechnology Information Database of Expressed Sequence Tags (www.ncbi.nlm.nih.gov/dbEST/dbEST_summary.html).
J. Yu S. Hu J. Wang J.
Li S., Chin. Sci. Bull. 46, 1937 (2001).
G. G. Presting et al. Novartis Found. Symp. 236 13 (2001).
R. A. Wing et al. in Rice Genetics IV Proceedings of the Fourth International Rice Genetics Symposium G. S. Khush D. S. Brar B. Hardy Eds. (IRRI Press Makati City Philippines 2001) pp. 215–225.
About 80% of the sequences were from paired (forward and reverse) reads with an average clone size of ∼1700 bp (18.5-fold genome coverage). More than fivefold coverage was from randomly selected clones with the remainder from resequencing gaps or low-quality regions. Low-voltage electrophoresis was used for resequencing which provided longer sequences with better quality and in many cases resulted in closing gaps between contigs. The resulting sequences were analyzed for contamination from nonrice DNA sources (∼500 000 reads) or rice repetitive DNA (∼1 500 000 reads) and the remainder assembled using the Myriad Assembly Program.
R. Apweiler et al. Bioinformatics 16 1145 (2000).
S. J. O'Brien et al. Nature Genet. 3 103 (1993).
Rice genome duplications were dated by calculating amino acid divergence rates of all possible paralogous protein pairs. 14 345 high-evidence rice proteins were grouped by chromosomes. Paralogous protein pairs were identified by comparing groups (BLASTP). Protein pairs are defined as those with 80% identity over a minimum of 30 amino acids. Protein pairs were aligned with CLUSTALW and amino acid divergence rates ( d A ) were estimated by PAML (Phylogenetic Analysis by Maximum Likelihood version 3.0 University College London) using the Dayhoff matrix. The divergence time calculation was based on a molecular clock rate of 9 × 10 −10 nonsynonymous substitutions per site per lineage per year and 2.25 nonsynonymous substitutions per amino acid change.
M. O. Dayhoff R. M. Schwartz B. C. Orcutt Atlas of Protein Sequence and Structure Vol. 5 (National Biomedical Research Foundation Washington DC 1978) pp. 345–352.
Arabidopsis annotated proteins of chromosomes 1 2 and 4 were obtained from GenBank and annotated proteins of chromosomes 3 and 5 were obtained from The Institute for Genomic Research (TIGR) (May 2001). Arabidopsis proteins from each chromosome were compared to anchored rice sequence contigs by BLAST effectively linking the Arabidopsis and rice maps and enabling a study of syntenic relationships between the two species. Requiring at least 70% identity over a minimum of 30 contiguous amino acids 98% of BLAST hits achieved E values of ≤ –7. Syntenic groups are defined as three or more Arabidopsis proteins from the same chromosome mapping to one rice BAC contig. Bootstrap analysis was used to determine the significance threshold (Table 4).
BLAST E score < –3 searching the draft sequence with the pfam0093 NB-ARC consensus sequence as the query.
Dictionary of Natural Products on CD-ROM (Chapman & Hall/CRC Press Boca Raton FL 2000).
Bun-Ya M., Nishimura M., Harashima S., Oshima Y., Mol. Cell. Biol. 11, 3229 (1991).
GenBank accession number .
GenBank accession number .
The 3501 TFs in the TRANSFAC data set (v5.2) were compared against the rice gene predictions (no size cutoff) using TBLASTN. Only matches with an E value ≤ –4 and in which the subject extended at least 70% of the length of the TF-specific motif or domain in the query were included. In a parallel analyses of the Arabidopsis genome 1799 TF genes were identified.
Gale M., Moore G., Devos K., Novartis Found. Symp. 236, 46 (2001).
Lee M., Symp. Soc. Exp. Biol. 50, 31 (1996).
We thank D. Patton J. Salmeron B. Dietrich A. Binder and L. Mattle for critical reading of the manuscript and S. Guimil for artwork.