A Draft Sequence of the Rice Genome ( Oryza sativa L. ssp. indica )

American Association for the Advancement of Science (AAAS) - Tập 296 Số 5565 - Trang 79-92 - 2002
Jun Yu1,2,3,4, Songnian Hu1, Jun Wang1,5,2, Gane Ka‐Shu Wong1,2,4, Songgang Li1,5, Bin Liu1, Yajun Deng1,6, Li Dai1, Yan Zhou7,2, Xiuqing Zhang1,3, Mengliang Cao8, Jing Liu2, Jiandong Sun1, Jiabin Tang1,3, Feng Chen1,6, Xiaobing Huang1, Wei‐Yu Lin2, Chen Ye1, Wei Tong1, Lijuan Cong1, Jianing Geng1, Yujun Han1, Kun-Lin Wu1, Wei Li1,9, Guangqiang Hu1, Xiangang Huang1, Wenjie Li1, Jian Li1, Zhanwei Liu1, Long Li1, Jianping Liu1, Qiuhui Qi1, Jinsong Liu1, Li Li1, Tao Li1, Xuegang Wang1, Hong Lü1, Tingting Wu1, Miao Zhu1, Peixiang Ni1, Hua Han1, Wei Dong1,3, Xiaoyu Ren1, Xiaoli Feng1,3, Peng Cui1, Xianran Li1, Hao Wang1, Xin Xu1, Wenxue Zhai3, Xu Zhao1, Jin‐Song Zhang3, Si-Jie He3, Jianguo Zhang1, Junfeng Xu3, Kunlin Zhang1,5, Xianwu Zheng3, Jianhai Dong2, Wanyong Zeng3, Lin Tao2, Jia Ye2, Jun Tan2, Xide Ren1, Xuewei Chen3, Jun He2, Daofeng Liu3, Wei Tian2,6, Chaoguang Tian1, Hongai Xia1, Qiyu Bao1, Gang Li1, Hui Gao1, Ting Cao1, Juan Wang1, Wenming Zhao1, Ping Li3, Wei Chen1, Xudong Wang3, Yong Zhang1,5, Jianfei Hu1,5, Jing Wang1,5, Song Liu1, Jian Yang1, Guangyu Zhang1, Yuqing Xiong1, Zhijie Li1, Long Mao3, Chengshu Zhou8, Zhen Zhu3, Runsheng Chen1,9, Bailin Hao2,10, Wei‐Mou Zheng1,10, Shou‐Yi Chen3, Wei Guo11, Guojie Li12, Siqi Liu1,2, Ming Tao1,2, Jian Wang1,2, Li Zhu3, Longping Yuan8, Huanming Yang1,2,3
1Beijing Genomics Institute/Center of Genomics and Bioinformatics, Chinese Academy of Sciences, Beijing 101300, China
2Hangzhou Genomics Institute, Institute of Bioinformatics of Zhejiang University, Key Laboratory of Bioinformatics of Zhejiang Province, Hangzhou 310007, China
3Institute of Genetics, Chinese Academy of Sciences, Beijing 100101, China
4University of Washington Genome Center, Department of Medicine, Seattle, WA 98195, USA.
5College of Life Sciences, Peking University, Beijing 100871, China
6Medical College, Xi'an Jiaotong University; Xi'an 710061, China
7Fudan University Shanghai, 200433, China
8National Hybrid Rice R&D Center, Changsha 410125, China.
9Laboratory of Bioinformatics, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.
10Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100080, China
11Digital China Ltd., Beijing 100080, China.
12Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China

Tóm tắt

We have produced a draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp. indica , by whole-genome shotgun sequencing. The genome was 466 megabases in size, with an estimated 46,022 to 55,615 genes. Functional coverage in the assembled sequences was 92.0%. About 42.2% of the genome was in exact 20-nucleotide oligomer repeats, and most of the transposons were in the intergenic regions between genes. Although 80.6% of predicted Arabidopsis thaliana genes had a homolog in rice, only 49.4% of predicted rice genes had a homolog in A. thaliana . The large proportion of rice genes with no recognizable homologs is due to a gradient in the GC content of rice coding sequences.

Từ khóa


Tài liệu tham khảo

10.1016/S1369-5266(99)00047-3

10.1105/tpc.12.11.2011

10.1007/BF02672069

The Arabidopsis Genome Initiative Nature 408 796 (2000).

10.1016/S0092-8674(01)00382-8

10.1016/S1369-5266(00)00144-8

International Human Genome Sequencing Consortium Nature 409 860 (2001).

10.1126/science.1058040

10.1038/cr.1994.13

10.1007/BF01682091

10.1016/S0168-9525(99)01744-8

G. L. Wang et al. Plant J. 7 525 ( 1995).

10.1073/pnas.95.5.1971

10.1073/pnas.95.5.2017

10.1016/S1369-5266(99)80018-1

Major Web sites for rice genome data: ; ; ; .

10.1126/science.291.5505.807A

10.1038/35054705

Z. Y. Dai B. H. Zhao X. J. Liu (in Chinese) Jiangsu Agric. Sci. 4 13 (1997).

L. P. Yuan (in Chinese) Hybrid Rice 1 1 (1997).

10.1007/BF02901901

J. Wang et al. Genome Res. in press. The software can be obtained by e-mailing the authors at [email protected].

10.1105/tpc.12.7.1021

10.1146/annurev.genet.33.1.479

10.1126/science.287.5461.2185

10.1101/gr.8.3.175

10.1101/gr.8.3.186

For the plasmid shotgun libraries a DNA isolation protocol was modified from Sambrook and Russell (29). Fresh leaves at the seeding stage were ground in liquid nitrogen before complete lysis (30). Purified high–molecular weight genomic DNA was sonicated and sized on agarose gels selecting for fragments of size 1.5 to 3.0 kb. QIAEX Gel Extraction Kit (QIAGEN) was used to purify DNA from the gel slices. Genomic fragments were ligated to Sma I–linearized pUC18 plasmids and transformed into DH10B-competent cells by electroporation.

J. Sambrook J. D. Russell Molecular Cloning (Cold Spring Harbor Laboratory Press Cold Spring Harbor NY ed. 3 2001).

10.1016/0168-9452(92)90062-Q

Single colonies were grown in 96-deep-well plates and plasmid DNA was prepared by alkaline lysis (32). Quality of DNA and insert sizes were examined by agarose gel electrophoresis. Purified plasmid DNA (200 ng; Amersham Pharmacia Biotech Beijing) was used for the sequencing reactions. DNA sequencing was done with MegaBACE 1000 capillary sequencers (Amersham Pharmacia Biotech Beijing). Machine parameters were adjusted for high output (10 to 11 runs a day on average).

10.1016/0076-6879(83)00059-2

Jiang N., Wessler S. R., Plant Cell 13, 2553 (2001).

P. Green .

10.1016/0888-7543(88)90007-9

10.1016/S0076-6879(96)66029-7

Sources for STS STR restriction fragment length polymorphism sequences: ; ; http: //rgp.dna.affrc.go.jp/publicdata/geneticmap2000/index.html.

A. F. Smit P. Green .

10.1101/gr.190201

10.1007/BF02173653

10.1073/pnas.45.7.1039

10.1038/1831429a0

10.1016/S0378-1119(00)00472-8

10.1016/S0378-1119(00)00441-8

10.1038/35080577

10.1016/S0959-437X(00)00144-1

10.1146/annurev.genet.32.1.185

10.1093/genetics/154.4.1819

10.1016/0022-2836(73)90240-4

10.1126/science.4001930

10.1101/gr.148900

G. K. S. Wong et al. Genome Res. in press.

Wong G. K. S., Passey D. A., Yu J., Genome Res. 11, 1672 (2001).

10.1073/pnas.96.14.8265

10.1046/j.1365-313x.2001.01062.x

10.1016/0888-7543(92)90024-M

Messing J., Trends Genet. 6, 196 (2001).

10.1016/S1360-1385(01)02038-6

10.1016/S0168-9525(00)02093-X

; .

10.1101/gr.10.7.982

10.1046/j.1365-313x.2001.00945.x

S. Temnykh et al . Genome Res. 11 1441 (2001).

10.1016/0092-8674(92)90302-S

10.1038/ng0294-114

10.1093/nar/20.2.211

10.1038/322652a0

10.1073/pnas.97.1.245

Tarchini R., Biddle P., Wineland R., Tingey S., Rafalski A., Plant Cell 12, 381 (2000).

10.1073/pnas.211442198

10.1016/S0959-437X(96)80030-X

10.1073/pnas.93.16.8524

10.1016/S0168-9525(01)02445-3

10.1016/S0959-437X(00)00252-5

10.1126/science.860134

10.1101/gr.10.4.516

; .

10.1093/nar/26.4.1107

. .

10.1006/jmbi.1997.0951

; .

10.1093/nar/27.23.4636

; .

K. Sakata et al. Abstracts of 4th Annual Conference on Computational Genomics (2000) p. 31; .

10.1101/gr.147901

10.1016/S1369-5266(00)00149-7

10.1093/nar/29.1.44

The Gene Ontology Consortium Nature Genet. 25 25 (2000).

10.1126/science.282.5389.656

10.1016/S1369-5266(99)00048-5

10.1101/gr.9.9.825

10.1139/g99-033

10.1101/gr.GR-1617R

10.1101/gr.194501

V. Brendel S. Kurtz V. Walbot Genome Biol. 3 reviews 1005.1 (2002).

10.1016/S1369-5266(00)00144-8

10.2307/2412448

Arabidopsis genome annotations: .

Gasteiger E., Jung E., Bairoch A.Curr. Iss. Mol. Biol.3200147; .

; .

For each protein query we created an array with one element for each amino acid position. Blast_hits() recorded the number of times that each position was covered by a TblastN hit. Each hit had associated with it a score for the percentage of identically matched amino acids. AA_identity() recorded the maximum and minimum score at each position across all TblastN hits. “Extent of hit ” quoted as a percentage of the protein length is the number of nonzero elements in Blast_hits(). “AA identity” and “hits per gene” are the median values of AA_identity() and Blast_hits() computed over positions with one or more hits. We used the median instead of the mean to minimize the likelihood of counting a highly duplicated domain when the entire protein is not duplicated.

10.1093/protein/8.6.513

10.1016/S1359-0278(96)00032-6

S. J. Wheelan A. Marchler-Bauer S. H. Bryant. Bioinformatics 16 613 (2000).

10.1073/pnas.96.25.14400

10.1146/annurev.ecolsys.28.1.359

10.1126/science.264.5157.421

10.1023/A:1006480722854

10.1104/pp.125.3.1283

10.1073/pnas.151244298

10.1038/ng0997-21

10.1038/13833

10.1104/pp.124.4.1483

10.1016/S0168-9525(99)01830-2

10.1101/gr.9.12.1288

10.1038/74153

10.1016/S0014-5793(00)01581-7

10.1093/nar/29.13.2850

10.1074/jbc.270.6.2411

10.1016/S0968-0004(00)01549-8

10.1016/S0955-0674(00)00212-X

10.1016/S1360-1385(00)01595-8

10.1038/ng803

10.1016/S0168-9525(00)89009-5

Source for BAC-end sequences: .

We are indebted to faculty and staff at the Beijing Genomics Institute whose names were not listed but who also contributed to the team effort (www.genomics.org.cn). We are indebted to our scientific advisors M. V. Olson L. Bolund R. Waterston E. Lander and M.-C. King for their long-term support. We are grateful to R. Wu and C. Herlache for editorial assistance on the manuscript. We thank Amersham Pharmacia Biotech (China) Ltd. SUN Microsystems (China) Inc. and Dawning Computer Corp. for their support and service. This project was jointly sponsored by the Chinese Academy of Science the Commission for Economy Planning the Ministry of Science and Technology the Zhejiang Provincial Government the Hangzhou Municipal Government the Beijing Municipal Government and the National Natural Science Foundation of China. The analysis was supported in part by the National Institute of Environmental Health Sciences (grant 1 RO1 ES09909).