Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement

Nature Biotechnology - Tập 33 Số 5 - Trang 531-537 - 2015
Tianzhen Zhang1, Yan Hu1, Wenkai Jiang2, Lei Fang1, Xueying Guan1, Jiedan Chen1, Jinbo Zhang2, Christopher Saski3, H. Martin Schaefer4, David M. Stelly5, Amanda M. Hulse‐Kemp5, Qun Wan1, Bingliang Liu1, Chunxiao Liu1, Sen Wang1, Mengqiao Pan1, Yangkun Wang1, Dawei Wang2, Wenxue Ye1, Lijing Chang1, Wenpan Zhang1, Qingxin Song6, Ryan C. Kirkbride6, Xiao‐Ya Chen7, Elizabeth S. Dennis8, Danny Llewellyn8, Daniel G. Peterson9, P. M. Thaxton10, Don C. Jones11, Qiong Wang1, Xiaoyang Xu1, Hua Zhang1, Huaitong Wu1, Lei Zhou1, Gaofu Mei1, Shuqi Chen1, Yue Tian1, Dan Xiang1, Xinghe Li1, Jian Ding1, Qiyang Zuo2, Linna Tao2, Yunchao Liu2, Ji Li2, Yu Lin2, Yuanyuan Hui2, Zhisheng Cao2, Caiping Cai1, Xiefei Zhu1, Zhi Jiang2, Baoliang Zhou1, Wangzhen Guo1, Ruiqiang Li2, Z. Jeffrey Chen6
1State Key Laboratory of Crop Genetics and Germplasm Enhancement, Cotton Hybrid R & D Engineering Center (the Ministry of Education), Nanjing Agricultural University, Nanjing, Jiangsu, China
2Novogene Bioinformatics Institute, Beijing, China
3Clemson University Genomics Institute, Clemson University, Clemson, South Carolina, USA
4US Department of Agriculture (USDA), Agricultural Research Service (ARS), Middle Southern Area (MSA) Genomics Laboratory, Stoneville, Mississippi, USA
5Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, USA
6Department of Molecular Biosciences, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, USA
7National Key Laboratory of Plant Molecular Genetics, National Plant Gene Research Center, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
8The Commonwealth Scientific and Industrial Research Organisation, Plant Industry, Black Mountain, Australia
9Department of Plant and Soil Sciences, Mississippi State University, Starkville, Mississippi, USA
10Delta Research and Extension Center, Mississippi State University, Stoneville, Mississippi, USA
11Cotton Incorporated, Cary, North Carolina, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

Chen, Z.J. et al. Toward sequencing cotton (Gossypium) genomes. Plant Physiol. 145, 1303–1310 (2007).

Li, F. et al. Genome sequence of the cultivated cotton Gossypium arboreum. Nat. Genet. 46, 567–572 (2014).

Wang, K. et al. The draft genome of a diploid cotton Gossypium raimondii. Nat. Genet. 44, 1098–1103 (2012).

Paterson, A.H. et al. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature 492, 423–427 (2012).

Wendel, J.F. New World tetraploid cottons contain Old World cytoplasm. Proc. Natl. Acad. Sci. USA 86, 4132–4136 (1989).

International Wheat Genome Sequencing Consortium. A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science 345, 1251788 (2014).

Wang, X. et al. The genome of the mesopolyploid crop species Brassica rapa. Nat. Genet. 43, 1035–1039 (2011).

Liu, S. et al. The Brassica oleracea genome reveals the asymmetrical evolution of polyploid genomes. Nat. Commun. 5, 3930 (2014).

Chalhoub, B. et al. Early allopolyploid evolution in the post-Neolithic Brassica napus oilseed genome. Science 345, 950 (2014).

Sierro, N. et al. The tobacco genome sequence and its comparison with those of tomato and potato. Nat. Commun. 5, 3833 (2014).

Kohel, R., Richmond, T. & Lewis, C. Texas marker-1. Description of a genetic standard for Gossypium hirsutum L. Crop Sci. 10, 670–671 (1970).

Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).

Guo, W. et al. A preliminary analysis of genome structure and composition in Gossypium hirsutum. BMC Genomics 9, 314 (2008).

Wang, K. et al. Localization of high level of sequence conservation and divergence regions in cotton. Theor. Appl. Genet. 124, 1173–1182 (2012).

Wang, K. et al. Structure and size variations between 12A and 12D homoeologous chromosomes based on high-resolution cytogenetic map in allotetraploid cotton. Chromosoma 119, 255–266 (2010).

Jin, J., Zhang, H., Kong, L., Gao, G. & Luo, J. PlantTFDB 3.0: a portal for the functional and evolutionary study of plant transcription factors. Nucleic Acids Res. 42, D1182–D1187 (2014).

Brenchley, R. et al. Analysis of the bread wheat genome using whole-genome shotgun sequencing. Nature 491, 705–710 (2012).

Otto, S.P. The evolutionary consequences of polyploidy. Cell 131, 452–462 (2007).

Soltis, P.S. & Soltis, D.E. The role of hybridization in plant speciation. Annu. Rev. Plant Biol. 60, 561–588 (2009).

Woodhouse, M.R. et al. Following tetraploidy in maize, a short deletion mechanism removed genes preferentially from one of the two homeologs. PLoS Biol. 8, e1000409 (2010).

Feldman, M. et al. Rapid elimination of low-copy DNA sequences in polyploid wheat: a possible mechanism for differentiation of homoeologous chromosomes. Genetics 147, 1381–1387 (1997).

Gaeta, R.T. et al. Genomic changes in resynthesized Brassica napus and their effect on gene expression and phenotype. Plant Cell 19, 3403–3417 (2007).

Buggs, R.J. et al. Rapid, repeated, and clustered loss of duplicate genes in allopolyploid plant populations of independent origin. Curr. Biol. 22, 248–252 (2012).

Liu, B. et al. Polyploid formation in cotton is not accompanied by rapid genomic changes. Genome 44, 321–330 (2001).

Wang, J. et al. Genomewide nonadditive gene regulation in Arabidopsis allotetraploids. Genetics 172, 507–517 (2006).

Endrizzi, J., Turcotte, E. & Kohel, R. Genetics, cytology and evolution of Gossypium. Adv. Genet. 23, 271–375 (1985).

Motamayor, J.C. et al. The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color. Genome Biol. 14, R53 (2013).

Li, F. et al. Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution. Nat. Biotechnol. doi: 10.1038/nbt.3208 (20 April 2015).

Gan, X. et al. Multiple reference genomes and transcriptomes for Arabidopsis thaliana. Nature 477, 419–423 (2011).

Grover, C.E. et al. Homoeolog expression bias and expression level dominance in allopolyploids. New Phytol. 196, 966–971 (2012).

Leach, L.J. et al. Patterns of homoeologous gene expression shown by RNA sequencing in hexaploid bread wheat. BMC Genomics 15, 276 (2014).

Cheng, F. et al. Biased gene fractionation and dominant gene expression among the subgenomes of Brassica rapa. PLoS ONE 7, e36442 (2012).

Kagale, S. et al. The emerging biofuel crop Camelina sativa retains a highly undifferentiated hexaploid genome structure. Nat. Commun. 5, 3706 (2014).

Yoo, M.-J. & Wendel, J.F. Comparative evolutionary and development dynamics of the cotton (Gossypium hirsutum) fibre transcriptome. PLoS Genet. 10, e1004073 (2014).

Adams, K.L., Cronn, R., Percifield, R. & Wendel, J.F. Genes duplicated by polyploidy show unequal contributions to the transcriptome and organ-specific reciprocal silencing. Proc. Natl. Acad. Sci. USA 100, 4649–4654 (2003).

Guan, X., Song, Q. & Chen, Z.J. Polyploidy and small RNA regulation of cotton fibre development. Trends Plant Sci. 19, 516–528 (2014).

Walford, S.A., Wu, Y., Llewellyn, D.J. & Dennis, E.S. GhMYB25-like: a key factor in early cotton fibre development. Plant J. 65, 785–797 (2011).

Wang, S. et al. Control of plant trichome development by a cotton fibre MYB gene. Plant Cell 16, 2323–2334 (2004).

Yang, Z. & Nielsen, R. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19, 908–917 (2002).

Qin, Y.M. & Zhu, Y.X. How cotton fibres elongate: a tale of linear cell-growth mode. Curr. Opin. Plant Biol. 14, 106–111 (2011).

Haigler, C.H., Betancur, L., Stiff, M.R. & Tuttle, J.R. Cotton fibre: a powerful single-cell model for cell wall and cellulose research. Front. Plant Sci. 3, 104 (2012).

Gou, J.Y., Wang, L.J., Chen, S.P., Hu, W.L. & Chen, X.Y. Gene expression and metabolite profiles of cotton fibre during cell elongation and secondary cell wall synthesis. Cell Res. 17, 422–434 (2007).

Li, X.R., Wang, L. & Ruan, Y.L. Developmental and molecular physiological evidence for the role of phosphoenolpyruvate carboxylase in rapid cotton fibre elongation. J. Exp. Bot. 61, 287–295 (2010).

Broché, M. et al. Transcriptomics and functional genomics of ROS-induced cell death regulation by RADICAL-INDUCED CELL DEATH1. PLoS Genet. 10, e1004112 (2014).

Rodrigues, S.M. et al. Arabidopsis and tobacco plants ectopically expressing the soybean antiquitin-like ALDH7 gene display enhanced tolerance to drought, salinity, and oxidative stress. J. Exp. Bot. 57, 1909–1918 (2006).

Puranik, S. et al. NAC proteins: regulation and role in stress tolerance. Trends Plant Sci. 17, 369–381 (2012).

Turcotte, E. & Feaster, C.V. Semigametic production of haploids in Pima cotton. Crop Sci. 9, 653–655 (1969).

Kim, E.B. et al. Genome sequencing reveals insights into physiology and longevity of the naked mole rat. Nature 479, 223–227 (2011).

Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

Voorrips, R.E. MapChart: software for the graphical presentation of linkage maps and QTLs. J. Hered. 93, 77–78 (2002).

Ooijen, J.W. & Voorrips, R.E. JoinMap® version 3.0: software for the calculation of genetic linkage maps. Plant Res. Inter. (2001).

Kent, W.J., Baertsh, R., Hinrichs, A., Miller, W. & Haussler, D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc. Natl. Acad. Sci. USA 100, 11484–11489 (2003).

Goodstein, D.M. et al. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 40, D1178–D1186 (2012).

Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

Yu, X.J. et al. Detecting lineage-specific adaptive evolution of brain-expressed genes in human using rhesus macaque as outgroup. Genomics 88, 745–751 (2006).

Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988–995 (2004).

Stanke, M., Schoffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 7, 62 (2006).

Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).

Majoros, W.H., Pertea, M. & Salzberg, S.L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).

Guigó, R. Assembling genes from predicted exons in linear time with dynamic programming. J. Comput. Biol. 5, 681–702 (1998).

Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).

Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).

Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).

Haas, B.J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).

Punta, M. et al. The Pfam protein families database. Nucleic Acids Res. 40, D290–D301 (2012).

Hunter, S. et al. InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res. 40, D306–D312 (2012).

Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999. Nucleic Acids Res. 27, 49–54 (1999).

Finn, R.D., Clements, J. & Eddy, S.R. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, W29–W37 (2011).

Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).

Kanehisa, M. et al. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res. 42, D199–D205 (2014).

Lowe, T.M. & Eddy, S.R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).

Griffiths-Jones, S. et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33, D121–D124 (2005).

Nawrocki, E.P., Kolbe, D.L. & Eddy, S.R. Infernal 1.0: inference of RNA alignments. Bioinformatics 25, 1335–1337 (2009).

Nussbaumer, T. et al. MIPS PlantsDB: a database framework for comparative plant genome research. Nucleic Acids Res. 41, D1144–D1151 (2013).

Senchina, D.S. et al. Rate variation among nuclear genes and the age of polyploidy in Gossypium. Mol. Biol. Evol. 20, 633–643 (2003).

Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).

Tamura, K., Stecher, G., Peterson, D., Filipski, A. & Kumar, S. MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol. Biol. Evol. 30, 2725–2729 (2013).

Zhang, J., Nielsen, R. & Yang, Z. Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol. Biol. Evol. 22, 2472–2479 (2005).

Wang, L., Feng, Z., Wang, X., Wang, X. & Zhang, X. DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics 26, 136–138 (2010).

Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate - a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. B. Met. 57, 289–300 (1995).

Thimm, O. et al. MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. Plant J. 37, 914–939 (2004).

Conesa, A. et al. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21, 3674–3676 (2005).