Whole Genome Analyses of Chinese Population and De Novo Assembly of A Northern Han Genome

Genomics, Proteomics & Bioinformatics - Tập 17 - Trang 229-247 - 2019
Zhenglin Du1,2, Liang Ma1,3, Hongzhu Qu1,4, Wei Chen1,3, Bing Zhang1, Xi Lu1, Weibo Zhai1, Xin Sheng1,2, Yongqiao Sun1, Wenjie Li1, Meng Lei1, Qiuhui Qi1, Na Yuan1,2, Shuo Shi1,2, Jingyao Zeng1,2, Jinyue Wang1,2, Yadong Yang1,4, Qi Liu1,3, Yaqiang Hong1,3, Lili Dong1,2
1Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
2BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
3CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
4CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China

Tài liệu tham khảo

Lander, 2001, Initial sequencing and analysis of the human genome, Nature, 409, 860, 10.1038/35057062 Wang, 2008, The diploid genome sequence of an Asian individual, Nature, 456, 60, 10.1038/nature07484 Seo, 2016, De novo assembly and phasing of a Korean human genome, Nature, 538, 243, 10.1038/nature20098 Mostovoy, 2016, A hybrid approach for de novo human genome sequence assembly and phasing, Nat Methods, 13, 587, 10.1038/nmeth.3865 Cao, 2015, De novo assembly of a haplotype-resolved human genome, Nat Biotechnol, 33, 617, 10.1038/nbt.3200 Shi, 2016, Long-read sequencing and de novo assembly of a Chinese genome, Nat Commun, 7, 12065, 10.1038/ncomms12065 Li H. The regional differences in the population density and economic density of construction land in China and its convergence analysis. South China Population 2012. Yap, 2010, Metabolome-wide association study identifies multiple biomarkers that discriminate north and south Chinese populations at differing risks of cardiovascular disease: INTERMAP study, J Proteome Res, 9, 6647, 10.1021/pr100798r Zhao, 2015, Ancient DNA reveals that the genetic structure of the northern Han Chinese was shaped prior to 3000 years ago, PLoS One, 10 Xu, 2009, Genomic dissection of population substructure of Han Chinese and its implication in association studies, Am J Hum Genet, 85, 762, 10.1016/j.ajhg.2009.10.015 Goldstein, 2013, Sequencing studies in human genetics: design and interpretation, Nat Rev Genet, 14, 460, 10.1038/nrg3455 Weischenfeldt, 2013, Phenotypic impact of genomic structural variation: insights from and for human disease, Nat Rev Genet, 14, 125, 10.1038/nrg3373 International HapMap, 2005, A haplotype map of the human genome, Nature, 437, 1299, 10.1038/nature04226 Genomes Project, 2012, An integrated map of genetic variation from 1092 human genomes., Nature, 491, 56, 10.1038/nature11632 MacArthur, 2017, The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog), Nucleic Acids Res, 45, D896, 10.1093/nar/gkw1133 Maretty, 2017, Sequencing and de novo assembly of 150 genomes from Denmark as a population reference, Nature, 548, 87, 10.1038/nature23264 Hehir-Kwa, 2016, A high-quality human reference panel reveals the complexity and distribution of genomic structural variants, Nat Commun, 7, 12989, 10.1038/ncomms12989 Gudbjartsson, 2015, Large-scale whole-genome sequencing of the Icelandic population, Nat Genet, 47, 435, 10.1038/ng.3247 Consortium, 2015, The UK10K project identifies rare variants in health and disease, Nature, 526, 82, 10.1038/nature14962 Nagasaki, 2015, Rare variant discovery by deep whole-genome sequencing of 1,070 Japanese individuals, Nat Commun, 6, 8018, 10.1038/ncomms9018 Erikson, 2016, Whole-genome sequencing of a healthy aging cohort, Cell, 165, 1002, 10.1016/j.cell.2016.03.022 Chen, 2009, Genetic structure of the Han Chinese population revealed by genome-wide SNP variation, Am J Hum Genet, 85, 775, 10.1016/j.ajhg.2009.10.016 Chiang, 2017, A comprehensive map of genetic variation in the world's largest ethnic group – Han Chinese, Carbohydr Polym, 75, 104 Lan, 2017, Deep whole-genome sequencing of 90 Han Chinese genomes, GigaScience, 6, 1, 10.1093/gigascience/gix067 Koren, 2017, Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation, Genome Res, 27, 722, 10.1101/gr.215087.116 Weisenfeld, 2017, Direct determination of diploid genome sequences, Genome Res, 27, 757, 10.1101/gr.214874.116 de Jesus Ascencio-Montiel, 2017, Characterization of large copy number variation in Mexican Type 2 diabetes subjects, Sci Rep, 7, 17105, 10.1038/s41598-017-17361-7 Finn, 2017, InterPro in 2017-beyond protein family and domain annotations, Nucleic Acids Res, 45, D190, 10.1093/nar/gkw1107 McKenna, 2010, The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data, Genome Res, 20, 1297, 10.1101/gr.107524.110 Gu, 2005, Prevalence of the metabolic syndrome and overweight among adults in China, Lancet, 365, 1398, 10.1016/S0140-6736(05)66375-1 Huang, 1994, A north-south comparison of blood pressure and factors related to blood pressure in the People's Republic of China: a report from the PRC-USA Collaborative Study of Cardiovascular Epidemiology, J Hypertens, 12, 1103, 10.1097/00004872-199409000-00017 Davis, 2018, The Encyclopedia of DNA elements (ENCODE): data portal update, Nucleic Acids Res, 46, D794, 10.1093/nar/gkx1081 Thurman, 2012, The accessible chromatin landscape of the human genome, Nature, 489, 75, 10.1038/nature11232 Pérezmancera, 2007, Fat-specific FUS-DDIT3-transgenic mice establish PPARgamma inactivation is required to liposarcoma development, Carcinogenesis, 28, 2069, 10.1093/carcin/bgm107 Huang, 2016, Novel mutations in the 3β-hydroxy-Δ5-C27-steroid dehydrogenase gene (HSD3B7) in a patient with neonatal cholestasis, Chin Med J (Eng), 129, 98, 10.4103/0366-6999.172603 Li, 2012, Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation, Cell, 148, 84, 10.1016/j.cell.2011.12.014 Consortium, 2017, Genetic effects on gene expression across human tissues, Nature, 550, 204, 10.1038/nature24277 Pique-Regi, 2011, Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data, Genome Res, 21, 447, 10.1101/gr.112623.110 Schaeffer, 2006, Common genetic variants of the FADS1 FADS2 gene cluster and their reconstructed haplotypes are associated with the fatty acid composition in phospholipids, Hum Mol Genet, 15, 1745, 10.1093/hmg/ddl117 Andiappan, 2015, Genome-wide analysis of the genetic regulation of gene expression in human neutrophils, Nat Commun, 6, 7971, 10.1038/ncomms8971 Frosst, 1995, A candidate genetic risk factor for vascular disease: a common mutation in methylenetetrahydrofolate reductase, Nat Genet, 10, 111, 10.1038/ng0595-111 Yang, 2013, Geographical distribution of MTHFR C677T, A1298C and MTRR A66G gene polymorphisms in China: findings from 15357 adults of Han nationality, PLoS One, 8 Alexandrov, 2013, Signatures of mutational processes in human cancer, Nature, 500, 415, 10.1038/nature12477 Mathieson, 2017, Differences in the rare variant spectrum among human populations, PLoS Genet, 13, 10.1371/journal.pgen.1006581 Raheleh, 2016, Timing, rates and spectra of human germline mutation, Nat Genet, 48, 126, 10.1038/ng.3469 Alexandrov, 2015, Clock-like mutational processes in human somatic cells, Nat Genet, 47, 1402, 10.1038/ng.3441 Emigh, 1979, Fixation probabilities and effective population numbers in diploid populations with overlapping generations, Theor Popul Biol, 15, 86, 10.1016/0040-5809(79)90028-5 Merisalu, 2007, The contribution of genetic variations of aryl hydrocarbon receptor pathway genes to male factor infertility, Fertil Steril, 88, 854, 10.1016/j.fertnstert.2006.12.041 Liu, 2015, Exploring population size changes using SNP frequency spectra, Nat Genet, 47, 555, 10.1038/ng.3254 Ye, 2009, Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads, Bioinformatics, 25, 2865, 10.1093/bioinformatics/btp394 Wang, 2011, CREST maps somatic structural variation in cancer genomes with base-pair resolution, Nat Methods, 8, 652, 10.1038/nmeth.1628 Valentina, 2012, Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data, Bioinformatics, 28, 423, 10.1093/bioinformatics/btr670 Macdonald, 2014, The database of genomic variants: a curated collection of structural variation in the human genome, Nucleic Acids Res, 42, D986, 10.1093/nar/gkt958 Ilkka, 2013, DbVar and DGVa: public archives for genomic structural variation, Nucleic Acids Res, 41, D936 Berry, 1999, Prevention of neural-tube defects with folic acid in China. China-U.S. Collaborative Project for Neural Tube Defect Prevention, N Engl J Med, 341, 1864, 10.1056/NEJM199911113412001 Yang, 2014, Prevalence of hyperhomocysteinemia in China: a systematic review and meta-analysis, Nutrients, 7, 74, 10.3390/nu7010074 Wang, 2016, Geographical and ethnic distributions of the MTHFR C677T, A1298C and MTRR A66G gene polymorphisms in Chinese populations: a meta-analysis, PLoS One, 11 Yang, 2016, Geographical and ethnic distribution of MTHFR gene polymorphisms and their associations with diseases among Chinese population, Clin Genet, 92 Friso, 2002, A common mutation in the 5,10-methylenetetrahydrofolate reductase gene affects genomic DNA methylation through an interaction with folate status, Proc Natl Acad Sci U S A, 99, 5606, 10.1073/pnas.062066299 Wang, 2012, Is the prevalence of MTHFR C677T polymorphism associated with ultraviolet radiation in Eurasia, J Hum Genet, 57, 780, 10.1038/jhg.2012.113 Hao, 2003, Geographical, seasonal and gender differences in folate status among Chinese adults, J Nutr, 133, 3630, 10.1093/jn/133.11.3630 Wilcken, 2003, Geographical and ethnic variation of the 677C>T allele of 5,10 methylenetetrahydrofolate reductase (MTHFR): findings from over 7000 newborns from 16 areas world wide, J Med Genet, 40, 619, 10.1136/jmg.40.8.619 He, 2017, Prevalence of overweight and obesity in 15.8 million men aged 15–49 years in rural China from 2010 to 2014, Sci Rep, 7, 5012, 10.1038/s41598-017-04135-4 Xu, 2013, Gender differences in the prevalence and development of metabolic syndrome in Chinese population with abdominal obesity, PLoS One, 8 Shungin, 2015, New genetic loci link adipose and insulin biology to body fat distribution, Nature, 518, 187, 10.1038/nature14132 Consultation WE. Waist circumference and waist-hip ratio: report of a WHO expert consultation, Geneva, 8–11 December 2008. Hum Resour Health 2011. Marcais, 2018, MUMmer4: a fast and versatile genome alignment system, PLoS Comput Biol, 14, 10.1371/journal.pcbi.1005944 Shelton, 2015, Tools and pipelines for BioNano data: molecule assembly pipeline and FASTA super scaffolding tool, BMC Genomics, 16, 734, 10.1186/s12864-015-1911-8 Luo, 2012, SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler, GigaScience, 1, 18, 10.1186/2047-217X-1-18 Li, 2009, Fast and accurate short read alignment with Burrows-Wheeler transform, Bioinformatics, 25, 1754, 10.1093/bioinformatics/btp324 Harris RS. Improved pairwise alignment of genomic DNA. A Thesis in Computer Science and Engineering 2017. The Pennsylvania State University. Li, 2011, Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly, Nat Biotechnol, 29, 723, 10.1038/nbt.1904 Wang, 2010, ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data, Nucleic Acids Res, 38, 10.1093/nar/gkq603 Wu, 2006, KOBAS server: a web-based platform for automated annotation and pathway identification, Nucleic Acids Res, 34, W720, 10.1093/nar/gkl167 Boyle, 2012, Annotation of functional variation in personal genomes using RegulomeDB, Genome Res, 22, 1790, 10.1101/gr.137323.112 Weir, 1984, Estimating F-statistics for the analysis of population structure, Evolution, 38, 1358 Shaun, 2007, PLINK: a tool set for whole-genome association and population-based linkage analyses, Am J Hum Genet, 81, 559, 10.1086/519795 Wang, 2017, GSA: genome sequence archive, Genomics Proteomics Bioinformatics, 15, 14, 10.1016/j.gpb.2017.01.001