CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing

Genome Research - Tập 21 Số 6 - Trang 974-984 - 2011
Alexej Abyzov1,2, Alexander E. Urban3,4, M Snyder3, Mark Gerstein5,1,2
1Department of Molecular Biophysics and Biochemistry Yale University, New Haven, Connecticut 06520, USA
2Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
3Department of Genetics, Stanford University, Stanford, California, 94305, USA
4Department of Psychiatry and Behavioral Sciences, School of Medicine, Stanford University, Stanford, California 94305, USA
5Department of Computer Science, Yale University, New Haven, Connecticut 06520, USA

Tóm tắt

Copy number variation (CNV) in the genome is a complex phenomenon, and not completely understood. We have developed a method, CNVnator, for CNV discovery and genotyping from read-depth (RD) analysis of personal genome sequencing. Our method is based on combining the established mean-shift approach with additional refinements (multiple-bandwidth partitioning and GC correction) to broaden the range of discovered CNVs. We calibrated CNVnator using the extensive validation performed by the 1000 Genomes Project. Because of this, we could use CNVnator for CNV discovery and genotyping in a population and characterization of atypical CNVs, such as de novo and multi-allelic events. Overall, for CNVs accessible by RD, CNVnator has high sensitivity (86%–96%), low false-discovery rate (3%–20%), high genotyping accuracy (93%–95%), and high resolution in breakpoint discovery (<200 bp in 90% of cases with high sequencing coverage). Furthermore, CNVnator is complementary in a straightforward way to split-read and read-pair approaches: It misses CNVs created by retrotransposable elements, but more than half of the validated CNVs that it identifies are not detected by split-read or read-pair. By genotyping CNVs in the CEPH, Yoruba, and Chinese-Japanese populations, we estimated that at least 11% of all CNV loci involve complex, multi-allelic events, a considerably higher estimate than reported earlier. Moreover, among these events, we observed cases with allele distribution strongly deviating from Hardy-Weinberg equilibrium, possibly implying selection on certain complex loci. Finally, by combining discovery and genotyping, we identified six potential de novo CNVs in two family trios.

Từ khóa


Tài liệu tham khảo

10.1093/bioinformatics/btq713

10.1038/ng.437

10.1038/nature07517

10.1038/ng.128

10.1038/ng2028

10.1038/nmeth.1276

10.1109/34.1000236

10.1038/nature08516

10.1038/nature09534

2006, Structural variation in the human genome, Nat Rev Genet, 7, 85, 10.1038/nrg1767

1993, Acute promyelocytic leukemia: Clinical relevance of two major PML-RAR alpha isoforms and detection of minimal residual disease by retrotranscriptase/polymerase chain reaction to predict relapse, Blood, 82, 1264, 10.1182/blood.V82.4.1264.1264

10.1038/nature06862

10.1126/science.1149504

10.1186/gb-2009-10-2-r23

10.1186/gb-2009-10-3-r25

Leary RJ , Kinde I , Diehl F , Schmidt K , Clouser C , Duncan C , Antipova A , Lee C , McKernan K , De La Vega FM , . 2010. Development of personalized tumor biomarkers using massively parallel sequencing. Sci Transl Med 2, 20ra14. doi: 10.1126/scitranslmed.3000702.

10.1101/gr.078212.108

10.1093/bioinformatics/btn025

10.1038/ng2084

10.1186/gb-2007-8-10-r228

10.1038/ng.238

10.1101/gr.106344.110

10.1038/nature09708

10.1101/gr.4565806

10.1038/ng.555

10.1038/nbt.1518

10.1146/annurev.genom.7.080505.115618

10.1136/jmg.2009.072983

Wand MP , Jones MC . 1995. Kernel smoothing 1st ed Chapman & Hall, New York.

10.1101/gr.080069.108

10.1101/gr.092981.109