Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21
Tóm tắt
Global patterns of human DNA sequence variation (haplotypes) defined by common single nucleotide polymorphisms (SNPs) have important implications for identifying disease associations and human traits. We have used high-density oligonucleotide arrays, in combination with somatic cell genetics, to identify a large fraction of all common human chromosome 21 SNPs and to directly observe the haplotype structure defined by these SNPs. This structure reveals blocks of limited haplotype diversity in which more than 80% of a global human sample can typically be characterized by only three common haplotypes.
Từ khóa
Tài liệu tham khảo
G. R. Abecasis et al. Am. J. Hum. Genet. 68 191 (2001).
Eight unique oligonucleotides each 25 bases in length were used to interrogate each of the unique chromosome 21 bases for a total of 1.7 × 10 8 different oligonucleotides. These oligonucleotides were distributed over a total of eight different wafer designs using a previously described tiling strategy [
]. Light-directed chemical synthesis of oligonucleotides was carried out on 5 inch by 5 inch glass wafers by Affymetrix Inc. (Santa Clara CA).
LR-PCR assays were designed using Oligo 6.23 primer design software with high to moderate stringency parameters. The resulting primers were typically 30 nucleotides in length with the melting temperature of >65°C. The range of amplicon size was from 3 to 14 kb. A primer database for the entire chromosome was generated and custom software (pPicker; Perlegen Sciences Inc. Mountain View CA) was designed to choose a minimal set of nonredundant primers that yield maxium coverage of chromosome 21 sequence with a minimal overlap between adjacent amplicons. LR-PCR reactions were performed using the Expand Long Template PCR Kit (Roche Biosciences Palo Alto CA) with minor modifications.
LR-PCR targets were prepared as previously described with some modifications (1). For each wafer hybridization corresponding LR-PCR products were pooled and purified using Qiagen tip 500 (Qiagen Valencia CA). A total of 280 μg of purified DNA was fragmented using 37 μl of 10× One-Phor-All buffer PLUS (Promega Madison WI) and 1 unit of DNAase (Life Technolgies/Invitrogen Carlsbad CA) in 370 μl total volume at 37°C for 10 min which was then heat- inactivated at 99°C for 10 min. The fragmented products were end labeled using 500 units of Tdt (Boehringer Manheim) and 20 nmoles of biotin-N6-ddATP (DuPont NEN Boston MA) at 37°C for 90 min and heat inactivated at 95°C for 10 min. The labeled samples were hybridized to the wafers in 10 mM tris-HCL (pH 8) 3 M tetramethylammonium chloride 0.01% Tx-100 10 μg/ml denatured herring sperm DNA in a total volume of 14 ml per wafer at 50°C for 14 to 16 hours. The wafers were rinsed briefly in 4× SSPE washed three times in 6× SSPE for 10 min each and stained with streptavidin R-phycoerythrin (SAPE; 5 ng/ml) at room temperature for 10 min. The signal was amplified by staining with an antibody against streptavidin (1.25 ng/ml) and by repeating the staining step with SAPE. The wafers were scanned using a custom-built confocal scanner.
A combination of previously described algorithms (1) was used to detect SNPs based on altered hybridization patterns.
Consistent failure of LR-PCR in all samples analyzed accounts for 15% of the 35% false negative rate. The remaining 20% false negatives are distributed between bases that never yield high-quality data (10%) and bases that yield high-quality data in only a fraction of the 20 chromosomes analyzed (10%). In general it is the sequence context of a base that dictates whether or not it will yield high-quality data. Our finding that approximately 20% of all bases give consistently poor data is very similar to the finding that approximately 30% of bases in single dideoxy sequencing reads of 500 bases have quality scores too low for reliable SNP detection [
]. The power to discover rare SNPs as compared to more frequent SNPs is disporportionately reduced in cases where only a limited number of the samples analyzed yield high-quality data for a given base. As a result our SNP discovery is biased in favor of common SNPs.
D. L. Hartl A. G. Clark Principles of Population Genetics (Sinauer Sunderland MA 1997) pp. 57–60.
The International
. We compared the overlap of 15 549 chromosome 21 SNPs discovered by The SNP Consortium (TSC) with the SNPs found in this study. Of the TSC SNPs 5087 were found to be in repeated DNA and were not tiled on our wafers. Of the remaining 10462 TSC SNPs we identified 4705 (45%).
In the course of SNP discovery we identified 339 SNPs that appeared to have more than two alleles. These SNPs were not included in any analyses.
Excoffier L., Slatkin M., Mol. Biol. Evol. 12, 921 (1995).
Supplementary data delineating the precise boundaries of the SNP blocks described in this paper as well as the haplotypes identified for each block in the 20 chromosomes sampled are available at Science Online at www.sciencemag.org/cgi/content/full/294/5547/1719/DC1 and www.perlegen.com/haplotype.
D. Altschuler et al. Nature Genet. 26 76 (2000).
We thank B. Margus E. Rubin A. Chakravarti and E. Lander for helpful discussions and an anonymous reviewer for suggestions that significantly improved the manuscript.