Analysis and Exploration of the Use of Rule-Based Algorithms and Consensus Methods for the Inferral of Haplotypes

Genetics - Tập 165 Số 2 - Trang 915-928 - 2003
Steven Hecht Orzack1, Daniel Gusfield2, Jeffrey J. Olson3, Steven Nesbitt3, Lakshman Subrahmanyan3, Vincent P. Stanton3
1* Fresh Pond Research Institute, Cambridge, Massachusetts 02140
2Department of Computer Science, University of California, Davis, California 95616
3Variagenics, Cambridge, Massachusetts 02139

Tóm tắt

Abstract The difficulty of experimental determination of haplotypes from phase-unknown genotypes has stimulated the development of nonexperimental inferral methods. One well-known approach for a group of unrelated individuals involves using the trivially deducible haplotypes (those found in individuals with zero or one heterozygous sites) and a set of rules to infer the haplotypes underlying ambiguous genotypes (those with two or more heterozygous sites). Neither the manner in which this “rule-based” approach should be implemented nor the accuracy of this approach has been adequately assessed. We implemented eight variations of this approach that differed in how a reference list of haplotypes was derived and in the rules for the analysis of ambiguous genotypes. We assessed the accuracy of these variations by comparing predicted and experimentally determined haplotypes involving nine polymorphic sites in the human apolipoprotein E (APOE) locus. The eight variations resulted in substantial differences in the average number of correctly inferred haplotype pairs. More than one set of inferred haplotype pairs was found for each of the variations we analyzed, implying that the rule-based approach is not sufficient by itself for haplotype inferral, despite its appealing simplicity. Accordingly, we explored consensus methods in which multiple inferrals for a given ambiguous genotype are combined to generate a single inferral; we show that the set of these “consensus” inferrals for all ambiguous genotypes is more accurate than the typical single set of inferrals chosen at random. We also use a consensus prediction to divide ambiguous genotypes into those whose algorithmic inferral is certain or almost certain and those whose less certain inferral makes molecular inferral preferable.

Từ khóa


Tài liệu tham khảo

Artiga, 1998, Allelic polymorphisms in the transcriptional regulatory region of apolipoprotein E gene, FEBS Lett., 421, 105, 10.1016/S0014-5793(97)01543-3

Bullido, 1998, A polymorphism in the regulatory region of APOE associated with risk for Alzheimer’s dementia, Nat. Genet., 18, 69, 10.1038/ng0198-69

Chakravarti, 1999, Population genetics—making sense out of sequence, Nat. Genet., 21, 56, 10.1038/4482

Clark, 1990, Inference of haplotypes from PCR-amplified samples of diploid populations, Mol. Biol. Evol., 7, 111

Dracopoli, 2001, Current Protocols in Human Genetics

Drysdale, 2000, Complex promotor and coding region β2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness, Proc. Natl. Acad. Sci. USA, 97, 10483, 10.1073/pnas.97.19.10483

Excoffier, 1995, Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population, Mol. Biol. Evol., 12, 921

Fallin, 2000, Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data, Am. J. Hum. Genet., 67, 947, 10.1086/303069

Fullerton, 2000, Apolipoprotein E variation at the sequence haplotype level: implications for the origin and maintenance of a major human polymorphism, Am. J. Hum. Genet., 67, 881, 10.1086/303070

Gusfield, 2001, Inference of haplotypes from samples of diploid populations: complexity and algorithms, J. Comput. Biol., 8, 305, 10.1089/10665270152530863

Hartman, 2001, Principles for the buffering of genetic variation, Science, 291, 1001, 10.1126/science.1056072

Hawley, 1995, HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes, J. Hered., 86, 409, 10.1093/oxfordjournals.jhered.a111613

Judson, 2000, The predictive power of haplotypes in clinical response, Pharmacogenomics, 1, 15, 10.1517/14622416.1.1.15

Lin, 2002, Haplotype inference in random population samples, Am. J. Hum. Genet., 71, 1129, 10.1086/344347

Long, 1995, An E-M algorithm and testing strategy for multiple-locus haplotypes, Am. J. Hum. Genet., 56, 799

Michalatos-Beloin, 1996, Molecular haplotyping of genetic markers 10 kb apart by allele-specific long-range PCR, Nucleic Acids Res., 24, 4841, 10.1093/nar/24.23.4841

Nickerson, 1997, Polyphred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing, Nucleic Acids Res., 25, 2745, 10.1093/nar/25.14.2745

Nickerson, 2000, Sequence diversity and large-scale typing of SNPs in the human apolipoprotein E gene, Genome Res., 10, 1532, 10.1101/gr.146900

Niu, 2002, Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms, Am. J. Hum. Genet., 70, 157, 10.1086/338446

Pritchard, 2001, Are rare variants responsible for susceptibility to complex diseases?, Am. J. Hum. Genet., 69, 124, 10.1086/321272

Reich, 2001, Linkage disequilibrium in the human genome, Nature, 411, 199, 10.1038/35075590

Science Citation Index, 2003, ISI Web of Knowledge

Stephens, 2001, A new statistical method for haplotype reconstruction from population data, Am. J. Hum. Genet., 68, 978, 10.1086/319501

Stephens, 2001, Reply to Zhang et al, Am. J. Hum. Genet., 69, 912, 10.1086/323623

Stephens M , SmithN J, DonnellyP, 2002  Documentation for PHASE, version 1.0 (http://www.stat.washington.edu/stephens/phase.html).

Templeton, 1988, A cladistic analysis of phenotype associations with haplotypes inferred from restriction endonuclease mapping. II. The analysis of natural populations, Genetics, 120, 1145, 10.1093/genetics/120.4.1145

Xu, 2002, Effectiveness of computational methods in haplotype prediction, Hum. Genet., 110, 148, 10.1007/s00439-001-0656-4

Zhu, 2001, Localization of a small genomic region associated with elevated ACE, Am. J. Hum. Genet., 67, 1144, 10.1016/S0002-9297(07)62945-0