Genetic Epidemiology
Công bố khoa học tiêu biểu
* Dữ liệu chỉ mang tính chất tham khảo
Các nghiên cứu liên kết toàn bộ hệ gen (GWAS) có thể xác định các alen phổ biến có đóng góp vào sự nhạy cảm với các bệnh phức tạp. Mặc dù số lượng lớn SNPs được đánh giá trong mỗi nghiên cứu, tác động của phần lớn các SNP phổ biến phải được đánh giá gián tiếp bằng cách sử dụng các dấu hiệu đã được genotyped hoặc các haplotype của chúng làm đại diện. Chúng tôi đã triển khai một khung Markov Chain hiệu quả về mặt tính toán cho việc ước tính kiểu gen và haplotyping trong gói phần mềm MaCH miễn phí có sẵn. Phương pháp tiếp cận này mô tả các nhiễm sắc thể mẫu như những hình khảm của nhau và sử dụng dữ liệu kiểu gen hiện có và chuỗi shotgun để ước tính các kiểu gen và haplotype chưa quan sát, cùng với các thước đo hữu ích về chất lượng của những ước tính này. Phương pháp của chúng tôi đã được sử dụng rộng rãi để tạo điều kiện so sánh kết quả giữa các nghiên cứu cũng như phân tích tổng hợp GWAS. Tại đây, chúng tôi sử dụng các mô phỏng và kiểu gen thực nghiệm để đánh giá độ chính xác và tính hữu ích của nó, xem xét các lựa chọn bảng genotyping, cấu hình bảng tham chiếu và các thiết kế genotyping được thay bằng chuỗi shotgun. Điều quan trọng, chúng tôi cho thấy ước tính kiểu gen không chỉ tạo điều kiện cho phân tích giữa các nghiên cứu mà còn tăng công suất của các nghiên cứu liên kết di truyền. Chúng tôi cho rằng việc ước tính kiểu gen các biến thể phổ biến bằng cách sử dụng haplotypes HapMap làm tham chiếu là rất chính xác khi sử dụng dữ liệu SNP toàn bộ hệ gen hoặc số lượng nhỏ dữ liệu điển hình trong các nghiên cứu phác thảo chi tiết hơn. Hơn nữa, chúng tôi cho thấy phương pháp này có thể áp dụng trong nhiều quần thể khác nhau. Cuối cùng, chúng tôi minh họa làm thế nào phân tích liên kết các biến thể chưa quan sát sẽ được hưởng lợi từ những tiến bộ hiện tại như các bảng tham chiếu HapMap lớn hơn và công nghệ chuỗi shotgun toàn bộ hệ gen.
Mutations in the gene encoding interferon regulatory factor 6 (
Statistical methods for haplotype inference from multi‐site genotypes of unrelated individuals have important application in association studies and population genetics. Understanding the factors that affect the accuracy of this inference is important, but their assessment has been restricted by the limited availability of biological data with known phase. We created hybrid cell lines monosomic for human chromosome 19 and produced single‐chromosome complete sequences of a 48 kb genomic region in 39 individuals of African American (AA) and European American (EA) origin. We employ these phase‐known genotypes and coalescent simulations to assess the accuracy of statistical haplotype reconstruction by several algorithms. Accuracy of phase inference was considerably low in our biological data even for regions as short as 25–50 kb, suggesting that caution is needed when analyzing reconstructed haplotypes. Moreover, the reliability of estimated confidence in phase inference is not high enough to allow for a reliable incorporation of site‐specific uncertainty information in subsequent analyses. We show that, in samples of certain mixed ancestry (AA and EA populations), the most accurate haplotypes are probably obtained when increasing sample size by considering the largest, pooled sample, despite the hypothetical problems associated with pooling across those heterogeneous samples. Strategies to improve confidence in reconstructed haplotypes, and realistic alternatives to the analysis of inferred haplotypes, are discussed.
The HLA DR genotype frequencies in insulin‐dependent diabetes mellitus (IDDM) patients and the frequencies of DR alleles transmitted from affected parent to affected child both indicate that the DR3‐associated predisposition is more “recessive” and the DR4‐associated predisposition more “dominant” in inheritance after allowing for the DR3/DR4 synergistic effect. B locus distributions on patient haplotypes indicate that only subsets of both DR3 and DR4 are predisposing Haterogeneity is detected for both the DR3 and DR4 predisposing haplotypes based on DR genotypic class. With appropriate use of the family structure of the data a control population of “unaffected” alleles can be defined. Application of this method confirms the predisposing effect associated with the class 1 allele of the polymorphic region 5′ to the insulin gene.
In the haplotype relative risk (HRR) statistic (Rubinstein et al.:
The genome of an admixed individual represents a mixture of alleles from different ancestries. In the United States, the two largest minority groups, African‐Americans and Hispanics, are both admixed. An understanding of the admixture proportion at an individual level (individual admixture, or IA) is valuable for both population geneticists and epidemiologists who conduct case‐control association studies in these groups. Here we present an extension of a previously described frequentist (maximum likelihood or ML) approach to estimate individual admixture that allows for uncertainty in ancestral allele frequencies. We compare this approach both to prior partial likelihood based methods as well as more recently described Bayesian MCMC methods. Our full ML method demonstrates increased robustness when compared to an existing partial ML approach. Simulations also suggest that this frequentist estimator achieves similar efficiency, measured by the mean squared error criterion, as Bayesian methods but requires just a fraction of the computational time to produce point estimates, allowing for extensive analysis (e.g., simulations) not possible by Bayesian methods. Our simulation results demonstrate that inclusion of ancestral populations or their surrogates in the analysis is required by any method of IA estimation to obtain reasonable results. Genet. Epidemiol. © 2005 Wiley‐Liss, Inc.
The number of Mendelian randomization (MR) analyses including large numbers of genetic variants is rapidly increasing. This is due to the proliferation of genome‐wide association studies, and the desire to obtain more precise estimates of causal effects. Since it is unlikely that all genetic variants will be valid instrumental variables, several robust methods have been proposed. We compare nine robust methods for MR based on summary data that can be implemented using standard statistical software. Methods were compared in three ways: by reviewing their theoretical properties, in an extensive simulation study, and in an empirical example. In the simulation study, the best method, judged by mean squared error was the contamination mixture method. This method had well‐controlled Type 1 error rates with up to 50% invalid instruments across a range of scenarios. Other methods performed well according to different metrics. Outlier‐robust methods had the narrowest confidence intervals in the empirical example. With isolated exceptions, all methods performed badly when over 50% of the variants were invalid instruments. Our recommendation for investigators is to perform a variety of robust methods that operate in different ways and rely on different assumptions for valid inferences to assess the reliability of MR analyses.
Developments in genome‐wide association studies and the increasing availability of summary genetic association data have made application of Mendelian randomization relatively straightforward. However, obtaining reliable results from a Mendelian randomization investigation remains problematic, as the conventional inverse‐variance weighted method only gives consistent estimates if all of the genetic variants in the analysis are valid instrumental variables. We present a novel weighted median estimator for combining data on multiple genetic variants into a single causal estimate. This estimator is consistent even when up to 50% of the information comes from invalid instrumental variables. In a simulation analysis, it is shown to have better finite‐sample Type 1 error rates than the inverse‐variance weighted method, and is complementary to the recently proposed MR‐Egger (Mendelian randomization‐Egger) regression method. In analyses of the causal effects of low‐density lipoprotein cholesterol and high‐density lipoprotein cholesterol on coronary artery disease risk, the inverse‐variance weighted method suggests a causal effect of both lipid fractions, whereas the weighted median and MR‐Egger regression methods suggest a null effect of high‐density lipoprotein cholesterol that corresponds with the experimental evidence. Both median‐based and MR‐Egger regression methods should be considered as sensitivity analyses for Mendelian randomization investigations with multiple genetic variants.
Heterogeneity in determinants of familial resemblance of lipid and lipoprotein levels between populations in North America and Israel was investigated using path analysis. A common protocol, identical measurement techniques, and the same statistical procedures were used in the two samples. Both genetic (h2) and cultural (c2) determinants of inheritance were significant for all lipid variables in the two studies. Genetic and cultural heritability of total cholesterol (h2 = 0.61, c2 = 0.02), low‐density lipoprotein cholesterol (h2 = 0.59, c2 = 0.02), and high‐density lipoprotein cholesterol (h2 = 0.55, c2 = 0.06) did not differ significantly between North America and Israel, while there was a significant difference for triglyceride (h2 = 0.41, c2 = 0.07 in North America; h2 = 0.61, c2 = 0.05 in Israel). Secondary parameters of the path model describing intrafamilial environmental relationships differed between the two countries. In particular, there was a higher correlation between marital environments in Israel for all traits except triglyceride, and a larger effect of father's environment on offspring's environment in Israel for all traits. Within both populations, variation of plasma lipids and lipoproteins was mostly explained by genetic factors and random unmeasured environmental factors. The contribution of common family environment was found to be small, though statistically significant. This is probably due to homogeneity of the distribution of familian environmental determinants within both countries.
Association tests of multilocus haplotypes are of interest both in linkage disequilibrium mapping and in candidate gene studies. For case‐parent trios, I discuss the extension of existing multilocus methods to include ambiguous haplotypes in tests of models which distinguish between the
- 1
- 2
- 3