Nền tảng MR-Base hỗ trợ suy diễn nguyên nhân một cách hệ thống trên toàn bộ biểu hiện ở người

eLife - Tập 7
Gibran Hemani1,2, Jie Zheng3,4,5,1,6,7,8, Benjamin Elsworth3,4,5,1,6,7,8, Kaitlin H. Wade3,4,5,1,6,7,8, Valeriia Haberland3,4,5,1,6,7,8, Denis Baird3,4,5,1,6,7,8, Charles Laurin3,4,5,1,6,7,8, Stephen Burgess9, Jack Bowden1,2, Ryan Langdon1,2, Vanessa Y. Tan1,2, James Yarmolinsky1,2, Hashem A. Shihab1,2, Nicholas J. Timpson1,2, David M. Evans1,2,10,11, Caroline L. Relton1,2, Richard M. Martin1,2, George Davey Smith3,4,5,1,6,7,8, Tom R. Gaunt1,2, Philip Haycock1,2
1Medical Research Council (MRC) Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
2National Institute for Health Research NIHR Bristol BRC
3Australian Research Council
4Cancer Research UK Population Research Postdoctoral Fellowship, C52724/A20138
5Castle Lung Cancer Foun-dation
6Medical Research Council Methodology Research Fellowship, MR/N501906/
7National Health and Medical Research Council.
8The Icahn School of Medicine at Mount Sinai, United States
9Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
10University of Queensland Diamantina Institute
11University of Queensland Diamantina Institute, Translational Research Institute, Brisbane, Australia

Tóm tắt

Những kết quả từ các nghiên cứu liên kết toàn bộ genome (GWAS) có thể được sử dụng để suy diễn các mối quan hệ nguyên nhân giữa các kiểu hình, bằng cách sử dụng một chiến lược được gọi là ngẫu nhiên Mendel hai mẫu (2SMR) và vượt qua nhu cầu dữ liệu cấp cá nhân. Tuy nhiên, các phương pháp 2SMR đang phát triển nhanh chóng và kết quả GWAS thường không được quản lý đầy đủ, làm giảm hiệu quả triển khai của phương pháp này. Do đó, chúng tôi đã phát triển MR-Base (http://www.mrbase.org): một nền tảng tích hợp cơ sở dữ liệu được biên soạn từ các kết quả GWAS hoàn chỉnh (không có hạn chế theo ý nghĩa thống kê) với một giao diện lập trình ứng dụng, ứng dụng web và các gói R tự động hóa 2SMR. Phần mềm bao gồm một số phân tích nhạy cảm để đánh giá tác động của sự đa hợp kém chiều ngang và các vi phạm giả định khác. Cơ sở dữ liệu hiện tại bao gồm 11 tỷ liên kết đa hình nucleotide đơn-đặc tính từ 1673 GWAS và được cập nhật thường xuyên. Việc tích hợp dữ liệu với phần mềm đảm bảo áp dụng nghiêm ngặt hơn các phân tích dựa trên giả thuyết và cho phép đánh giá hiệu quả hàng triệu mối quan hệ nguyên nhân tiềm năng trong các nghiên cứu liên kết biểu hiện rộng rãi.

Từ khóa


Tài liệu tham khảo

1000 Genomes Project Consortium, 2015, A global reference for human genetic variation, Nature, 526, 68, 10.1038/nature15393

Angrist JD, Krueger AB. 1992. Estimating the Payoff to Schooling Using the Vietnam-Era Draft Lottery. http://www.nber.org/papers/w4067 [Accessed February 1, 2018].

Angrist, 1995, Split-sample instrumental variables estimates of the return to schooling, Journal of Business & Economic Statistics, 13, 225, 10.1080/07350015.1995.10524597

Beck, 2014, GWAS Central: a comprehensive resource for the comparison and interrogation of genome-wide association studies, European Journal of Human Genetics, 22, 949, 10.1038/ejhg.2013.274

Benner, 2016, FINEMAP: efficient variable selection using summary data from genome-wide association studies, Bioinformatics, 32, 1493, 10.1093/bioinformatics/btw018

Bowden, 2015, Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression, International Journal of Epidemiology, 44, 512, 10.1093/ije/dyv080

Bowden, 2016, Consistent Estimation in Mendelian Randomization with Some Invalid Instruments Using a Weighted Median Estimator, Genetic Epidemiology, 40, 304, 10.1002/gepi.21965

Bowden, 2017, A framework for the investigation of pleiotropy in two-sample summary data Mendelian randomization, Statistics in Medicine, 36, 1783, 10.1002/sim.7221

Bowden, 2016, Assessing the suitability of summary data for two-sample Mendelian randomization analyses using MR-Egger regression: the role of the I2 statistic, International Journal of Epidemiology, 45, 1961, 10.1093/ije/dyw220

Bowden, 2009, Unbiased estimation of odds ratios: combining genomewide association scans with replication studies, Genetic Epidemiology, 33, 406, 10.1002/gepi.20394

Bowden, 2017, Improving the visualisation, interpretation and analysis of two-sample summary data mendelian randomization via the radial plot and radial regression, BioRxiv, 10.1101/200378

Bulik-Sullivan, 2015, An atlas of genetic correlations across human diseases and traits, Nature Genetics, 47, 1236, 10.1038/ng.3406

Burgess, 2016, Beyond Mendelian randomization: how to interpret evidence of shared genetic predictors, Journal of Clinical Epidemiology, 69, 208, 10.1016/j.jclinepi.2015.08.001

Burgess, 2014, Using multivariable mendelian randomization to disentangle the causal effects of lipid fractions, PLoS ONE, 9, e108891, 10.1371/journal.pone.0108891

Burgess, 2011, Avoiding bias from weak instruments in Mendelian randomization studies, International Journal of Epidemiology, 40, 755, 10.1093/ije/dyr036

Bycroft, 2017, Genome-wide genetic data on ~500,000 UK Biobank participants, bioRxiv, 10.1101/166298

Churchhouse C, Neale B. 2017. Rapid GWAS of thousands of phenotypes for 337,000 samples in the UK Biobank. Neale Lab. http://www.nealelab.is/blog/2017/7/19/rapid-gwas-of-thousands-of-phenotypes-for-337000-samples-in-the-uk-biobank [Accessed 14, December 2017].

Davey Smith, 2001, Epidemiology--is it time to call it a day?, International Journal of Epidemiology, 30, 1, 10.1093/ije/30.1.1

Davey Smith, 2003, 'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease?, International Journal of Epidemiology, 32, 1, 10.1093/ije/dyg070

Davey Smith, 2014, Mendelian randomization: genetic anchors for causal inference in epidemiological studies, Human Molecular Genetics, 23, R89, 10.1093/hmg/ddu328

Deming, 2016, Genetic studies of plasma analytes identify novel potential biomarkers for several complex traits, Scientific Reports, 6, 10.1038/srep18092

Denny, 2010, PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations, Bioinformatics, 26, 1205, 10.1093/bioinformatics/btq126

Di Angelantonio, 2009, Major lipids, apolipoproteins, and risk of vascular disease, JAMA, 302, 1993, 10.1001/jama.2009.1619

Do, 2013, Common variants associated with plasma triglycerides and risk for coronary artery disease, Nature Genetics, 45, 1345, 10.1038/ng.2795

Dudbridge, 2013, Power and predictive accuracy of polygenic risk scores, PLoS Genetics, 9, 10.1371/journal.pgen.1003348

Durinck, 2009, Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt, Nature Protocols, 4, 1184, 10.1038/nprot.2009.97

Euesden, 2015, PRSice: Polygenic Risk Score software, Bioinformatics, 31, 1466, 10.1093/bioinformatics/btu848

Gaunt, 2016, Systematic identification of genetic influences on methylation across the human life course, Genome Biology, 17, 10.1186/s13059-016-0926-z

Giambartolomei, 2014, Bayesian test for colocalisation between pairs of genetic association studies using summary statistics, PLoS Genetics, 10, 10.1371/journal.pgen.1004383

GTEx Consortium, 2015, Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans, Science, 348, 648, 10.1126/science.1262110

Gusev, 2016, Integrative approaches for large-scale transcriptome-wide association studies, Nature Genetics, 48, 245, 10.1038/ng.3506

Hannon, 2017, Pleiotropic Effects of Trait-Associated Genetic Variation on DNA Methylation: Utility for Refining GWAS Loci, The American Journal of Human Genetics, 100, 954, 10.1016/j.ajhg.2017.04.013

Hartwig, 2017, Inflammatory Biomarkers and Risk of Schizophrenia: A 2-Sample Mendelian Randomization Study, JAMA Psychiatry, 74, 10.1001/jamapsychiatry.2017.3191

Hartwig, 2017, Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption, International Journal of Epidemiology, 46, 1985, 10.1093/ije/dyx102

Hartwig, 2016, Two-sample Mendelian randomization: avoiding the downsides of a powerful, widely applicable but potentially fallible technique, International Journal of Epidemiology, 45, 1717, 10.1093/ije/dyx028

Haycock, 2017, Association Between Telomere Length and Risk of Cancer and Non-Neoplastic Diseases: A Mendelian Randomization Study, JAMA Oncology, 3, 636, 10.1001/jamaoncol.2016.5945

Haycock, 2016, Best (but oft-forgotten) practices: the design, analysis, and interpretation of Mendelian randomization studies, The American Journal of Clinical Nutrition, 103, 965, 10.3945/ajcn.115.118216

Hemani, 2017, Automating Mendelian randomization through machine learning to construct a putative causal map of the human phenome, bioRxiv, 10.1101/173682

Hemani, 2017, Orienting the causal relationship between imprecisely measured traits using GWAS summary data, PLoS Genetics, 13, 10.1371/journal.pgen.1007081

Hemani G. 2018. Analysis for MR Base methods paper. GitHub. 56a955c. https://github.com/explodecomputer/mr-base-methods-paper.

Holmes, 2017, Mendelian randomization in cardiometabolic disease: challenges in evaluating causality, Nature Reviews Cardiology, 14, 577, 10.1038/nrcardio.2017.78

Inoshita, 2018, Retraction: A significant causal association between C-reactive protein levels and schizophrenia, Scientific Reports, 8, 10.1038/srep46947

Johnson T. 2012. Efficient calculation for Multi-SNP genetic risk scores. American Society of Human Genetics Annual Meeting. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.398.7674.

Jones, 2016, Genome-Wide association analyses in 128,266 individuals identifies new morningness and sleep duration loci, PLOS Genetics, 12, e1006125, 10.1371/journal.pgen.1006125

Kang H, Zhang A, Cai TT, Small DS. 2014. Instrumental Variables Estimation with Some Invalid Instruments and its Application to Mendelian. arXiv. https://arxiv.org/abs/1401.5755.

Kettunen, 2016, Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of LPA, Nature Communications, 7, 10.1038/ncomms11122

Lawlor, 2008, Mendelian randomization: using genes as instruments for making causal inferences in epidemiology, Statistics in Medicine, 27, 1133, 10.1002/sim.3034

Lawlor, 2016, Triangulation in aetiological epidemiology, International Journal of Epidemiology, 45, 1866, 10.1093/ije/dyw314

Li, 2016, GWASdb v2: an update database for human genetic variants identified by genome-wide association studies, Nucleic Acids Research, 44, D869, 10.1093/nar/gkv1317

Millard, 2015, MR-PheWAS: hypothesis prioritization among potential causal effects of body mass index on many outcomes, using Mendelian randomization, Scientific Reports, 5, 10.1038/srep16645

Millard, 2017, Software Application Profile: PHESANT: a tool for performing automated phenome scans in UK Biobank, International Journal of Epidemiology, 10.1093/ije/dyx204

Munafò, 2018, Robust research needs many lines of evidence, Nature, 553, 399, 10.1038/d41586-018-01023-3

Newcombe, 2016, JAM: A Scalable Bayesian Framework for Joint Analysis of Marginal SNP Effects, Genetic Epidemiology, 40, 188, 10.1002/gepi.21953

Nikpay, 2015, A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease, Nature Genetics, 47, 1121, 10.1038/ng.3396

Pasaniuc, 2017, Dissecting the genetics of complex traits using summary association statistics, Nature Reviews Genetics, 18, 117, 10.1038/nrg.2016.142

Pickrell, 2016, Detection and interpretation of shared genetic influences on 42 human traits, Nature Genetics, 48, 709, 10.1038/ng.3570

Pierce, 2013, Efficient design for Mendelian randomization studies: subsample and 2-sample instrumental variable estimators, American Journal of Epidemiology, 178, 1177, 10.1093/aje/kwt084

Pilling, 2016, Human longevity is influenced by many genetic variants: evidence from 75,000 UK biobank participants, Aging, 8, 547, 10.18632/aging.100930

Richardson, 2017, Mendelian randomization analysis identifies CpG sites as putative mediators for genetic influences on cardiovascular disease risk, The American Journal of Human Genetics, 101, 590, 10.1016/j.ajhg.2017.09.003

Roederer, 2015, The genetic architecture of the human immune system: a bioresource for autoimmunity and disease pathogenesis, Cell, 161, 387, 10.1016/j.cell.2015.02.046

Sattar, 2010, Statins and risk of incident diabetes: a collaborative meta-analysis of randomised statin trials, Lancet, 375, 735, 10.1016/S0140-6736(09)61965-6

Schmidt, 2017, PCSK9 genetic variants and risk of type 2 diabetes: a mendelian randomisation study, The Lancet Diabetes & Endocrinology, 5, 97, 10.1016/S2213-8587(16)30396-5

Shin, 2014, An atlas of genetic influences on human blood metabolites, Nature Genetics, 46, 543, 10.1038/ng.2982

Silverman, 2016, Association Between Lowering LDL-C and Cardiovascular Risk Reduction Among Different Therapeutic Interventions: A Systematic Review and Meta-analysis, JAMA, 316, 1289, 10.1001/jama.2016.13985

Staley, 2016, PhenoScanner: a database of human genotype-phenotype associations, Bioinformatics, 32, 3207, 10.1093/bioinformatics/btw373

Sterne, 2011, Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials, BMJ, 343, 10.1136/bmj.d4002

Swerdlow, 2016, Selecting instruments for Mendelian randomization in the wake of genome-wide association studies, International Journal of Epidemiology, 45, 1600, 10.1093/ije/dyw088

Swerdlow, 2015, HMG-coenzyme A reductase inhibition, type 2 diabetes, and bodyweight: evidence from genetic analysis and randomised trials, Lancet, 385, 351, 10.1016/S0140-6736(14)61183-1

VanderWeele, 2014, Methodological challenges in mendelian randomization, Epidemiology, 25, 427, 10.1097/EDE.0000000000000081

Verbanck, 2018, Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases, Nature Genetics, 693, 10.1038/s41588-018-0099-7

Visscher, 2012, Five years of GWAS discovery, The American Journal of Human Genetics, 90, 7, 10.1016/j.ajhg.2011.11.029

Visscher, 2017, 10 Years of GWAS Discovery: Biology, Function, and Translation, The American Journal of Human Genetics, 101, 5, 10.1016/j.ajhg.2017.06.005

Welter, 2014, The NHGRI GWAS Catalog, a curated resource of SNP-trait associations, Nucleic Acids Research, 42, D1001, 10.1093/nar/gkt1229

White, 2016, Association of Lipid Fractions With Risks for Coronary Artery Disease and Diabetes, JAMA Cardiology, 1, 692, 10.1001/jamacardio.2016.1884

Willer, 2013, Discovery and refinement of loci associated with lipid levels, Nature Genetics, 45, 1274, 10.1038/ng.2797

Wood, 2016, Variants in the FTO and CDKAL1 loci have recessive effects on risk of obesity and type 2 diabetes, respectively, Diabetologia, 59, 1214, 10.1007/s00125-016-3908-5

Yavorska, 2017, MendelianRandomization: an R package for performing Mendelian randomization analyses using summarized data, International Journal of Epidemiology, 46, 1734, 10.1093/ije/dyx034

Zhao Q, Wang J, Hemani G, Bowden J, Small DS. 2018. Statistical inference in two-sample summary-data Mendelian randomization using robust adjusted profile score. arXiv. https://arxiv.org/abs/1801.09652.

Zheng, 2017, Recent developments in mendelian randomization studies, Current Epidemiology Reports, 4, 330, 10.1007/s40471-017-0128-6

Zheng, 2017, LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis, Bioinformatics, 33, 272, 10.1093/bioinformatics/btw613

Zhu, 2016, Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets, Nature genetics, 48, 481, 10.1038/ng.3538

Zollner, 2007, Overcoming the winner's curse: estimating penetrance parameters from case-control data, The American Journal of Human Genetics, 80, 605, 10.1086/512821