Whole population, genome-wide mapping of hidden relatedness

Genome Research - Tập 19 Số 2 - Trang 318-326 - 2009
Alexander Gusev1, Jennifer K. Lowe2,3,4, Markus Stoffel5, Mark J. Daly2,6,7, David Altshuler2,7,3, Jan L. Breslow4, Jeffrey M. Friedman8,9,4, Itsik Pe’er10,1
1Department of Computer Science, Columbia University, New York, New York 10027 USA
2Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA ,
3Department of Molecular Biology, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
4The Rockefeller University, New York, New York 10065, USA;
5ETH Zurich, Zurich 8093, Switzerland;
6Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
7Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115, USA
8Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA
9Howard Hughes Medical Institute, Chevy Chase, Maryland 20815, USA
10Center for Computational Biology and Bioinformatics, New York, New York 10032, USA

Tóm tắt

We present GERMLINE, a robust algorithm for identifying segmental sharing indicative of recent common ancestry between pairs of individuals. Unlike methods with comparable objectives, GERMLINE scales linearly with the number of samples, enabling analysis of whole-genome data in large cohorts. Our approach is based on a dictionary of haplotypes that is used to efficiently discover short exact matches between individuals. We then expand these matches using dynamic programming to identify long, nearly identical segmental sharing that is indicative of relatedness. We use GERMLINE to comprehensively survey hidden relatedness both in the HapMap as well as in a densely typed island population of 3000 individuals. We verify that GERMLINE is in concordance with other methods when they can process the data, and also facilitates analysis of larger scale studies. We bolster these results by demonstrating novel applications of precise analysis of hidden relatedness for (1) identification and resolution of phasing errors and (2) exposing polymorphic deletions that are otherwise challenging to detect. This finding is supported by concordance of detected deletions with other evidence from independent databases and statistical analyses of fluorescence intensity not used by GERMLINE.

Từ khóa


Tài liệu tham khảo

10.1086/301844

10.1006/jmbi.1990.9999

10.1089/cmb.2006.13.767

10.1086/521987

10.1631/jzus.2007.B0782

10.1007/s10519-005-9015-x

10.1038/nature06258

10.1534/genetics.107.074344

10.1016/j.tpb.2006.05.006

10.1186/1471-2156-6-S1-S34

Huang,, 2004, Whole genome DNA copy number changes identified by high density oligonucleotide arrays, Hum. Genomics, 1, 287, 10.1186/1479-7364-1-4-287

10.1038/ng1416

10.1038/nature04226

10.1101/gr.229202. Article published online before March 2002

10.1086/426405

10.1038/ng.216

10.1093/hmg/ddm376

10.1086/302754

10.1086/344347

Lowe, J.K. Maller, J.B. Pe'er, I. Neale, B.M. Salit, J. Kenny, E.E. Shea, J.L. Burkhardt, R. Ji, W. Noel, M. (2009) Genome-wide association studies in an isolated founder population from the Pacific island of Kosrae. PLoS Genet. (in press).

Malécot, G. (1948) Les mathématiques de l'hérédité (Masson, Paris, France).

10.1038/sj.hdy.6800564

10.1086/500808

10.1534/genetics.107.070953

10.1158/1055-9965.EPI-06-0219

10.1016/j.ajhg.2007.12.010

10.1093/hmg/ddm241

10.1086/519795

10.1038/nature05329

10.1073/pnas.0510156103

10.1086/319501

10.1111/j.1469-1809.2007.00406.x

10.1016/j.tpb.2007.11.011

10.1101/gr.6861907

10.1056/NEJMoa075974

Wright,, 1921, Systems of mating. I. The biometric relations between parent and offspring, Genetics, 6, 111, 10.1093/genetics/6.2.111