Initial sequencing and analysis of the human genome

Nature - Tập 409 Số 6822 - Trang 860-921 - 2001
Sı́lvia Beà1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1, Michael C. Zody1, Jennifer Baldwin1, Keri Devon1, Ken Dewar1, Michael P. Doyle1, William W. Fitzhugh1, Roel Funke1, Diane Gage1, Katrina Harris1, Andrew Heaford1, John G. Howland1, Lisa Kann1, Jessica A. Lehoczky1, R Paul Levine1, Paul McEwan1, Kevin McKernan1, James C. Meldrim1, Jill P. Mesirov1, Cher Miranda1, William Morris1, Jerome W. Naylor1, Christina Raymond1, Mark Rosetti1, Ralph Santos1, Andrew Sheridan1, Carrie Sougnez1, Nicole Stange-Thomann1, Nikola M. Stojanović1, Aravind Subramanian1, Dudley Wyman1, Jane Rogers2, John Sulston2, R. Ainscough2, Stephan Beck2, David Bentley2, John H. Burton2, Christopher Clee2, Nigel Carter2, Alan Coulson2, Rebecca Deadman2, Panos Deloukas2, Andrew Dunham2, Ian Dunham2, Richard Durbin2, Lisa French2, Darren Grafham2, Simon G. Gregory2, Tim Hubbard2, Sean Humphray2, Adrienne Hunt2, Matthew C. Jones2, Christine Lloyd2, Amanda McMurray2, Lucy Matthews2, Simon Mercer2, Sarah Milne2, James C. Mullikin2, Andrew J. Mungall2, R. W. Plumb2, Mark T. Ross2, R. Shownkeen2, Sarah Sims2, R Waterston3, Richard K. Wilson3, LaDeana W. Hillier3, John D. McPherson3, Marco A. Marra3, Elaine R. Mardis3, Lucinda A. Fulton3, Asif Chinwalla3, Kymberlie Pepin3, Warren Gish3, Stephanie L. Chissoe3, Michael C. Wendl3, Kim D. Delehaunty3, Tracie L. Miner3, Andrew Delehaunty3, Jason Kramer3, Lisa L. Cook3, Robert S. Fulton3, D. Johnson3, Patrick Minx3, Sandra W. Clifton3, Trevor Hawkins4, Elbert Branscomb4, Paul Predki4, Daniel S. Rokhsar4, Sarah Wenning4, Tom Slezak4, Norman A. Doggett4, Jan‐Fang Cheng4, Anne S. Olsen4, Susan Lucas4, Christopher J. Elkin4, Edward C. Uberbacher4, M.E. Frazier4, Richard A. Gibbs5, Donna M. Muzny5, Steven E. Scherer5, John Bouck5, Erica Sodergren5, Kim C. Worley5, Catherine Rives5, James H. Gorrell5, Michael L. Metzker5, Richard Reinhardt6, Raju Kucherlapati7, David L. Nelson, George M. Weinstock8, Yoshiyuki Sakaki9, Marianne Bronner‐Fraser9, Masahira Hattori9, Tetsushi Yada9, Atsushi Toyoda9, Takehiko Itoh9, Chiharu Kawagoe9, Hidemi Watanabe9, Yasushi Totoki9, Todd D. Taylor9, Jean Weissenbach10, Roland Heilig10, William Saurin10, François Artiguenave10, Philippe Brottier10, Hervé M. Blottière10, Éric Pelletier10, Catherine Robert10, Patrick Wincker10, André Rosenthal11, Matthias Platzer11, Gerald Nyakatura11, Stefan Taudien11, Andreas Rump11, Andrew R. Smith12, Lynn Doucette‐Stamm12, Pascale Roux12, Keith G. Weinstock12, Hong Mei Lee12, JoAnn Dubois12, Huanming Yang13, Jun Yu13, Jun Wang13, Guyang Matthew Huang14, Jun Gu15, Leroy Hood16, Lee Rowen16, Anup Madan16, Shizen Qin16, Ronald Davis17, Nancy A. Federspiel17, A. Pia Abola17, Michael Proctor17, Bruce A. Roe18, Feng Chen18, Huaqin Pan18, Juliane Ramser19, Hans Lehrach19, W. Richard McCombie20, M. de la Bastide20, Neilay Dedhia20, Helmut Blöcker21, Klaus Hornischer21, Gabriele Nordsiek21, Richa Agarwala22, L. Aravind22, Jeffrey A. Bailey23, Alex Bateman2, Serafim Batzoglou1, Ewan Birney24, Peer Bork25, Daniel G. Brown1, Christopher B. Burge26, Lorenzo Cerutti24, Hsiu-Chuan Chen22, Deanna M. Church22, Michèle Clamp2, Richard R. Copley27, Tobias Doerks25, Sean R. Eddy28, Evan E. Eichler23, Terrence S. Furey29, James Galagan1, James Gilbert2, Cyrus L. Harmon30, Sandrine Imbeaud31, David Haussler32, Henning Hermjakob24, Karsten Hokamp33, Wonhee Jang22, L. Steven Johnson28, Thomas A. Jones28, Simon Kasif34, Arek Kaspryzk24, Scot Kennedy35, W. James Kent36, Paul Kitts22, Eugene V. Koonin22, Ian Korf3, David Kulp30, Doron Lancet37, Todd M. Lowe38, Aoife McLysaght33, Tarjei S. Mikkelsen34, John V. Moran39, Nicola Mulder24, Victor J. Pollara1, Chris P. Ponting40, G. D. Schuler22, Jörg Schultz27, Guy Slater24, Arian F. A. Smit41, Elia Stupka24, Joseph D. Szustakowski34, Danielle Thierry‐Mieg22, Jean Thierry-Mieg22, John W. Wallis3, Raymond M. Wheeler30, Alan J. Williams30, Yuri I. Wolf22, Kenneth H. Wolfe33, Shiaw-Pyng Yang3, Ru-Fang Yeh26, Pedro Jares42, Mark S. Guyer42, Jane L. Peterson42, Adam L. Felsenfeld42, Kris A. Wetterstrand42, R Myers43, Jeremy Schmutz43, Mark Dickson43, Jonathan Wood43, David R. Cox43, Maynard V. Olson44, Rajinder Kaul44, Christopher K. Raymond44, Nobuyoshi Shimizu45, Kazuhiko Kawasaki45, Satoshi Minoshima45, Glen A. Evans46, Μαρία Αθανασίου47, Roger A. Schultz47, A.A.N. Patrinos48, Michael J. Morgan49
1Whitehead Institute for Biomedical Research, Center for Genome Research, Nine Cambridge Center, Cambridge, 02142, Massachusetts, USA
2The Sanger Centre, The Wellcome Trust Genome Campus, Hinxton, CB10 1RQ, Cambridgeshire, United Kingdom
3Washington University Genome Sequencing Center, Box 8501, 4444 Forest Park Avenue, St. Louis, 63108, Missouri, USA
4US DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, 94598, California, USA
5Department of Molecular and Human Genetics, Baylor College of Medicine Human Genome Sequencing Center, One Baylor Plaza, Houston, 77030, Texas, USA
6Department of Cellular and Structural Biology, The University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Drive, San Antonio, 78229-3900, Texas, USA
7Department of Molecular Genetics, Albert Einstein College of Medicine, 1635 Poplar Street, Bronx, 10461, New York, USA
8Baylor College of Medicine Human Genome Sequencing Center and the Department of Microbiology & Molecular Genetics, University of Texas Medical School, PO Box 20708, Houston, 77225, Texas, USA
9RIKEN Genomic Sciences Center, 1-7-22 Suehiro-cho, Tsurumi-ku Yokohama-city, 230-0045, Kanagawa, Japan
10Genoscope and CNRS UMR-8030, 2 Rue Gaston Cremieux, Evry Cedex, CP 5706, 91057, France
11Department of Genome Analysis, Institute of Molecular Biotechnology, Beutenbergstrasse 11, Jena, D-07745, Germany
12GTC Sequencing Center, Genome Therapeutics Corporation, 100 Beaver Street, Waltham, 02453-8443, Massachusetts, USA
13Beijing Genomics Institute/Human Genome Center, Institute of Genetics, Chinese Academy of Sciences, Beijing, 100101, China
14Southern China National Human Genome Research Center, Shanghai, 201203, China
15Northern China National Human Genome Research Center, Beijing, 100176, China
16Multimegabase Sequencing Center, The Institute for Systems Biology, 4225 Roosevelt Way, NE Suite 200, Seattle, 98105, Washington, USA
17Stanford Genome Technology Center, 855 California Avenue, Palo Alto, 94304, California, USA
18University of Oklahoma's Advanced Center for Genome Technology, Dept. of Chemistry and Biochemistry, University of Oklahoma, 620 Parrington Oval, Rm 311, Norman, 73019, Oklahoma, USA
19Max Planck Institute for Molecular Genetics, Ihnestrasse 73, Berlin, 14195, Germany
20Cold Spring Harbor Laboratory, Lita Annenberg Hazen Genome Center, 1 Bungtown Road, Cold Spring Harbor, 11724, New York, USA
21GBF - German Research Centre for Biotechnology, Mascheroder Weg 1, Braunschweig, D-38124, Germany
22National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bldg. 38A, 8600 Rockville Pike, Bethesda, 20894, Maryland, USA
23Department of Genetics, Case Western Reserve School of Medicine and University Hospitals of Cleveland, BRB 720, 10900 Euclid Ave., Cleveland, 44106, Ohio, USA
24EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, Cambridge, United Kingdom
25Max Delbrück Center for Molecular Medicine, Robert-Rossle-Strasse 10, Berlin-Buch, 13125, Germany
26Dept. of Biology, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, 02139-4307, Massachusetts, USA
27EMBL, Meyerhofstrasse 1, Heidelberg, 69012, Germany
28Howard Hughes Medical Institute, Dept. of Genetics, Washington University School of Medicine, Saint Louis, 63110, Missouri, USA
29Dept. of Computer Science, University of California at Santa Cruz, Santa Cruz, 95064, California, USA
30Affymetrix, Inc., 2612 8th St, Berkeley, 94710, California, USA
31Genome Exploration Research Group, Genomic Sciences Center, RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Kanagawa, Japan
32Department of Computer Science, Howard Hughes Medical Institute, University of California at Santa Cruz, 95064, California, USA
33Department of Genetics, University of Dublin, Trinity College, Smurfit Institute, Dublin, 2, Ireland
34Cambridge Research Laboratory, Compaq Computer Corporation and MIT Genome Center, 1 Cambridge Center, Cambridge, 02142, Massachusetts, USA
35Dept. of Mathematics, University of California at Santa Cruz, Santa Cruz, 95064, California, USA
36Dept. of Biology, University of California at Santa Cruz, Santa Cruz, 95064, California, USA
37Crown Human Genetics Center and Department of Molecular Genetics, The Weizmann Institute of Science, Rehovot, 71600, Israel
38Dept. of Genetics, Stanford University School of Medicine, Stanford, 94305, California, USA
39Departments of Human Genetics and Internal Medicine, The University of Michigan Medical School, Ann Arbor, 48109, Michigan, USA
40Department of Human Anatomy and Genetics, MRC Functional Genetics Unit, University of Oxford, South Parks Road, Oxford, OX1 3QX, UK
41Institute for Systems Biology, 4225 Roosevelt Way NE, Seattle, 98105, WA, USA
42National Human Genome Research Institute, US National Institutes of Health, 31 Center Drive, Bethesda, 20892, Maryland, USA
43Stanford Human Genome Center and Department of Genetics, Stanford University School of Medicine, Stanford, 94305-5120, California, USA
44University of Washington Genome Center, 225 Fluke Hall on Mason Road, Seattle, 98195, Washington, USA
45Department of Molecular Biology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, 160-8582, Tokyo, Japan
46INRA, Station d’Amélioration des Plantes, 63039, Clermont-Ferrand Cedex, 2, France
47University of Texas Southwestern Medical Center at Dallas, 6000 Harry Hines Blvd., Dallas, 75235-8591, Texas, USA
48US Department of Energy, Office of Science, 19901 Germantown Road, Germantown, 20874, Maryland, USA
49The Wellcome Trust, 183 Euston Road, London, NW1 2BE, UK

Tóm tắt

Từ khóa


Tài liệu tham khảo

Correns, C. Untersuchungen über die Xenien bei Zea mays. Berichte der Deutsche Botanische Gesellschaft 17, 410–418 (1899).

De Vries, H. Sur la loie de disjonction des hybrides. Comptes Rendue Hebdemodaires, Acad. Sci. Paris 130, 845–847 (1900).

von Tschermack, E. Uber Künstliche Kreuzung bei Pisum sativum. Berichte der Deutsche Botanische Gesellschaft 18, 232–239. (1900).

Sanger, F. et al. Nucleotide sequence of bacteriophage Φ X174 DNA. Nature 265, 687–695 (1977).

Sanger, F. et al. The nucleotide sequence of bacteriophage ΦX174. J Mol Biol 125, 225–246 (1978).

Sanger, F., Coulson, A. R., Hong, G. F., Hill, D. F. & Petersen, G. B. Nucleotide-sequence of bacteriophage Lambda DNA. J. Mol. Biol. 162, 729–773 (1982).

Fiers, W. et al. Complete nucleotide sequence of SV40 DNA. Nature 273, 113–120 (1978).

Anderson, S. et al. Sequence and organization of the human mitochondrial genome. Nature 290, 457–465 (1981).

Botstein, D., White, R. L., Skolnick, M. & Davis, R. W. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am. J. Hum. Genet. 32, 314–331 (1980).

Olson, M. V. et al. Random-clone strategy for genomic restriction mapping in yeast. Proc. Natl Acad. Sci. USA 83, 7826–7830 (1986).

Coulson, A., Sulston, J., Brenner, S. & Karn, J. Toward a physical map of the genome of the nematode Caenorhabditis elegans. Proc. Natl Acad. Sci. USA 83, 7821–7825 (1986).

Putney, S. D., Herlihy, W. C. & Schimmel, P. A new troponin T and cDNA clones for 13 different muscle proteins, found by shotgun sequencing. Nature 302, 718–721 (1983).

Milner, R. J. & Sutcliffe, J. G. Gene expression in rat brain. Nucleic Acids Res. 11, 5497–5520 (1983).

Adams, M. D. et al. Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252, 1651–1656 (1991).

Adams, M. D. et al. Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature 377, 3–174 (1995).

Okubo, K. et al. Large scale cDNA sequencing for analysis of quantitative and qualitative aspects of gene expression. Nature Genet. 2, 173–179 (1992).

Hillier, L. D. et al. Generation and analysis of 280,000 human expressed sequence tags. Genome Res. 6, 807–828 (1996).

Strausberg, R. L., Feingold, E. A., Klausner, R. D. & Collins, F. S. The mammalian gene collection. Science 286, 455–457 (1999).

Berry, R. et al. Gene-based sequence-tagged-sites (STSs) as the basis for a human gene map. Nature Genet. 10, 415–423 (1995).

Houlgatte, R. et al. The Genexpress Index: a resource for gene discovery and the genic map of the human genome. Genome Res. 5, 272–304 (1995).

Sinsheimer, R. L. The Santa Cruz Workshop—May 1985. Genomics 5, 954–956 (1989).

Palca, J. Human genome—Department of Energy on the map. Nature 321, 371 (1986).

National Research Council Mapping and Sequencing the Human Genome (National Academy Press, Washington DC, 1988).

Bishop, J. E. & Waldholz, M. Genome (Simon and Schuster, New York, 1990).

Kevles, D. J. & Hood, L. (eds) The Code of Codes: Scientific and Social Issues in the Human Genome Project (Harvard Univ. Press, Cambridge, Massachusetts, 1992).

Cook-Deegan, R. The Gene Wars: Science, Politics, and the Human Genome (W. W. Norton & Co., New York, London, 1994).

Donis-Keller, H. et al. A genetic linkage map of the human genome. Cell 51, 319–337 (1987).

Gyapay, G. et al. The 1993–94 Genethon human genetic linkage map. Nature Genet. 7, 246–339 (1994).

Hudson, T. J. et al. An STS-based map of the human genome. Science 270, 1945–1954 (1995).

Dietrich, W. F. et al. A comprehensive genetic map of the mouse genome. Nature 380, 149–152 (1996).

Nusbaum, C. et al. A YAC-based physical map of the mouse genome. Nature Genet. 22, 388–393 (1999).

Oliver, S. G. et al. The complete DNA sequence of yeast chromosome III. Nature 357, 38–46 (1992).

Wilson, R. et al. 2.2 Mb of contiguous nucleotide sequence from chromosome III of C. elegans. Nature 368, 32–38 (1994).

Chen, E. Y. et al. The human growth hormone locus: nucleotide sequence, biology, and evolution. Genomics 4, 479–497 (1989).

McCombie, W. R. et al. Expressed genes, Alu repeats and polymorphisms in cosmids sequenced from chromosome 4p16.3. Nature Genet. 1, 348–353 (1992).

Martin-Gallardo, A. et al. Automated DNA sequencing and analysis of 106 kilobases from human chromosome 19q13.3. Nature Genet. 1, 34–39 (1992).

Edwards, A. et al. Automated DNA sequencing of the human HPRT locus. Genomics 6, 593–608 (1990).

Marshall, E. A strategy for sequencing the genome 5 years early. Science 267, 783–784 (1995).

Project to sequence human genome moves on to the starting blocks. Nature 375, 93–94 (1995).

Shizuya, H. et al. Cloning and stable maintenance of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-based vector. Proc. Natl Acad. Sci. USA 89, 8794–8797 (1992).

Burke, D. T., Carle, G. F. & Olson, M. V. Cloning of large segments of exogenous DNA into yeast by means of artificial chromosome vectors. Science 236, 806–812 (1987).

Marshall, E. A second private genome project. Science 281, 1121 (1998).

Marshall, E. NIH to produce a ‘working draft’ of the genome by 2001. Science 281, 1774–1775 (1998).

Pennisi, E. Academic sequencers challenge Celera in a sprint to the finish. Science 283, 1822–1823 (1999).

Bouck, J., Miller, W., Gorrell, J. H., Muzny, D. & Gibbs, R. A. Analysis of the quality and utility of random shotgun sequencing at low redundancies. Genome Res. 8, 1074–1084 (1998).

Collins, F. S. et al. New goals for the U. S. Human Genome Project: 1998–2003. Science 282, 682–689 (1998).

Sanger, F. & Coulson, A. R. A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J. Mol. Biol. 94, 441–448 (1975).

Maxam, A. M. & Gilbert, W. A new method for sequencing DNA. Proc. Natl Acad. Sci. USA 74, 560–564 (1977).

Anderson, S. Shotgun DNA sequencing using cloned DNase I-generated fragments. Nucleic Acids Res. 9, 3015–3027 (1981).

Gardner, R. C. et al. The complete nucleotide sequence of an infectious clone of cauliflower mosaic virus by M13mp7 shotgun sequencing. Nucleic Acids Res. 9, 2871–2888 (1981).

Deininger, P. L. Random subcloning of sonicated DNA: application to shotgun DNA sequence analysis. Anal. Biochem. 129, 216–223 (1983).

Chissoe, S. L. et al. Sequence and analysis of the human ABL gene, the BCR gene, and regions involved in the Philadelphia chromosomal translocation. Genomics 27, 67–82 (1995).

Rowen, L., Koop, B. F. & Hood, L. The complete 685-kilobase DNA sequence of the human beta T cell receptor locus. Science 272, 1755–1762 (1996).

Koop, B. F. et al. Organization, structure, and function of 95 kb of DNA spanning the murine T-cell receptor C alpha/C delta region. Genomics 13, 1209–1230 (1992).

Wooster, R. et al. Identification of the breast cancer susceptibility gene BRCA2. Nature 378, 789–792 (1995).

Fleischmann, R. D. et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496–512 (1995).

Lander, E. S. & Waterman, M. S. Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics 2, 231–239 (1988).

Weber, J. L. & Myers, E. W. Human whole-genome shotgun sequencing. Genome Res. 7, 401–409 (1997).

Green, P. Against a whole-genome shotgun. Genome Res. 7, 410–417 (1997).

Venter, J. C. et al. Shotgun sequencing of the human genome. Science 280, 1540–1542 (1998).

Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

Smith, L. M. et al. Fluorescence detection in automated DNA sequence analysis. Nature 321, 674–679 (1986).

Ju, J. Y., Ruan, C. C., Fuller, C. W., Glazer, A. N. & Mathies, R. A. Fluorescence energy-transfer dye-labeled primers for DNA sequencing and analysis. Proc. Natl Acad. Sci. USA 92, 4347–4351 (1995).

Lee, L. G. et al. New energy transfer dyes for DNA sequencing. Nucleic Acids Res. 25, 2816–2822 (1997).

Rosenblum, B. B. et al. New dye-labeled terminators for improved DNA sequencing patterns. Nucleic Acids Res. 25, 4500–4504 (1997).

Metzker, M. L., Lu, J. & Gibbs, R. A. Electrophoretically uniform fluorescent dyes for automated DNA sequencing. Science 271, 1420–1422 (1996).

Prober, J. M. et al. A system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides. Science 238, 336–341 (1987).

Reeve, M. A. & Fuller, C. W. A novel thermostable polymerase for DNA sequencing. Nature 376, 796–797 (1995).

Tabor, S. & Richardson, C. C. Selective inactivation of the exonuclease activity of bacteriophage T7 DNA polymerase by in vitro mutagenesis. J. Biol. Chem. 264, 6447–6458 (1989).

Tabor, S. & Richardson, C. C. DNA sequence analysis with a modified bacteriophage T7 DNA polymerase—effect of pyrophosphorolysis and metal ions. J. Biol. Chem. 265, 8322–8328 (1990).

Murray, V. Improved double-stranded DNA sequencing using the linear polymerase chain reaction. Nucleic Acids Res. 17, 8889 (1989).

Guttman, A., Cohen, A. S., Heiger, D. N. & Karger, B. L. Analytical and micropreparative ultrahigh resolution of oligonucleotides by polyacrylamide-gel high-performance capillary electrophoresis. Anal. Chem. 62, 137–141 (1990).

Luckey, J. A. et al. High-speed DNA sequencing by capillary electrophoresis. Nucleic Acids Res. 18, 4417–4421 (1990).

Swerdlow, H., Wu, S., Harke, H. & Dovichi, N. J. Capillary gel-electrophoresis for DNA sequencing—laser-induced fluorescence detection with the sheath flow cuvette. J. Chromatogr. 516, 61–67 (1990).

Meldrum, D. Automation for genomics, part one: preparation for sequencing. Genome Res. 10, 1081–1092 (2000).

Meldrum, D. Automation for genomics, part two: sequencers, microarrays, and future trends. Genome Res. 10, 1288–1303 (2000).

Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998).

Ewing, B., Hillier, L., Wendl, M. C. & Green, P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185 (1998).

Bentley, D. R. Genomic sequence information should be released immediately and freely in the public domain. Science 274, 533–534 (1996).

Guyer, M. Statement on the rapid release of genomic DNA sequence. Genome Res. 8, 413 (1998).

Dietrich, W. et al. A genetic map of the mouse suitable for typing intraspecific crosses. Genetics 131, 423–447 (1992).

Kim, U. J. et al. Construction and characterization of a human bacterial artificial chromosome library. Genomics 34, 213–218 (1996).

Osoegawa, K. et al. Bacterial artificial chromosome libraries for mouse sequencing and functional analysis. Genome Res. 10, 116–128 (2000).

Marra, M. A. et al. High throughput fingerprint analysis of large-insert clones. Genome Res. 7, 1072–1084 (1997).

Marra, M. et al. A map for sequence analysis of the Arabidopsis thaliana genome. Nature Genet. 22, 265–270 (1999).

The International Human Genome Mapping Consortium. A physical map of the human genome. Nature 409, 934–941 (2001).

Zhao, S. et al. Human BAC ends quality assessment and sequence analyses. Genomics 63, 321–332 (2000).

Mahairas, G. G. et al. Sequence-tagged connectors: A sequence approach to mapping and scanning the human genome. Proc. Natl Acad. Sci. USA 96, 9739–9744 (1999).

Tilford, C. A. et al. A physical map of the human Y chromosome. Nature 409, 943–945 (2001).

Bentley, D. R. et al. The physical maps for sequencing human chromosomes 1, 6, 9, 10, 13, 20 and X. Nature 409, 942–943 (2001).

Montgomery, K. T. et al. A high-resolution map of human chromosome 12. Nature 409, 945–946 (2001).

Brüls, T. et al. A physical map of human chromosome 14. Nature 409, 947–948 (2001).

Hattori, M. et al. The DNA sequence of human chromosome 21. Nature 405, 311–319 (2000).

Dunham, I. et al. The DNA sequence of human chromosome 22. Nature 402, 489–495 (1999).

Cox, D. et al. Radiation hybrid map of the human genome. Science (in the press).

Osoegawa, K. et al. An improved approach for construction of bacterial artificial chromosome libraries. Genomics 52, 1–8 (1998).

The International SNP Map Working Group. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001).

Collins, F. S., Brooks, L. D. & Chakravarti, A. A DNA polymorphism discovery resource for research on human genetic variation. Genome Res. 8, 1229–1231 (1998).

Stewart, E. A. et al. An STS-based radiation hybrid map of the human genome. Genome Res. 7, 422–433 (1997).

Deloukas, P. et al. A physical map of 30,000 human genes. Science 282, 744–746 (1998).

Dib, C. et al. A comprehensive genetic map of the human genome based on 5,264 microsatellites. Nature 380, 152–154 (1996).

Broman, K. W., Murray, J. C., Sheffield, V. C., White, R. L. & Weber, J. L. Comprehensive human genetic maps: individual and sex-specific variation in recombination. Am. J. Hum. Genet. 63, 861–869 (1998).

The BAC Resource Consortium. Integration of cytogenetic landmarks into the draft sequence of the human genome. Nature 409, 953–958 (2001).

Kent, W. J. & Haussler, D. GigAssembler: an algorithm for the initial assembly of the human working draft . Technical Report UCSC-CRL-00-17 (Univ. California at Santa Cruz, Santa Cruz, California, 2001).

Morton, N. E. Parameters of the human genome. Proc. Natl Acad. Sci. USA 88, 7474–7476 (1991).

Podugolnikova, O. A. & Blumina, M. G. Heterochromatic regions on chromosomes 1, 9, 16, and Y in children with some disturbances occurring during embryo development. Hum. Genet. 63, 183–188 (1983).

Lundgren, R., Berger, R. & Kristoffersson, U. Constitutive heterochromatin C-band polymorphism in prostatic cancer. Cancer Genet. Cytogenet. 51, 57–62 (1991).

Lee, C., Wevrick, R., Fisher, R. B., Ferguson-Smith, M. A. & Lin, C. C. Human centromeric DNAs. Hum. Genet. 100, 291–304 (1997).

Riethman, H. C. et al. Integration of telomere sequences with the draft human genome sequence. Nature 409, 953–958 (2001).

Pruit, K. D. & Maglott, D. R. RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res. 29, 137–140 (2001).

Wolfsberg, T. G., McEntyre, J. & Schuler, G. D. Guide to the draft human genome. Nature 409, 824–826 (2001).

Hurst, L. D. & Eyre-Walker, A. Evolutionary genomics: reading the bands. Bioessays 22, 105–107 (2000).

Saccone, S. et al. Correlations between isochores and chromosomal bands in the human genome. Proc. Natl Acad. Sci. USA 90, 11929–11933 (1993).

Zoubak, S., Clay, O. & Bernardi, G. The gene distribution of the human genome. Gene 174, 95–102 (1996).

Gardiner, K. Base composition and gene distribution: critical patterns in mammalian genome organization. Trends Genet. 12, 519–524 (1996).

Duret, L., Mouchiroud, D. & Gautier, C. Statistical analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores. J. Mol. Evol. 40, 308–317 (1995).

Saccone, S., De Sario, A., Della Valle, G. & Bernardi, G. The highest gene concentrations in the human genome are in telomeric bands of metaphase chromosomes. Proc. Natl Acad. Sci. USA 89, 4913–4917 (1992).

Bernardi, G. et al. The mosaic genome of warm-blooded vertebrates. Science 228, 953–958 (1985).

Bernardi, G. Isochores and the evolutionary genomics of vertebrates. Gene 241, 3–17 (2000).

Fickett, J. W., Torney, D. C. & Wolf, D. R. Base compositional structure of genomes. Genomics 13, 1056–1064 (1992).

Churchill, G. A. Stochastic models for heterogeneous DNA sequences. Bull. Math. Biol. 51, 79–94 (1989).

Bird, A., Taggart, M., Frommer, M., Miller, O. J. & Macleod, D. A fraction of the mouse genome that is derived from islands of nonmethylated, CpG-rich DNA. Cell 40, 91–99 (1985).

Bird, A. P. CpG islands as gene markers in the vertebrate nucleus. Trends Genet. 3, 342–347 (1987).

Chan, M. F., Liang, G. & Jones, P. A. Relationship between transcription and DNA methylation. Curr. Top. Microbiol. Immunol. 249, 75–86 (2000).

Holliday, R. & Pugh, J. E. DNA modification mechanisms and gene activity during development. Science 187, 226–232 (1975).

Larsen, F., Gundersen, G., Lopez, R. & Prydz, H. CpG islands as gene markers in the human genome. Genomics 13, 1095–1107 (1992).

Tazi, J. & Bird, A. Alternative chromatin structure at CpG islands. Cell 60, 909–920 (1990).

Gardiner-Garden, M. & Frommer, M. CpG islands in vertebrate genomes. J. Mol. Biol. 196, 261–282 (1987).

Antequera, F. & Bird, A. Number of CpG islands and genes in human and mouse. Proc. Natl Acad. Sci. USA 90, 11995–11999 (1993).

Ewing, B. & Green, P. Analysis of expressed sequence tags indicates 35,000 human genes. Nature Genet. 25, 232–234 (2000).

Yu, A. Comparison of human genetic and sequence-based physical maps. Nature 409, 951–953 (2001).

Kaback, D. B., Guacci, V., Barber, D. & Mahon, J. W. Chromosome size-dependent control of meiotic recombination. Science 256, 228–232 (1992).

Riles, L. et al. Physical maps of the 6 smallest chromosomes of Saccharomyces cerevisiae at a resolution of 2.6-kilobase pairs. Genetics 134, 81–150 (1993).

Lynn, A. et al. Patterns of meiotic recombination on the long arm of human chromosome 21. Genome Res. 10, 1319–1332 (2000).

Laurie, D. A. & Hulten, M. A. Further studies on bivalent chiasma frequency in human males with normal karyotypes. Ann. Hum. Genet. 49, 189–201 (1985).

Roeder, G. S. Meiotic chromosomes: it takes two to tango. Genes Dev. 11, 2600–2621 (1997).

Wu, T.-C. & Lichten, M. Meiosis-induced double-strand break sites determined by yeast chromatin structure. Science 263, 515–518 (1994).

Gerton, J. L. et al. Global mapping of meiotic recombination hotspots and coldspots in the yeast Saccharomyces cerevisiae. Proc. Natl Acad. Sci. USA 97, 11383–11390 (2000).

Li, W. -H. Molecular Evolution (Sinauer, Sunderland, Massachusetts, 1997).

Gregory, T. R. & Hebert, P. D. The modulation of DNA content: proximate causes and ultimate consequences. Genome Res. 9, 317–324 (1999).

Hartl, D. L. Molecular melodies in high and low C. Nature Rev. Genet. 1, 145–149 (2000).

Smit, A. F. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr. Opin. Genet. Dev. 9, 657–663 (1999).

Prak, E. L. & Haig, H. K. Jr Mobile elements and the human genome. Nature Rev. Genet. 1, 134–144 (2000).

Okada, N., Hamada, M., Ogiwara, I. & Ohshima, K. SINEs and LINEs share common 3′ sequences: a review. Gene 205, 229–243 (1997).

Esnault, C., Maestre, J. & Heidmann, T. Human LINE retrotransposons generate processed pseudogenes. Nature Genet. 24, 363–367 (2000).

Wei, W. et al. Human L1 retrotransposition: cis-preference vs. trans-complementation. Mol. Cell. Biol. 21, 1429–1439 (2001)

Malik, H. S., Henikoff, S. & Eickbush, T. H. Poised for contagion: evolutionary origins of the infectious abilities of invertebrate retroviruses. Genome Res. 10, 1307–1318 (2000).

Smit, A. F. The origin of interspersed repeats in the human genome. Curr. Opin. Genet. Dev. 6, 743–748 (1996).

Clark, J. B. & Tidwell, M. G. A phylogenetic perspective on P transposable element evolution in Drosophila. Proc. Natl Acad. Sci. USA 94, 11428–11433 (1997).

Haring, E., Hagemann, S. & Pinsker, W. Ancient and recent horizontal invasions of Drosophilids by P elements. J. Mol. Evol. 51, 577–586 (2000).

Koga, A. et al. Evidence for recent invasion of the medaka fish genome by the Tol2 transposable element. Genetics 155, 273–281 (2000).

Robertson, H. M. & Lampe, D. J. Recent horizontal transfer of a mariner transposable element among and between Diptera and Neuroptera. Mol. Biol. Evol. 12, 850–862 (1995).

Simmons, G. M. Horizontal transfer of hobo transposable elements within the Drosophila melanogaster species complex: evidence from DNA sequencing. Mol. Biol. Evol. 9, 1050–1060 (1992).

Malik, H. S., Burke, W. D. & Eickbush, T. H. The age and evolution of non-LTR retrotransposable elements. Mol. Biol. Evol. 16, 793–805 (1999).

Kordis, D. & Gubensek, F. Bov-B long interspersed repeated DNA (LINE) sequences are present in Vipera ammodytes phospholipase A2 genes and in genomes of Viperidae snakes. Eur. J. Biochem. 246, 772–779 (1997).

Jurka, J. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 16, 418–420 (2000).

Sarich, V. M. & Wilson, A. C. Generation time and genome evolution in primates. Science 179, 1144–1147 (1973).

Smit, A. F., Toth, G., Riggs, A. D., & Jurka, J. Ancestral, mammalian-wide subfamilies of LINE-1 repetitive sequences. J. Mol. Biol. 246, 401–417 (1995).

Lim, J. K. & Simmons, M. J. Gross chromosome rearrangements mediated by transposable elements in Drosophila melanogaster. Bioessays 16, 269–275 (1994).

Caceres, M., Ranz, J. M., Barbadilla, A., Long, M. & Ruiz, A. Generation of a widespread Drosophila inversion by a transposable element. Science 285, 415–418 (1999).

Gray, Y. H. It takes two transposons to tango: transposable-element-mediated chromosomal rearrangements. Trends Genet. 16, 461–468 (2000).

Zhang, J. & Peterson, T. Genome rearrangements by nonlinear transposons in maize. Genetics 153, 1403–1410 (1999).

Smit, A. F. Identification of a new, abundant superfamily of mammalian LTR-transposons. Nucleic Acids Res. 21, 1863–1872 (1993).

Cordonnier, A., Casella, J. F. & Heidmann, T. Isolation of novel human endogenous retrovirus-like elements with foamy virus-related pol sequence. J. Virol. 69, 5890–5897 (1995).

Medstrand, P. & Mager, D. L. Human-specific integrations of the HERV-K endogenous retrovirus family. J. Virol. 72, 9782–9787 (1998).

Myers, E. W. et al. A whole-genome assembly of Drosophila. Science 287, 2196–2204 (2000).

Petrov, D. A., Lozovskaya, E. R. & Hartl, D. L. High intrinsic rate of DNA loss in Drosophila. Nature 384, 346–349 (1996).

Li, W. H., Ellsworth, D. L., Krushkal, J., Chang, B. H. & Hewett-Emmett, D. Rates of nucleotide substitution in primates and rodents and the generation-time effect hypothesis. Mol. Phylogenet. Evol. 5, 182–187 (1996).

Goodman, M. et al. Toward a phylogenetic classification of primates based on DNA evidence complemented by fossil evidence. Mol. Phylogenet. Evol. 9, 585–598 (1998).

Kazazian, H. H. Jr & Moran, J. V. The impact of L1 retrotransposons on the human genome. Nature Genet. 19, 19–24 (1998).

Malik, H. S. & Eickbush, T. H. NeSL-1, an ancient lineage of site-specific non-LTR retrotransposons from Caenorhabditis elegans. Genetics 154, 193–203 (2000).

Casavant, N. C. et al. The end of the LINE?: lack of recent L1 activity in a group of South American rodents. Genetics 154, 1809–1817 (2000).

Meunier-Rotival, M., Soriano, P., Cuny, G., Strauss, F. & Bernardi, G. Sequence organization and genomic distribution of the major family of interspersed repeats of mouse DNA. Proc. Natl Acad. Sci. USA 79, 355–359 (1982).

Soriano, P., Meunier-Rotival, M. & Bernardi, G. The distribution of interspersed repeats is nonuniform and conserved in the mouse and human genomes. Proc. Natl Acad. Sci. USA 80, 1816–1820 (1983).

Goldman, M. A., Holmquist, G. P., Gray, M. C., Caston, L. A. & Nag, A. Replication timing of genes and middle repetitive sequences. Science 224, 686–692 (1984).

Manuelidis, L. & Ward, D. C. Chromosomal and nuclear distribution of the HindIII 1.9-kb human DNA repeat segment. Chromosoma 91, 28–38 (1984).

Feng, Q., Moran, J. V., Kazazian, H. H. Jr & Boeke, J. D. Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 87, 905–916 (1996).

Jurka, J. Sequence patterns indicate an enzymatic involvement in integration of mammalian retroposons. Proc. Natl Acad. Sci. USA 94, 1872–1877 (1997).

Arcot, S. S. et al. High-resolution cartography of recently integrated human chromosome 19-specific Alu fossils. J. Mol. Biol. 281, 843–856 (1998).

Schmid, C. W. Does SINE evolution preclude Alu function? Nucleic Acids Res. 26, 4541–4550 (1998).

Chu, W. M., Ballard, R., Carpick, B. W., Williams, B. R. & Schmid, C. W. Potential Alu function: regulation of the activity of double-stranded RNA-activated kinase PKR. Mol. Cell. Biol. 18, 58–68 (1998).

Li, T., Spearow, J., Rubin, C. M. & Schmid, C. W. Physiological stresses increase mouse short interspersed element (SINE) RNA expression in vivo. Gene 239, 367–372 (1999).

Liu, W. M., Chu, W. M., Choudary, P. V. & Schmid, C. W. Cell stress and translational inhibitors transiently increase the abundance of mammalian SINE transcripts. Nucleic Acids Res. 23, 1758–1765 (1995).

Filipski, J. Correlation between molecular clock ticking, codon usage fidelity of DNA repair, chromosome banding and chromatin compactness in germline cells. FEBS Lett. 217, 184–186 (1987).

Sueoka, N. Directional mutation pressure and neutral molecular evolution. Proc. Natl Acad. Sci. USA 85, 2653–2657 (1988).

Wolfe, K. H., Sharp, P. M. & Li, W. H. Mutation rates differ among regions of the mammalian genome. Nature 337, 283–285 (1989).

Bains, W. Local sequence dependence of rate of base replacement in mammals. Mutat. Res. 267, 43–54 (1992).

Mathews, C. K. & Ji, J. DNA precursor asymmetries, replication fidelity, and variable genome evolution. Bioessays 14, 295–301 (1992).

Holmquist, G. P. & Filipski, J. Organization of mutations along the genome: a prime determinant of genome evolution. Trends Ecol. Evol. 9, 65–68 (1994).

Eyre-Walker, A. Evidence of selection on silent site base composition in mammals: potential implications for the evolution of isochores and junk DNA. Genetics 152, 675–683 (1999).

The International SNP Map Working Group. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407, 513–516 (2000).

Bohossian, H. B., Skaletsky, H. & Page, D. C. Unexpectedly similar rates of nucleotide substitution found in male and female hominids. Nature 406, 622–625 (2000).

Skowronski, J., Fanning, T. G. & Singer, M. F. Unit-length LINE-1 transcripts in human teratocarcinoma cells. Mol. Cell. Biol. 8, 1385–1397 (1988).

Boissinot, S., Chevret, P. & Furano, A. V. L1 (LINE-1) retrotransposon evolution and amplification in recent human history. Mol. Biol. Evol. 17, 915–928 (2000).

Moran, J. V. Human L1 retrotransposition: insights and peculiarities learned from a cultured cell retrotransposition assay. Genetica 107, 39–51 (1999).

Kazazian, H. H. Jr et al. Haemophilia A resulting from de novo insertion of L1 sequences represents a novel mechanism for mutation in man. Nature 332, 164–166 (1988).

Sheen, F.-m. et al. Reading between the LINEs: Human genomic variation introduced by LINE-1 retrotransposition. Genome Res. 10, 1496–1508 (2000).

Dombroski, B. A., Mathias, S. L., Nanthakumar, E., Scott, A. F. & Kazazian, H. H. Jr Isolation of an active human transposable element. Science 254, 1805–1808 (1991).

Holmes, S. E., Dombroski, B. A., Krebs, C. M., Boehm, C. D. & Kazazian, H. H. Jr A new retrotransposable human L1 element from the LRE2 locus on chromosome 1q produces a chimaeric insertion. Nature Genet. 7, 143–148 (1994).

Sassaman, D. M. et al. Many human L1 elements are capable of retrotransposition. Nature Genet. 16, 37–43 (1997).

Dombroski, B. A., Scott, A. F. & Kazazian, H. H. Jr Two additional potential retrotransposons isolated from a human L1 subfamily that contains an active retrotransposable element. Proc. Natl Acad. Sci. USA 90, 6513–6517 (1993).

Kimberland, M. L. et al. Full-length human L1 insertions retain the capacity for high frequency retrotransposition in cultured cells. Hum. Mol. Genet. 8, 1557–1560 (1999).

Moran, J. V. et al. High frequency retrotransposition in cultured mammalian cells. Cell 87, 917–927 (1996).

Moran, J. V., DeBerardinis, R. J. & Kazazian, H. H. Jr Exon shuffling by L1 retrotransposition. Science 283, 1530–1534 (1999).

Pickeral, O. K., Makalowski, W., Boguski, M. S. & Boeke, J. D. Frequent human genomic DNA transduction driven by LINE-1 retrotransposition. Genome Res. 10, 411–415 (2000).

Miki, Y. et al. Disruption of the APC gene by a retrotransposal insertion of L1 sequence in a colon cancer. Cancer Res. 52, 643–645 (1992).

Branciforte, D. & Martin, S. L. Developmental and cell type specificity of LINE-1 expression in mouse testis: implications for transposition. Mol. Cell. Biol. 14, 2584–2592 (1994).

Trelogan, S. A. & Martin, S. L. Tightly regulated, developmentally specific expression of the first open reading frame from LINE-1 during mouse embryogenesis. Proc. Natl Acad. Sci. USA 92, 1520–1524 (1995).

Jurka, J. & Kapitonov, V. V. Sectorial mutagenesis by transposable elements. Genetica 107, 239–248 (1999).

Fraser, M. J., Ciszczon, T., Elick, T. & Bauser, C. Precise excision of TTAA-specific lepidopteran transposons piggyBac (IFP2) and tagalong (TFP3) from the baculovirus genome in cell lines from two species of Lepidoptera. Insect Mol. Biol. 5, 141–151 (1996).

Brosius, J. Genomes were forged by massive bombardments with retroelements and retrosequences. Genetica 107, 209–238 (1999).

Kruglyak, S., Durrett, R. T., Schug, M. D. & Aquadro, C. F. Equilibrium distribution of microsatellite repeat length resulting from a balance between slippage events and point mutations. Proc. Natl Acad. Sci. USA 95, 10774–10778 (1998).

Toth, G., Gaspari, Z. & Jurka, J. Microsatellites in different eukaryotic genomes: survey and analysis. Genome Res. 10, 967–981 (2000).

Ellegren, H. Heterogeneous mutation processes in human microsatellite DNA sequences. Nature Genet. 24, 400–402 (2000).

Ji, Y., Eichler, E. E., Schwartz, S. & Nicholls, R. D. Structure of chromosomal duplicons and their role in mediating human genomic disorders. Genome Res. 10, 597–610 (2000).

Eichler, E. E. Masquerading repeats: paralogous pitfalls of the human genome. Genome Res. 8, 758–762 (1998).

Mazzarella, R. & D. Schlessinger, D. Pathological consequences of sequence duplications in the human genome. Genome Res. 8, 1007–1021 (1998).

Eichler, E. E. et al. Interchromosomal duplications of the adrenoleukodystrophy locus: a phenomenon of pericentromeric plasticity. Hum. Mol. Genet. 6, 991–1002 (1997).

Horvath, J. E., Schwartz, S. & Eichler, E. E. The mosaic structure of human pericentromeric DNA: a strategy for characterizing complex regions of the human genome. Genome Res. 10, 839–852 (2000).

Brand-Arpon, V. et al. A genomic region encompassing a cluster of olfactory receptor genes and a myosin light chain kinase (MYLK) gene is duplicated on human chromosome regions 3q13-q21 and 3p13. Genomics 56, 98–110 (1999).

Arnold, N., Wienberg, J., Ermert, K. & Zachau, H. G. Comparative mapping of DNA probes derived from the V kappa immunoglobulin gene regions on human and great ape chromosomes by fluorescence in situ hybridization. Genomics 26, 147–150 (1995).

Eichler, E. E. et al. Duplication of a gene-rich cluster between 16p11.1 and Xq28: a novel pericentromeric-directed mechanism for paralogous genome evolution. Hum. Mol. Genet. 5, 899–912 (1996).

Potier, M. et al. Two sequence-ready contigs spanning the two copies of a 200-kb duplication on human 21q: partial sequence and polymorphisms. Genomics 51, 417–426 (1998).

Regnier, V. et al. Emergence and scattering of multiple neurofibromatosis (NF1)-related sequences during hominoid evolution suggest a process of pericentromeric interchromosomal transposition. Hum. Mol. Genet. 6, 9–16 (1997).

Ritchie, R. J., Mattei, M. G. & Lalande, M. A large polymorphic repeat in the pericentromeric region of human chromosome 15q contains three partial gene duplications. Hum. Mol. Genet. 7, 1253–1260 (1998).

Trask, B. J. et al. Members of the olfactory receptor gene family are contained in large blocks of DNA duplicated polymorphically near the ends of human chromosomes. Hum. Mol. Genet. 7, 13–26 (1998).

Trask, B. J. et al. Large multi-chromosomal duplications encompass many members of the olfactory receptor gene family in the human genome. Hum. Mol. Genet. 7, 2007–2020 (1998).

van Deutekom, J. C. et al. Identification of the first gene (FRG1) from the FSHD region on human chromosome 4q35. Hum. Mol. Genet. 5, 581–590 (1996).

Zachau, H. G. The immunoglobulin kappa locus—or—what has been learned from looking closely at one-tenth of a percent of the human genome. Gene 135, 167–173 (1993).

Zimonjic, D. B., Kelley, M. J., Rubin, J. S., Aaronson, S. A. & Popescu, N. C. Fluorescence in situ hybridization analysis of keratinocyte growth factor gene amplification and dispersion in evolution of great apes and humans. Proc. Natl Acad. Sci. USA 94, 11461–11465 (1997).

van Geel, M. et al. The FSHD region on human chromosome 4q35 contains potential coding regions among pseudogenes and a high density of repeat elements. Genomics 61, 55–65 (1999).

Horvath, J. E. et al. Molecular structure and evolution of an alpha satellite/non-alpha satellite junction at 16p11. Hum. Mol. Genet. 9, 113–123 (2000).

Guy, J. et al. Genomic sequence and transcriptional profile of the boundary between pericentromeric satellites and genes on human chromosome arm 10q. Hum. Mol. Genet. 9, 2029–2042 (2000).

Reiter, L. T., Murakami, T., Koeuth, T., Gibbs, R. A. & Lupski, J. R. The human COX10 gene is disrupted during homologous recombination between the 24 kb proximal and distal CMT1A-REPs. Hum. Mol. Genet. 6, 1595–1603 (1997).

Amos-Landgraf, J. M. et al. Chromosome breakage in the Prader-Willi and Angelman syndromes involves recombination between large, transcribed repeats at proximal and distal breakpoints. Am. J. Hum. Genet. 65, 370–386 (1999).

Christian, S. L., Fantes, J. A., Mewborn, S. K., Huang, B. & Ledbetter, D. H. Large genomic duplicons map to sites of instability in the Prader-Willi/Angelman syndrome chromosome region (15q11-q13). Hum. Mol. Genet. 8, 1025–1037 (1999).

Edelmann, L., Pandita, R. K. & Morrow, B. E. Low-copy repeats mediate the common 3-Mb deletion in patients with velo-cardio-facial syndrome. Am. J. Hum. Genet. 64, 1076–1086 (1999).

Shaikh, T. H. et al. Chromosome 22-specific low copy repeats and the 22q11.2 deletion syndrome: genomic organization and deletion endpoint analysis. Hum. Mol. Genet. 9, 489–501 (2000).

Francke, U. Williams-Beuren syndrome: genes and mechanisms. Hum. Mol. Genet. 8, 1947–1954 (1999).

Peoples, R. et al. A physical map, including a BAC/PAC clone contig, of the Williams-Beuren syndrome-deletion region at 7q11.23. Am. J. Hum. Genet. 66, 47–68 (2000).

Eichler, E. E., Archidiacono, N. & Rocchi, M. CAGGG repeats and the pericentromeric duplication of the hominoid genome. Genome Res. 9, 1048–1058 (1999).

O'Keefe, C. & Eichler, E. in Comparative Genomics: Empirical and Analytical Approaches to Gene Order Dynamics, Map Alignment and the Evolution of Gene Families (eds Sankoff, D. & Nadeau, J.) 29–46 (Kluwer Academic, Dordrecht, 2000).

Lander, E. S. The new genomics: Global views of biology. Science 274, 536–539 (1996).

Eddy, S. R. Noncoding RNA genes. Curr. Op. Genet. Dev. 9, 695–699 (1999).

Ban, N., Nissen, P., Hansen, J., Moore, P. B. & Steitz, T. A. The complete atomic structure of the large ribosomal subunit at 2.4 angstrom resolution. Science 289, 905–920 (2000).

Nissen, P., Hansen, J., Ban, N., Moore, P. B. & Steitz, T. A. The structural basis of ribosome activity in peptide bond synthesis. Science 289, 920–930 (2000).

Weinstein, L. B. & Steitz, J. A. Guided tours: from precursor snoRNA to functional snoRNP. Curr. Opin. Cell Biol. 11, 378–384 (1999).

Bachellerie, J.-P. & Cavaille, J. in Modification and Editing of RNA (ed. Benne, H. G. a. R.) 255–272 (ASM, Washington DC, 1998).

Burge, C. & Sharp, P. A. Classification of introns: U2-type or U12-type. Cell 91, 875–879 (1997).

Brown, C. J. et al. The Human Xist gene—analysis of a 17 kb inactive X-specific RNA that contains conserved repeats and is highly localized within the nucleus. Cell 71, 527–542 (1992).

Kickhoefer, V. A., Vasu, S. K. & Rome, L. H. Vaults are the answer, what is the question? Trends Cell Biol. 6, 174–178 (1996).

Hatlen, L. & Attardi, G. Proportion of the HeLa cell genome complementary to the transfer RNA and 5S RNA. J. Mol. Biol. 56, 535–553 (1971).

Sprinzl, M., Horn, C., Brown, M., Ioudovitch, A. & Steinberg, S. Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res. 26, 148–153 (1998).

Long, E. O. & Dawid, I. B. Repeated genes in eukaryotes. Annu. Rev. Biochem. 49, 727–764 (1980).

Crick, F. H. Codon–anticodon pairing: the wobble hypothesis. J. Mol. Biol. 19, 548–555 (1966).

Guthrie, C. & Abelson, J. in The Molecular Biology of the Yeast Saccharomyces: Metabolism and Gene Expression (eds Strathern, J. & Broach J.) 487–528 (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1982).

Soll, D. & RajBhandary, U. (eds) tRNA: Structure, Biosynthesis, and Function (ASM, Washington DC, 1995).

Ikemura, T. Codon usage and tRNA content in unicellular and multicellular organisms. Mol. Biol. Evol. 2, 13–34 (1985).

Bulmer, M. Coevolution of codon usage and transfer-RNA abundance. Nature 325, 728–730 (1987).

Duret, L. tRNA gene number and codon usage in the C. elegans genome are co-adapted for optimal translation of highly expressed genes. Trends Genet. 16, 287–289 (2000).

Sharp, P. M. & Matassi, G. Codon usage and genome evolution. Curr. Opin. Genet. Dev. 4, 851–860 (1994).

Buckland, R. A. A primate transfer-RNA gene cluster and the evolution of human chromosome 1. Cytogenet. Cell Genet. 61, 1–4 (1992).

Gonos, E. S. & Goddard, J. P. Human tRNA-Glu genes: their copy number and organization. FEBS Lett. 276, 138–142 (1990).

Sylvester, J. E. et al. The human ribosomal RNA genes: structure and organization of the complete repeating unit. Hum. Genet. 73, 193–198 (1986).

Sorensen, P. D. & Frederiksen, S. Characterization of human 5S ribosomal RNA genes. Nucleic Acids Res. 19, 4147–4151 (1991).

Timofeeva, M. et al. [Organization of a 5S ribosomal RNA gene cluster in the human genome]. Mol. Biol. (Mosk.) 27, 861–868 (1993).

Little, R. D. & Braaten, D. C. Genomic organization of human 5S rDNA and sequence of one tandem repeat. Genomics 4, 376–383 (1989).

Maden, B. E. Htl>The numerous modified nucleotides in eukaryotic ribosomal RNA. Prog. Nucleic Acid Res. Mol. Biol. 39, 241–303 (1990).

Tycowski, K. T., You, Z. H., Graham, P. J. & Steitz, J. A. Modification of U6 spliceosomal RNA is guided by other small RNAs. Mol. Cell 2, 629–638 (1998).

Pavelitz, T., Liao, D. Q. & Weiner, A. M. Concerted evolution of the tandem array encoding primate U2 snRNA (the RNU2 locus) is accompanied by dramatic remodeling of the junctions with flanking chromosomal sequences. EMBO J. 18, 3783–3792 (1999).

Lindgren, V., Ares, A., Weiner, A. M. & Francke, U. Human genes for U2 small nuclear RNA map to a major adenovirus 12 modification site on chromosome 17. Nature 314, 115–116 (1985).

Van Arsdell, S. W. & Weiner, A. M. Human genes for U2 small nuclear RNA are tandemly repeated. Mol. Cell. Biol. 4, 492–499 (1984).

Gao, L. I., Frey, M. R. & Matera, A. G. Human genes encoding U3 snRNA associate with coiled bodies in interphase cells and are clustered on chromosome 17p11. 2 in a complex inverted repeat structure. Nucleic Acids Res. 25, 4740–4747 (1997).

Hawkins, J. D. A survey on intron and exon lengths. Nucleic Acids Res. 16, 9893–9908 (1988).

Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).

Labeit, S. & Kolmerer, B. Titins: giant proteins in charge of muscle ultrastructure and elasticity. Science 270, 293–296 (1995).

Sterner, D. A., Carlo, T. & Berget, S. M. Architectural limits on split genes. Proc. Natl Acad. Sci. USA 93, 15081–15085 (1996).

Sun, Q., Mayeda, A., Hampson, R. K., Krainer, A. R. & Rottman, F. M. General splicing factor SF2/ASF promotes alternative splicing by binding to an exonic splicing enhancer. Genes Dev. 7, 2598–2608 (1993).

Tanaka, K., Watakabe, A. & Shimura, Y. Polypurine sequences within a downstream exon function as a splicing enhancer. Mol. Cell. Biol. 14, 1347–1354 (1994).

Carlo, T., Sterner, D. A. & Berget, S. M. An intron splicing enhancer containing a G-rich repeat facilitates inclusion of a vertebrate micro-exon. RNA 2, 342–353 (1996).

Burset, M., Seledtsov, I. A. & Solovyev, V. V. Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Res. 28, 4364–4375 (2000).

Burge, C. B., Padgett, R. A. & Sharp, P. A. Evolutionary fates and origins of U12-type introns. Mol. Cell 2, 773–785 (1998).

Mironov, A. A., Fickett, J. W. & Gelfand, M. S. Frequent alternative splicing of human genes. Genome Res. 9, 1288–1293 (1999).

Hanke, J. et al. Alternative splicing of human genes: more the rule than the exception? Trends Genet. 15, 389–390 (1999).

Brett, D. et al. EST comparison indicates 38% of human mRNAs contain possible alternative splice forms. FEBS Lett. 474, 83–86 (2000).

Dunham, I. The gene guessing game. Yeast 17, 218–224 (2000).

Lewin, B. Gene Expression (Wiley, New York, 1980).

Lewin, B. Genes IV 466–481 (Oxford Univ. Press, Oxford, 1990).

Smaglik, P. Researchers take a gamble on the human genome. Nature 405, 264 (2000).

Fields, C., Adams, M. D., White, O. & Venter, J. C. How many genes in the human genome? Nature Genet. 7, 345–346 (1994).

Liang, F. et al. Gene index analysis of the human genome estimates approximately 120,000 genes. Nature Genet. 25, 239–240 (2000).

Roest Crollius, H. et al. Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence. Nature Genet. 25, 235–238 (2000).

The C. elegans Sequencing Consortium. Genome sequence of the nematode C. elegans: A platform for investigating biology. Science 282, 2012–2018 (1998).

Rubin, G. M. et al. Comparative genomics of the eukaryotes. Science 287, 2204–2215 (2000).

Green, P. et al. Ancient conserved regions in new gene sequences and the protein databases. Science 259, 1711–1716 (1993).

Fraser, A. G. et al. Functional genomic analysis of C. elegans chromosome I by systematic RNA interference. Nature 408, 325–330 (2000).

Mott, R. EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA. Comput. Appl. Biosci. 13, 477–478 (1997).

Florea, L., Hartzell, G., Zhang, Z., Rubin, G. M. & Miller, W. A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res. 8, 967–974 (1998).

Bailey, L. C. Jr, Searls, D. B. & Overton, G. C. Analysis of EST-driven gene annotation in human genomic sequence. Genome Res. 8, 362–376 (1998).

Birney, E., Thompson, J. D. & Gibson, T. J. PairWise and SearchWise: finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames. Nucleic Acids Res. 24, 2730–2739 (1996).

Gelfand, M. S., Mironov, A. A. & Pevzner, P. A. Gene recognition via spliced sequence alignment. Proc. Natl Acad. Sci. USA 93, 9061–9066 (1996).

Kulp, D., Haussler, D., Reese, M. G. & Eeckman, F. H. A generalized hidden Markov model for the recognition of human genes in DNA. ISMB 4, 134–142 (1996).

Reese, M. G., Kulp, D., Tammana, H. & Haussler, D. Genie—gene finding in Drosophila melanogaster. Genome Res. 10, 529–538 (2000).

Solovyev, V. & Salamov, A. The Gene-Finder computer tools for analysis of human and model organisms genome sequences. ISMB 5, 294–302 (1997).

Guigo, R., Agarwal, P., Abril, J. F., Burset, M. & Fickett, J. W. An assessment of gene prediction accuracy in large DNA sequences. Genome Res. 10, 1631–1642 (2000).

Hubbard, T. & Birney, E. Open annotation offers a democratic solution to genome sequencing. Nature 403, 825 (2000).

Bateman, A. et al. The Pfam protein families database. Nucleic Acids Res. 28, 263–266 (2000).

Birney, E. & Durbin, R. Using GeneWise in the Drosophila annotation experiment. Genome Res. 10, 547–548 (2000).

The RIKEN Genome Exploration Research Group Phase II Team and the FANTOM Consortium. Functional annotation of a full-length mouse cDNA collection. Nature 409, 685–690 (2001).

Basrai, M. A., Hieter, P. & Boeke, J. D. Small open reading frames: beautiful needles in the haystack. Genome Res. 7, 768–771 (1997).

Janin, J. & Chothia, C. Domains in proteins: definitions, location, and structural principles. Methods Enzymol. 115, 420–430 (1985).

Ponting, C. P., Schultz, J., Copley, R. R., Andrade, M. A. & Bork, P. Evolution of domain families. Adv. Protein Chem. 54, 185–244 (2000).

Doolittle, R. F. The multiplicity of domains in proteins. Annu. Rev. Biochem. 64, 287–314 (1995).

Bateman, A. & Birney, E. Searching databases to find protein domain organization. Adv. Protein Chem. 54, 137–157 (2000).

Futreal, P. A. et al. Cancer and genomics. Nature 409, 850–852 (2001).

Nestler, E. J. & Landsman, D. Learning about addiction from the human draft genome. Nature 409, 834–835 (2001).

Tupler, R., Perini, G. & Green, M. R. Expressing the human genome. Nature 409, 832–835 (2001).

Fahrer, A. M., Bazan, J. F., Papathanasiou, P., Nelms, K. A. & Goodnow, C. C. A genomic view of immunology. Nature 409, 836–838 (2001).

Li, W. -H., Gu, Z., Wang, H. & Nekrutenko, A. Evolutionary analyses of the human genome. Nature 409, 847–849 (2001).

Bock, J. B., Matern, H. T., Peden, A. A. & Scheller, R. H. A genomic perspective on membrane compartment organization. Nature 409, 839–841 (2001).

Pollard, T. D. Genomics, the cytoskeleton and motility. Nature 409, 842–843 (2001).

Murray, A. W. & Marks, D. Can sequencing shed light on cell cycling? Nature 409, 844–846 (2001).

Clayton, J. D., Kyriacou, C. P. & Reppert, S. M. Keeping time with the human genome. Nature 409, 829–831 (2001).

Chervitz, S. A. et al. Comparison of the complete protein sets of worm and yeast: orthology and divergence. Science 282, 2022–2028 (1998).

Aravind, L. & Subramanian, G. Origin of multicellular eukaryotes—insights from proteome comparisons. Curr. Opin. Genet. Dev. 9, 688–694 (1999).

Attwood, T. K. et al. PRINTS-S: the database formerly known as PRINTS. Nucleic Acids Res. 28, 225–227 (2000).

Hofmann, K., Bucher, P., Falquet, L. & Bairoch, A. The PROSITE database, its status in 1999. Nucleic Acids Res. 27, 215–219 (1999).

Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

Wolf, Y. I., Kondrashov, F. A. & Koonin, E. V. No footprints of primordial introns in a eukaryotic genome. Trends Genet. 16, 333–334 (2000).

Brunner, H. G., Nelen, M., Breakefield, X. O., Ropers, H. H. & van Oost, B. B. A. Abnormal behavior associated with a point mutation in the structural gene for monoamine oxidase A. Science 262, 578–580 (1993).

Cases, O. et al. Aggressive behavior and altered amounts of brain serotonin and norepinephrine in mice lacking MAOA. Science 268, 1763–1766 (1995).

Brunner, H. G. et al. X-linked borderline mental retardation with prominent behavioral disturbance: phenotype, genetic localization, and evidence for disturbed monoamine metabolism. Am. J. Hum. Genet. 52, 1032–1039 (1993).

Deckert, J. et al. Excess of high activity monoamine oxidase A gene promoter alleles in female patients with panic disorder. Hum. Mol. Genet. 8, 621–624 (1999).

Smith, T. F. & Waterman, M. S. Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981).

Tatusov, R. L., Koonin, E. V. & Lipman, D. J. A genomic perspective on protein families. Science 278, 631–637 (1997).

Ponting, C. P., Aravind, L., Schultz, J., Bork, P. & Koonin, E. V. Eukaryotic signalling domain homologues in archaea and bacteria. Ancient ancestry and horizontal gene transfer. J. Mol. Biol. 289, 729–745 (1999).

Zhang, J., Dyer, K. D. & Rosenberg, H. F. Evolution of the rodent eosinophil-associated Rnase gene family by rapid gene sorting and positive selection. Proc. Natl Acad. Sci. USA 97, 4701–4706 (2000).

Shashoua, V. E. Ependymin, a brain extracellular glycoprotein, and CNS plasticity. Ann. NY Acad. Sci. 627, 94–114 (1991).

Schultz, J., Copley, R. R., Doerks, T., Ponting, C. P. & Bork, P. SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res. 28, 231–234 (2000).

Koonin, E. V., Aravind, L. & Kondrashov, A. S. The impact of comparative genomics on our understanding of evolution. Cell 101, 573–576 (2000).

Bateman, A., Eddy, S. R. & Chothia, C. Members of the immunoglobulin superfamily in bacteria. Protein Sci. 5, 1939–1941 (1996).

Sutherland, D., Samakovlis, C. & Krasnow, M. A. Branchless encodes a Drosophila FGF homolog that controls tracheal cell migration and the pattern of branching. Cell 87, 1091–1101 (1996).

Warburton, D. et al. The molecular basis of lung morphogenesis. Mech. Dev. 92, 55–81 (2000).

Fuchs, T., Glusman, G., Horn-Saban, S., Lancet, D. & Pilpel, Y. The human olfactory subgenome: from sequence to structure to evolution. Hum. Genet. 108, 1–13 (2001).

Glusman, G. et al. The olfactory receptor gene family: data mining, classification and nomenclature. Mamm. Genome 11, 1016–1023 (2000).

Rouquier, S. et al. Distribution of olfactory receptor genes in the human genome. Nature Genet. 18, 243–250 (1998).

Sharon, D. et al. Primate evolution of an olfactory receptor cluster: Diversification by gene conversion and recent emergence of a pseudogene. Genomics 61, 24–36 (1999).

Gilad, Y. et al. Dichotomy of single-nucleotide polymorphism haplotypes in olfactory receptor genes and pseudogenes. Nature Genet. 26, 221–224 (2000).

Gearhart, J. & Kirschner, M. Cells, Embryos, and Evolution (Blackwell Science, Malden, Massachusetts, 1997).

Barbazuk, W. B. et al. The syntenic relationship of the zebrafish and human genomes. Genome Res. 10, 1351–1358 (2000).

McLysaght, A., Enright, A. J., Skrabanek, L. & Wolfe, K. H. Estimation of synteny conservation and genome compaction between pufferfish (Fugu) and human. Yeast 17, 22–36 (2000).

Trachtulec, Z. et al. Linkage of TATA-binding protein and proteasome subunit C5 genes in mice and humans reveals synteny conserved between mammals and invertebrates. Genomics 44, 1–7 (1997).

Nadeau, J. H. Maps of linkage and synteny homologies between mouse and man. Trends Genet. 5, 82–86 (1989).

Nadeau, J. H. & Taylor, B. A. Lengths of chromosomal segments conserved since divergence of man and mouse. Proc. Natl Acad. Sci. USA 81, 814–818 (1984).

Copeland, N. G. et al. A genetic linkage map of the mouse: current applications and future prospects. Science 262, 57–66 (1993).

DeBry, R. W. & Seldin, M. F. Human/mouse homology relationships. Genomics 33, 337–351 (1996).

Nadeau, J. H. & Sankoff, D. The lengths of undiscovered conserved segments in comparative maps. Mamm. Genome 9, 491–495 (1998).

Thomas, J. W. et al. Comparative genome mapping in the sequence-based era: early experience with human chromosome 7. Genome Res. 10, 624–633 (2000).

Pletcher, M. T. et al. Chromosome evolution: The junction of mammalian chromosomes in the formation of mouse chromosome 10. Genome Res. 10, 1463–1467 (2000).

Novacek, M. J. Mammalian phylogeny: shaking the tree. Nature 356, 121–125 (1992).

O'Brien, S. J. et al. Genome maps 10. Comparative genomics. Mammalian radiations. Wall chart. Science 286, 463–478 (1999).

Romer, A. S. Vertebrate Paleontology (Univ. Chicago Press, Chicago and New York, 1966).

Paterson, A. H. et al. Toward a unified genetic map of higher plants, transcending the monocot-dicot divergence. Nature Genet. 14, 380–382 (1996).

Jenczewski, E., Prosperi, J. M. & Ronfort, J. Differentiation between natural and cultivated populations of Medicago sativa (Leguminosae) from Spain: analysis with random amplified polymorphic DNA (RAPD) markers and comparison to allozymes. Mol. Ecol. 8, 1317–1330 (1999).

Ohno, S. Evolution by Gene Duplication (George Allen and Unwin, London, 1970).

Wolfe, K. H. & Shields, D. C. Molecular evidence for an ancient duplication of the entire yeast genome. Nature 387, 708–713 (1997).

Blanc, G., Barakat, A., Guyot, R., Cooke, R. & Delseny, M. Extensive duplication and reshuffling in the arabidopsis genome. Plant Cell 12, 1093–1102 (2000).

Paterson, A. H. et al. Comparative genomics of plant chromosomes. Plant Cell 12, 1523–1540 (2000).

Vision, T., Brown, D. & Tanksley, S. The origins of genome duplications in Arabidopsis. Science 290, 2114–2117 (2000).

Sidow, A. & Bowman, B. H. Molecular phylogeny. Curr. Opin. Genet. Dev. 1, 451–456 (1991).

Sidow, A. & Thomas, W. K. A molecular evolutionary framework for eukaryotic model organisms. Curr. Biol. 4, 596–603 (1994).

Sidow, A. Gen(om)e duplications in the evolution of early vertebrates. Curr. Opin. Genet. Dev. 6, 715–722 (1996).

Spring, J. Vertebrate evolution by interspecific hybridisation—are we polyploid? FEBS Lett. 400, 2–8 (1997).

Skrabanek, L. & Wolfe, K. H. Eukaryote genome duplication—where's the evidence? Curr. Opin. Genet. Dev. 8, 694–700 (1998).

Hughes, A. L. Phylogenies of developmentally important proteins do not support the hypothesis of two rounds of genome duplication early in vertebrate history. J. Mol. Evol. 48, 565–576 (1999).

Lander, E. S. & Schork, N. J. Genetic dissection of complex traits. Science 265, 2037–2048 (1994).

Horikawa, Y. et al. Genetic variability in the gene encoding calpain-10 is associated with type 2 diabetes mellitus. Nature Genet. 26, 163–175 (2000).

Hastbacka, J. et al. The diastrophic dysplasia gene encodes a novel sulfate transporter: positional cloning by fine-structure linkage disequilibrium mapping. Cell 78, 1073–1087 (1994).

Tischkoff, S. A. et al. Global patterns of linkage disequilibrium at the CD4 locus and modern human origins. Science 271, 1380–1387 (1996).

Kidd, J. R. et al. Haplotypes and linkage disequilibrium at the phenylalanine hydroxylase locus PAH, in a global representation of populations. Am. J. Hum. Genet. 63, 1882–1899 (2000).

Mateu, E. et al. Worldwide genetic analysis of the CFTR region. Am. J. Hum. Genet. 68, 103–117 (2001).

Abecasis, G. R. et al. Extent and distribution of linkage disequilibrium in three genomic regions. Am. J. Hum. Genet. 68, 191–197 (2001).

Taillon-Miller, P. et al. Juxtaposed regions of extensive and minimal linkage disequilibrium in Xq25 and Xq28. Nature Genet. 25, 324–328 (2000).

Martin, E. R. et al. SNPing away at complex diseases: analysis of single-nucleotide polymorphisms around APOE in Alzheimer disease. Am. J. Hum. Genet. 67, 383–394 (2000).

Collins, A., Lonjou, C. & Morton, N. E. Genetic epidemiology of single-nucleotide polymorphisms. Proc. Natl Acad. Sci. USA 96, 15173–15177 (1999).

Dunning, A. M. et al. The extent of linkage disequilibrium in four populations with distinct demographic histories. Am. J. Hum. Genet. 67, 1544–1554 (2000).

Rieder, M. J., Taylor, S. L., Clark, A. G. & Nickerson, D. A. Sequence variation in the human angiotensin converting enzyme. Nature Genet. 22, 59–62 (1999).

Collins, F. S. Positional cloning moves from perditional to traditional. Nature Genet. 9, 347–350 (1995).

Nagamine, K. et al. Positional cloning of the APECED gene. Nature Genet. 17, 393–398 (1997).

Reuber, B. E. et al. Mutations in PEX1 are the most common cause of peroxisome biogenesis disorders. Nature Genet. 17, 445–448 (1997).

Portsteffen, H. et al. Human PEX1 is mutated in complementation group 1 of the peroxisome biogenesis disorders. Nature Genet. 17, 449–452 (1997).

Everett, L. A. et al. Pendred syndrome is caused by mutations in a putative sulphate transporter gene (PDS). Nature Genet. 17, 411–422 (1997).

Coffey, A. J. et al. Host response to EBV infection in X-linked lymphoproliferative disease results from mutations in an SH2-domain encoding gene. Nature Genet. 20, 129–135 (1998).

Van Laer, L. et al. Nonsyndromic hearing impairment is associated with a mutation in DFNA5. Nature Genet. 20, 194–197 (1998).

Sakuntabhai, A. et al. Mutations in ATP2A2, encoding a Ca2+ pump, cause Darier disease. Nature Genet. 21, 271–277 (1999).

Gedeon, A. K. et al. Identification of the gene (SEDL) causing X-linked spondyloepiphyseal dysplasia tarda. Nature Genet. 22, 400–404 (1999).

Hurvitz, J. R. et al. Mutations in the CCN gene family member WISP3 cause progressive pseudorheumatoid dysplasia. Nature Genet. 23, 94–98 (1999).

Laberge-le Couteulx, S. et al. Truncating mutations in CCM1, encoding KRIT1, cause hereditary cavernous angiomas. Nature Genet. 23, 189–193 (1999).

Sahoo, T. et al. Mutations in the gene encoding KRIT1, a Krev-1/rap1a binding protein, cause cerebral cavernous malformations (CCM1). Hum. Mol. Genet. 8, 2325–2333 (1999).

McGuirt, W. T. et al. Mutations in COL11A2 cause non-syndromic hearing loss (DFNA13). Nature Genet. 23, 413–419 (1999).

Moreira, E. S. et al. Limb-girdle muscular dystrophy type 2G is caused by mutations in the gene encoding the sarcomeric protein telethonin. Nature Genet. 24, 163–166 (2000).

Ruiz-Perez, V. L. et al. Mutations in a new gene in Ellis-van Creveld syndrome and Weyers acrodental dysostosis. Nature Genet. 24, 283–286 (2000).

Kaplan, J. M. et al. Mutations in ACTN4, encoding alpha-actinin-4, cause familial focal segmental glomerulosclerosis. Nature Genet. 24, 251–256 (2000).

Escayg, A. et al. Mutations of SCN1A, encoding a neuronal sodium channel, in two families with GEFS+2. Nature Genet. 24, 343–345 (2000).

Sacksteder, K. A. et al. Identification of the alpha-aminoadipic semialdehyde synthase gene, which is defective in familial hyperlysinemia. Am. J. Hum. Genet. 66, 1736–1743 (2000).

Kalaydjieva, L. et al. N-myc downstream-regulated gene 1 is mutated in hereditary motor and sensory neuropathy-Lom. Am. J. Hum. Genet. 67, 47–58 (2000).

Sundin, O. H. et al. Genetic basis of total colourblindness among the Pingelapese islanders. Nature Genet. 25, 289–293 (2000).

Kohl, S. et al. Mutations in the CNGB3 gene encoding the beta-subunit of the cone photoreceptor cGMP-gated channel are responsible for achromatopsia (ACHM3) linked to chromosome 8q21. Hum. Mol. Genet. 9, 2107–2116 (2000).

Avela, K. et al. Gene encoding a new RING-B-box-coiled-coil protein is mutated in mulibrey nanism. Nature Genet. 25, 298–301 (2000).

Verpy, E. et al. A defect in harmonin, a PDZ domain-containing protein expressed in the inner ear sensory hair cells, underlies usher syndrome type 1C. Nature Genet. 26, 51–55 (2000).

Bitner-Glindzicz, M. et al. A recessive contiguous gene deletion causing infantile hyperinsulinism, enteropathy and deafness identifies the usher type 1C gene. Nature Genet. 26, 56–60 (2000).

The May-Hegglin/Fetchner Syndrome Consortium. Mutations in MYH9 result in the May-Hegglin anomaly, and Fechtner and Sebastian syndromes. Nature Genet. 26, 103–105 (2000).

Kelley, M. J., Jawien, W., Ortel, T. L. & Korczak, J. F. Mutation of MYH9, encoding non-muscle myosin heavy chain A, in May-Hegglin anomaly. Nature Genet. 26, 106–108 (2000).

Kirschner, L. S. et al. Mutations of the gene encoding the protein kinase A type I-α regulatory subunit in patients with the Carney complex. Nature Genet. 26, 89–92 (2000).

Lalwani, A. K. et al. Human nonsyndromic hereditary deafness DFNA17 is due to a mutation in non-muscle myosin MYH9. Am. J. Hum. Genet. 67, 1121–1128 (2000).

Matsuura, T. et al. Large expansion of the ATTCT pentanucleotide repeat in spinocerebellar ataxia type 10. Nature Genet. 26, 191–194 (2000).

Delettre, C. et al. Nuclear gene OPA1, encoding a mitochondrial dynamin-related protein, is mutated in dominant optic atrophy. Nature Genet. 26, 207–210 (2000).

Pusch, C. M. et al. The complete form of X-linked congenital stationary night blindness is caused by mutations in a gene encoding a leucine-rich repeat protein. Nature Genet. 26, 324–327 (2000).

The ADHR Consortium. Autosomal dominant hypophosphataemic rickets is associated with mutations in FGF23. Nature Genet. 26, 345–348 (2000).

Bomont, P. et al. The gene encoding gigaxonin, a new member of the cytoskeletal BTB/kelch repeat family, is mutated in giant axonal neuropathy. Nature Genet. 26, 370–374 (2000).

Tullio-Pelet, A. et al. Mutant WD-repeat protein in triple-A syndrome. Nature Genet. 26, 332–335 (2000).

Nicole, S. et al. Perlecan, the major proteoglycan of basement membranes, is altered in patients with Schwartz-Jampel syndrome (chondrodystrophic myotonia). Nature Genet. 26, 480–483 (2000).

Rogaev, E. I. et al. Familial Alzheimer's disease in kindreds with missense mutations in a gene on chromosome 1 related to the Alzheimer's disease type 3 gene. Nature 376, 775–778 (1995).

Sherrington, R. et al. Cloning of a gene bearing missense mutations in early-onset familial Alzheimer's disease. Nature 375, 754–760 (1995).

Olivieri, N. F. & Weatherall, D. J. The therapeutic reactivation of fetal haemoglobin. Hum. Mol. Genet. 7, 1655–1658 (1998).

Drews, J. Research & development. Basic science and pharmaceutical innovation. Nature Biotechnol. 17, 406 (1999).

Drews, J. Drug discovery: a historical perspective. Science 287, 1960–1964 (2000).

Davies, P. A. et al. The 5-HT3B subunit is a major determinant of serotonin-receptor function. Nature 397, 359–363 (1999).

Heise, C. E. et al. Characterization of the human cysteinyl leukotriene 2 receptor. J. Biol. Chem. 275, 30531–30536 (2000).

Fan, W. et al. BACE maps to chromosome 11 and a BACE homolog, BACE2, reside in the obligate Down Syndrome region of chromosome 21. Science 286, 1255a (1999).

Saunders, A. J., Kim, T. -W. & Tanzi, R. E. BACE maps to chromosome 11 and a BACE homolog, BACE2, reside in the obligate Down Syndrome region of chromosome 21. Science 286, 1255a (1999).

Firestein, S. The good taste of genomics. Nature 404, 552–553 (2000).

Matsunami, H., Montmayeur, J. P. & Buck, L. B. A family of candidate taste receptors in human and mouse. Nature 404, 601–604 (2000).

Adler, E. et al. A novel family of mammalian taste receptors. Cell 100, 693–702 (2000).

Chandrashekar, J. et al. T2Rs function as bitter taste receptors. Cell 100, 703–711 (2000).

Hardison, R. C. Conserved non-coding sequences are reliable guides to regulatory elements. Trends Genet. 16, 369–372 (2000).

Onyango, P. et al. Sequence and comparative analysis of the mouse 1-megabase region orthologous to the human 11p15 imprinted domain. Genome Res. 10, 1697–1710 (2000).

Bouck, J. B., Metzker, M. L. & Gibbs, R. A. Shotgun sample sequence comparisons between mouse and human genomes. Nature Genet. 25, 31–33 (2000).

Marshall, E. Public-private project to deliver mouse genome in 6 months. Science 290, 242–243 (2000).

Wasserman, W. W., Palumbo, M., Thompson, W., Fickett, J. W. & Lawrence, C. E. Human-mouse genome comparisons to locate regulatory sites. Nature Genet. 26, 225–228 (2000).

Tagle, D. A. et al. Embryonic epsilon and gamma globin genes of a prosimian primate (Galago crassicaudatus). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. J. Mol. Biol. 203, 439–455 (1988).

McGuire, A. M., Hughes, J. D. & Church, G. M. Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes. Genome Res. 10, 744–757 (2000).

Roth, F. P., Hughes, J. D., Estep, P. W. & Church, G. M. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nature Biotechnol. 16, 939–945 (1998).

Cheng, Y. & Church, G. M. Biclustering of expression data. ISMB 8, 93–103 (2000).

Cohen, B. A., Mitra, R. D., Hughes, J. D. & Church, G. M. A computational analysis of whole-genome expression data reveals chromosomal domains of gene expression. Nature Genet. 26, 183–186 (2000).

Feil, R. & Khosla, S. Genomic imprinting in mammals: an interplay between chromatin and DNA methylation? Trends Genet. 15, 431–434 (1999).

Robertson, K. D. & Wolffe, A. P. DNA methylation in health and disease. Nature Rev. Genet. 1, 11–19 (2000).

Beck, S., Olek, A. & Walter, J. From genomics to epigenomics: a loftier view of life. Nature Biotechnol. 17, 1144–1144 (1999).

Hagmann, M. Mapping a subtext in our genetic book. Science 288, 945–946 (2000).

Eliot, T. S. in T. S. Eliot. Collected Poems 1909–1962 (Harcourt Brace, New York, 1963).

Soderland, C., Longden, I. & Mott, R. FPC: a system for building contigs from restriction fingerprinted clones. Comput. Appl. Biosci. 13, 523–535 (1997).

Mott, R. & Tribe, R. Approximate statistics of gapped alignments. J. Comp. Biol. 6, 91–112 (1999).