Genome Research
Công bố khoa học tiêu biểu
* Dữ liệu chỉ mang tính chất tham khảo
We consider statistics for analyzing a variety of family-based and nonfamily-based designs for detecting linkage disequilibrium of a marker with a disease susceptibility locus. These designs include sibships with parents, sibships without parents, and use of unrelated controls. We also provide formulas for and evaluate the relative power of different study designs using these statistics. In this first paper in the series, we derive statistical tests based on data derived from DNA pooling experiments and describe their characteristics. Although designs based on affected and unaffected sibs without parents are usually robust to population stratification, they suffer a loss of power compared with designs using parents or unrelateds as controls. Although increasing the number of unaffected sibs improves power, the increase is generally not substantial. Designs including sibships with multiple affected sibs are typically the most powerful, with any of these control groups, when the disease allele frequency is low. When the allele frequency is high, however, designs with unaffected sibs as controls do not retain this advantage. In designs with parents, having an affected parent has little impact on the power, except for rare dominant alleles, where the power is increased compared with families with no affected parents. Finally, we also demonstrate that for sibships with parents, only the parents require individual genotyping to derive the TDT statistic, whereas all the offspring can be pooled. This can potentially lead to considerable savings in genotyping, especially for multiplex sibships. The formulas and tables we derive should provide some guidance to investigators designing nuclear family-based linkage disequilibrium studies for complex diseases.
To investigate geographic structure within U.S. ethnic populations, we analyzed 1705 haplotypes on the basis of 9 short tandem repeat (STR) loci on the Y-chromosome from 9–11 groups each of African-Americans, European-Americans, and Hispanics. There were no significant differences in the distribution of Y-STR haplotypes among African-American groups, whereas European-American and Hispanic groups did exhibit significant geographic heterogeneity. However, the significant heterogeneity resulted from one sample; removal of that sample in each case eliminated the significant heterogeneity. Multidimensional scaling analysis of RST values indicated that African-American groups formed a distinct cluster, whereas there was some intermingling of European-American and Hispanic groups. MtDNA data exist for many of these same groups; estimates of the European-American genetic contribution to the African-American gene pool were 27.5%–33.6% for the Y-STR haplotypes and 9%–15.4% for the mtDNA types. The lack of significant geographic heterogeneity among Y-STR and mtDNA haplotypes in U.S ethnic groups means that forensic DNA databases do not need to be constructed for separate geographic regions of the U.S. Moreover, absence of significant geographic heterogeneity for these two loci means that regional variation in disease susceptibility within ethnic groups is more likely to reflect cultural/environmental factors, rather than any underlying genetic heterogeneity.
Despite our growing knowledge that many mammalian genes generate multiple transcript variants that may encode functionally distinct protein isoforms, the transcriptomes of various tissues and their developmental stages are poorly defined. Identifying the transcriptome and its regulation in a cell/tissue is the key to deciphering the cell/tissue-specific functions of a gene. We built a genome-wide inventory of noncoding and protein-coding transcripts (transcriptomes), their promoters (promoteromes) and histone modification states (epigenomes) for developing, and adult cerebella using integrative massive-parallel sequencing and bioinformatics approach. The data consists of 61,525 (12,796 novel) distinct mRNAs transcribed by 29,589 (4792 novel) promoters corresponding to 15,669 protein-coding and 7624 noncoding genes. Importantly, our results show that the transcript variants from a gene are predominantly generated using alternative transcriptional rather than splicing mechanisms, highlighting alternative promoters and transcriptional terminations as major sources of transcriptome diversity. Moreover, H3K4me3, and not H3K27me3, defined the use of alternative promoters, and we identified a combinatorial role of H3K4me3 and H3K27me3 in regulating the expression of transcripts, including transcript variants of a gene during development. We observed a strong bias of both H3K4me3 and H3K27me3 for CpG-rich promoters and an exponential relationship between their enrichment and corresponding transcript expression. Furthermore, the majority of genes associated with neurological diseases expressed multiple transcripts through alternative promoters, and we demonstrated aberrant use of alternative promoters in medulloblastoma, cancer arising in the cerebellum. The transcriptomes of developing and adult cerebella presented in this study emphasize the importance of analyzing gene regulation and function at the isoform level.
The epigenome changes that underlie cellular differentiation in developing organisms are poorly understood. To gain insights into how pancreatic beta-cells are programmed, we profiled key histone methylations and transcripts in embryonic stem cells, multipotent progenitors of the nascent embryonic pancreas, purified beta-cells, and 10 differentiated tissues. We report that despite their endodermal origin, beta-cells show a transcriptional and active chromatin signature that is most similar to ectoderm-derived neural tissues. In contrast, the beta-cell signature of trimethylated H3K27, a mark of Polycomb-mediated repression, clusters with pancreatic progenitors, acinar cells and liver, consistent with the epigenetic transmission of this mark from endoderm progenitors to their differentiated cellular progeny. We also identified two H3K27 methylation events that arise in the beta-cell lineage after the pancreatic progenitor stage. One is a wave of cell-selective de novo H3K27 trimethylation in non-CpG island genes. Another is the loss of bivalent and H3K27me3-repressed chromatin in a core program of neural developmental regulators that enables a convergence of the gene activity state of beta-cells with that of neural cells. These findings reveal a dynamic regulation of Polycomb repression programs that shape the identity of differentiated beta-cells.
Human and mouse genomes contain a similar number of CpG islands (CGIs), which are discrete CpG-rich DNA sequences associated with transcription start sites. In both species, ∼50% of all CGIs are remote from annotated promoters but, nevertheless, often have promoter-like features. To determine the role of CGI methylation in cell differentiation, we analyzed DNA methylation at a comprehensive CGI set in cells of the mouse hematopoietic lineage. Using a method that potentially detects ∼33% of genomic CpGs in the methylated state, we found that large differences in gene expression were accompanied by surprisingly few DNA methylation changes. There were, however, many DNA methylation differences between hematopoietic cells and a distantly related tissue, brain. Altered DNA methylation in the immune system occurred predominantly at CGIs within gene bodies, which have the properties of cell type–restricted promoters, but infrequently at annotated gene promoters or CGI flanking sequences (CGI “shores”). Unexpectedly, elevated intragenic CGI methylation correlated with silencing of the associated gene. Differentially methylated intragenic CGIs tended to lack H3K4me3 and associate with a transcriptionally repressive environment regardless of methylation state. Our results indicate that DNA methylation changes play a relatively minor role in the late stages of differentiation and suggest that intragenic CGIs represent regulatory sites of differential gene expression during the early stages of lineage specification.
Transcription factors (TFs) bind specifically to discrete regions of mammalian genomes called
Cross-talk between DNA methylation and histone modifications drives the establishment of composite epigenetic signatures and is traditionally studied using correlative rather than direct approaches. Here, we present sequential ChIP-bisulfite-sequencing (ChIP-BS-seq) as an approach to quantitatively assess DNA methylation patterns associated with chromatin modifications or chromatin-associated factors directly. A chromatin-immunoprecipitation (ChIP)-capturing step is used to obtain a restricted representation of the genome occupied by the epigenetic feature of interest, for which a single-base resolution DNA methylation map is then generated. When applied to H3 lysine 27 trimethylation (H3K27me3), we found that H3K27me3 and DNA methylation are compatible throughout most of the genome, except for CpG islands, where these two marks are mutually exclusive. Further ChIP-BS-seq-based analysis in
An essential component of functional genomics studies is the sequence of DNA expressed in tissues of interest. To provide a resource of bovine-specific expressed sequence data and facilitate this powerful approach in cattle research, four normalized cDNA libraries were produced and arrayed for high-throughput sequencing. The libraries were made with RNA pooled from multiple tissues to increase efficiency of normalization and maximize the number of independent genes for which sequence data were obtained. Target tissues included those with highest likelihood to have impact on production parameters of animal health, growth, reproductive efficiency, and carcass merit. Success of normalization and inter- and intralibrary redundancy were assessed by collecting 6000–23,000 sequences from each of the libraries (68,520 total sequences deposited in GenBank). Sequence comparison and assembly of these sequences was performed in combination with 56,500 other bovine EST sequences present in the GenBank dbEST database to construct a cattle Gene Index (available from The Institute for Genomic Research at
An estimated 15% or more of the cancer burden worldwide is attributable to known infectious agents. We screened colorectal carcinoma and matched normal tissue specimens using RNA-seq followed by host sequence subtraction and found marked over-representation of
- 1
- 2
- 3
- 4
- 5
- 6
- 10