An interpretable bimodal neural network characterizes the sequence and preexisting chromatin predictors of induced transcription factor binding

Genome Biology - Tập 22 - Trang 1-25 - 2021
Divyanshi Srivastava1, Begüm Aydin2, Esteban O. Mazzoni2, Shaun Mahony1
1Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, Pennsylvania State University, University Park, USA
2Department of Biology, New York University, New York, USA

Tóm tắt

Transcription factor (TF) binding specificity is determined via a complex interplay between the transcription factor’s DNA binding preference and cell type-specific chromatin environments. The chromatin features that correlate with transcription factor binding in a given cell type have been well characterized. For instance, the binding sites for a majority of transcription factors display concurrent chromatin accessibility. However, concurrent chromatin features reflect the binding activities of the transcription factor itself and thus provide limited insight into how genome-wide TF-DNA binding patterns became established in the first place. To understand the determinants of transcription factor binding specificity, we therefore need to examine how newly activated transcription factors interact with sequence and preexisting chromatin landscapes. Here, we investigate the sequence and preexisting chromatin predictors of TF-DNA binding by examining the genome-wide occupancy of transcription factors that have been induced in well-characterized chromatin environments. We develop Bichrom, a bimodal neural network that jointly models sequence and preexisting chromatin data to interpret the genome-wide binding patterns of induced transcription factors. We find that the preexisting chromatin landscape is a differential global predictor of TF-DNA binding; incorporating preexisting chromatin features improves our ability to explain the binding specificity of some transcription factors substantially, but not others. Furthermore, by analyzing site-level predictors, we show that transcription factor binding in previously inaccessible chromatin tends to correspond to the presence of more favorable cognate DNA sequences. Bichrom thus provides a framework for modeling, interpreting, and visualizing the joint sequence and chromatin landscapes that determine TF-DNA binding dynamics.

Tài liệu tham khảo

Bulyk ML. Computational prediction of transcription-factor binding site locations. Genome Biol. 2003;5:201. Gordân R, et al. Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape. Cell Rep. 2013;3:1093–104. Rohs R, et al. The role of DNA shape in protein-DNA recognition. Nature. 2009;461:1248–53. Wasserman WW, Sandelin A. Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 2004;5:276–87. Slattery M, et al. Absence of a simple code: how transcription factors read the genome. Trends Biochem Sci. 2014;39:381–99. Arvey A, Agius P, Noble WS, Leslie C. Sequence and chromatin determinants of cell-type-specific transcription factor binding. Genome Res. 2012;22:1723–34. Wang J, et al. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res. 2012;22:1798–812. Spitz F, Furlong EEM. Transcription factors: from enhancer binding to developmental control. Nat. Rev. Genet. 2012;13:613–26. Guertin MJ, Lis JT. Chromatin landscape dictates HSF binding to target DNA elements. PLoS Genet. 2010;6:e1001114. Jiang C, Pugh BF. Nucleosome positioning and gene regulation: advances through genomics. Nat Rev Genet. 2009;10:161–72. John S, et al. Chromatin accessibility pre-determines glucocorticoid receptor binding patterns. Nat Genet. 2011;43:264–8. Gertz J, et al. Distinct properties of cell-type-specific and shared transcription factor binding sites. Mol Cell. 2013;52:25–36. Li X-Y, et al. The role of chromatin accessibility in directing the widespread, overlapping patterns of Drosophila transcription factor binding. Genome Biol. 2011;12:R34. Zhu F, et al. The interaction landscape between transcription factors and the nucleosome. Nature. 2018;562:76–81. Iwafuchi-Doi M, Zaret KS. Cell fate control by pioneer transcription factors. Development. 2016;143:1833–7. Hu G, et al. H2A.Z facilitates access of active and repressive complexes to chromatin in embryonic stem cell self-renewal and differentiation. Cell Stem Cell. 2013;12:180–92. Wu W, et al. Dynamic shifts in occupancy by TAL1 are guided by GATA factors and drive large-scale reprogramming of gene expression during hematopoiesis. Genome Res. 2014;24:1945–62. Slattery M, et al. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell. 2011;147:1270–82. Yamada N, Lai WKM, Farrell N, Pugh BF, Mahony S. Characterizing protein–DNA binding event subtypes in ChIP-exo data. Bioinformatics. 2019;35:903–13. Lupien M, et al. FoxA1 translates epigenetic signatures into enhancer-driven lineage-specific transcription. Cell. 2008;132:958–70. Cernilogar FM, et al. Pre-marked chromatin and transcription factor co-binding shape the pioneering activity of Foxa2. Nucleic Acids Res. 2019;47:9069–86. Swinstead EE, et al. Steroid receptors reprogram FoxA1 occupancy through dynamic chromatin transitions. Cell. 2016;165:593–605. Donaghey J, et al. Genetic determinants and epigenetic effects of pioneer-factor occupancy. Nat Genet. 2018;50:250–8. Quang D, Xie X. FactorNet: a deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data. Methods. 2019;166:40–7. Keilwagen J, Posch S, Grau J. Accurate prediction of cell type-specific transcription factor binding. Genome Biol. 2019;20:9. Li H, Quang D, Guan Y. Anchor: trans-cell type prediction of transcription factor binding sites. Genome Res. 2019;29:281–92. Xin B, Rohs R. Relationship between histone modifications and transcription factor binding is protein family specific. Genome Res. 2018;28:321–33. Karimzadeh, M. & Hoffman, M. M. Virtual ChIP-seq: Predicting transcription factor binding by learning from the transcriptome. bioRxiv 168419 (2018). https://doi.org/10.1101/168419. Tsai ZT-Y, Shiu S-H, Tsai H-K. Contribution of sequence motif, chromatin state, and DNA structure features to predictive models of transcription factor binding in yeast. PLoS Comput Biol. 2015;11:e1004418. Kumar S, Bucher P. Predicting transcription factor site occupancy using DNA sequence intrinsic and cell-type specific chromatin features. BMC Bioinformatics. 2016;17:S4. Li B, Carey M, Workman JL. The role of chromatin during transcription. Cell. 2007;128:707–19. Adams CC, Workman JL. Binding of disparate transcriptional activators to nucleosomal DNA is inherently cooperative. Mol Cell Biol. 1995;15:1405–21. Mahony S, et al. Ligand-dependent dynamics of retinoic acid receptor binding during early neurogenesis. Genome Biol. 2011;12:R2. Robertson AG, et al. Genome-wide relationship between histone H3 lysine 4 mono- and tri-methylation and transcription factor binding. Genome Res. 2008;18:1906–17. Alipanahi B, Delong A, Weirauch MT, Frey BJ. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol. 2015;33:831–8. Ngiam J, et al. Multimodal deep learning. In: Proceedings of the 28th international conference on machine learning (ICML-11); 2011. p. 689–96. Cheerla A, Gevaert O. Deep learning with multimodal representation for pancancer prognosis prediction. In: Bioinformatics 35; 2019. p. i446–54. Zhang D, Kabuka M. Multimodal deep representation learning for protein interaction identification and protein family classification. BMC Bioinformatics. 2019;20:531. Chen X, Yu B, Carriero N, Silva C, Bonneau R. Mocap: large-scale inference of transcription factor binding sites from chromatin accessibility. Nucleic Acids Res. 2017;45:4315–29. Pique-Regi R, et al. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 2011;21:447–55. Zhang Y, et al. Primary sequence and epigenetic determinants of in vivo occupancy of genomic DNA by GATA1. Nucleic Acids Res. 2009;37:7024–38. Aydin B, et al. Proneural factors Ascl1 and Neurog2 contribute to neuronal subtype identities by establishing distinct chromatin landscapes. Nat Neurosci. 2019;22:897–908. Chanda S, et al. Generation of induced neuronal cells by the single reprogramming factor ASCL1. Stem Cell Reports. 2014;3:282–96. Wapinski OL, et al. Hierarchical mechanisms for direct reprogramming of fibroblasts to neurons. Cell. 2013;155:621–35. Orly Wapinski AL, et al. Rapid chromatin switch in the direct reprogramming of fibroblasts to neurons. Cell Rep. 2017;20:3236–47. Casey BH, Kollipara RK, Pozo K, Johnson JE. Intrinsic DNA binding properties demonstrated for lineage-specifying basic helix-loop-helix transcription factors. Genome Res. 2018;28:484–96. Raccaud M, et al. Mitotic chromosome binding predicts transcription factor properties in interphase. Nat Commun. 2019;10:487. Quang D, Xie X. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 2016;44:e107. Sundararajan M, Taly A, Yan Q. Axiomatic attribution for deep networks. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70; 2017. p. 3319–28. Zhou T, et al. Quantitative modeling of transcription factor binding specificities using DNA shape. Proc Natl Acad Sci U S A. 2015;112:4654–9. Ernst J, Kellis M. ChromHMM: automating chromatin-state discovery and characterization. Nat Methods. 2012;9:215–6. Rhee HS, et al. Expression of terminal effector genes in mammalian neurons is maintained by a dynamic relay of transient enhancers. Neuron. 2016;92:1252–65. Van Der Raadt J, Van Gestel SHCC, Kasri NN, Albers CA. ONECUT transcription factors induce neuronal characteristics and remodel chromatin accessibility. Nucleic Acids Res. 2019;47:5587–602. Velasco S, et al. A multi-step transcriptional and chromatin state cascade underlies motor neuron programming from embryonic stem cells. Cell Stem Cell. 2017;20:205–17.e8. Iwafuchi-Doi M, Zaret KS. Pioneer transcription factors in cell reprogramming. Genes Dev. 2014;28:2679–92. Yu X, Buck MJ. Defining TP53 pioneering capabilities with competitive nucleosome binding assays. Genome Res. 2019;29:107–15. Soufi A, et al. Pioneer transcription factors target partial DNA motifs on nucleosomes to initiate reprogramming. Cell. 2015;161:555–68. Wreczycka K, et al. Hot or not: examining the basis of high-occupancy target regions. Nucleic Acids Res. 2019;47:5735–45. Mazzoni EO, et al. Embryonic stem cell–based mapping of developmental transcriptional programs. Nat Methods. 2011;8:1056–8. Liang K, Keleş S. Normalization of ChIP-seq data with control. BMC Bioinformatics. 2012;13:199. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. Mahony S, et al. An integrated model of multiple-condition ChIP-Seq data reveals predeterminants of Cdx2 binding. PLoS Comput Biol. 2014;10:e1003501. Langmead B, Salzberg SL. Fast gapped-read alignment with bowtie 2. Nat Methods. 2012;9:357–9. Agarwal, R., Frosst, N., Zhang, X., Caruana, R. & Hinton, G. E. Neural additive models: interpretable machine learning with neural nets. arXiv 2020. Yosinski J, Clune J, Bengio Y, Lipson H. How transferable are features in deep neural networks? In: Advances in neural information processing systems. 3328; 2014. p. 3320. Kakumanu A, Velasco S, Mazzoni E, Mahony S. Deconvolving sequence features that discriminate between overlapping regulatory annotations. PLoS Comput Biol. 2017;13:e1005795. Bailey TL, Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol. 1994;2:28–36. Machanick P, Bailey TL. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics. 2011;27:1696–7. Grant CE, Bailey TL, Noble WS. Sequence analysis FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011;27:1017–8. Brodersen, K. H., Ong, C. S., Stephan, K. E. & Buhmann, J. M. The balanced accuracy and its posterior distribution. in 2010 20th International Conference on Pattern Recognition 3121–3124 (IEEE, 2010). https://doi.org/10.1109/ICPR.2010.764. Ku M, et al. H2A.Z landscapes and dual modifications in pluripotent and multipotent stem cells underlie complex genome regulatory functions. Genome Biol. 2012;13:R85. Yue F, et al. A comparative encyclopedia of DNA elements in the mouse genome. Nature. 2014;515:355–64. Mikkelsen TS, et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature. 2007;448:553–60. Srivastava, D., Aydin, B., Mazzoni, E. O. & Mahony, S. An interpretable bimodal neural network characterizes the sequence and preexisting chromatin predictors of induced transcription factor binding. https://github.com/seqcode/Bichrom. Github. 2020. Srivastava, D., Aydin, B., Mazzoni, E. O. & Mahony, S. An interpretable bimodal neural network characterizes the sequence and preexisting chromatin predictors of induced transcription factor binding https://doi.org/10.5281/zenodo.4271912. Zenodo. 2020.