On an enhancement of RNA probing data using information theory

Springer Science and Business Media LLC - Tập 15 - Trang 1-22 - 2020
Thomas J. X. Li1, Christian M. Reidys1,2
1Biocomplexity Institute & Initiative, University of Virginia, Charlottesville, USA
2Department of Mathematics, University of Virginia, Charlottesville, USA

Tóm tắt

Identifying the secondary structure of an RNA is crucial for understanding its diverse regulatory functions. This paper focuses on how to enhance target identification in a Boltzmann ensemble of structures via chemical probing data. We employ an information-theoretic approach to solve the problem, via considering a variant of the Rényi-Ulam game. Our framework is centered around the ensemble tree, a hierarchical bi-partition of the input ensemble, that is constructed by recursively querying about whether or not a base pair of maximum information entropy is contained in the target. These queries are answered via relating local with global probing data, employing the modularity in RNA secondary structures. We present that leaves of the tree are comprised of sub-samples exhibiting a distinguished structure with high probability. In particular, for a Boltzmann ensemble incorporating probing data, which is well established in the literature, the probability of our framework correctly identifying the target in the leaf is greater than $$90\%$$ .

Tài liệu tham khảo

Waterman MS. Secondary structure of single-stranded nucleic acids. In: Rota G-C, editor. Studies on foundations and combinatorics, advances in mathematics supplementary studies, vol. 1. New York: Academic Press; 1978. p. 167–212. Zuker M, Sankoff D. RNA secondary structures and their prediction. Bull Math Biol. 1984;46(4):591–621. McCaskill JS. The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers. 1990;29:1105–19. Ding Y, Lawrence CE. A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res. 2003;31(24):7280–301. Stern S, Moazed D, Noller HF. Structural analysis of RNA using chemical and enzymatic probing monitored by primer extension. Methods Enzymol. 1988;164:481–9. Merino EJ, Wilkinson KA, Coughlan JL, Weeks KM. RNA structure analysis at single nucleotide resolution by selective 2?-hydroxyl acylation and primer extension (SHAPE). J Am Chem Soc. 2005;127(12):4223–31. Deigan KE, Li TW, Mathews DH, Weeks KM. Accurate SHAPE-directed RNA structure determination. Proc Natl Acad Sci. 2009;106(1):97–102. Hajdin CE, Bellaousov S, Huggins W, Leonard CW, Mathews DH, Weeks KM. Accurate SHAPE-directed RNA secondary structure modeling, including pseudoknots. Proc Natl Acad Sci. 2013;110(14):5498–503. Sükösd Z, Swenson MS, Kjems J, Heitsch CE. Evaluating the accuracy of SHAPE-directed RNA secondary structure predictions. Nucleic Acids Res. 2013;41(5):2807–16. Washietl S, Hofacker IL, Stadler PF, Kellis M. RNA folding with soft constraints: reconciliation of probing data and thermodynamic secondary structure prediction. Nucleic Acids Res. 2012;40(10):4261–72. Zarringhalam K, Meyer MM, Dotu I, Chuang JH, Clote P. Integrating chemical footprinting data into RNA secondary structure prediction. PLoS ONE. 2012;7(10):45160. Lorenz R, Bernhart SH, Höner zu Siederdissen C, Tafer H, Flamm C, Stadler PF, Hofacker IL. ViennaRNA Package 2.0. Algorithms Mol Biol. 2011;6:26. Lai W-JC, Kayedkhordeh M, Cornell EV, Farah E, Bellaousov S, Rietmeijer R, Salsi E, Mathews DH, Ermolenko DN. mRNAs and lncRNAs intrinsically form secondary structures with short end-to-end distances. Nat Commun. 2018;9:1–11. Li TJX, Reidys CM. The rainbow spectrum of RNA secondary structures. Bull Math Biol. 2018;80(6):1514–38. Li TJX, Burris CS, Reidys CM. The block spectrum of RNA pseudoknot structures. J Math Biol. 2019;79(3):791–822. Novikova IV, Dharap A, Hennelly SP, Sanbonmatsu KY. 3s: shotgun secondary structure determination of long non-coding RNAs. Methods. 2013;63(2):170–7. Hawkes EJ, Hennelly SP, Novikova IV, Irwin JA, Dean C, Sanbonmatsu KY. COOLAIR antisense RNAs form evolutionarily conserved elaborate secondary structures. Cell Rep. 2016;16(12):3087–96. Rényi A. On a problem of information theory. MTA Mat Kut Int Kozl. 1961;6(B):505–16. Ulam SM. Adventures of a mathematician. New York: Scribner; 1976. p. 281. Shannon CE. A mathematical theory of communication. Bell Syst Tech J. 1948;27(3):379–423. Berlekamp ER. Block coding for the binary symmetric channel with noiseless, delayless feedback. In: Mann HB, editor. Error correcting codes: proceedings of a symposium. New York: Wiley; 1968. p. 61–88. Pelc A. Searching with known error probability. Theor Comput Sci. 1989;63(2):185–202. Spencer J. Ulam’s searching game with a fixed number of lies. Theor Comput Sci. 1992;95(2):307–21. Rivest RL, Meyer AR, Kleitman DJ, Winklmann K, Spencer J. Coping with errors in binary search procedures. J Comput Syst Sci. 1980;20(3):396–404. Mathews D, Sabina J, Zuker M, Turner DH. Expanded sequence dependence of thermo-dynamic parameters improves prediction of RNA secondary structure. J Mol Biol. 1999;288:911–40. Mathews DH, Disney MD, Childs JL, Schroeder SJ, Zuker M, Turner DH. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc Natl Acad Sci USA. 2004;101(19):7287–92. Zuker M. On finding all suboptimal foldings of an RNA molecule. Science. 1989;244(4900):48–52. Agius P, Bennett KP, Zuker M. Comparing RNA secondary structures using a relaxed base-pair score. RNA. 2010;16(5):865–78. Mustoe AM, Lama N, Irving PS, Olson SW, Weeks KM. RNA base pairing complexity in living cells visualized by correlated chemical probing. Proc Natl Acad Sci. 2019;116(49):24574–82. Cheng CY, Kladwang W, Yesselman JD, Das R. RNA structure inference through chemical mapping after accidental or intentional mutations. Proc Natl Acad Sci. 2017;114(37):9876–81. Sükösd Z, Knudsen B, Anderson JW, Novák A, Kjems J, Pedersen CN. Characterising RNA secondary structure space using information entropy. BMC Bioinform. 2013;14(2):22. Garcia-Martin JA, Clote P. RNA thermodynamic structural entropy. PLoS ONE. 2015;10(11):0137859. Lyngsø RB, Anderson JW, Sizikova E, Badugu A, Hyland T, Hein J. Frnakenstein: multiple target inverse RNA folding. BMC Bioinform. 2012;13:260. Ponty Y, Termier M, Denise A. GenRGenS: software for generating random genomic sequences and structures. Bioinformatics. 2006;22(12):1534–5. Waterman MS. Combinatorics of RNA hairpins and cloverleaves. Stud Appl Math. 1979;60(2):91–8. Smith TF, Waterman MS. RNA secondary structure. Math Biol. 1978;42:31–49. Howell J, Smith T, Waterman M. Computation of generating functions for biological molecules. SIAM J Appl Math. 1980;39(1):119–33. Penner RC, Waterman MS. Spaces of RNA secondary structures. Adv Math. 1993;217:31–49. Shi Y. A glimpse of structural biology through x-ray crystallography. Cell. 2014;159(5):995–1014. Bothe JR, Nikolova EN, Eichhorn CD, Chugh J, Hansen AL, Al-Hashimi HM. Characterizing RNA dynamics at atomic resolution using solution-state NMR spectroscopy. Nat Methods. 2011;8(11):919–31. Bai X-C, McMullan G, Scheres SHW. How cryo-EM is revolutionizing structural biology. Trends Biochem Sci. 2015;40(1):49–57. Weeks KM. Review toward all RNA structures, concisely. Biopolymers. 2015;103(8):438–48. Cover TM, Thomas JA. Elements of information theory (Wiley series in telecommunications and signal processing). New York: Wiley-Interscience; 2006.