Codeword design and information encoding in DNA ensembles

Springer Science and Business Media LLC - Tập 3 - Trang 253-292 - 2004
Max H. Garzon1, Russell J. Deaton2
1Computer Science, The University of Memphis, Memphis, USA
2Computer Science and Engineering, The University of Arkansas, Fayetteville, USA

Tóm tắt

Encoding of information in DNA-, RNA- and other biomolecules is animportant area of research in fields such as DNA computing,bioinformatics, and, conceivably, microbiology and genetics. This surveyfocuses on two fundamental problems, the codeword design problemand the representation problem of abiotic information, formassively parallel processing with DNA molecules. The first problemrequires libraries of DNA sequences to be designed so that specificduplexes are formed during annealing while simultaneously preventingother undesirable hybridizations from occurring in the course of acomputation in the tube. The second involves a search for efficient andcost-effective methods of representing non-biological information in DNAsequences for storage and retrieval of large amouns of data (tera- andpeta-byte scales). Two approaches are treated, namely thermodynamic andcombinatoric-computational. Both experimental and theoretical resultsare described. A reference list of major works in the area is given.Finally, some open problems deemed important for their possible impacton encoding of abiotic information representation and processing arediscussed.

Tài liệu tham khảo

Adleman L (1994) Molecular computation of solutions of combinatorial problems. Science 266: 1021–1024 Allawi HT and SantaLucia J Jr (1997) Thermodynamics and NMR of internal G·Cmis matches in DNA. Biochemistry 36: 10581–10594 Andronescu M, Dees D, Slaybaugh L, Zhao Y, Condon AE, Cohen B and Skiena S (2002) Algorithms for testing that DNA word designs avoid secondary structures. In: Ohuchi Hagiya A. (ed), pp. 182–195 Arita M and Kobayashi S (2002) DNA sequence design using templates. New Generation Computing 20(3): 263–277 Ausubel FM, Brent R, Kingston RE, Moore DD, Seidman JG, Smith JA, Struhl K, Wang-Iverson P and Bonitz SG (1993) Current Protocols in Molecular Biology. Greene Publishing Associates and Wiley-Interscience, New York Bahnzhaf W, Daida J, Eiben AE, Garzon MH, Hanovar V, Jakiela M and Smith RE (1999) Proc. of GECCO-99, the Genetic and Evolutionary Computation Conference, Orlando, Florida, Morgan Kaufmann, San Mateo, CA Baum E (1995) Building an associative memory vastly larger than the brain. Science 268: 583–585 Baum E (1996) DNA sequences useful for computation. In: Landweber and Baum (eds), pp. 122–127 Ben-Dor A, Karp R, Schwikowski B and Yakhini Z (2000) Universal DNA tag systems: A combinatorial design scheme. In: Proceedings of the Fourth International Conference on Computational Molecular Biology, ACM. Tokyo, Japan, April 8–11 Bi H, Chen J, Deaton R, Garzon M, Rubin H and Wood D (2003) A PCR-based protocol for in vitroselection of non-crosshybridizing oligonucleotides. J. of Natural Computing 2(3): 417–426 Bloomfield VA, Crothers DM and Tinoco I Jr (2000) Nucleic Acids: Structures, Properties, and Functions. University Science Books, Sausalito, CA Boneh D, Dunworth C, Lipton RJ and Sgall J (1996) Making DNA computers error-resistant. In: Landweber and Baum (eds), pp. 163–170 Borer PN, Dengler B, Tinoco I Jr. and Uhlenbeck OC (1974) Stability of ribonucleic acid double-stranded helices. J. Mol. Biol. 86: 843–853 Brenneman B and Condon A (2001) Sequence Design for Biomolecular Computation. In press. Available at http://www.cs.ubc.edu/~condon/papers/wordsurvey.ps Cantor CR and Schimmel PR (1980) Biophysical Chemistry, Part III: The Behavior of Biological Macromolecules. Freeman, New York Carbone A and Seeman N (2002) Circuits and programmable self-assembling DNA structures. Proc. of the National Academy of Science USA 99: 12577–12582 Condon A and Rozenberg G eds (2000) Proc. 6th Int. Workshop on DNA-Based Computers DNA 2000, Leiden, The Netherlands. Lecture Notes in Computer Science LNCS 2054, Springer-Verlag Cukras A, Faulhammer D, Lipton R and Landweber L (1998) Chess games: A model for RNA-based computation. In: Kari et al. (eds), pp. 35–45 Deaton R, Kim JW and Chen J (2003) Design and test of non-crosshybridizing oligonucleotide building blocks for DNA computers and nanostructures. Appl. Phys. Lett. 82: 1305–1307 Deaton RJ, Chen J, Bi H, Garzon M, Rubin H and Wood DH (2002a) A PCR-based protocol for in vitroselection of non-crosshybridizing oligonucleotides. In: Ohuchi Hagiya A (eds), pp. 105–114 Deaton RJ, Chen J, Bi H and Rose JA (2002b) A software tool for generating noncrosshybridizing libraries of DNA oligonucleotides. In: Ohuchi Hagiya A (eds), pp. 211–220 Deaton R, Murphy R, Rose J, Garzon M, Franceschetti D and Stevens SE Jr (1997a) Good encodings for DNA solution to combinatorial problems. Proc. IEEE Conference on Evolutionary Computation, IEEE Computer Society Press (1997), pp. 267–271 Deaton R, Garzon M and Rose JA (1997b) A DNA based artificial immune system for self-nonself discrimination. Proc. of the IEEE Int. Conference on Systems, Man and Cybernetics, Orlando. IEEE Press, Piscataway, NJ, pp. 369–374 Deaton R, Murphy R, Garzon M, Franceschetti D and Stevens E Jr (1996) Good encodings for DNA solution to combinatorial problems. Second Princeton Conference on DNA-based Computers, June 1996. DIMACS series of the American Mathematical Society 44 (1998), pp. 247–258 Faulhammer D, Cukras A, Lipton R and Landweber L (1999) When the knight falls: On constructing an RNA computer. In: Winfree D. Gifford (ed.), pp. 1–7 Feldkamp U, Saghafi S, Banzhaf W and Rouche H (2001) DNA sequence generator: A program for the construction of DNA sequences. In: Jonoska and Seeman, 2002, pp. 23–32 Fraenkel AS (1997) Protein folding, spin glass and computational complexity. In: Rubin and Wood (eds), pp. 101–121 Garey MR and Johnson DS (1979) Computers and Intractability. Freeman, New York Garzon M (ed) (2003a) Biomolecular machines and artificial evolution. Special Issue of the Journal of Genetic Programming and Evolvable Machines 4(2). Kluwer Academic Publishers Garzon M, Blain D, Bobba K, Neel A and West M (2003b) Self-assembly of DNA-like structures in-silico. In: Garzon (ed.), pp. 185–200 Garzon M, Neel A and Chen H (2003c) Efficiency and reliability of DNA-based memories. In: E. Cantu-Paz et al. (eds), Proc. GECCO-2003, The Genetic and Evolutionary Programming Conference. Springer-Verlag Lecture Notes in Computer Science 2723, pp. 379–389 Garzon M, Bobba K and Hyde B (2003d) Digital information encoding on DNA. In: Jonoska N, Paun G, Rozenberg G (eds), Aspects of Molecular Computing. Springer-Verlag, Lecture Notes in Computer Science 2950, pp. 152–166 Garzon M, Neel A and Bobba K (2003e) Efficiency and reliability of semantic retrieval in DNA-based memories. In: J. Reif and J. Chen (eds) Proc. DNA9-2003, Int. Workshop on DNA-based Computers. Springer-Verlag Lecture Notes in Computer Science, in press Garzon M and Oehmen C (2001) Biomolecular computing in virtual test tubes. In: Jonoska and Seeman, 2002, pp. 117–128 Garzon M and Deaton RJ (2000) Biomolecular computing: A definition. Kunstliche Intelligenz 1/00: 63–72 Garzon M and Deaton RJ (1999) Biomolecular computing and programming. IEEE Trans. on Evolutionary Comp. 3(2): 36–50 Garzon M, Deaton R, Rose JA and Franceschetti DR (1999b) Soft molecular computing. In: Winfree and Gifford (eds), pp. 89–98 Garzon M, Neathery PI, Deaton R, Murphy RC, Franceschetti DR and Stevens SE Jr (1997a) A new metric for DNA computing. In: Koza et al., 1997, pp. 472–478 Garzon M, Deaton R, Neathery P, Murphy RC, Franceschetti DR and Stevens E Jr (1997b) On the encoding problem for DNA computing. Poster at The Third DIMACS Workshop on DNA-based Computing, U of Pennsylvania. Preliminary Proceedings, pp. 230–237 Gray DM(1997) Derivation of nearest-neighbor properties from data on nucleic acid oligomer. I. Simple sets of independent sequences and the influence of absent nearest-neighbors. Biopolymers 42: 783–793 Gray DM and Tinoco I Jr (1970) A new approach to the study of sequence-dependent properties of polynucleotides. Biopolymers 9: 223–244 Hagiya M and Ohuchi A eds (2002) Proc. 8th Int. Meeting on DNA-Based Computers. Springer-Verlag Lecture Notes in Computer Science LNCS 2568. Springer-Verlag Hartemink AJ and Gifford DK (1997) Thermodynamic simulation of deoxyoligonucleotide hybridization of DNA computation. In: Rubin and Wood (eds), pp. 25–37 Head T and Yamamura M and Gal S (2001) Relativized code concepts and multi-tube DNA dictionaries, in press Head T, Yamamura M and Gal S (1999) Aqueous computing: Writing on molecules. Proceedings of the Congress on Evolutionary Computing (CEC'99) Heitsch CE, Condon AE and Hood HH (2002) From RNA secondary structure to coding theory: A combinatorial approach. In: Hagiya and Ohuchi, pp. 125–136 Hinze T, Hatnik U and Sturm M (2001) An object oriented simulation of real occurring molecular biological processes for DNA computing and its experimental verification. In: Jonoska and Seeman, 2002, pp. 1–13 Holland JH (1992) Adaptation in Natural and Artificial Systems. MIT Press, Cambridge, MA Hopcroft, J and Ullman J (1979). Introduction to Automata Theory, Languages and Computation. Addison-Wesley, Reading, MA Hussini S, Kari L, Rubin H and Konstantinidis S (2001) Coding properties of DNA languages. In: Jonoska and Seeman, 2002, pp. 57–69 Jonoska, N and Seeman N eds (2001) Proc. 7th Int. Workshop on DNA-Based Computers DNA 2000, Tampa, Florida. Lecture Notes in Computer Science LNCS 2340, Springer-Verlag, 2002 Kari L, Kitto R and Thierrin G (2000) Codes, envolutions and DNA encoding. In: Proc. Workshop on Coding Theory, London, Ontario, in press Kari L, Rubin H and Wood D eds (1998) Proc. 4th DIMACS workshop on DNA Computers, University of Pennsylvania, 1998. Special issue of Biosystems 52: 1–3 (1999) Karp R, Kenyon C and Waarts O (1996) Error-resilient DNA Computation. Proc. 7th Annual Symposium on Discrete Algorithms SODA, pp. 458–467 Karp RM and Wigderson A (1985) A fast parallel algorithm for the maximal independent set problem. Journal of the Association for Computing Machinery 32: 762–773 Klein JP, Leete TH and Rubin H (1997) A biomolecular implementation of logically reversible computation with minimal energy dissipation. In: Rubin and Wood, pp. 15–23 Kool ET (1999) New molecular strategies for joining, pairing, and amplifying. In: Winfree and Gifford, p. 62 Koza JR, Deb K, Dorigo M, Fogel DB, Garzon M, Iba H and Riolo RL (eds) (1998) Proc. 3rd Annual Genetic Programming Conference. Morgan Kaufmann, San Mateo, California Koza JR, Deb K, Dorigo M, Fogel DB, Garzon M, Iba H and Riolo RL (eds) (1997) Proc. 2nd Annual Genetic Programming Conference. Morgan Kaufmann, San Mateo, California Landweber, L and Kari L (1999) The evolution of cellular computing: Nature's solution to a computational problem. In: Kari et al., 1998, pp. 3–13 Landweber LF and Baum EB (eds) (1996) DNA Based Computers II. Proc. of the Second DIMACS Workshop on DNA-Based Computers, Princeton University. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Providence, RI: American Mathematical Society, vol. 44 (1998) Luby M (1986) A simple parallel algorithm for the maximal independent set problem. SIAM Journal on Computing 15: 1036–1053 Mao C, LaBean TH, Reif JH and Seeman NC (2000) Logical computation using algorithmic self-assembly of DNA triple-crossover molecules. Nature 407: 493–496 Marathe A, Condon AE and Corn RM(1999) On combinatorial DNA word design. In:Winfree and Gifford, 1999, pp. 75–88 McCaskill JS (1990) The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29: 1105–1119 Mezard M, Parisi G and Virasoro MA (1987) Spin Glass Theory and Beyond.World Scientific, Singapore Mitchell J and Yurke B (2001) DNA scissors. In: Jonoska and Seeman, 2002, pp. 263–268 Monasson R, Zecchina R, Kirkpatrick S, Selman B and Troyansky L (1999) Determining computational complexity from characteristic phase transitions. Nature 400: 133–137 Mount D (2001) Bioinformatics: Sequence and Genome Analysis. Spring Harbor Lab Press MD Nusser M and Deaton RJ (2003) DNA computing with in vitroselection. In: Garzon, pp. 173–183 Poland D and Scheraga HA (1970) Theory of Helix-Coil Transitions in Biopolymers. Academic Press, Inc., New York Reif JH, LaBean TH, Pirrung M, Rana VS, Guo B, Kingsford C and Wickman GS (2001) Experimental construction of very-large scale databases with associative search capability. In: Jonoska and Seeman, 2002, pp. 231–247 Roman J (1995) The Theory of Error-Correcting Codes. Springer-Verlag, Berlin Rose JA, Deaton RJ, Hagiya M and Suyama A (2000) The fidelity of the tag-antitag system. In: Jonoska and Seeman, 2001, pp. 138–149 Rose JA, Deaton R, Franceschetti DR, Garzon MH, Stevens SE Jr (1999) A statistical mechanical treatment of error in the annealing biostep of DNA computation. Banzhaf et al., pp. 1829–1834 Rose JA, Deaton R, Garzon M, Murphy RC, Franceschetti DR and Stevens SE Jr (1997) The effect of uniform melting temperatures on the efficiency of DNA computing. In: Rubin and Wood, pp. 35–42 Rubin H and Wood D (eds) (1997) Third DIMACS Workshop on DNA-Based Computers, The University of Pennsylvania, 1997. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Providence, RI. American Mathematical Society 48 (1999) SantaLucia J (2001) DNA hybridization is not digital. Plenary address at the Seventh International Meeting on DNA Based Computers SantaLucia J Jr (1998) A unified view of polymer, dumbell, and oligonucleotide nearestneighbor thermodynamics. Proc. Natl. Acad. Sci. 95: 1460–1465 SantaLucia J Jr, Allawi HT and Seneviratne PA (1996) Improved nearest-neighbor parameters for predicting DNA duplex stability. Biochemistry 35: 3555–3562 Shin SY, Kim DM, Lee IH and Zhang BT (2002) Evolutionary Sequence Generation for Reliable DNA Computing. In IEEE Congress on Evolutionary Computation 2002, in press Shor PW (1996) Fault-tolerant quantum computation. Proc. IEEE Foundations of Computer Science FOCS, 56–65 Smith TF and Waterman MS (1981) The identification of common molecular subsequences. J. Mol. Biol. 147: 195–197 Wartell RM and Benight AS (1985) Thermal denaturation of DNA molecules. Physics Reports 126: 67–107 Watson JD, Hopkins NH, Roberts JW, Steitz JA and Weiner AM (1987) Molecular Biology of the Gene, 4th ed. The Benjamin/Cummings Publishing Co., Menlo Park, CA Weigt M and Hartmann AK (2000) Number of guards needed by a museum: A phase transition in vertex covering of random graphs. Phys. Rev. Lett. 84: 6118–6121 West M, Garzon MH and Blain D (2003) DNA-like genomes for evolution in silico. In: E. Cantu-Paz et al. (eds), Proc. GECCO-2003, The Genetic and Evolutionary Programming Conference. Springer-Verlag Lecture Notes in Computer Science 2723, pp. 413–424 Wetmur JG (1997) Physical chemistry of nucleic acid hybridization. In: Rubin and Wood, pp. 1–23 Winfree E and Gifford D eds. (1999) Fifth International Metting on DNA Based Computers, MIT, Boston, MA. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Providence, RI. American Mathematical Society, 44 (2000) Winfree E, Liu F, Wenzler LA and Seeman NC (1998) Design and self-assembly of two-dimensional DNA crystals. Nature 394: 539–544 Yamamoto M, Yamashita J, Shiba T, Hirayama T, Takiya S, Suzuki K, Munekata M and Ohuchi A (1999) A study on the hybridization process in DNA computing. In: Winfree and Gifford, pp. 101–110 Yurke B and Mills AP Jr (2003) Using DNA to power Nanostructures. In: Garzon, 2003a, pp. 111–122 Yurke B, Tauberfield AJ, Mills AP Jr, Simmel FC and Neumann JL (2000) A DNA-fuelled molecular machine made of DNA. Nature 406: 605–608 Zhang BT and Shin SY (1998) Molecular algorithms for efficient and reliable DNA computing. In: Koza et al., 1998, pp. 735–742