K-spin Hamiltonian for quantum-resolvable Markov decision processes

Springer Science and Business Media LLC - Tập 2 - Trang 1-11 - 2020

Eric B. Jones^1,2, Peter Graf¹, Eliot Kapit², Wesley Jones¹

¹National Renewable Energy Laboratory, Golden, USA

²Department of Physics, Colorado School of Mines, Golden, USA

Tóm tắt

The Markov decision process is the mathematical formalization underlying the modern field of reinforcement learning when transition and reward functions are unknown. We derive a pseudo-Boolean cost function that is equivalent to a K-spin Hamiltonian representation of the discrete, finite, discounted Markov decision process with infinite horizon. This K-spin Hamiltonian furnishes a starting point from which to solve for an optimal policy using heuristic quantum algorithms such as adiabatic quantum annealing and the quantum approximate optimization algorithm on near-term quantum hardware. In arguing that the variational minimization of our Hamiltonian is approximately equivalent to the Bellman optimality condition for a prevalent class of environments we establish an interesting analogy with classical field theory. Along with proof-of-concept calculations to corroborate our formulation by simulated and quantum annealing against classical Q-Learning, we analyze the scaling of physical resources required to solve our Hamiltonian on quantum hardware.

Tài liệu tham khảo

Albash T, Lidar DA (2018) Demonstration of a scaling advantage for a quantum annealer over simulated annealing. Phys Rev X 8(3):031016

Bapst V, Foini L, Krzakala F, Semerjian G, Zamponi F (2013) The quantum adiabatic algorithm applied to random optimization problems: the quantum spin glass perspective. Phys Rep 523(3):127–205

Barahona F (1982) On the computational complexity of ising spin glass models. J Phys A Math Gen 15(10):3241

Barenco A, Bennett CH, Cleve R, DiVincenzo DP, Margolus N, Shor P, Sleator T, Smolin JA, Weinfurter H (1995) Elementary gates for quantum computation. Phys Rev A 52(5):3457

Barry AC (2000) The ising model is np-complete. SIAM News 33(6):1–3

Boothby K, Bunyk P, Raymond J, Roy A (2019) Next-generation topology of d-wave quantum processors. Technical report, Technical report

Boros E, Hammer PL (2002) Pseudo-boolean optimization. Discret Appl Math 123(1-3):155–225

Briegel HJ, De las Cuevas G (2012) Projective simulation for artificial intelligence. Sci Rep 2:400

Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym. arXiv:1606.01540

Dattani N (2019) Quadratization in discrete optimization and quantum mechanics. arXiv:1901.04405

Day AGR, Bukov M, Weinberg P, Mehta P, Dries S (2019) Glassy phase of optimal quantum control. Phys Rev Lett 122(2):020601

Denchev VS, Boixo S, Isakov SV, Ding N, Babbush R, Smelyanskiy V, Martinis J, Neven H (2016) What is the computational value of finite-range tunneling?. Phys Rev X 6(3):031015

Derrida B (1980) Random-energy model: limit of a family of disordered models. Phys Rev Lett 45(2):79

Dong D, Chen C, Li H, Tarn TJ (2008) Quantum reinforcement learning. IEEE Trans Sys Man Cybern Part B Cybern 38(5):1207–1220

Dunjko V, Friis N, Hans JB (2015) Quantum-enhanced deliberation of learning agents using trapped ions. New J Phys 17(2):023006

Dunjko V, Taylor JM, Hans JB (2016) Quantum-enhanced machine learning. Phys Rev Lett 117(13):130501

Dunjko V, Taylor JM, Hans JB (2017) Advances in quantum reinforcement learning. In: IEEE international conference on systems, man, and cybernetics (SMC), pp 282–287. IEEE

Dynkin EB (1983) Markov processes as a tool in field theory. J Funct Anal 50(2):167–187

Farhi E, Gosset D, Hen I, Sandvik AW, Shor P, Young AP, Francesco Z (2012) Performance of the quantum adiabatic algorithm on random instances of two optimization problems on regular hypergraphs. Phys Rev A 86(5):052334

Farhi E, Goldstone J, Gutmann S (2014) A quantum approximate optimization algorithm. arXiv:1411.4028

Fix A, Gruber A, Boros E, Ramin Z (2011) A graph cut algorithm for higher-order markov random fields. In: International conference on computer vision, pp. 1020–1027. IEEE, p 2011

Golovin N, Rahm E (2004) Reinforcement learning architecture for web recommendations. In: International conference on information technology: coding and computing, 2004. Proceedings. ITCC 2004. vol 1, pp 398–402. IEEE

Greenlaw R, Hoover JH, Ruzzo WL, et al. (1995) Limits to parallel computation: p-completeness theory. Oxford University Press on Demand, Oxford

Isakov SV, Zintchenko IN, Rønnow TF, Troyer M (2015) Optimised simulated annealing for ising spin glasses. Comput Phys Commun 192:265–271

Jones EB, Kapit E, Chang CY, Biagioni D, Vaidhynathan D, Graf P, Jones W (2020) On the computational viability of quantum optimization for pmu placement. arXiv:2001.04489

Kadowaki T, Nishimori H (1998) Quantum annealing in the transverse ising model. Phys Rev E 58(5):5355

Kappen HJ (2005) Path integrals and symmetry breaking for optimal control theory. J Stat Mech Theory Exp 2005(11):P11011

Kumar P (2013) Direct implementation of an n-qubit controlled-unitary gate in a single step. Quantum Inf Process 12(2):1201–1223

Lamata L (2017) Basic protocols in quantum reinforcement learning with superconducting circuits. Sci Rep 7(1):1609

Lucas A (2019) Hard combinatorial problems and minor embeddings on lattice graphs. Quantum Inf Process 18(7):203

Neukart F, Dollen DV, Seidel C, Compostella G (2018) Quantum-enhanced reinforcement learning for finite-episode games with discrete state spaces. Front Phys 5:71

Nielsen MA, Chuang I (2002) Quantum computation and quantum information

Papadimitriou CH, Tsitsiklis JN (1987) The complexity of markov decision processes. Math Oper Res 12(3):441–450

Paparo GD, Dunjko V, Makmal A, Martin-Delgado MA, Hans JB (2014) Quantum speedup for active learning agents. Phys Rev X 4(3):031002

Patil P, Kourtis S, Chamon C, Mucciolo ER, Andrei ER (2019) Obstacles to quantum annealing in a planar embedding of xorsat. Phys Rev B 100(5):054435

Pedersen SP, Christensen KS, Nikolaj TZ (2019) Native three-body interaction in superconducting circuits. Phys Rev Res 1(3):033123

Peskin ME (2018) An introduction to quantum field theory. CRC Press, Boca Raton

Rosenberg IG (1975) Reduction of bivalent maximization to the quadratic case. Cahiers du Centre d’etudes de Recherche Operationnelle 17:71–74

Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, et al. (2018) A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419):1140–1144

Stuart ED (1965) Dynamic programming and the calculus of variations. Technical report, RAND CORP SANTA MONICA CA

Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge

Sutton RS, Barto AG, Williams RJ (1992) Reinforcement learning is direct adaptive optimal control. IEEE Control Syst Mag 12(2):19–22

D-Wave Systems Inc. (2018) Source code for neal.sampler. https://docs.ocean.dwavesys.com/projects/neal/en/latest/_modules/neal/sampler.html#SimulatedAnnealingSampler.sample. Accessed: 2020-03-21

Theodorou E, Buchli J, Schaal S (2010) A generalized path integral control approach to reinforcement learning. J Mach Learn Res 11(Nov):3137–3181

Yates R (2009) Fixed-point arithmetic: an introduction. Digital Signal Labs 81(83):198

Zintchenko I, Hastings MB, Troyer M (2015) From local to global ground states in ising spin glasses. Phys Rev B 91(2):024201

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA