K-spin Hamiltonian for quantum-resolvable Markov decision processes

Springer Science and Business Media LLC - Tập 2 - Trang 1-11 - 2020
Eric B. Jones1,2, Peter Graf1, Eliot Kapit2, Wesley Jones1
1National Renewable Energy Laboratory, Golden, USA
2Department of Physics, Colorado School of Mines, Golden, USA

Tóm tắt

The Markov decision process is the mathematical formalization underlying the modern field of reinforcement learning when transition and reward functions are unknown. We derive a pseudo-Boolean cost function that is equivalent to a K-spin Hamiltonian representation of the discrete, finite, discounted Markov decision process with infinite horizon. This K-spin Hamiltonian furnishes a starting point from which to solve for an optimal policy using heuristic quantum algorithms such as adiabatic quantum annealing and the quantum approximate optimization algorithm on near-term quantum hardware. In arguing that the variational minimization of our Hamiltonian is approximately equivalent to the Bellman optimality condition for a prevalent class of environments we establish an interesting analogy with classical field theory. Along with proof-of-concept calculations to corroborate our formulation by simulated and quantum annealing against classical Q-Learning, we analyze the scaling of physical resources required to solve our Hamiltonian on quantum hardware.

Tài liệu tham khảo

Barry AC (2000) The ising model is np-complete. SIAM News 33(6):1–3

Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym. arXiv:1606.01540

Dattani N (2019) Quadratization in discrete optimization and quantum mechanics. arXiv:1901.04405

Nielsen MA, Chuang I (2002) Quantum computation and quantum information

Rosenberg IG (1975) Reduction of bivalent maximization to the quadratic case. Cahiers du Centre d’etudes de Recherche Operationnelle 17:71–74

D-Wave Systems Inc. (2018) Source code for neal.sampler. https://docs.ocean.dwavesys.com/projects/neal/en/latest/_modules/neal/sampler.html#SimulatedAnnealingSampler.sample. Accessed: 2020-03-21