Machine learning prediction of accurate atomization energies of organic molecules from low-fidelity quantum chemical calculations

Springer Science and Business Media LLC - Tập 9 - Trang 891-899 - 2019
Logan Ward1,2, Ben Blaiszik1,3, Ian Foster1,2,3, Rajeev S. Assary4,5, Badri Narayanan5,6, Larry Curtiss4,5
1Data Science and Learning Division, Argonne National Laboratory, Lemont, USA
2Department of Computer Science, University of Chicago, Chicago, USA.
3Globus, University of Chicago, Chicago, USA
4Joint Center for Energy Storage Research (JCESR), Argonne National Laboratory, Lemont, USA
5Materials Science Division, Argonne National Laboratory, Lemont, USA
6Department of Mechanical Engineering, University of Louisville, Louisville, USA

Tóm tắt

Recent studies illustrate how machine learning (ML) can be used to bypass a core challenge of molecular modeling: the trade-off between accuracy and computational cost. Here, we assess multiple ML approaches for predicting the atomization energy of organic molecules. Our resulting models learn the difference between low-fidelity, B3LYP, and high-accuracy, G4MP2, atomization energies and predict the G4MP2 atomization energy to 0.005 eV (mean absolute error) for molecules with less than nine heavy atoms (training set of 117,232 entries, test set 13,026) and 0.012 eV for a small set of 66 molecules with between 10 and 14 heavy atoms. Our two best models, which have different accuracy/speed trade-offs, enable the efficient prediction of G4MP2-level energies for large molecules and are available through a simple web interface.

Tài liệu tham khảo

L.A. Curtiss, P.C. Redfern, and K. Raghavachari: Gn theory. Wiley Interdiscip. Rev. Comput. Mol. Sci. 1, 810–825 (2011). L.A. Curtiss, P.C. Redfern, and K. Raghavachari: Gaussian-4 theory using reduced order perturbation theory. J. Chem. Phys. 127, 124105 (2007). N. Mardirossian and M. Head-Gordon: Thirty years of density functional theory in computational chemistry: an overview and extensive assessment of 200 density functionals. Mol. Phys. 115, 2315–2372 (2017). A.D. Becke: A new mixing of Hartree–Fock and local density-functional theories. J. Chem. Phys. 98, 1372 (1993). L. Ward and C. Wolverton: Atomistic calculations and materials informatics: a review. Curr. Opin. Solid State Mater. Sci. 21, 167–176 (2017). C.M. Handley and J. Behler: Next generation interatomic potentials for condensed systems. Eur. Phys. J. B 87, 152 (2014). M. Rupp: Machine learning for quantum mechanics in a nutshell. Int. J. Quantum Chem. 115, 1058–1073 (2015). R. Ramakrishnan, P.O. Dral, M. Rupp, and O.A. von Lilienfeld: Big data meets quantum chemistry approximations: the Δ-machine learning approach. J. Chem. Theory Comput. 11, 2087–2096 (2015). P. Zaspel, B. Huang, H. Harbrecht, and O.A. von Lilienfeld: Boosting quantum machine learning models with a multilevel combination technique: pople diagrams revisited. J. Chem. Theory Comput. 15, 1546–1559 (2019). G. Pilania, J.E. Gubernatis, and T. Lookman: Multi-fidelity machine learning models for accurate bandgap predictions of solids. Comput. Mater. Sci. 129, 156–163 (2017). A. Seko, T. Maekawa, K. Tsuda, and I. Tanaka: Machine learning with systematic density-functional theory calculations: application to melting temperatures of single- and binary-component solids. Phys. Rev. B 89, 054303 (2014). J.S. Smith, B.T. Nebgen, R. Zubatyuk, N. Lubbers, C. Devereux, K. Barros, S. Tretiak, O. Isayev, and A.E. Roitbert: Outsmarting quantum chemistry through transfer learning universal neural network potentials for organic molecules. ChemArXiv (2018). 10.26434/chemrxiv.6744440. K.T. Schütt, H.E. Sauceda, P.-J. Kindermans, A. Tkatchenko, and K.-R. Müller: Schnet–a deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018). F.A. Faber, A.S. Christensen, B. Huang, and O.A. von Lilienfeld: Alchemical and structural distribution based representation for universal quantum machine learning. J. Chem. Phys. 148, 241717 (2018). R. Chard, Z. Li, K. Chard, L. Ward, Y. Babuji, A. Woodard, S. Tuecke, B. Blaiszik, M.J. Franklin, and I. Foster: DLHub: Model and Data Serving for Science (Cornell University, 2018). https://arxiv.org/abs/1811.11213 B. Narayanan, P.C. Redfern, R.S. Assary, and L.A. Curtiss: Accurate quantum chemical energies for 133 000 organic molecules. Chem. Sci. (2019). doi:10.1039/C9SC02834J R. Ramakrishnan, P.O. Dral, M. Rupp, and O.A. von Lilienfeld: Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1, 140022 (2014). B. Blaiszik, K. Chard, J. Pruyne, R. Ananthakrishnan, S. Tuecke, and I. Foster: The materials data facility: data services to advance materials science research. JOM 68, 2045–2052 (2016). L. Ward, B. Blaiszik, I. Foster, R.S. Assary, B. Narayanan, and L.A. Curtiss: Dataset for Machine Learning Prediction of Accurate Atomization Energies of Organic Molecules from Low-Fidelity Quantum Chemical Calculations (Materials Data Facility, 2019). doi:10.18126/M2V65Z https://github.com/globus-labs/g4mp2-atomization-energy. J. Gilmer, S.S. Schoenholz, P.F. Riley, O. Vinyals, and G.E. Dahl: Neural Message Passing for Quantum Chemistry (2017). http://arxiv.org/abs/1704.01212. Z. Wu, B. Ramsundar, E.N. Feinberg, J. Gomes, C. Geniesse, A.S. Pappu, K. Leswing, and V. Pande: Moleculenet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018). A. Paul, D. Jha, R. Al-Bahrani, W. Liao, A. Choudhary, and A. Agrawal: CheMixNet: Mixed DNN Architectures for Predicting Chemical Properties Using Multiple Molecular Representations (2018). http://arxiv.org/abs/1811.08283. K.T. Schütt, P. Kessel, M. Gastegger, K.A. Nicoli, A. Tkatchenko, and K.-R. Müller: Schnetpack: a deep learning toolbox for atomistic systems. J. Chem. Theory Comput. 15, 448–455 (2019). B. Huang and O.A. von Lilienfeld: The “DNA” of Chemistry: Scalable Quantum Machine Learning with “Amons”, 2017http://arxiv.org/abs/1707.04146. A.S. Christensen, F.A. Faber, B. Huang, L.A. Bratholm, A. Tkatchenko, K.-R. Müller, and O.A. von Lilienfeld: qmlcode/qml: Release v0.3.1 (2017). doi:10.5281/ZENODO.817332. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011). J. Baxter: A Bayesian Information theoretic model of learning to learn via multiple task sampling. Mach. Learn 28, 7–39 (1997). N.J. Browning, R. Ramakrishnan, O.A. von Lilienfeld, and U. Roethlisberger: Genetic optimization of training sets for improved machine learning models of molecular properties. J. Phys. Chem. Lett. 8, 1351–1359 (2017). T.S. Hy, S. Trivedi, H. Pan, B.M. Anderson, and R. Kondor: Predicting molecular properties with covariant compositional networks. J. Chem. Phys. 148 (2018). S. Kearnes, K. McCloskey, M. Berndl, V. Pande, and P. Riley: Molecular graph convolutions: moving beyond fingerprints. J. Comput. Aided Mol. Des. 30, 595–608 (2016). C.W. Coley, W. Jin, L. Rogers, T.F. Jamison, T.S. Jaakkola, W.H. Green, R. Barzilay, and K.F. Jensen: A graph-convolutional neural network model for the prediction of chemical reactivity. Chem. Sci. 10, 370–377 (2019). T.A. Halgren: Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J. Comput. Chem. 17, 490–519 (1996). N.M. O’Boyle, M. Banck, C.A. James, C. Morley, T. Vandermeersch, and G.R. Hutchison: Open babel: an open chemical toolbox. J. Cheminform. 3, 33 (2011). N.W.A. Gebauer, M. Gastegger, and K.T. Schütt: Generating Equilibrium Molecules with Deep Neural Networks (2018). http://arxiv.org/abs/1810.11347. K. Yao, J.E. Herr, D.W. Toth, R. Mckintyre, and J. Parkhill: The TensorMol-0.1 model chemistry: a neural network augmented with long-range physics. Chem. Sci. 9, 2261–2269 (2018). M. Nakata, T. Shimazaki, M. Hashimoto, and T. Maeda: PubChemQC PM6: A Dataset of 221 Million Molecules with Optimized Molecular Geometries and Electronic Properties (2019) pp. 1–33. http://arxiv.org/abs/1904.06046. J. Towns, T. Cockerill, M. Dahan, I. Foster, K. Gaither, A. Grimshaw, V. Hazlewood, S. Lathrop, D. Lifka, G.D. Peterson, R. Roskies, J.R. Scott, and N. Wilkens-Diehr: XSEDE: accelerating scientific discovery. Comput. Sci. Eng. 16, 62–74 (2014). C.A. Stewart, G. Turner, M. Vaughn, N.I. Gaffney, T.M. Cockerill, I. Foster, D. Hancock, N. Merchant, E. Skidmore, D. Stanzione, J. Taylor, and S. Tuecke: Jetstream: a self-provisioned, scalable science and engineering cloud environment. In Proc. 2015 XSEDE Conf. Sci. Adv. Enabled by Enhanc. Cyberinfrastructure - XSEDE’ 15; ACM Press, New York, NY, USA, 2015; pp. 1–8.