Ilm-NMR-P31: an open-access 31P nuclear magnetic resonance database and data-driven prediction of 31P NMR shifts

Springer Science and Business Media LLC - Tập 15 - Trang 1-12 - 2023
Jasmin Hack1, Moritz Jordan1, Alina Schmitt1, Melissa Raru1, Hannes Sönke Zorn1, Alex Seyfarth1, Isabel Eulenberger1, Robert Geitner1
1Institute of Chemistry and Bioengineering, Group of Physical Chemistry/Catalysis, Technical University Ilmenau, Ilmenau, Germany

Tóm tắt

This publication introduces a novel open-access 31P Nuclear Magnetic Resonance (NMR) shift database. With 14,250 entries encompassing 13,730 distinct molecules from 3,648 references, this database offers a comprehensive repository of organic and inorganic compounds. Emphasizing single-phosphorus atom compounds, the database facilitates data mining and machine learning endeavors, particularly in signal prediction and Computer-Assisted Structure Elucidation (CASE) systems. Additionally, the article compares different models for 31P NMR shift prediction, showcasing the database’s potential utility. Hierarchically Ordered Spherical Environment (HOSE) code-based models and Graph Neural Networks (GNNs) perform exceptionally well with a mean squared error of 11.9 and 11.4 ppm respectively, achieving accuracy comparable to quantum chemical calculations.

Tài liệu tham khảo

Willoughby PH, Jansma MJ, Hoye TR (2014) A guide to small-molecule structure assignment through computation of (¹H and ¹³C) NMR chemical shifts. Nat Protoc 9:643–660. https://doi.org/10.1038/nprot.2014.042 Yesiltepe Y, Nuñez JR, Colby SM et al (2018) An automated framework for NMR chemical shift calculations of small organic molecules. J Cheminform 10:52. https://doi.org/10.1186/s13321-018-0305-8 Gao P, Zhang J, Chen H (2021) A systematic benchmarking of 31 P and 19 F NMR chemical shift predictions using different DFT / GIAO methods and applying linear regression to improve the prediction accuracy. Int J Quantum Chem 121:e26482. https://doi.org/10.1002/qua.26482 Payard P-A, Perego LA, Grimaud L et al (2020) A DFT protocol for the prediction of 31 P NMR chemical shifts of phosphine ligands in first-row transition-metal complexes. Organometallics 39:3121–3130. https://doi.org/10.1021/acs.organomet.0c00309 Jonas E, Kuhn S (2019) Rapid prediction of NMR spectral properties with quantified uncertainty. J Cheminform 11:50. https://doi.org/10.1186/s13321-019-0374-3 Guan Y, Shree Sowndarya SV, Gallegos LC et al (2021) Real-time prediction of 1H and 13 C chemical shifts with DFT accuracy using a 3D graph neural network. Chem Sci 12:12012–12026. https://doi.org/10.1039/D1SC03343C Han H, Choi S (2021) Transfer learning from simulation to experimental data: NMR chemical shift predictions. J Phys Chem Lett 12:3662–3668. https://doi.org/10.1021/acs.jpclett.1c00578 Bremser W (1978) Hose — a novel substructure code. Anal Chim Acta 103:355–365. https://doi.org/10.1016/S0003-2670(01)83100-7 Jonas E, Kuhn S, Schlörer N (2022) Prediction of chemical shift in NMR: a review. Magn Reson Chem 60:1021–1031. https://doi.org/10.1002/mrc.5234 Kuhn S, Johnson SR (2019) Stereo-aware extension of HOSE codes. ACS Omega 4:7323–7329. https://doi.org/10.1021/acsomega.9b00488 Reiser P, Neubert M, Eberhard A et al (2022) Graph neural networks for materials science and chemistry. Commun Mater 3:93. https://doi.org/10.1038/s43246-022-00315-6 Paul EG, Grant DM (1963) Additivity relationships in carbon-13 chemical shift data for the linear alkanes. J Am Chem Soc 85:1701–1702. https://doi.org/10.1021/ja00894a045 Gensch T, Dos Passos Gomes G, Friederich P et al (2022) A comprehensive discovery platform for organophosphorus ligands for catalysis. J Am Chem Soc 144:1205–1217. https://doi.org/10.1021/jacs.1c09718 Kuwahara H, Gao X (2021) Analysis of the effects of related fingerprints on molecular similarity using an eigenvalue entropy approach. J Cheminform 13:27. https://doi.org/10.1186/s13321-021-00506-2 Tong J, Liu S, Zhang S et al (2007) Prediction of 31P nuclear magnetic resonance chemical shifts for phosphines. Spectrochim Acta A Mol Biomol Spectrosc 67:837–846. https://doi.org/10.1016/j.saa.2006.08.041 ACD/Labs (2022) NMR Prediction | 1H, 13 C, 15 N, 19F, 31P NMR predictor | ACD/Labs. www.acdlabs.com. Accessed 15 Aug 2023 Cobas C (2019) Ensemble NMR prediction—mestrelab resources. https://resources.mestrelab.com/ensemble-nmr-prediction/. Accessed 15 Aug 2023 Gupta RR, Lechner MD, Kumar M et al (2013) Numerical Data and Functional relationships in Science and Technology: NMR data for Phosphorus-31, vol 40. Springer, Berlin Tebby JC (1991) Handbook of phosphorus-31 nuclear magnetic resonance data. CRC Press, Boca Raton, Florida R Core team (2022) R: a language and environment for statistical computing. http://www.R-project.org/ Aristarán M, Tigas M, Merrill, Jeremy B et al (2020) Tabula. https://tabula.technology/ Jeroen Ooms (2022) pdftools: Text extraction, rendering and converting of PDF documents. https://CRAN.R-project.org/package=pdftools Wickham H (2022) stringr: Simple, consistent wrappers for common string operations. https://CRAN.R-project.org/package=stringr Rajan K, Brinkhaus HO, Zielesny A et al (2020) A review of optical chemical structure recognition tools. J Cheminform 12:1260. https://doi.org/10.1186/s13321-020-00465-0 ACD/Labs ACD ChemSketch. https://www.acdlabs.com Cao Y, Charisi A, Cheng L-C et al (2008) ChemmineR: a compound mining framework for R. Bioinformatics 24:1733–1734. https://doi.org/10.1093/bioinformatics/btn307 Kevin Horan T (2022) Girke ChemmineOB: R interface to a subset of OpenBabel functionalities. https://github.com/girke-lab/ChemmineOB O’Boyle NM, Banck M, James CA et al (2011) Open Babel: an open chemical toolbox. J Cheminform 3:33. https://doi.org/10.11886/1758-2946-3-33 Thomas Lin Pedersen (2022) tidygraph: A tidy API for graph manipulation. https://CRAN.R-project.org/package=tidygraph Wickham H, François R, Henry L et al (2021) dplyr: A grammar of data manipulation. https://CRAN.R-project.org/package=dplyr Pupier M, Nuzillard J-M, Wist J et al (2018) NMReDATA, a standard to report the NMR assignment and parameters of organic compounds. Org Magn Reson 56:703–715. https://doi.org/10.1002/mrc.4737 Müller K, Wickham H (2022) tibble: Simple data frames. https://CRAN.R-project.org/package=tibble Mestrelab Research SL MestReNova. https://mestrelab.com/ Guha R (2007) Chemical informatics functionality in R. J Stat Softw 18. https://doi.org/10.18637/jss.v018.i05 Guha R (2022) rcdklibs: The CDK Libraries Packaged for R. https://CRAN.R-project.org/package=rcdklibs Guha R (2018) fingerprint: Functions to operate on binary fingerprint data. https://CRAN.R-project.org/package=fingerprint Wickham H, Averick M, Bryan J et al (2019) Welcome to the tidyverse. JOSS 4:1686. https://doi.org/10.21105/joss.01686 Pedersen TL (2023) tidygraph: A tidy API for graph manipulation. https://CRAN.R-project.org/package=tidygraph Kuhn M, Vaughan D, Hvitfeldt E (2023) yardstick: Tidy characterizations of model performance. https://CRAN.R-project.org/package=yardstick Csardi G, Nepusz T (2006) The igraph software package for complex network research. https://igraph.org Csárdi G, Nepusz T, Traag V et al (2023) igraph: Network analysis and visualization in R. https://CRAN.R-project.org/package=igraph Robinson D, Hayes A, Couch S (2023) broom: Convert statistical objects into tidy tibbles. https://CRAN.R-project.org/package=broom Bache SM, Wickham H (2022) magrittr: A forward-pipe operator for R. https://CRAN.R-project.org/package=magrittr Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28:1–26. https://doi.org/10.18637/jss.v028.i05 Ushey K, Allaire JJ, Tang Y (2023) reticulate: Interface to ’Python’. https://CRAN.R-project.org/package=reticulate TensorFlow D (2023) TensorFlow. https://www.tensorflow.org/ Chollet F (2015) Keras O’Malley T, Bursztein E, Long J et al (2019) KerasTuner. https://github.com/keras-team/keras-tuner Greg Landrum P, Tosco B, Kelley et al (2023) rdkit/rdkit: 2023_03_3 (Q1 2023) Release. https://www.rdkit.org/ Grattarola D, Alippi C (2020) Graph Neural Networks in TensorFlow and Keras with Spektral Willighagen EL, Mayfield JW, Alvarsson J et al (2017) The chemistry development kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminform 9:33. https://doi.org/10.1186/s13321-017-0220-4 You J, Ying R, Leskovec J (2020) Design space for graph neural networks. https://arxiv.org/pdf/2011.08843 Kühl O (2009) Phosphorus-31 NMR spectroscopy. Springer Berlin Heidelberg, Berlin, Heidelberg Wiley Science Solutions KnowItAll NMR Spectral Library. https://sciencesolutions.wiley.com/solutions/technique/nmr/knowitall-nmr-collection/ Haider N, Robien W (2016) Automatisierte qualitätskontrolle Von 13 C-NMR-daten. Nachr Chem 64:196–198. https://doi.org/10.1002/nadc.20164047147 Kuhn S, Schlörer NE, Kolshorn H et al (2012) From chemical shift data through prediction to assignment and NMR LIMS - multiple functionalities of nmrshiftdb2. J Cheminform 4:P52. https://doi.org/10.1186/1758-2946-4-S1-P52 Kuhn S, Schlörer NE (2015) Facilitating quality control for spectra assignments of small organic molecules: nmrshiftdb2–a free in-house NMR database with integrated LIMS for academic service laboratories. Org Magn Reson 53:582–589. https://doi.org/10.1002/mrc.4263 Linstrom P (1997) NIST Chemistry WebBook, NIST Standard Reference Database 69. National Institute of Standards and Technology Hoch JC, Baskaran K, Burr H et al (2023) Biological magnetic resonance data bank. Nucleic Acids Res 51:D368–D376. https://doi.org/10.1093/nar/gkac1050 National Institute of Advanced Industrial Science and Technology SDBSweb. https://sdbs.db.aist.go.jp