Perplexity-Based Molecule Ranking and Bias Estimation of Chemical Language Models

Journal of Chemical Information and Modeling - Tập 62 Số 5 - Trang 1199-1206 - 2022
Michaël Moret1, Francesca Grisoni2,3, Paul Katzberger1, Gisbert Schneider1,4
1Department of Chemistry and Applied Biosciences, ETH Zurich, RETHINK, Vladimir-Prelog-Weg 4, Zurich 8093, Switzerland
2Center for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Princetonlaan 6, Utrecht 3584 CB, The Netherlands
3Institute for Complex Molecular Systems, Department of Biomedical Engineering, Eindhoven University of Technology, Groene Loper 7, Eindhoven 5612AZ, Netherlands
4ETH Singapore SEC Ltd., 1 CREATE Way, #06-01 CREATE Tower, Singapore 138602, Singapore

Tóm tắt

Từ khóa


Tài liệu tham khảo

10.1016/j.neunet.2014.09.003

10.1002/minf.201700153

10.1126/sciadv.abg3338

Tang B., 2021, Topics in Medicinal Chemistry, 1

10.1016/j.drudis.2018.01.039

10.1021/acs.jcim.7b00690

10.1186/s13321-017-0235-x

10.1016/j.isci.2021.102269

De Cao, N.; Kipf, T. MolGAN: An Implicit Generative Model for Small Molecular Graphs. Preprint at http://arxiv.org/abs/1805.11973, 2018.

10.1038/s41587-019-0224-x

10.1126/science.aat2663

10.1021/acscentsci.7b00572

Jin, W.; Barzilay, R.; Jaakkola, T. Junction Tree Variational Autoencoder for Molecular Graph Generation. Preprint at http://arxiv.org/abs/1802.04364, 2018.

Guimaraes, G. L.; Sanchez-Lengeling, B.; Outeiral, C.; Farias, P. L. C.; Aspuru-Guzik, A. Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models. Preprint at http://arxiv.org/abs/1705.10843, 2017.

10.1021/acs.jcim.8b00706

10.1021/acs.jcim.8b00263

10.1039/C9SC01928F

10.1021/acs.jcim.0c00599

10.1021/acs.jcim.8b00751

10.1021/acscentsci.7b00512

10.1021/acs.jcim.6b00754

10.1146/annurev-statistics-010814-020120

10.1021/acs.jmedchem.8b01048

10.1021/ci00057a005

10.1038/s42256-020-0160-y

Skinnider, M.; Wang, F.; Pasin, D.; Greiner, R.; Foster, L.; Dalsgaard, P.; Wishart, D. S. A Deep Generative Model Enables Automated Structure Elucidation of Novel Psychoactive Substances. Preprint at https://chemrxiv.org/articles/preprint/A_Deep_Generative_Model_Enables_Automated_Structure_Elucidation_of_Novel_Psychoactive_Substances/14644854/1, 2021.

10.1186/s13321-018-0287-6

10.1038/s41598-019-47148-x

10.1038/s42256-021-00368-1

10.3389/fphar.2020.565644

Flam-Shepherd, D.; Zhu, K.; Aspuru-Guzik, A. Keeping It Simple: Language Models Can Learn Complex Molecular Distributions. Preprint at http://arxiv.org/abs/2112.03041, 2021.

Peters, M.; Ruder, S.; Smith, N. A. To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks. Preprint at http://arxiv.org/abs/1903.05987, 2019.

Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How Transferable Are Features in Deep Neural Networks? Prepreint at http://arxiv.org/abs/1411.1792, 2014.

10.1002/minf.201700111

10.1038/s42004-018-0068-1

10.1002/anie.202104405

Manning C., 1999, Foundations of Statistical Natural Language Processing

Radford A., 2019, OpenAI blog, 1, 9

Hu, J.; Gauthier, J.; Qian, P.; Wilcox, E.; Levy, R. P. A Systematic Assessment of Syntactic Generalization in Neural Language Models. Preprint at http://arxiv.org/abs/2005.03692, 2020.

10.1162/neco.1997.9.8.1735

Gaulton A., 2017, Nucleic Acids Res., 45, D945, 10.1093/nar/gkw1074

10.1021/ci100050t

10.1002/(SICI)1097-0290(199824)61:1<47::AID-BIT9>3.0.CO;2-Z

Wilt, C. M.; Thayer, J. T.; Ruml, W. A Comparison of Greedy Search Algorithms. In Third Annual Symposium on Combinatorial Search; 2010; pp. 129–136.

10.1186/s13321-019-0393-0

Moret, M.; Grisoni, F.; Brunner, C.; Schneider, G. Leveraging Molecular Structure and Bioactivity with Chemical Language Models for Drug Design. Preprint at https://chemrxiv.org/engage/api-gateway/chemrxiv/assets/orp/resource/item/615580ced1fc334326f9356e/original/leveraging-molecular-structure-and-bioactivity-with-chemical-language-models-for-drug-design.pdf.

10.1093/nar/gky1075

Kingma, D. P.; Ba, J. Adam: A Method for Stochastic Optimization. Preprint at http://arxiv.org/abs/1412.6980, 2014.