Large-Scale Distributed Training of Transformers for Chemical Fingerprinting
Tóm tắt
Từ khóa
Tài liệu tham khảo
Jin W., 2017, Adv. Neur. Inf. Proc. Sys., 30
Lowe, D. Chemical Reactions from US Patents (1976-Sep2016), 2017.
Chithrananda, S.; Grand, G.; Ramsundar, B.ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction. arXiv:2010.09885 [physics, q-bio] 2020.
Fabian, B.; Edlich, T.; Gaspar, H.; Segler, M.; Meyers, J.; Fiscato, M.; Ahmed, M.Molecular Representation Learning with Language Models and Domain-Relevant Auxiliary Tasks. arXiv:2011.13230 [cs] 2020.
Duvenaud D. K., 2015, Adv. Neur. Inf. Proc. Sys., 28
Gilmer J., 2017, International Conference on Machine Learning, 1263
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems; NIPS’17; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 6000–6010.
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers); Association for Computational Linguistics: Minneapolis, Minnesota, 2019; pp. 4171–4186.
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V.RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692 [cs] 2019.
Landrum, G.; Tosco, P.; Kelley, B.; sriniker; gedeck; Schneider, N.; Vianello, R.; Ric; Dalke, A.; Cole, B.; Savelyev, A.; Swain, M.; Turk, S.; Dan, N.; Vaucher, A.; Kawashima, E.; Wójcikowski, M.; Probst, D.; godin, g.; Cosgrove, D.; Pahl, A.; JP; Berenger, F.; strets123; Varjo, J. L.; O’Boyle, N.; Fuller, P.; Jensen, J. H.; Sforna, G.; Gavid, D. Rdkit/Rdkit: 2020_03_1 (Q1 2020) Release; Zenodo, 2020.
Salle, A. terashuf. 2017. https://github.com/alexandres/terashuf.
Sennrich, R.; Haddow, B.; Birch, A. Neural Machine Translation of Rare Words with Subword Units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Association for Computational Linguistics: Berlin, Germany, 2016; pp. 1715–1725.
Manning C. D., 2012, Introduction to Information Retrieval
Paszke A., 2019, Adv. Neur. Inf. Proc. Sys. 32, 8024
Distributed Training and Fast inter-GPU Communication with NCCL | GTC Silicon Valley2019https://on-demand-gtc.gputechconf.com/gtcnew/sessionview.php?sessionName=s9656-distributed+training+and+fast+inter-gpu+communication+with+nccl (accessed 2021–05 -18).
Wolf T., 2020, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 38, 10.18653/v1/2020.emnlp-demos.6
Reimers, N.; Gurevych, I.Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. arXiv:1908.10084 [cs] 2019.
Ramsundar, B; Eastman, P; Feinberg, E; Gomes, J; Leswing, K; Pappu, A; Wu, M; Pande, V. Deepchem: Democratizing Deep-Learning for Drug Discovery, Quantum Chemistry, Materials Science and Biology, 2016, https://github.com/deepchem/deepchem
Pedregosa F., 2011, J. Mach. Learn. Res., 12, 2825