Towards reproducible computational drug discovery
Tóm tắt
The reproducibility of experiments has been a long standing impediment for further scientific progress. Computational methods have been instrumental in drug discovery efforts owing to its multifaceted utilization for data collection, pre-processing, analysis and inference. This article provides an in-depth coverage on the reproducibility of computational drug discovery. This review explores the following topics: (1) the current state-of-the-art on reproducible research, (2) research documentation (e.g. electronic laboratory notebook, Jupyter notebook, etc.), (3) science of reproducible research (i.e. comparison and contrast with related concepts as replicability, reusability and reliability), (4) model development in computational drug discovery, (5) computational issues on model development and deployment, (6) use case scenarios for streamlining the computational drug discovery protocol. In computational disciplines, it has become common practice to share data and programming codes used for numerical calculations as to not only facilitate reproducibility, but also to foster collaborations (i.e. to drive the project further by introducing new ideas, growing the data, augmenting the code, etc.). It is therefore inevitable that the field of computational drug design would adopt an open approach towards the collection, curation and sharing of data/code.
Từ khóa
Tài liệu tham khảo
Mullard A (2016) Biotech R&D spend jumps by more than 15. Nat Rev Drug Discov 15(7):447. https://doi.org/10.1038/nrd.2016.135
Stratmann HG (2010) Bad medicine: when medical research goes wrong. Analog Sci Fict Fact CXXX(9):20–30
DiMasi JA, Grabowski HG, Hansen RW (2016) Innovation in the pharmaceutical industry: new estimates of R&D costs. J Health Econ 47:20–33. https://doi.org/10.1016/j.jhealeco.2016.01.012
Biotechnology Innovation Organisation (2016) Clinical Development Success Rates 2006–2015
Ogu CC, Maxa JL (2000) Drug interactions due to cytochrome p450. Baylor Univ Med Center Proc 13(4):421–423. https://doi.org/10.1080/08998280.2000.11927719
Fox S, Farr-Jones S, Sopchak L, Boggs A, Nicely HW, Khoury R, Biros M (2006) High-throughput screening: update on practices and success. J Biomol Screen 11(7):864–869. https://doi.org/10.1177/1087057106292473
Hughes JP, Rees S, Kalindjian SB, Philpott KL (2011) Principles of early drug discovery. Br J Pharmacol 162(6):1239–1249. https://doi.org/10.1111/j.1476-5381.2010.01127.x
Ruddigkeit L, van Deursen R, Blum LC, Reymond J-L (2012) Enumeration of 166 billion organic small molecules in the chemical universe database gdb-17. J Chem Inform Model 52(11):2864–2875. https://doi.org/10.1021/ci300415d
Villoutreix BO, Renault N, Lagorce D, Sperandio O, Montes M, Miteva MA (2007) Free resources to assist structure-based virtual ligand screening experiments. Curr Protein Pept Sci 8(4):381–411
Nantasenamat C, Prachayasittikul V (2015) Maximizing computational tools for successful drug discovery. Expert Opin Drug Discov 10(4):321–329. https://doi.org/10.1517/17460441.2015.1016497
Feng BY, Simeonov A, Jadhav A, Babaoglu K, Inglese J, Shoichet BK, Austin CP (2007) A high-throughput screen for aggregation-based inhibition in a large compound library. J Med Chem 50(10):2385–2390. https://doi.org/10.1021/jm061317y
Soares KM, Blackmon N, Shun TY, Shinde SN, Takyi HK, Wipf P, Lazo JS, Johnston PA (2010) Profiling the nih small molecule repository for compounds that generate H2O2 by redox cycling in reducing environments. Assay Drug Dev Technol 8(2):152–174. https://doi.org/10.1089/adt.2009.0247
Young D, Martin T, Venkatapathy R, Harten P (2008) Are the chemical structures in your QSAR correct? QSAR Combinatorial Sci 27(11–12):1337–1345. https://doi.org/10.1002/qsar.200810084
Zhao L, Wang W, Sedykh A, Zhu H (2017) Experimental errors in QSAR modeling sets: what we can do and what we cannot do. ACS Omega 2(6):2805–2812. https://doi.org/10.1021/acsomega.7b00274
Clark RD (2019) A path to next-generation reproducibility in cheminformatics. J Cheminform 11:62. https://doi.org/10.1186/s13321-019-0385-0
Walters P (2019) Where’s the code? http://practicalcheminformatics.blogspot.com/2019/05/wheres-code.html. Accessed 1 Nov 2019
Garabedian TE (1997) Laboratory record keeping. Nat Biotechnol 15(8):799–800. https://doi.org/10.1038/nbt0897-799
Plavén-Sigray P, Matheson GJ, Schiffler BC, Thompson WH (2017) The readability of scientific texts is decreasing over time. eLife. https://doi.org/10.7554/eLife.27725
Dirnagl U, Przesdzing I (2016) A pocket guide to electronic laboratory notebooks in the academic life sciences. F1000 Res 5:2 https://doi.org/10.12688/f1000research.7628.1
Rubacha M, Rattan AK, Hosselet SC (2011) A review of electronic laboratory notebooks available in the market today. J Lab Autom 16(1):90–98. https://doi.org/10.1016/j.jala.2009.01.002
Mascarelli A (2014) Research tools: jump off the page. Nature 507(7493):523–525. https://doi.org/10.1038/nj7493-523a
Schnell S (2015) Ten simple rules for a computational biologist’s laboratory notebook. PLoS Comput Biol 11(9):1004385. https://doi.org/10.1371/journal.pcbi.1004385
Bradley J-C, Neylon C (2008) Data on display. Interview by Katherine Sanderson. Nature 455(7211):273. https://doi.org/10.1038/455273a
Butler D (2005) Electronic notebooks: a new leaf. Nature 436(7047):20–21. https://doi.org/10.1038/436020a
Project Jupyter (2019) The Jupyter Notebook. http://www.jupyter.org/. Accessed 9 Jan 2019
Project Jupyter (2019) nbviewer. http://nbviewer.jupyter.org/. Accessed 9 Jan 2019
Freeman Lab (2019) Binder. http://mybinder.org/. Accessed 9 Jan 2019
Google (2019) Colaboratory. https://colab.research.google.com/. Accessed 9 Jan 2019
Baker M (2016) 1,500 scientists lift the lid on reproducibility. Nature 533(7604):452–454. https://doi.org/10.1038/533452a
Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD (2015) The extent and consequences of p-hacking in science. PLoS Biol 13(3):1002106. https://doi.org/10.1371/journal.pbio.1002106
Simonsohn U, Nelson LD, Simmons JP (2014) P-curve: a key to the file-drawer. J Exp Psychol Gen 143(2):534–547. https://doi.org/10.1037/a0033242
Ioannidis JPA (2008) Effect of formal statistical significance on the credibility of observational associations. Am J Epidemiol 168(4):374–83384. https://doi.org/10.1093/aje/kwn156
Risch NJ (2000) Searching for genetic determinants in the new millennium. Nature 405(6788):847–856. https://doi.org/10.1038/35015718
Wacholder S, Chanock S, Garcia-Closas M, El Ghormli L, Rothman N (2004) Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J Natl Cancer Inst 96(6):434–442
Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJ, Groth P, Goble C, Grethe JS, Heringa J, ’t Hoen PA, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone SA, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B (2016) The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3:160018. https://doi.org/10.1038/sdata.2016.18
Guha R, Willighagen E (2017) Helping to improve the practice of cheminformatics. J Cheminform 9(1):40. https://doi.org/10.1186/s13321-017-0217-z
Collin’s English Dictionary (2019) Reproduce. http://www.dictionary.com/browse/reproducibility. Accessed 9 Jan 2019
Schwab M, Karrenbach M, Claerbout J (2000) Making scientific computations reproducible. Comput Sci Eng 2:61–67
Casadevall A, Fang FC (2010) Reproducible science. Infect Immun 78(12):4972–4975. https://doi.org/10.1128/IAI.00908-10
Kerr Bernal S (2006) A massive snowball of fraud and deceit. J Androl 27(3):313–315. https://doi.org/10.2164/jandrol.06007
Joint Committee for Guides in Metrology (2008) Evaluation of measurement data — Guide to the expression of uncertainty in measurement. https://www.bipm.org/utils/common/documents/jcgm/JCGM_100_2008_E.pdf. Accessed 1 Nov 2019
Oudeyer P-Y, Merrick K (2016) Computational modelling across disciplines. IEEE Cogn Dev Syst Newslett 13(2):1
Taylor CF, Field D, Sansone SA, Aerts J, Apweiler R, Ashburner M, Ball CA, Binz PA, Bogue M, Booth T, Brazma A, Brinkman RR, Michael Clark A, Deutsch EW, Fiehn O, Fostel J, Ghazal P, Gibson F, Gray T, Grimes G, Hancock JM, Hardy NW, Hermjakob H, Julian RK, Kane M, Kettner C, Kinsinger C, Kolker E, Kuiper M, Le Novere N, Leebens-Mack J, Lewis SE, Lord P, Mallon AM, Marthandan N, Masuya H, McNally R, Mehrle A, Morrison N, Orchard S, Quackenbush J, Reecy JM, Robertson DG, Rocca-Serra P, Rodriguez H, Rosenfelder H, Santoyo-Lopez J, Scheuermann RH, Schober D, Smith B, Snape J, Stoeckert CJ, Tipton K, Sterk P, Untergasser A, Vandesompele J, Wiemann S (2008) Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nat Biotechnol 26(8):889–896. https://doi.org/10.1038/nbt.1411
Hermjakob H, Montecchi-Palazzi L, Bader G, Wojcik J, Salwinski L, Ceol A, Moore S, Orchard S, Sarkans U, von Mering C, Roechert B, Poux S, Jung E, Mersch H, Kersey P, Lappe M, Li Y, Zeng R, Rana D, Nikolski M, Husi H, Brun C, Shanker K, Grant SG, Sander C, Bork P, Zhu W, Pandey A, Brazma A, Jacq B, Vidal M, Sherman D, Legrain P, Cesareni G, Xenarios I, Eisenberg D, Steipe B, Hogue C, Apweiler R (2004) The HUPO PSI’s molecular interaction format—a community standard for the representation of protein interaction data. Nat Biotechnol 22(2):177–183. https://doi.org/10.1038/nbt926
Demir E, Cary MP, Paley S, Fukuda K, Lemer C, Vastrik I, Wu G, D’Eustachio P, Schaefer C, Luciano J, Schacherer F, Martinez-Flores I, Hu Z, Jimenez-Jacinto V, Joshi-Tope G, Kandasamy K, Lopez-Fuentes AC, Mi H, Pichler E, Rodchenkov I, Splendiani A, Tkachev S, Zucker J, Gopinath G, Rajasimha H, Ramakrishnan R, Shah I, Syed M, Anwar N, Babur O, Blinov M, Brauner E, Corwin D, Donaldson S, Gibbons F, Goldberg R, Hornbeck P, Luna A, Murray-Rust P, Neumann E, Ruebenacker O, Reubenacker O, Samwald M, van Iersel M, Wimalaratne S, Allen K, Braun B, Whirl-Carrillo M, Cheung KH, Dahlquist K, Finney A, Gillespie M, Glass E, Gong L, Haw R, Honig M, Hubaut O, Kane D, Krupa S, Kutmon M, Leonard J, Marks D, Merberg D, Petri V, Pico A, Ravenscroft D, Ren L, Shah N, Sunshine M, Tang R, Whaley R, Letovksy S, Buetow KH, Rzhetsky A, Schachter V, Sobral BS, Dogrusoz U, McWeeney S, Aladjem M, Birney E, Collado-Vides J, Goto S, Hucka M, Le Novere N, Maltsev N, Pandey A, Thomas P, Wingender E, Karp PD, Sander C, Bader GD (2010) The BioPAX community standard for pathway data sharing. Nat Biotechnol 28(9):935–942. https://doi.org/10.1038/nbt.1666
Wf4Ever Project (2019) Wf4Ever github repository. http://wf4ever.github.io/. Accessed 9 Jan 2019
Cooper J, Vik JO, Waltemath D (2015) A call for virtual experiments: accelerating the scientific process. Progr Biophys Mol Biol 117(1):99–106. https://doi.org/10.1016/j.pbiomolbio.2014.10.001
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5(10):80. https://doi.org/10.1186/gb-2004-5-10-r80
Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, Taylor J (2010) Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol Chapt 19:19–10121. https://doi.org/10.1002/0471142727.mb1910s89
Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J et al (2005) Galaxy: a platform for interactive large-scale genome analysis. Genome Res 15(10):1451–1455. https://doi.org/10.1101/gr.4086505
Goecks J, Nekrutenko A, Taylor J (2010) Galaxy Team: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11(8):86. https://doi.org/10.1186/gb-2010-11-8-r86
Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu C-H, Xie D, Suchard MA, Rambaut A, Drummond AJ (2014) Beast 2: a software platform for bayesian evolutionary analysis. PLoS Comput Biol 10(4):1003537. https://doi.org/10.1371/journal.pcbi.1003537
Bouckaert R, Vaughan TG, Barido-Sottani J, Duchêne S, Fourment M, Gavryushkina A, Heled J, Jones G, Kühnert D, De Maio N, Matschiner M, Mendes FK, Müller NF, Ogilvie HA, du Plessis L, Popinga A, Rambaut A, Rasmussen D, Siveroni I, Suchard MA, Wu C-H, Xie D, Zhang C, Stadler T, Drummond AJ (2019) Beast 2.5: An advanced software platform for bayesian evolutionary analysis. PLoS Comput Biol 15(4):1006650. https://doi.org/10.1371/journal.pcbi.1006650
Teytelman L protocols.io - the #1 science methods repository
High Level Expert Group on Scientific Data (2010) Riding the Wave—how Europe can gain from the rising tide of scientific data. https://www.fosteropenscience.eu/content/riding-wave-how-europe-can-gain-rising-tide-scientific-data/. Accessed 9 Jan 2019
National Institutes of Health (2019) NIH Grants Policy Statement. https://grants.nih.gov/policy/nihgps/index.htm. Accessed 9 Jan 2019
NordForsk (2019) Open Access to Research Data - Status, Issues and Outlook. https://www.nordforsk.org/en/publications/publications_container/open-access-to-research-data-2013-status-issues-and-outlook/. Accessed 9 Jan 2019
Borgman CL (2015) Big data, little data, no data: scholarship in the networked world. MIT Press, Cambridge
Margolis R, Derr L, Dunn M, Huerta M, Larkin J, Sheehan J, Guyer M, Green ED (2014) The national institutes of health’s big data to knowledge (bd2k) initiative: capitalizing on biomedical big data. J Am Med Inform Assoc 21(6):957–958. https://doi.org/10.1136/amiajnl-2014-002974
Pasquetto IV, Randles BM, Borgman CL (2017) On the reuse of scientific data. Data Sci J. https://doi.org/10.5334/dsj-2017-008
Wallis JC, Rolando E, Borgman CL (2013) If we share data, will anyone use them? data sharing and reuse in the long tail of science and technology. PLoS ONE 8(7):67332. https://doi.org/10.1371/journal.pone.0067332
Chavan V, Penev L (2011) The data paper: a mechanism to incentivize data publishing in biodiversity science. BMC Bioinform 12 Suppl 15:2. https://doi.org/10.1186/1471-2105-12-S15-S2
Gorgolewski KJ, Margulies DS, Milham MP (2013) Making data sharing count: a publication-based solution. Front Neurosci 7:9. https://doi.org/10.3389/fnins.2013.00009
Searls DB (2010) The roots of bioinformatics. PLoS Comput Biol 6(6):1000809. https://doi.org/10.1371/journal.pcbi.1000809
Kanwal S, Khan FZ, Lonie A, Sinnott RO (2017) Investigating reproducibility and tracking provenance—a genomic workflow case study. BMC Bioinform 18(1):337. https://doi.org/10.1186/s12859-017-1747-0
Kim Y-M, Poline J-B, Dumas G (2017) Experimenting with reproducibility in bioinformatics. BioRxiv. https://doi.org/10.1101/143503
Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten simple rules for reproducible computational research. PLoS Comput Biol 9(10):1003285. https://doi.org/10.1371/journal.pcbi.1003285
Van Neste C, Gansemans Y, De Coninck D, Van Hoofstat D, Van Criekinge W, Deforce D, Van Nieuwerburgh F (2015) Forensic massively parallel sequencing data analysis tool: implementation of MyFLq as a standalone web- and Illumina BaseSpace®-application. Forensic Sci Int Genet 15:2–7. https://doi.org/10.1016/j.fsigen.2014.10.006
Dove ES, Joly Y, Tassé A-M (2015) Public Population Project in Genomics and Society (P3G) International Steering Committee and International Cancer Genome Consortium (ICGC) Ethics and Policy Committee, Knoppers, B.M.: genomic cloud computing: legal and ethical points to consider. Eur J Human Genet 23(10):1271–1278. https://doi.org/10.1038/ejhg.2014.196
Docker Inc. (2019) Docker. https://www.docker.com/. Accessed 9 Jan 2019
da Veiga Leprevost F, Gruning BA, Alves Aflitos S, Rost HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Vera Alvarez R, Griss J, Nesvizhskii AI, Perez-Riverol Y (2017) BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics 33(16):2580–2582. https://doi.org/10.1093/bioinformatics/btx192
Kim B, Ali T, Lijeron C, Afgan E, Krampis K (2017) Bio-docklets: virtualization containers for single-step execution of ngs pipelines. GigaScience 6(8):1–7. https://doi.org/10.1093/gigascience/gix048
Menegidio FB, Jabes DL, de Oliveira R Costa, Nunes LR (2018) Dugong: a Docker image, based on Ubuntu Linux, focused on reproducibility and replicability for bioinformatics analyses. Bioinformatics 34(3):514–515. https://doi.org/10.1093/bioinformatics/btx554
Kulkarni N, Alessandri L, Panero R, Arigoni M, Olivero M, Ferrero G, Cordero F, Beccuti M, Calogero RA (2018) Reproducible bioinformatics project: a community for reproducible bioinformatics analysis pipelines. BMC Bioinform 19(Suppl 10):349. https://doi.org/10.1186/s12859-018-2296-x
Rozenblatt-Rosen O, Stubbington MJT, Regev A, Teichmann SA (2017) The Human Cell Atlas: from vision to reality. Nature 550(7677):451–453. https://doi.org/10.1038/550451a
Peng RD (2011) Reproducible research in computational science. Science 334(6060):1226–1227. https://doi.org/10.1126/science.1213847
Stodden V, Leisch F, Peng RD (2014) Implementing reproducible research. CRC Press/Taylor & Francis Group, Boca Raton
Scientific Data (2019) Recommended Data Repositories. https://www.nature.com/sdata/policies/repositories/. Accessed 9 Jan 2019
Dryad (2019) Dryad Digital Repository. https://datadryad.org/. Accessed 9 Jan 2019
Dryad (2019) DryadLab. http://datadryad.org/pages/dryadlab/. Accessed 9 Jan 2019
figshare (2019) figshare—credit for all your research. http://www.figshare.com/. Accessed 9 Jan 2019
Singh J (2011) Figshare. J Pharmacol Pharmacother 2(2):138–139. https://doi.org/10.4103/0976-500X.81919
Zenodo (2019) Zenodo—Research. Shared. https://zenodo.org/. Accessed 9 Jan 2019
Open Science Framework (2019) OSF Home. https://osf.io/. Accessed 9 Jan 2019
Center for Open Science (2019) Center for Open Science Website. https://cos.io/. Accessed 9 Jan 2019
Foster ED, Deardorff A (2017) Open science framework (osf). J Med Lib Assoc 105(2):203–206. https://doi.org/10.5195/JMLA.2017.88
Macmillan Publishers Limited (2019) Scientific Data. https://www.nature.com/sdata/. Accessed 9 Jan 2019
Elsevier (2019) Data in Brief. https://www.journals.elsevier.com/data-in-brief/. Accessed 9 Jan 2019
MDPI (2019) Data. http://www.mdpi.com/journal/data/. Accessed 9 Jan 2019
F1000Research (2019) F1000Research | Open Access Publishing Platform | Beyond a Research Journal. https://f1000research.com/. Accessed 9 Jan 2019
arXiv (2019) arXiv.org e-Print archive. https://arxiv.org/. Accessed 9 Jan 2019
bioRxiv (2019) bioRxiv.org—the preprint server for Biology. https://www.biorxiv.org/. Accessed 9 Jan 2019
ChemRxiv (2019) ChemRxiv: the Preprint Server for Chemistry. https://chemrxiv.org/. Accessed 9 Jan 2019
PeerJ (2019) PeerJ Preprints. https://peerj.com/preprints/. Accessed 9 Jan 2019
Bitbucket (2019) Bitbucket - The Git solution for professional teams. https://bitbucket.org/. Accessed 9 Jan 2019
GitLab (2019) GitLab. https://about.gitlab.com/. Accessed 9 Jan 2019
Assembla (2019) Assembla: Secure Git, Secure Software Development in the Cloud. https://www.assembla.com/. Accessed 9 Jan 2019
Google (2019) Cloud Source Repositories. https://cloud.google.com/source-repositories/. Accessed 9 Jan 2019
Sofroniew NJ, Vlasov YA, Hires SA, Freeman J, Svoboda K (2015) Neural coding in barrel cortex during whisker-guided locomotion. eLife. https://doi.org/10.7554/eLife.12559
Li N, Daie K, Svoboda K, Druckmann S (2016) Robust neuronal dynamics in premotor cortex during motor planning. Nature 532(7600):459–464. https://doi.org/10.1038/nature17643
Code Ocean (2019) Code Ocean—Professional tools for researchers. https://codeocean.com/. Accessed 9 Jan 2019
Cornell Tech (2019) Code Ocean: Tackling Reproducibility and Transparency in Scientific Research. https://tech.cornell.edu/news/code-ocean-tackling-reproducibility-and-transparency-in- scientific-research. Accessed 9 Jan 2019
Perkel J (2019) TechBlog: C. Titus Brown: Predicting the paper of the future. http://blogs.nature.com/naturejobs/2017/06/01/techblog-c-titus-brown-predicting-the-paper-of-the-future/. Accessed 9 Jan 2019
Software Carpentry (2019) Software Carpentry—Teaching basic lab skills for research computing. https://software-carpentry.org/. Accessed 9 Jan 2019
Data Carpentry (2019) Data Carpentry—Building communities teaching universal data literacy. http://www.datacarpentry.org/. Accessed 9 Jan 2019
Birney E, Hudson TJ, Green ED, Gunter C, Eddy S, Rogers J, Harris JR, Ehrlich SD, Apweiler R, Austin CP, Berglund L, Bobrow M, Bountra C, Brookes AJ, Cambon-Thomsen A, Carter NP, Chisholm RL, Contreras JL, Cooke RM, Crosby WL, Dewar K, Durbin R, Dyke SO, Ecker JR, El Emam K, Feuk L, Gabriel SB, Gallacher J, Gelbart WM, Granell A, Guarner F, Hubbard T, Jackson SA, Jennings JL, Joly Y, Jones SM, Kaye J, Kennedy KL, Knoppers BM, Kyrpides NC, Lowrance WW, Luo J, MacKay JJ, Martin-Rivera L, McCombie WR, McPherson JD, Miller L, Miller W, Moerman D, Mooser V, Morton CC, Ostell JM, Ouellette BF, Parkhill J, Raina PS, Rawlings C, Scherer SE, Scherer SW, Schofield PN, Sensen CW, Stodden VC, Sussman MR, Tanaka T, Thornton J, Tsunoda T, Valle D, Vuorio EI, Walker NM, Wallace S, Weinstock G, Whitman WB, Worley KC, Wu C, Wu J, Yu J (2009) Prepublication data sharing. Nature 461(7261):168–170. https://doi.org/10.1038/461168a
González-Medina M, Naveja JJ, Sánchez-Cruz N, Medina-Franco JL (2017) Open chemoinformatic resources to explore the structure, properties and chemical space of molecules. RSC Adv 7(85):54153–54163. https://doi.org/10.1039/C7RA11831G
Hasegawa K, Funatsu K (2014) Data mining of chemogenomics data using bi-modal PLS methods and chemical interpretation for molecular design. Mol Inform 33(11–12):749–756. https://doi.org/10.1002/minf.201400061
Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, Félix E, Magariños MP, Mosquera JF, Mutowo P, Nowotka M, Gordillo-Marañón M, Hunter F, Junco L, Mugumbate G, Rodriguez-Lopez M, Atkinson F, Bosc N, Radoux CJ, Segura-Cabrera A, Hersey A, Leach AR (2019) ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 47(D1):930–940. https://doi.org/10.1093/nar/gky1075
Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, Li Q, Shoemaker BA, Thiessen PA, Yu B, Zaslavsky L, Zhang J, Bolton EE (2019) PubChem 2019 update: improved access to chemical data. Nucleic Acids Res 47(D1):1102–1109. https://doi.org/10.1093/nar/gky1033
Gilson MK, Liu T, Baitaluk M, Nicola G, Hwang L, Chong J (2016) BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res 44(D1):1045–53. https://doi.org/10.1093/nar/gkv1072
Gilson MK (2019) BindingDB. https://www.bindingdb.org. Accessed 9 Jan 2019
Ursu O, Holmes J, Knockel J, Bologa CG, Yang JJ, Mathias SL, Nelson SJ, Oprea TI (2017) DrugCentral: online drug compendium. Nucleic Acids Res 45(D1):932–939. https://doi.org/10.1093/nar/gkw993
Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, Maciejewski A, Arndt D, Wilson M, Neveu V, Tang A, Gabriel G, Ly C, Adamjee S, Dame ZT, Han B, Zhou Y, Wishart DS (2014) DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res 42(Database issue):1091–1097. https://doi.org/10.1093/nar/gkt1068
Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z, Assempour N, Iynkkaran I, Liu Y, Maciejewski A, Gale N, Wilson A, Chin L, Cummings R, Le D, Pon A, Knox C, Wilson M (2018) DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res 46(D1):1074–1082. https://doi.org/10.1093/nar/gkx1037
Mathias SL, Hines-Kay J, Yang JJ, Zahoransky-Kohalmi G, Bologa CG, Ursu O, Oprea TI (2013) The CARLSBAD database: a confederated database of chemical bioactivities. Database 2013:044. https://doi.org/10.1093/database/bat044
Placzek S, Schomburg I, Chang A, Jeske L, Ulbrich M, Tillack J, Schomburg D (2017) Brenda in 2017: new perspectives and new tools in brenda. Nucleic Acids Res 45(D1):380–388. https://doi.org/10.1093/nar/gkw952
Sun J, Jeliazkova N, Chupakin V, Golib-Dzib J-F, Engkvist O, Carlsson L, Wegner J, Ceulemans H, Georgiev I, Jeliazkov V, Kochev N, Ashby TJ, Chen H (2017) ExCAPE-DB: an integrated large scale dataset facilitating big data analysis in chemogenomics. J Cheminform 9:17. https://doi.org/10.1186/s13321-017-0203-5
Güner OF (2002) History and evolution of the pharmacophore concept in computer-aided drug design. Curr Top Med Chem 2(12):1321–1332. https://doi.org/10.2174/1568026023392940
Patel Y, Gillet VJ, Bravi G, Leach AR (2002) A comparison of the pharmacophore identification programs: catalyst, disco and gasp. J Comput Aided Mol Des 16(8–9):653–681. https://doi.org/10.1023/a:1021954728347
Sliwoski G, Kothiwale S, Meiler J, Lowe EW (2014) Computational methods in drug discovery. Pharmacol Rev 66(1):334–395. https://doi.org/10.1124/pr.112.007336
Kolossov E, Lemon A (2006) Medicinal chemistry tools: making sense of hts data. Eur J Med Chem 41(2):166–175. https://doi.org/10.1016/j.ejmech.2005.10.005
Doke SK, Dhawale SC (2015) Alternatives to animal testing: a review. Saudi Pharm J 23(3):223–229. https://doi.org/10.1016/j.jsps.2013.11.002
Cronin MT, Jaworska JS, Walker JD, Comber MH, Watts CD, Worth AP (2003) Use of QSARs in international decision-making frameworks to predict health effects of chemical substances. Environ Health Perspect 111(10):1391–1401. https://doi.org/10.1289/ehp.5760
Hofer T, Gerner I, Gundert-Remy U, Liebsch M, Schulte A, Spielmann H, Vogel R, Wettig K (2004) Animal testing and alternative approaches for the human health risk assessment under the proposed new European chemicals regulation. Arch Toxicol 78(10):549–564. https://doi.org/10.1007/s00204-004-0577-9
Ashby J (1985) Fundamental structural alerts to potential carcinogenicity or noncarcinogenicity. Environ Mutagen 7(6):919–921. https://doi.org/10.1002/em.2860070613
Ashby J, Tennant RW (1991) Definitive relationships among chemical structure, carcinogenicity and mutagenicity for 301 chemicals tested by the U.S. NTP. Mutation Res 257(3):229–306. https://doi.org/10.1016/0165-1110(91)90003-e
Devillers J, Mombelli E, Samsera R (2011) Structural alerts for estimating the carcinogenicity of pesticides and biocides. SAR QSAR Environ Res 22(1–2):89–106. https://doi.org/10.1080/1062936X.2010.548349
Aptula AO, Patlewicz G, Roberts DW (2005) Skin sensitization: reaction mechanistic applicability domains for structure-activity relationships. Chem Res Toxicol 18(9):1420–1426. https://doi.org/10.1021/tx050075m
Roberts DW, Patlewicz G, Kern PS, Gerberick F, Kimber I, Dearman RJ, Ryan CA, Basketter DA, Aptula AO (2007) Mechanistic applicability domain classification of a local lymph node assay dataset for skin sensitization. Chem Res Toxicol 20(7):1019–1030. https://doi.org/10.1021/tx700024w
Blake JF (2005) Identification and evaluation of molecular properties related to preclinical optimization and clinical fate. Med Chem 1(6):649–655. https://doi.org/10.2174/157340605774598081
Hann M, Hudson B, Lewell X, Lifely R, Miller L, Ramsden N (1999) Strategic pooling of compounds for high-throughput screening. J Chem Inform Comput Sci 39(5):897–902. https://doi.org/10.1021/ci990423o
Pearce BC, Sofia MJ, Good AC, Drexler DM, Stock DA (2006) An empirical process for the design of high-throughput screening deck filters. J Chem Inform Model 46(3):1060–1068. https://doi.org/10.1021/ci050504m
Alves V, Muratov E, Capuzzi S, Politi R, Low Y, Braga R, Zakharov AV, Sedykh A, Mokshyna E, Farag S, Andrade CH, Kuz’min VE, Fourchesh D, Tropsha A (2016) Alarms about structural alerts. Green Chem 18(16):4348–4360. https://doi.org/10.1039/C6GC01492E
Labute P (2000) A widely applicable set of descriptors. J Mol Graph Model 18(4–5):464–477. https://doi.org/10.1016/s1093-3263(00)00068-1
Nantasenamat C, Isarankura-Na-Ayudhya C, Naenna T, Prachayasittikul V (2009) A practical overview of quantitative structure–activity relationship. EXCLI J 8:74–88. https://doi.org/10.17877/DE290R-690
Nantasenamat C, Isarankura-Na-Ayudhya C, Prachayasittikul V (2010) Advances in computational methods to predict the biological activity of compounds. Expert Opin Drug Discov 5(7):633–654. https://doi.org/10.1517/17460441.2010.492827
Randić M (2001) Novel shape descriptors for molecular graphs. J Chem Inform Comput Sci 41(3):607–613. https://doi.org/10.1021/ci0001031
Senese CL, Duca J, Pan D, Hopfinger AJ, Tseng YJ (2004) 4D-fingerprints, universal QSAR and QSPR descriptors. J Chem Inform Comput Sci 44(5):1526–1539. https://doi.org/10.1021/ci049898s
Shoombuatong W, Prathipati P, Owasirikul W, Worachartcheewan A, Simeon S, Anuwongcharoen N, Wikberg JES, Nantasenamat C (2017) Towards the revival of interpretable QSAR models. In: Roy K (ed) Advances in QSAR modeling challenges and advances in computational chemistry and physics, vol 24. Springer, Cham, pp 3–55. https://doi.org/10.1007/978-3-319-56850-8_1
Hawkins DM, Basak SC, Shi X (2001) QSAR with few compounds and many features. J Chem Inform Comput Sci 41(3):663–670. https://doi.org/10.1021/ci0001177
Rücker C, Rücker G, Meringer M (2007) y-randomization and its variants in QSPR/QSAR. J Chem Inform Model 47(6):2345–2357. https://doi.org/10.1021/ci700157b
Weaver S, Gleeson MP (2008) The importance of the domain of applicability in QSAR modeling. J Mol Graph Model 26(8):1315–1326. https://doi.org/10.1016/j.jmgm.2008.01.002
Gleeson MP, Modi S, Bender A, Robinson RLM, Kirchmair J, Promkatkaew M, Hannongbua S, Glen RC (2012) The challenges involved in modeling toxicity data in silico: a review. Curr Pharm Des 18(9):1266–1291. https://doi.org/10.2174/138161212799436359
Konovalov DA, Llewellyn LE, Vander Heyden Y, Coomans D (2008) Robust cross-validation of linear regression QSAR models. J Chem Inform Model 48(10):2081–2094. https://doi.org/10.1021/ci800209k
Eklund M, Norinder U, Boyer S, Carlsson L (2012) Application of conformal prediction in QSAR. IFIP Adv Inform Commun Technol 382:166–175. https://doi.org/10.1007/978-3-642-33412-2_17
Bosc N, Atkinson F, Felix E, Gaulton A, Hersey A, Leach AR (2019) Large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery. J Cheminform 11(1):4. https://doi.org/10.1186/s13321-018-0325-4
Gleeson MP, Montanari D (2012) Strategies for the generation, validation and application of in silico ADMET models in lead generation and optimization. Exp Opin Drug Metab Toxicol 8(11):1435–1446. https://doi.org/10.1517/17425255.2012.711317
Topliss JG, Edwards RP (1979) Chance factors in studies of quantitative structure–activity relationships. J Med Chem 22(10):1238–1244. https://doi.org/10.1021/jm00196a017
Lombardo F, Gifford E, Shalaeva MY (2003) In silico ADME prediction: data, models, facts and myths. Mini Rev Med Chem 3(8):861–875. https://doi.org/10.2174/1389557033487629
Wood DJ, Buttar D, Cumming JG, Davis AM, Norinder U, Rodgers SL (2011) Automated QSAR with a hierarchy of global and local models. Mol Inform 30(11–12):960–972. https://doi.org/10.1002/minf.201100107
Tetko IV, Bruneau P, Mewes H-W, Rohrer DC, Poda GI (2006) Can we estimate the accuracy of adme-tox predictions? Drug Disc Today 11(15–16):700–707. https://doi.org/10.1016/j.drudis.2006.06.013
37th Joint Meeting of the Chemicals Committee (2004) OECD principles for the validation, for regulatory purposes, of (quantitative) structure–activity relationship models. https://www.oecd.org/chemicalsafety/risk-assessment/37849783.pdf. Accessed 9 Jan 2019
Judson PN, Barber C, Canipa SJ, Poignant G, Williams R (2015) Establishing good computer modelling practice (gcmp) in the prediction of chemical toxicity. Mol Inform 34(5):276–283. https://doi.org/10.1002/minf.201400137
Tropsha A (2010) Best practices for QSAR model development, validation, and exploitation. Mol Inform 29(6–7):476–488. https://doi.org/10.1002/minf.201000061
Patel M, Chilton ML, Sartini A, Gibson L, Barber C, Covey-Crump L, Przybylak KR, Cronin MTD, Madden JC (2018) Assessment and reproducibility of quantitative structure–activity relationship models by the nonexpert. J Chem Inform Model 58(3):673–682. https://doi.org/10.1021/acs.jcim.7b00523
Arora PK, Patil VM, Gupta SP (2010) A QSAR study on some series of anti-hepatitis B virus (HBV) agents. Bioinformation 4(9):417–420. https://doi.org/10.6026/97320630004417
Kurdekar V, Jadhav HR (2015) A new open source data analysis python script for QSAR study and its validation. Med Chem Res 24(4):1617–1625. https://doi.org/10.1007/s00044-014-1240-5
Research Collaboratory for Structural Bioinformatics (2019) The Protein Data Bank (PDB). http://www.rcsb.org/pdb/. Accessed 9 Jan 2019
Fiser A, Sali A (2003) Modeller: generation and refinement of homology-based protein structure models. Methods Enzymol 374:461–491. https://doi.org/10.1016/S0076-6879(03)74020-8
Ewing TJ, Makino S, Skillman AG, Kuntz ID (2001) Dock 4.0: search strategies for automated molecular docking of flexible molecule databases. J Comput Aided Mol Des 15(5):411–428. https://doi.org/10.1023/a:1011115820450
Goodsell DS, Olson AJ (1990) Automated docking of substrates to proteins by simulated annealing. Proteins 8(3):195–202. https://doi.org/10.1002/prot.340080302
Warren GL, Andrews CW, Capelli AM, Clarke B, LaLonde J, Lambert MH, Lindvall M, Nevins N, Semus SF, Senger S, Tedesco G, Wall ID, Woolven JM, Peishoff CE, Head MS (2006) A critical assessment of docking programs and scoring functions. J Med Chem 49(20):5912–5931. https://doi.org/10.1021/jm050362n
Kubinyi H (1997) QSAR and 3D QSAR in drug design Part 2: applications and problems. Drug Discov Today 2:538–546. https://doi.org/10.1016/S1359-6446(97)01084-2
Kubinyi H (1997) QSAR and 3D QSAR in drug design Part 1: methodology. Drug Discov Today 2(11):457–467. https://doi.org/10.1016/S1359-6446(97)01079-9
Cramer RD, Wendt B (2007) Pushing the boundaries of 3D-QSAR. J Comput Aided Mol Des 21(1–3):23–32. https://doi.org/10.1007/s10822-006-9100-0
Leach AR (2001) Molecular modelling: principles and applications, 2nd edn. Pearson Education, Harlow
Menikarachchi LC, Gascón JA (2010) QM/MM approaches in medicinal chemistry research. Curr Top Med Chem 10(1):46–54. https://doi.org/10.2174/156802610790232297
Mulholland AJ (2007) Chemical accuracy in QM/MM calculations on enzyme-catalysed reactions. Chem Cent J 1:19. https://doi.org/10.1186/1752-153X-1-19
Senn HM, Thiel W (2007) QM/MM studies of enzymes. Curr Opin Chem Biol 11(2):182–187. https://doi.org/10.1016/j.cbpa.2007.01.684
Senn HM, Thiel W (2009) QM/MM methods for biomolecular systems. Angewandte Chemie 48(7):1198–1229. https://doi.org/10.1002/anie.200802019
Walker RC, Crowley MF, Case DA (2008) The implementation of a fast and accurate QM/MM potential method in amber. J Comput Chem 29(7):1019–1031. https://doi.org/10.1002/jcc.20857
Butcher EC, Berg EL, Kunkel EJ (2004) Systems biology in drug discovery. Nat Biotechnol 22(10):1253–1259. https://doi.org/10.1038/nbt1017
Pujol A, Mosca R, Farres J, Aloy P (2010) Unveiling the role of network and systems biology in drug discovery. Trends Pharmacol Sci 31(3):115–123. https://doi.org/10.1016/j.tips.2009.11.006
Keiser MJ, Setola V, Irwin JJ, Laggner C, Abbas AI, Hufeisen SJ, Jensen NH, Kuijer MB, Matos RC, Tran TB, Whaley R, Glennon RA, Hert J, Thomas KL, Edwards DD, Shoichet BK, Roth BL (2009) Predicting new molecular targets for known drugs. Nature 462(7270):175–181. https://doi.org/10.1038/nature08506
Ye H, Wei J, Tang K, Feuers R, Hong H (2016) Drug repositioning through network pharmacology. Curr Top Med Chem 16(30):3646–3656. https://doi.org/10.2174/1568026616666160530181328
Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, Shoichet BK (2007) Relating protein pharmacology by ligand chemistry. Nat Biotechnol 25(2):197–206. https://doi.org/10.1038/nbt1284
Wu W, Zhang R, Salahub DR (2009) Nelfinavir: a magic bullet to annihilate cancer cells? Cancer Biol Ther 8(3):233–235. https://doi.org/10.4161/cbt.8.3.7789
Dakshanamurthy S, Issa NT, Assefnia S, Seshasayee A, Peters OJ, Madhavan S, Uren A, Brown ML, Byers SW (2012) Predicting new indications for approved drugs using a proteochemometric method. J Med Chem 55(15):6832–6848. https://doi.org/10.1021/jm300576q
Schaduangrat N, Anuwongcharoen N, Phanus-umporn C, Sriwanichpoom N, Wikberg JES, Nantasenamat C (2019) Chapter 10—Proteochemometric modeling for drug repositioning. In: Roy K (ed) In Silico Drug Design. Academic Press, London, pp 281–302. https://doi.org/10.1016/B978-0-12-816125-8.00010-9
Waltemath D, Wolkenhauer O (2016) How modeling standards, software, and initiatives support reproducibility in systems biology and systems medicine. IEEE Trans Biomed Eng 63(10):1999–2006. https://doi.org/10.1109/TBME.2016.2555481
Medley JK, Goldberg AP, Karr JR (2016) Guidelines for reproducibly building and simulating systems biology models. IEEE Trans Biomed Eng 63(10):2015–2020. https://doi.org/10.1109/TBME.2016.2591960
Waltemath D, Henkel R, Winter F, Wolkenhauer O (2013) Reproducibility of model-based results in systems biology. In: Prokop A, Csukás B (eds) Syst Biol. Springer, Dordrecht, pp 301–320. https://doi.org/10.1007/978-94-007-6803-1_10
Le Novère N, Bornstein B, Broicher A, Courtot M, Donizelli M, Dharuri H, Li L, Sauro H, Schilstra M, Shapiro B, Snoep JL, Hucka M (2006) BioModels database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems. Nucleic Acids Res 34:689–691. https://doi.org/10.1093/nar/gkj092
Kirouac DC, Cicali B, Schmidt S (2019) Reproducibility of quantitative systems pharmacology models: current challenges and future opportunities. CPT Pharmacometrics Syst Pharmacol 8(4):205–210. https://doi.org/10.1002/psp4.12390
Watanabe L, Barhak J, Myers C (2019) Toward reproducible disease models using the systems biology markup language. Simulation 95(10):895–930. https://doi.org/10.1177/0037549718793214
Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A, Cuellar AA, Dronov S, Gilles ED, Ginkel M, Gor V, Goryanin II, Hedley WJ, Hodgman TC, Hofmeyr JH, Hunter PJ, Juty NS, Kasberger JL, Kremling A, Kummer U, Le Novere N, Loew LM, Lucio D, Mendes P, Minch E, Mjolsness ED, Nakayama Y, Nelson MR, Nielsen PF, Sakurada T, Schaff JC, Shapiro BE, Shimizu TS, Spence HD, Stelling J, Takahashi K, Tomita M, Wagner J, Wang J (2003) The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19(4):524–531. https://doi.org/10.1093/bioinformatics/btg015
Swat MJ, Moodie S, Wimalaratne SM, Kristensen NR, Lavielle M, Mari A, Magni P, Smith MK, Bizzotto R, Pasotti L, Mezzalana E, Comets E, Sarr C, Terranova N, Blaudez E, Chan P, Chard J, Chatel K, Chenel M, Edwards D, Franklin C, Giorgino T, Glont M, Girard P, Grenon P, Harling K, Hooker AC, Kaye R, Keizer R, Kloft C, Kok JN, Kokash N, Laibe C, Laveille C, Lestini G, Mentre F, Munafo A, Nordgren R, Nyberg HB, Parra-Guillen ZP, Plan E, Ribba B, Smith G, Troconiz IF, Yvon F, Milligan PA, Harnisch L, Karlsson M, Hermjakob H, Le Novere N (2015) Pharmacometrics Markup Language (PharmML): opening new perspectives for model exchange in drug development. CPT Pharmacometrics Syst Pharmacol 4(6):316–319. https://doi.org/10.1002/psp4.57
Barhak J (2019) MIST: Micro-simulation tool to support disease modeling. https://github.com/scipy-conference/scipy2013_talks/tree/master/talks/jacob_barhak. Accessed 1 Nov 2019
Hedley WJ, Nelson MR, Bullivant DP, Nielsen PF (2001) A short introduction to cellML. Philos Trans R Soc A 359(1783):1073–1089. https://doi.org/10.1098/rsta.2001.0817
Medley JK, Choi K, Konig M, Smith L, Gu S, Hellerstein J, Sealfon SC, Sauro HM (2018) Tellurium notebooks—an environment for reproducible dynamical modeling in systems biology. PLoS Comput Biol 14(6):1006220. https://doi.org/10.1371/journal.pcbi.1006220
Choi K, Medley JK, Konig M, Stocking K, Smith L, Gu S, Sauro HM (2018) Tellurium: an extensible python-based modeling environment for systems and synthetic biology. BioSystems 171:74–79. https://doi.org/10.1016/j.biosystems.2018.07.006
Kolpakov F, Akberdin I, Kashapov T, Kiselev L, Kolmykov S, Kondrakhin Y, Kutumova E, Mandrik N, Pintus S, Ryabova A, Sharipov R, Yevshin I, Kel A (2019) BioUML: an integrated environment for systems biology and collaborative analysis of biomedical data. Nucleic Acids Res 47(W1):225–233. https://doi.org/10.1093/nar/gkz440
Drawert B, Trogdon M, Toor S, Petzold L, Hellander A (2016) MOLNs: A cloud platform for interactive, reproducible, and scalable spatial stochastic computational experiments in systems biology using PyURDME. SIAM J Sci Comput 38(3):179–202. https://doi.org/10.1137/15M1014784
Schadt EE, Linderman MD, Sorenson J, Lee L, Nolan GP (2010) Computational solutions to large-scale data management and analysis. Nat Rev Genet 11(9):647–657. https://doi.org/10.1038/nrg2857
Noble WS (2009) A quick guide to organizing computational biology projects. PLoS Comput Biol 5(7):1000424. https://doi.org/10.1371/journal.pcbi.1000424
Hassan M, Brown RD, VarmaO’brien S, Rogers D (2006) Cheminformatics analysis and learning in a data pipelining environment. Mol Divers 10(3):283–299. https://doi.org/10.1007/s11030-006-9041-5
Berthold MR, Cebron N, Dill F, Gabriel TR, Kötter T, Meinl T, Ohl P, Thiel K, Wiswedel B (2009) KNIME—the Konstanz information miner. ACM SIGKDD Explor Newslett 11(1):26. https://doi.org/10.1145/1656274.1656280
Cox R, Green DVS, Luscombe CN, Malcolm N, Pickett SD (2013) QSAR workbench: automating QSAR modeling to drive compound design. J Comput Aided Mol Des 27(4):321–336. https://doi.org/10.1007/s10822-013-9648-4
Steinmetz FP, Mellor CL, Meinl T, Cronin MTD (2015) Screening chemicals for receptor-mediated toxicological and pharmacological endpoints: using public data to build screening tools within a KNIME workflow. Mol Inform 34(2–3):171–178. https://doi.org/10.1002/minf.201400188
Nicola G, Berthold MR, Hedrick MP, Gilson MK (2015) Connecting proteins with drug-like compounds: open source drug discovery workflows with BindingDB and KNIME. Database. https://doi.org/10.1093/database/bav087
Mazanetz MP, Marmon RJ, Reisser CBT, Morao I (2012) Drug discovery applications for knime: an open source data mining platform. Curr Top Med Chem 12(18):1965–1979. https://doi.org/10.2174/156802612804910331
Kuhn T, Willighagen EL, Zielesny A, Steinbeck C (2010) Cdk-taverna: an open workflow environment for cheminformatics. BMC Bioinform 11:159. https://doi.org/10.1186/1471-2105-11-159
Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E (2003) The Chemistry Development Kit (CDK): an open-source Java Library for Chemo- and Bioinformatics. J Chem Inform Comput Sci 43(2):493–500. https://doi.org/10.1021/ci025584y
Willighagen EL, Mayfield JW, Alvarsson J, Berg A, Carlsson L, Jeliazkova N, Kuhn S, Pluskal T, Rojas-Chertó M, Spjuth O, Torrance G, Evelo CT, Guha R, Steinbeck C (2017) The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminform 9:33. https://doi.org/10.1186/s13321-017-0220-4
Lucas X, Grüning BA, Günther S (2014) ChemicalToolBoX and its application on the study of the drug like and purchasable space. J Cheminform 6(Suppl 1):51. https://doi.org/10.1186/1758-2946-6-S1-P51
Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C (2017) Nextflow enables reproducible computational workflows. Nat Biotechnol 35(4):316–319. https://doi.org/10.1038/nbt.3820
Köster J, Rahmann S (2012) Snakemake-a scalable bioinformatics workflow engine. Bioinformatics 28(19):2520–2522. https://doi.org/10.1093/bioinformatics/bts480
Goodstadt L (2010) Ruffus: a lightweight python library for computational pipelines. Bioinformatics 26(21):2778–2779. https://doi.org/10.1093/bioinformatics/btq524
Sadedin SP, Pope B, Oshlack A (2012) Bpipe: a tool for running and managing bioinformatics pipelines. Bioinformatics 28(11):1525–1526. https://doi.org/10.1093/bioinformatics/bts167
Brandt J, Reisig W, Leser ULF (2017) Computation semantics of the functional scientific workflow language cuneiform. J Funct Program. https://doi.org/10.1017/S0956796817000119
Bernhardsson E, Freider E, Rouhani A (2012) Luigi GitHub repository. https://github.com/spotify/luigi
Wilson G, Aruliah DA, Brown CT, Chue Hong NP, Davis M, Guy RT, Haddock SH, Huff KD, Mitchell IM, Plumbley MD, Waugh B, White EP, Wilson P (2014) Best practices for scientific computing. PLoS Biol 12(1):1001745. https://doi.org/10.1371/journal.pbio.1001745
Taschuk M, Wilson G (2017) Ten simple rules for making research software more robust. PLoS Comput Biol 13(4):1005412. https://doi.org/10.1371/journal.pcbi.1005412
Nowotka MM, Gaulton A, Mendez D, Bento AP, Hersey A, Leach A (2017) Using ChEMBL web services for building applications and data processing workflows relevant to drug discovery. Exp Opin Drug Discov 12(8):757–767. https://doi.org/10.1080/17460441.2017.1339032
Alvarsson J, Lampa S, Schaal W, Andersson C, Wikberg JES, Spjuth O (2016) Large-scale ligand-based predictive modelling using support vector machines. J Cheminform 8:39. https://doi.org/10.1186/s13321-016-0151-5
Lampa S, Alvarsson J, Spjuth O (2016) Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles. J Cheminform 8:67. https://doi.org/10.1186/s13321-016-0179-6
Yoo AB, Jette MA, Grondona M (2003) SLURM: simple linux utility for resource management. In: Feitelson D, Rudolph L, Schwiegelshohn U (eds) Job scheduling strategies for parallel processing. Lecture notes in computer science, vol 2862. Springer, Berlin, pp 44–60
Amstutz P, Crusoe MR, Tijanić N, Chapman B, Chilton J, Heuer M, Kartashov A, Leehr D, Ménager H, Nedeljkovich M, Scales M, Soiland-Reyes S, Stojanovic L (2019) Common Workflow Language, v1.0. https://doi.org/10.6084/m9.figshare.3115156.v2. Accessed 9 Jan 2019
Chapman B, Gentry J, Lin M, Magee P, O’Connor B, Prabhakaran A, Van der Auwera G (2019) OpenWDL. http://www.openwdl.org/. Accessed 9 Jan 2019
Davie P (2010) Cloud computing: a drug discovery game changer? Innov Pharm Technol 33:34–36
Dudley JT, Butte AJ (2010) In silico research in the era of cloud computing. Nat Biotechnol 28(11):1181–1185. https://doi.org/10.1038/nbt1110-1181
Garg V, Arora S, Gupta C (2011) Cloud computing approaches to accelerate drug discovery value chain. Comb Chem High Throughput Screen 14(10):861–871. https://doi.org/10.2174/138620711797537085
Moghadam BT, Alvarsson J, Holm M, Eklund M, Carlsson L, Spjuth O (2015) Scaling predictive modeling in drug development with cloud computing. J Chem Inform Model 55(1):19–25. https://doi.org/10.1021/ci500580y
Hurley DG, Budden DM, Crampin EJ (2015) Virtual reference environments: a simple way to make research reproducible. Brief Bioinform 16(5):901–903. https://doi.org/10.1093/bib/bbu043
Piccolo SR, Frampton MB (2016) Tools and techniques for computational reproducibility. GigaScience 5(1):30. https://doi.org/10.1186/s13742-016-0135-4
Jaghoori MM, Bleijlevens B, Olabarriaga SD (2016) 1001 ways to run AutoDock Vina for virtual screening. J Comput Aided Mol Des 30(3):237–249. https://doi.org/10.1007/s10822-016-9900-9
McGuire R, Verhoeven S, Vass M, Vriend G, de Esch IJ, Lusher SJ, Leurs R, Ridder L, Kooistra AJ, Ritschel T, de Graaf C (2017) 3D-e-Chem-VM: structural cheminformatics research infrastructure in a freely available virtual machine. J Chem Inf Model 57(2):115–121. https://doi.org/10.1021/acs.jcim.6b00686
Alvim-Gaston M, Grese T, Mahoui A, Palkowitz AD, Pineiro-Nunez M, Watson I (2014) Open Innovation Drug Discovery (OIDD): a potential path to novel therapeutic chemical space. Curr Top Med Chem 14(3):294–303. https://doi.org/10.2174/1568026613666131127125858
Ochoa R, Davies M, Papadatos G, Atkinson F, Overington JP (2014) myChEMBL: a virtual machine implementation of open data and cheminformatics tools. Bioinformatics 30(2):298–300. https://doi.org/10.1093/bioinformatics/btt666
Ellingson SR, Baudry J (2011) High-throughput virtual molecular docking: Hadoop implementation of AutoDock4 on a private cloud. In: Proceedings of the second international workshop on emerging computational methods for the life sciences - ECMLS’11. ACM Press, New York, pp 33–38. https://doi.org/10.1145/1996023.1996028
Capuccini M, Ahmed L, Schaal W, Laure E, Spjuth O (2017) Large-scale virtual screening on public cloud resources with apache spark. J Cheminform 9:15. https://doi.org/10.1186/s13321-017-0204-4
Georgieva P, Lapins M, Spjuth O, Wikberg J (2019) Pharmaceutical bioinformatics: A free internet course for international and Swedish students offered by the University of Uppsala. http://www.pharmbio.org/. Accessed 1 Nov 2019
Dahlö M, Haziza F, Kallio A, Korpelainen E, Bongcam-Rudloff E, Spjuth O (2015) BioImg.org: a catalog of virtual machine images for the life sciences. Bioinform Biol Insights 9:125–128. https://doi.org/10.4137/BBI.S28636
Cito J, Gall HC (2016) Using docker containers to improve reproducibility in software engineering research. In: Proceedings of the 38th international conference on software engineering companion—ICSE ’16. ACM Press, New York, pp 906–907
Kurtzer GM, Sochat V, Bauer MW (2017) Singularity: Scientific containers for mobility of compute. PLoS ONE 12(5):0177459. https://doi.org/10.1371/journal.pone.0177459
Gomes J, Campos I, Bagnaschi E, David M, Alves L, Martins J, Pina J, Lopez-Garcia A, Orviz P (2017) Enabling rootless linux containers in multi-user environments: the udocker tool. Comput Phys Commun 232:84–97. https://doi.org/10.1016/j.cpc.2018.05.021
Warr WA (2012) Scientific workflow systems: pipeline pilot and knime. J Comput Aided Mol Des 26(7):801–804. https://doi.org/10.1007/s10822-012-9577-7
Suhartanto H, Pasaribu AP, Siddiq MF, Fadhila MI, Hilman MH, Yanuar A (2017) A preliminary study on shifting from virtual machine to docker container for insilico drug discovery in the cloud. Int J Technol 8(4):611. https://doi.org/10.14716/ijtech.v8i4.9478
Fong J (2019) How GlaxoSmithKline is Accelerating Science with Docker Enterprise Edition. https://blog.docker.com/2017/10/how-gsk-is-accelerating-science-with-dockeree/. Accessed 9 Jan 2019
Altae-Tran H, Ramsundar B, Pappu AS, Pande V (2017) Low data drug discovery with one-shot learning. ACS Cent Sci 3(4):283–293. https://doi.org/10.1021/acscentsci.6b00367
OpenRiskNet (2019) Open e-infrastructure to support data sharing, knowledge integration and in silico analysis and modelling in predictive toxicology and risk assessment. http://www.openrisknet.org/. Accessed 9 Jan 2019
Belmann P, Dröge J, Bremges A, McHardy AC, Sczyrba A, Barton MD (2015) Bioboxes: standardised containers for interchangeable bioinformatics software. GigaScience 4:47. https://doi.org/10.1186/s13742-015-0087-0
Li W, Kanso A (2015) Comparing containers versus virtual machines for achieving high availability. In: 2015 IEEE international conference on cloud engineering. IEEE, New Jersey, pp 353–358. https://doi.org/10.1109/IC2E.2015.79
Spjuth O, Willighagen EL, Guha R, Eklund M, Wikberg JE (2010) Towards interoperable and reproducible QSAR analyses: exchange of datasets. J Cheminform 2(1):5. https://doi.org/10.1186/1758-2946-2-5
Ruusmann V, Sild S, Maran U (2014) QSAR databank—an approach for the digital organization and archiving of QSAR model information. J Cheminform 6:25. https://doi.org/10.1186/1758-2946-6-25
Ruusmann V, Sild S, Maran U (2015) QSAR databank repository: open and linked qualitative and quantitative structure-activity relationship models. J Cheminform 7(1):32. https://doi.org/10.1186/s13321-015-0082-6
Joint Research Centre, The European’s Commission’s science and knowledge service (2019) (Q)SAR Model Reporting Format Database. https://qsardb.jrc.ec.europa.eu/qmrf/. Accessed 1 Nov 2019
Hastings J, Jeliazkova N, Owen G, Tsiliki G, Munteanu CR, Steinbeck C, Willighagen E (2015) eNanoMapper: harnessing ontologies to enable data integration for nanomaterial risk assessment. J Biomed Demant 6(1):10
Guazzelli A, Zeller M, Lin W-C, Williams G et al (2009) PMML: an open standard for sharing models. R J 1(1):60–65
Center for Computational Science Research, Inc. (2019) Data Mining Group. http://dmg.org/. Accessed 1 Nov 2019
Fillbrunn A (2019) PMML integration in KNIME. https://www.knime.com/blog/pmml-integration-in-knime/. Accessed 1 Nov 2019
ONNX Project Contributors (2019) Open Neural Network Exchange Format: The open ecosystem for interchangeable AI models. https://onnx.ai/. Accessed 1 Nov 2019
Stålring JC, Carlsson LA, Almeida P, Boyer S (2011) AZOrange—high performance open source machine learning for QSAR modeling in a graphical programming environment. J Cheminform 3:28. https://doi.org/10.1186/1758-2946-3-28
Dixon SL, Duan J, Smith E, Von Bargen CD, Sherman W, Repasky MP (2016) AutoQSAR: an automated machine learning tool for best-practice quantitative structure-activity relationship modeling. Fut Med Chem 8(15):1825–1839. https://doi.org/10.4155/fmc-2016-0093
Nantasenamat C, Worachartcheewan A, Jamsak S, Preeyanon L, Shoombuatong W, Simeon S, Mandi P, Isarankura-Na-Ayudhya C, Prachayasittikul V (2015) AutoWeka: toward an automated data mining software for QSAR and QSPR studies. Methods Mol Biol 1260:119–147. https://doi.org/10.1007/978-1-4939-2239-0_8
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software. ACM SIGKDD Explor Newslett 11(1):10. https://doi.org/10.1145/1656274.1656278
Kausar S, Falcao AO (2018) An automated framework for QSAR model building. J Cheminform 10(1):1. https://doi.org/10.1186/s13321-017-0256-5
Dong J, Yao Z-J, Zhu M-F, Wang N-N, Lu B, Chen AF, Lu A-P, Miao H, Zeng W-B, Cao D-S (2017) ChemSAR: an online pipelining platform for molecular SAR modeling. J Cheminform 9(1):27. https://doi.org/10.1186/s13321-017-0215-1
Tsiliki G, Munteanu CR, Seoane JA, Fernandez-Lozano C, Sarimveis H, Willighagen EL (2015) Rregrs: an r package for computer-aided model selection with multiple regression models. J Cheminform 7:46. https://doi.org/10.1186/s13321-015-0094-2
Murrell DS, Cortes-Ciriano I, van Westen GJP, Stott IP, Bender A, Malliavin TE, Glen RC (2015) Chemically aware model builder (camb): an r package for property and bioactivity modelling of small molecules. J Cheminform 7:45. https://doi.org/10.1186/s13321-015-0086-2
Shamsara J (2017) Ezqsar: an R package for developing QSAR models directly from structures. Open Med Chem J 11:212–221. https://doi.org/10.2174/1874104501711010212
Nantasenamat C (2020) Best practices for constructing reproducible QSAR models. In: Roy K (ed) Ecotoxicological QSARs. Humana Press, New Jersey
Rule A, Birmingham A, Zuniga C, Altintas I, Huang S-C, Knight R, Moshiri N, Nguyen MH, Rosenthal SB, Pérez F, Rose PW (2019) Ten simple rules for writing and sharing computational analyses in jupyter notebooks. PLoS Comput Biol 15(7):1007007
Landrum G (2019) RDKit tutorials. Available online: https://github.com/greglandrum/. Accessed 1 Nov 2019
RDKit (2019) RDKit: Open-Source Cheminformatics Software. https://www.rdkit.org/. Accessed 1 Nov 2019
RDKit GitHub (2019) RDKit. https://github.com/rdkit/rdkit-tutorials/. Accessed 1 Nov 2019
OpenEye Scientific Software, Inc (2019) OpenEye Python Cookbook. https://docs.eyesopen.com/toolkits/cookbook/python/. Accessed 1 Nov 2019
Informatics Matters Ltd (2019) Squonk Computational Notebook. https://squonk.it/. Accessed 1 Nov 2019
CDK (2019) Chemistry Development Kit: Open Source modular Java libraries for Cheminformatics. https://cdk.github.io/. Accessed 1 Nov 2019
Jansen JM, Cornell W, Tseng YJ, Amaro RE (2012) Teach-Discover-Treat (TDT): collaborative computational drug discovery for neglected diseases. J Mol Graph Model 38:360–362. https://doi.org/10.1016/j.jmgm.2012.07.007
Riniker S, Landrum GA, Montanari F, Villalba SD, Maier J, Jansen JM, Walters WP, Shelat AA (2017) Virtual-screening workflow tutorials and prospective results from the Teach-Discover-Treat competition 2014 against malaria. F1000 Res 6:1136. https://doi.org/10.12688/f1000research.11905.2
Riniker S, Landrum GA, Montanari F, Villalba SD, Maier J, Jansen, JM, Walters WP, Shelat AA (2019) Tutorial for the Teach-Discover-Treat (TDT) competition 2014-Challenge 1: anti-malaria hit finding using classifier-fusion boosted predictive models. https://github.com/sriniker/TDT-tutorial-2014/. Accessed 1 Nov 2019
Sydow D, Morger A, Driller M, Volkamer A (2019) TeachOpenCADD: a teaching platform for computer-aided drug design using open source packages and data. J Cheminform 11:29. https://doi.org/10.1186/s13321-019-0351-x
Kluyver T, Ragan-Kelley B, Pérez F, Granger B, Bussonnier M, Frederic J, Kelley K, Hamrick J, Grout J, Corlay S, Ivanov P, Avila D, Abdalla S, Willing C (2016) development team, J.: Jupyter notebooks - a publishing format for reproducible computational workflows. In: Loizides F, Scmidt B (eds) Positioning and power in Academic Publishing: players, agents and agendas. IOS Press, Amsterdam, pp 87–90. https://eprints.soton.ac.uk/403913/
Grünberg R, Nilges M, Leckner J (2007) Biskit-a software platform for structural bioinformatics. Bioinformatics 23(6):769–770. https://doi.org/10.1093/bioinformatics/btl655
Daniluk P, Wilczyński B, Lesyng B (2015) WeBIAS: a web server for publishing bioinformatics applications. BMC Res Notes 8:628. https://doi.org/10.1186/s13104-015-1622-x
Osz Á, Pongor LS, Szirmai D, Gyorffy B (2017) A snapshot of 3649 web-based services published between 1994 and 2017 shows a decrease in availability after 2 years. Brief Bioinform. https://doi.org/10.1093/bib/bbx159
RStudio Inc. (2018) Shiny. https://shiny.rstudio.com/
Plotly (2019) Dash. https://plot.ly/products/dash/. Accessed 9 Jan 2019
Plotly (2019) Plotly: Modern analytic apps for the enterprise. https://plot.ly/. Accessed 9 Jan 2019
Nantasenamat C (2019) Conceptual map of computational drug discovery [CC-BY]. https://doi.org/10.6084/m9.figshare.5979400
Synergy Research Group (2019) The leading cloud providers continue to run away with the market. https://www.srgresearch.com/articles/leading-cloud-providers-continue-run-away-market/. Accessed 9 Jan 2019
Dong J, Yao Z-J, Wen M, Zhu M-F, Wang N-N, Miao H-Y, Lu A-P, Zeng W-B, Cao D-S (2016) Biotriangle: a web-accessible platform for generating various molecular representations for chemicals, proteins, dnas/rnas and their interactions. J Cheminform 8:34. https://doi.org/10.1186/s13321-016-0146-2
Dong J, Cao D-S, Miao H-Y, Liu S, Deng B-C, Yun Y-H, Wang N-N, Lu A-P, Zeng W-B, Chen AF (2015) Chemdes: an integrated web-based platform for molecular descriptor and fingerprint computation. J Cheminform 7:60. https://doi.org/10.1186/s13321-015-0109-z
Walker T, Grulke CM, Pozefsky D, Tropsha A (2010) Chembench: a cheminformatics workbench. Bioinformatics 26(23):3000–3001. https://doi.org/10.1093/bioinformatics/btq556
Sushko I, Novotarskyi S, Körner R, Pandey AK, Rupp M, Teetz W, Brandmaier S, Abdelaziz A, Prokopenko VV, Tanchuk VY et al (2011) Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J Comput Aided Mol Des 25(6):533–554. https://doi.org/10.1007/s10822-011-9440-2
González-Medina M, Medina-Franco JL (2017) Platform for unified molecular analysis: Puma. J Chem Inform Model 57(8):1735–1740. https://doi.org/10.1021/acs.jcim.7b00253
van Zundert GCP, Rodrigues JPGLM, Trellet M, Schmitz C, Kastritis PL, Karaca E, Melquiond ASJ, van Dijk M, de Vries SJ, Bonvin AMJJ (2016) The haddock2.2 web server: user-friendly integrative modeling of biomolecular complexes. J Mol Biol 428(4):720–725. https://doi.org/10.1016/j.jmb.2015.09.014
Camps J, Carrillo O, Emperador A, Orellana L, Hospital A, Rueda M, Cicin-Sain D, D’Abramo M, Gelpí JL, Orozco M (2009) FlexServ: an integrated tool for the analysis of protein flexibility. Bioinformatics 25(13):1709–1710. https://doi.org/10.1093/bioinformatics/btp304
Hospital A, Andrio P, Fenollosa C, Cicin-Sain D, Orozco M, Gelpí JL (2012) MDWeb and MDMoby: an integrated web-based platform for molecular dynamics simulations. Bioinformatics 28(9):1278–1279. https://doi.org/10.1093/bioinformatics/bts139
Stierand K, Maass PC, Rarey M (2006) Molecular complexes at a glance: automated generation of two-dimensional complex diagrams. Bioinformatics 22(14):1710–1716. https://doi.org/10.1093/bioinformatics/btl150
Biasini M, Bienert S, Waterhouse A, Arnold K, Studer G, Schmidt T, Kiefer F, Gallo Cassarino T, Bertoni M, Bordoli L, Torsten S (2014) Swiss-model: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Res 42(Web Server issue):252–8. https://doi.org/10.1093/nar/gku340