Springer Science and Business Media LLC
1758-2946
Cơ quản chủ quản: BMC , Chemistry Central
Lĩnh vực:
Physical and Theoretical ChemistryComputer Science ApplicationsLibrary and Information SciencesComputer Graphics and Computer-Aided Design
Phân tích ảnh hưởng
Thông tin về tạp chí
Các bài báo tiêu biểu
On InChI and evaluating the quality of cross-reference links
Tập 6 - Trang 1-15 - 2014
There are many databases of small molecules focused on different aspects of research and its applications. Some tasks may require integration of information from various databases. However, determining which entries from different databases represent the same compound is not straightforward. Integration can be based, for example, on automatically generated cross-reference links between entries. Another approach is to use the manually curated links stored directly in databases. This study employs well-established InChI identifiers to measure the consistency and completeness of the manually curated links by comparing them with the automatically generated ones. We used two different tools to generate InChI identifiers and observed some ambiguities in their outputs. In part, these ambiguities were caused by indistinctness in interpretation of the structural data used. InChI identifiers were used successfully to find duplicate entries in databases. We found that the InChI inconsistencies in the manually curated links are very high (28.85% in the worst case). Even using a weaker definition of consistency, the measured values were very high in general. The completeness of the manually curated links was also very poor (only 93.8% in the best case) compared with that of the automatically generated links. We observed several problems with the InChI tools and the files used as their inputs. There are large gaps in the consistency and completeness of manually curated links if they are measured using InChI identifiers. However, inconsistency can be caused both by errors in manually curated links and the inherent limitations of the InChI method.
Exploration and augmentation of pharmacological space via adversarial auto-encoder model for facilitating kinase-centric drug development
Tập 13 - Trang 1-15 - 2021
Predicting compound–protein interactions (CPIs) is of great importance for drug discovery and repositioning, yet still challenging mainly due to the sparse nature of CPI matrixes, resulting in poor generalization performance. Hence, unlike typical CPI prediction models focused on representation learning or model selection, we propose a deep neural network-based strategy, PCM-AAE, that re-explores and augments the pharmacological space of kinase inhibitors by introducing the adversarial auto-encoder model (AAE) to improve the generalization of the prediction model. To complete the data space, we constructed Ensemble of PCM-AAE (EPA), an ensemble model that quickly and accurately yields quantitative predictions of binding affinity between any human kinase and inhibitor. In rigorous internal validation, EPA showed excellent performance, consistently outperforming the model trained with the imbalanced set, especially for targets with relatively fewer training data points. Improved prediction accuracy of EPA for external datasets enhances its generalization ability, making it possible to gracefully handle previously unseen kinases and inhibitors. EPA showed promising potential when directly applied to virtual screening and off-target prediction, exhibiting its practicality in hit prediction. Our strategy is expected to facilitate kinase-centric drug development, as well as to solve more challenging prediction problems with insufficient data points.
Neural network based classification of acute toxicity of phthalate esters to fathead minnow
Tập 5 - Trang 1-1 - 2013
Avoiding hERG-liability in drug design via synergetic combinations of different (Q)SAR methodologies and data sources: a case study in an industrial setting
Tập 11 - Trang 1-13 - 2019
In this paper, we explore the impact of combining different in silico prediction approaches and data sources on the predictive performance of the resulting system. We use inhibition of the hERG ion channel target as the endpoint for this study as it constitutes a key safety concern in drug development and a potential cause of attrition. We will show that combining data sources can improve the relevance of the training set in regard of the target chemical space, leading to improved performance. Similarly we will demonstrate that combining multiple statistical models together, and with expert systems, can lead to positive synergistic effects when taking into account the confidence in the predictions of the merged systems. The best combinations analyzed display a good hERG predictivity. Finally, this work demonstrates the suitability of the SOHN methodology for building models in the context of receptor based endpoints like hERG inhibition when using the appropriate pharmacophoric descriptors.
Data-driven identification of structural alerts for mitigating the risk of drug-induced human liver injuries
Tập 7 - Trang 1-8 - 2015
The use of structural alerts to de-prioritize compounds with undesirable features as drug candidates has been gaining in popularity. Hundreds of molecular structural moieties have been proposed as structural alerts. An emerging issue is that strict application of these alerts will result in a significant reduction of the chemistry space for new drug discovery, as more than half of the oral drugs on the market match at least one of the alerts. To mitigate this issue, we propose to apply a rigorous statistical analysis to derive/validate structural alerts before use. To derive human liver toxicity structural alerts, we retrieved all small-molecule entries from LiverTox, a U.S. National Institutes of Health online resource for information on human liver injuries induced by prescription and over-the-counter drugs and dietary supplements. We classified the compounds into hepatotoxic, nonhepatotoxic, and possible hepatotoxic classes, and performed detailed statistical analyses to identify molecular structural fragments highly enriched in the hepatotoxic class beyond random distribution as structural alerts for human liver injuries. We identified 12 molecular fragments present in multiple marketed drugs that one can consider as common “drug-like” fragments, yet they are strongly associated with drug-induced human liver injuries. Thus, these fragments may be considered as robust hepatotoxicity structural alerts suitable for use in drug discovery screening programs. The use of structural alerts has contributed to the identification of many compounds with potential toxicity issues in modern drug discovery. However, with a large number of structural alerts published to date without proper validation, application of these alerts may restrict the chemistry space and prevent discovery of valuable drugs. To mitigate this issue, we showed how to use statistical analyses to develop a small, robust, and broadly applicable set of structural alerts.
Prediction of the partition coefficient between air and body compartments from the chemical structure
Tập 2 - Trang 1-1 - 2010
“Ask Ernö”: a self-learning tool for assignment and prediction of nuclear magnetic resonance spectra
Tập 8 - Trang 1-8 - 2016
We present “Ask Ernö”, a self-learning system for the automatic analysis of NMR spectra, consisting of integrated chemical shift assignment and prediction tools. The output of the automatic assignment component initializes and improves a database of assigned protons that is used by the chemical shift predictor. In turn, the predictions provided by the latter facilitate improvement of the assignment process. Iteration on these steps allows Ask Ernö to improve its ability to assign and predict spectra without any prior knowledge or assistance from human experts. This concept was tested by training such a system with a dataset of 2341 molecules and their 1H-NMR spectra, and evaluating the accuracy of chemical shift predictions on a test set of 298 partially assigned molecules (2007 assigned protons). After 10 iterations, Ask Ernö was able to decrease its prediction error by 17 %, reaching an average error of 0.265 ppm. Over 60 % of the test chemical shifts were predicted within 0.2 ppm, while only 5 % still presented a prediction error of more than 1 ppm.
Ask Ernö introduces an innovative approach to automatic NMR analysis that constantly learns and improves when provided with new data. Furthermore, it completely avoids the need for manually assigned spectra. This system has the potential to be turned into a fully autonomous tool able to compete with the best alternatives currently available.
Combatting over-specialization bias in growing chemical databases
Tập 15 - Trang 1-17 - 2023
Predicting in advance the behavior of new chemical compounds can support the design process of new products by directing the research toward the most promising candidates and ruling out others. Such predictive models can be data-driven using Machine Learning or based on researchers’ experience and depend on the collection of past results. In either case: models (or researchers) can only make reliable assumptions about compounds that are similar to what they have seen before. Therefore, consequent usage of these predictive models shapes the dataset and causes a continuous specialization shrinking the applicability domain of all trained models on this dataset in the future, and increasingly harming model-based exploration of the space. In this paper, we propose cancels (CounterActiNg Compound spEciaLization biaS), a technique that helps to break the dataset specialization spiral. Aiming for a smooth distribution of the compounds in the dataset, we identify areas in the space that fall short and suggest additional experiments that help bridge the gap. Thereby, we generally improve the dataset quality in an entirely unsupervised manner and create awareness of potential flaws in the data. cancels does not aim to cover the entire compound space and hence retains a desirable degree of specialization to a specified research domain. An extensive set of experiments on the use-case of biodegradation pathway prediction not only reveals that the bias spiral can indeed be observed but also that cancels produces meaningful results. Additionally, we demonstrate that mitigating the observed bias is crucial as it cannot only intervene with the continuous specialization process, but also significantly improves a predictor’s performance while reducing the number of required experiments. Overall, we believe that cancels can support researchers in their experimentation process to not only better understand their data and potential flaws, but also to grow the dataset in a sustainable way. All code is available under
github.com/KatDost/Cancels
.