HastaLaVista, a web-based user interface for NMR-based untargeted metabolic profiling analysis in biomedical sciences: towards a new publication standardSpringer Science and Business Media LLC - - 2019
Julien Wist
AbstractMetabolic profiling has been shown to be useful to improve our understanding of complex metabolic processes. Shared data are key to the analysis and validation of metabolic profiling and untargeted spectral analysis and may increase the pace of new discovery. Improving the existing portfolio of open software may increase the fraction of shared data by decreasing the amount of effort required to publish them in a manner that is useful to others. However, a weakness of open software, when compared to commercial ones, is the lack of user-friendly graphical interface that may discourage inexperienced researchers. Here, a web-browser-oriented solution is presented and demonstrated for metabolic profiling analysis that combines the power of R for back-end statistical analyses and of JavaScript for front-end visualisations and user interactivity. This unique combination of statistical programming and web-browser visualisation brings enhanced data interoperability and interactivity into the open source realm. It is exemplified by characterizing the extent to which bariatric surgery perturbs the metabolisms of rats, showing the value of the approach in iterative analysis by the end-user to establish a deeper understanding of the system perturbation. HastaLaVista is available at: (https://github.com/jwist/hastaLaVista, 10.5281/zenodo.3544800) under MIT license. The approach described in this manuscript can be extended to connect the interface to other scripting languages such as Python, and to create interfaces for other types of data analysis.
PROTEOMAS: a workflow enabling harmonized proteomic meta-analysis and proteomic signature mappingSpringer Science and Business Media LLC - Tập 15 - Trang 1-17 - 2023
Aileen Bahl, Celine Ibrahim, Kristina Plate, Andrea Haase, Jörn Dengjel, Penny Nymark, Verónica I. Dumit
Toxicological evaluation of substances in regulation still often relies on animal experiments. Understanding the substances’ mode-of-action is crucial to develop alternative test strategies. Omics methods are promising tools to achieve this goal. Until now, most attention was focused on transcriptomics, while proteomics is not yet routinely applied in toxicology despite the large number of datasets available in public repositories. Exploiting the full potential of these datasets is hampered by differences in measurement procedures and follow-up data processing. Here we present the tool PROTEOMAS, which allows meta-analysis of proteomic data from public origin. The workflow was designed for analyzing proteomic studies in a harmonized way and to ensure transparency in the analysis of proteomic data for regulatory purposes. It agrees with the Omics Reporting Framework guidelines of the OECD with the intention to integrate proteomics to other omic methods in regulatory toxicology. The overarching aim is to contribute to the development of AOPs and to understand the mode of action of substances. To demonstrate the robustness and reliability of our workflow we compared our results to those of the original studies. As a case study, we performed a meta-analysis of 25 proteomic datasets to investigate the toxicological effects of nanomaterials at the lung level. PROTEOMAS is an important contribution to the development of alternative test strategies enabling robust meta-analysis of proteomic data. This workflow commits to the FAIR principles (Findable, Accessible, Interoperable and Reusable) of computational protocols.
IDSL_MINT: a deep learning framework to predict molecular fingerprints from mass spectraSpringer Science and Business Media LLC - Tập 16 - Trang 1-8 - 2024
Sadjad Fakouri Baygi, Dinesh Kumar Barupal
The majority of tandem mass spectrometry (MS/MS) spectra in untargeted metabolomics and exposomics studies lack any annotation. Our deep learning framework, Integrated Data Science Laboratory for Metabolomics and Exposomics—Mass INTerpreter (IDSL_MINT) can translate MS/MS spectra into molecular fingerprint descriptors. IDSL_MINT allows users to leverage the power of the transformer model for mass spectrometry data, similar to the large language models. Models are trained on user-provided reference MS/MS libraries via any customizable molecular fingerprint descriptors. IDSL_MINT was benchmarked using the LipidMaps database and improved the annotation rate of a test study for MS/MS spectra that were not originally annotated using existing mass spectral libraries. IDSL_MINT may improve the overall annotation rates in untargeted metabolomics and exposomics studies. The IDSL_MINT framework and tutorials are available in the GitHub repository at
https://github.com/idslme/IDSL_MINT
. Scientific contribution statement. Structural annotation of MS/MS spectra from untargeted metabolomics and exposomics datasets is a major bottleneck in gaining new biological insights. Machine learning models to convert spectra into molecular fingerprints can help in the annotation process. Here, we present IDSL_MINT, a new, easy-to-use and customizable deep-learning framework to train and utilize new models to predict molecular fingerprints from spectra for the compound annotation workflows.
Predicting the mutation effects of protein–ligand interactions via end-point binding free energy calculations: strategies and analysesSpringer Science and Business Media LLC - Tập 14 Số 1
Yang Yu, Zhe Wang, Lingling Wang, Sheng Tian, Tingjun Hou, Huiyong Sun
AbstractProtein mutations occur frequently in biological systems, which may impact, for example, the binding of drugs to their targets through impairing the critical H-bonds, changing the hydrophobic interactions, etc. Thus, accurately predicting the effects of mutations on biological systems is of great interests to various fields. Unfortunately, it is still unavailable to conduct large-scale wet-lab mutation experiments because of the unaffordable experimental time and financial costs. Alternatively, in silico computation can serve as a pioneer to guide the experiments. In fact, numerous pioneering works have been conducted from computationally cheaper machine-learning (ML) methods to the more expensive alchemical methods with the purpose to accurately predict the mutation effects. However, these methods usually either cannot result in a physically understandable model (ML-based methods) or work with huge computational resources (alchemical methods). Thus, compromised methods with good physical characteristics and high computational efficiency are expected. Therefore, here, we conducted a comprehensive investigation on the mutation issues of biological systems with the famous end-point binding free energy calculation methods represented by MM/GBSA and MM/PBSA. Different computational strategies considering different length of MD simulations, different value of dielectric constants and whether to incorporate entropy effects to the predicted total binding affinities were investigated to provide a more accurate way for predicting the energetic change upon protein mutations. Overall, our result shows that a relatively long MD simulation (e.g. 100 ns) benefits the prediction accuracy for both MM/GBSA and MM/PBSA (with the best Pearson correlation coefficient between the predicted ∆∆G and the experimental data of ~ 0.44 for a challenging dataset). Further analyses shows that systems involving large perturbations (e.g. multiple mutations and large number of atoms change in the mutation site) are much easier to be accurately predicted since the algorithm works more sensitively to the large change of the systems. Besides, system-specific investigation reveals that conformational adjustment is needed to refine the micro-environment of the manually mutated systems and thus lead one to understand why longer MD simulation is necessary to improve the predicting result. The proposed strategy is expected to be applied in large-scale mutation effects investigation with interpretation.
Graphical Abstract
Chaos-embedded particle swarm optimization approach for protein-ligand docking and virtual screeningSpringer Science and Business Media LLC - - 2018
Hio Kuan Tai, Siti Azma Jusoh, Shirley W. I. Siu
Protein-ligand docking programs are routinely used in structure-based drug design to find the optimal binding pose of a ligand in the protein’s active site. These programs are also used to identify potential drug candidates by ranking large sets of compounds. As more accurate and efficient docking programs are always desirable, constant efforts focus on developing better docking algorithms or improving the scoring function. Recently, chaotic maps have emerged as a promising approach to improve the search behavior of optimization algorithms in terms of search diversity and convergence speed. However, their effectiveness on docking applications has not been explored. Herein, we integrated five popular chaotic maps—logistic, Singer, sinusoidal, tent, and Zaslavskii maps—into PSOVina
$$^{{\mathrm{2LS}}}$$
, a recent variant of the popular AutoDock Vina program with enhanced global and local search capabilities, and evaluated their performances in ligand pose prediction and virtual screening using four docking benchmark datasets and two virtual screening datasets. Pose prediction experiments indicate that chaos-embedded algorithms outperform AutoDock Vina and PSOVina in ligand pose RMSD, success rate, and run time. In virtual screening experiments, Singer map-embedded PSOVina
$$^{{\mathrm{2LS}}}$$
achieved a very significant five- to sixfold speedup with comparable screening performances to AutoDock Vina in terms of area under the receiver operating characteristic curve and enrichment factor. Therefore, our results suggest that chaos-embedded PSOVina methods might be a better option than AutoDock Vina for docking and virtual screening tasks. The success of chaotic maps in protein-ligand docking reveals their potential for improving optimization algorithms in other search problems, such as protein structure prediction and folding. The Singer map-embedded PSOVina
$$^{{\mathrm{2LS}}}$$
which is named PSOVina-2.0 and all testing datasets are publicly available on
https://cbbio.cis.umac.mo/software/psovina
.
Harnessing Shannon entropy-based descriptors in machine learning models to enhance the prediction accuracy of molecular propertiesSpringer Science and Business Media LLC - Tập 15 - Trang 1-11 - 2023
Rajarshi Guha, Darrell Velegol
Accurate prediction of molecular properties is essential in the screening and development of drug molecules and other functional materials. Traditionally, property-specific molecular descriptors are used in machine learning models. This in turn requires the identification and development of target or problem-specific descriptors. Additionally, an increase in the prediction accuracy of the model is not always feasible from the standpoint of targeted descriptor usage. We explored the accuracy and generalizability issues using a framework of Shannon entropies, based on SMILES, SMARTS and/or InChiKey strings of respective molecules. Using various public databases of molecules, we showed that the accuracy of the prediction of machine learning models could be significantly enhanced simply by using Shannon entropy-based descriptors evaluated directly from SMILES. Analogous to partial pressures and total pressure of gases in a mixture, we used atom-wise fractional Shannon entropy in combination with total Shannon entropy from respective tokens of the string representation to model the molecule efficiently. The proposed descriptor was competitive in performance with standard descriptors such as Morgan fingerprints and SHED in regression models. Additionally, we found that either a hybrid descriptor set containing the Shannon entropy-based descriptors or an optimized, ensemble architecture of multilayer perceptrons and graph neural networks using the Shannon entropies was synergistic to improve the prediction accuracy. This simple approach of coupling the Shannon entropy framework to other standard descriptors and/or using it in ensemble models could find applications in boosting the performance of molecular property predictions in chemistry and material science.