Bioinformatics (Oxford, England)
1367-4811
Cơ quản chủ quản: N/A
Lĩnh vực:
Các bài báo tiêu biểu
Modelling cellular systems with PySCeS Abstract
Summary: The Python Simulator for Cellular Systems (PySCeS) is an extendable research tool for the numerical analysis and investigation of cellular systems.
Availability: PySCeS is distributed as Open Source Software under the GNU General Public Licence and is available for download from http://pysces.sourceforge.net
Contact: [email protected]
Tập 21 Số 4 - Trang 560-561 - 2005
PySCeSToolbox: a collection of metabolic pathway analysis tools Abstract
Summary
PySCeSToolbox is an extension to the Python Simulator for Cellular Systems (PySCeS) that includes tools for performing generalized supply–demand analysis, symbolic metabolic control analysis, and a framework for investigating the kinetic and thermodynamic aspects of enzyme-catalyzed reactions. Each tool addresses a different aspect of metabolic behaviour, control, and regulation; the tools complement each other and can be used in conjunction to better understand higher level system behaviour.
Availability and implementation
PySCeSToolbox is available on Linux, Mac OS X and Windows. It is licensed under the BSD 3-clause licence. Code, setup instructions and a link to documentation can be found at https://github.com/PySCeS/PyscesToolbox.
Supplementary information
Supplementary data are available at Bioinformatics online.
Tập 34 Số 1 - Trang 124-125 - 2018
APTANI: a computational tool to select aptamers through sequence-structure motif analysis of HT-SELEX data Abstract
Motivation: Aptamers are synthetic nucleic acid molecules that can bind biological targets in virtue of both their sequence and three-dimensional structure. Aptamers are selected using SELEX, Systematic Evolution of Ligands by EXponential enrichment, a technique that exploits aptamer-target binding affinity. The SELEX procedure, coupled with high-throughput sequencing (HT-SELEX), creates billions of random sequences capable of binding different epitopes on specific targets. Since this technique produces enormous amounts of data, computational analysis represents a critical step to screen and select the most biologically relevant sequences.
Results: Here, we present APTANI, a computational tool to identify target-specific aptamers from HT-SELEX data and secondary structure information. APTANI builds on AptaMotif algorithm, originally implemented to analyze SELEX data; extends the applicability of AptaMotif to HT-SELEX data and introduces new functionalities, as the possibility to identify binding motifs, to cluster aptamer families or to compare output results from different HT-SELEX cycles. Tabular and graphical representations facilitate the downstream biological interpretation of results.
Availability and implementation: APTANI is available at http://aptani.unimore.it.
Contact: [email protected]
Supplementary information: Supplementary data are available at Bioinformatics online.
Tập 32 Số 2 - Trang 161-164 - 2016
Predicting HIV drug resistance with neural networks Abstract
Motivation: Drug resistance is a very important factor influencing the failure of current HIV therapies. The ability to predict the drug resistance of HIV protease mutants may be useful in developing more effective and longer lasting treatment regimens.
Methods: The HIV resistance is predicted to two current protease inhibitors, Indinavir and Saquinavir. The problem was approached from two perspectives. First, a predictor was constructed based on the structural features of the HIV protease–drug inhibitor complex. A particular structure was represented by its list of contacts between the inhibitor and the protease. Next, a classifier was constructed based on the sequence data of various drug resistant mutants. In both cases, self-organizing maps were first used to extract the important features and cluster the patterns in an unsupervised manner. This was followed by subsequent labelling based on the known patterns in the training set.
Results: The prediction performance of the classifiers was measured by cross-validation. The classifier using the structure information correctly classified previously unseen mutants with an accuracy of between 60 and 70%. Several architectures were tested on the more abundant sequence data. The best single classifier provided an accuracy of 68% and a coverage of 69%. Multiple networks were then combined into various majority voting schemes. The best combination yielded an average of 85% coverage and 78% accuracy on previously unseen data. This is more than two times better than the 33% accuracy expected from a random classifier.
Contact: [email protected]
* To whom correspondence should be addressed.
Tập 19 Số 1 - Trang 98-107 - 2003
Data mining in bioinformatics using Weka Abstract
Summary: The Weka machine learning workbench provides a general-purpose environment for automatic classification, regression, clustering and feature selection—common data mining problems in bioinformatics research. It contains an extensive collection of machine learning algorithms and data pre-processing methods complemented by graphical user interfaces for data exploration and the experimental comparison of different machine learning techniques on the same problem. Weka can process data given in the form of a single relational table. Its main objectives are to (a) assist users in extracting useful information from data and (b) enable them to easily identify a suitable algorithm for generating an accurate predictive model from it.
Availability: http://www.cs.waikato.ac.nz/ml/weka
Tập 20 Số 15 - Trang 2479-2481 - 2004
13CFLUX2—high-performance software suite for 13C-metabolic flux analysis Abstract Summary: 13C-based metabolic flux analysis (13C-MFA) is the state-of-the-art method to quantitatively determine in vivo metabolic reaction rates in microorganisms. 13CFLUX2 contains all tools for composing flexible computational 13C-MFA workflows to design and evaluate carbon labeling experiments. A specially developed XML language, FluxML, highly efficient data structures and simulation algorithms achieve a maximum of performance and effectiveness. Support of multicore CPUs, as well as compute clusters, enables scalable investigations. 13CFLUX2 outperforms existing tools in terms of universality, flexibility and built-in features. Therewith, 13CFLUX2 paves the way for next-generation high-resolution 13C-MFA applications on the large scale. Availability and implementation: 13CFLUX2 is implemented in C++ (ISO/IEC 14882 standard) with Java and Python add-ons to run under Linux/Unix. A demo version and binaries are available at www.13cflux.net. Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
Tập 29 Số 1 - Trang 143-145 - 2013
GOLD—Graphical Overview of Linkage Disequilibrium Abstract
Summary: We describe a software package that provides a graphical summary of linkage disequilibrium in human genetic data. It allows for the analysis of family data and is well suited to the analysis of dense genetic maps.
Availability: http://www.well.ox.ac.uk/asthma/GOLD
Contact: [email protected]
Tập 16 Số 2 - Trang 182-183 - 2000
Develop machine learning-based regression predictive models for engineering protein solubility Abstract
Motivation
Protein activity is a significant characteristic for recombinant proteins which can be used as biocatalysts. High activity of proteins reduces the cost of biocatalysts. A model that can predict protein activity from amino acid sequence is highly desired, as it aids experimental improvement of proteins. However, only limited data for protein activity are currently available, which prevents the development of such models. Since protein activity and solubility are correlated for some proteins, the publicly available solubility dataset may be adopted to develop models that can predict protein solubility from sequence. The models could serve as a tool to indirectly predict protein activity from sequence. In literature, predicting protein solubility from sequence has been intensively explored, but the predicted solubility represented in binary values from all the developed models was not suitable for guiding experimental designs to improve protein solubility. Here we propose new machine learning (ML) models for improving protein solubility in vivo.
Results
We first implemented a novel approach that predicted protein solubility in continuous numerical values instead of binary ones. After combining it with various ML algorithms, we achieved a R2 of 0.4115 when support vector machine algorithm was used. Continuous values of solubility are more meaningful in protein engineering, as they enable researchers to choose proteins with higher predicted solubility for experimental validation, while binary values fail to distinguish proteins with the same value—there are only two possible values so many proteins have the same one.
Availability and implementation
We present the ML workflow as a series of IPython notebooks hosted on GitHub (https://github.com/xiaomizhou616/protein_solubility). The workflow can be used as a template for analysis of other expression and solubility datasets.
Supplementary information
Supplementary data are available at Bioinformatics online.
Tập 35 Số 22 - Trang 4640-4646 - 2019
SOLpro: accurate sequence-based prediction of protein solubility Abstract
Motivation: Protein insolubility is a major obstacle for many experimental studies. A sequence-based prediction method able to accurately predict the propensity of a protein to be soluble on overexpression could be used, for instance, to prioritize targets in large-scale proteomics projects and to identify mutations likely to increase the solubility of insoluble proteins.
Results: Here, we first curate a large, non-redundant and balanced training set of more than 17 000 proteins. Next, we extract and study 23 groups of features computed directly or predicted (e.g. secondary structure) from the primary sequence. The data and the features are used to train a two-stage support vector machine (SVM) architecture. The resulting predictor, SOLpro, is compared directly with existing methods and shows significant improvement according to standard evaluation metrics, with an overall accuracy of over 74% estimated using multiple runs of 10-fold cross-validation.
Availability: SOLpro is integrated in the SCRATCH suite of predictors and is available for download as a standalone application and as a web server at: http://scratch.proteomics.ics.uci.edu.
Contact: [email protected]
Supplementary information: Supplementary data are available at Bioinformatics online.
Tập 25 Số 17 - Trang 2200-2207 - 2009
LocusZoom: regional visualization of genome-wide association scan results Abstract
Summary: Genome-wide association studies (GWAS) have revealed hundreds of loci associated with common human genetic diseases and traits. We have developed a web-based plotting tool that provides fast visual display of GWAS results in a publication-ready format. LocusZoom visually displays regional information such as the strength and extent of the association signal relative to genomic position, local linkage disequilibrium (LD) and recombination patterns and the positions of genes in the region.
Availability: LocusZoom can be accessed from a web interface at http://csg.sph.umich.edu/locuszoom. Users may generate a single plot using a web form, or many plots using batch mode. The software utilizes LD information from HapMap Phase II (CEU, YRI and JPT+CHB) or 1000 Genomes (CEU) and gene information from the UCSC browser, and will accept SNP identifiers in dbSNP or 1000 Genomes format. Single plots are generated in ∼20 s. Source code and associated databases are available for download and local installation, and full documentation is available online.
Contact: [email protected]
Tập 26 Số 18 - Trang 2336-2337 - 2010