Springer Science and Business Media LLC

Công bố khoa học tiêu biểu

* Dữ liệu chỉ mang tính chất tham khảo

Sắp xếp:  
Using multi-step proposal distribution for improved MCMC convergence in Bayesian network structure learning
Springer Science and Business Media LLC - Tập 2015 - Trang 1-14 - 2015
Antti Larjo, Harri Lähdesmäki
Bayesian networks have become popular for modeling probabilistic relationships between entities. As their structure can also be given a causal interpretation about the studied system, they can be used to learn, for example, regulatory relationships of genes or proteins in biological networks and pathways. Inference of the Bayesian network structure is complicated by the size of the model structure space, necessitating the use of optimization methods or sampling techniques, such Markov Chain Monte Carlo (MCMC) methods. However, convergence of MCMC chains is in many cases slow and can become even a harder issue as the dataset size grows. We show here how to improve convergence in the Bayesian network structure space by using an adjustable proposal distribution with the possibility to propose a wide range of steps in the structure space, and demonstrate improved network structure inference by analyzing phosphoprotein data from the human primary T cell signaling network.
A novel cost function to estimate parameters of oscillatory biochemical systems
Springer Science and Business Media LLC - Tập 2012 - Trang 1-17 - 2012
Seyedbehzad Nabavi, Cranos M Williams
Oscillatory pathways are among the most important classes of biochemical systems with examples ranging from circadian rhythms and cell cycle maintenance. Mathematical modeling of these highly interconnected biochemical networks is needed to meet numerous objectives such as investigating, predicting and controlling the dynamics of these systems. Identifying the kinetic rate parameters is essential for fully modeling these and other biological processes. These kinetic parameters, however, are not usually available from measurements and most of them have to be estimated by parameter fitting techniques. One of the issues with estimating kinetic parameters in oscillatory systems is the irregularities in the least square (LS) cost function surface used to estimate these parameters, which is caused by the periodicity of the measurements. These irregularities result in numerous local minima, which limit the performance of even some of the most robust global optimization algorithms. We proposed a parameter estimation framework to address these issues that integrates temporal information with periodic information embedded in the measurements used to estimate these parameters. This periodic information is used to build a proposed cost function with better surface properties leading to fewer local minima and better performance of global optimization algorithms. We verified for three oscillatory biochemical systems that our proposed cost function results in an increased ability to estimate accurate kinetic parameters as compared to the traditional LS cost function. We combine this cost function with an improved noise removal approach that leverages periodic characteristics embedded in the measurements to effectively reduce noise. The results provide strong evidence on the efficacy of this noise removal approach over the previous commonly used wavelet hard-thresholding noise removal methods. This proposed optimization framework results in more accurate kinetic parameters that will eventually lead to biochemical models that are more precise, predictable, and controllable.
A Hypothesis Test for Equality of Bayesian Network Models
Springer Science and Business Media LLC - Tập 2010 - Trang 1-11 - 2010
Anthony Almudevar
Bayesian network models are commonly used to model gene expression data. Some applications require a comparison of the network structure of a set of genes between varying phenotypes. In principle, separately fit models can be directly compared, but it is difficult to assign statistical significance to any observed differences. There would therefore be an advantage to the development of a rigorous hypothesis test for homogeneity of network structure. In this paper, a generalized likelihood ratio test based on Bayesian network models is developed, with significance level estimated using permutation replications. In order to be computationally feasible, a number of algorithms are introduced. First, a method for approximating multivariate distributions due to Chow and Liu (1968) is adapted, permitting the polynomial-time calculation of a maximum likelihood Bayesian network with maximum indegree of one. Second, sequential testing principles are applied to the permutation test, allowing significant reduction of computation time while preserving reported error rates used in multiple testing. The method is applied to gene-set analysis, using two sets of experimental data, and some advantage to a pathway modelling approach to this problem is reported.
Learning directed acyclic graphs from large-scale genomics data
Springer Science and Business Media LLC - Tập 2017 - Trang 1-16 - 2017
Fabio Nikolay, Marius Pesavento, George Kritikos, Nassos Typas
In this paper, we consider the problem of learning the genetic interaction map, i.e., the topology of a directed acyclic graph (DAG) of genetic interactions from noisy double-knockout (DK) data. Based on a set of well-established biological interaction models, we detect and classify the interactions between genes. We propose a novel linear integer optimization program called the Genetic-Interactions-Detector (GENIE) to identify the complex biological dependencies among genes and to compute the DAG topology that matches the DK measurements best. Furthermore, we extend the GENIE program by incorporating genetic interaction profile (GI-profile) data to further enhance the detection performance. In addition, we propose a sequential scalability technique for large sets of genes under study, in order to provide statistically significant results for real measurement data. Finally, we show via numeric simulations that the GENIE program and the GI-profile data extended GENIE (GI-GENIE) program clearly outperform the conventional techniques and present real data results for our proposed sequential scalability technique.
A visual analytics approach for models of heterogeneous cell populations
Springer Science and Business Media LLC - Tập 2012 - Trang 1-13 - 2012
Jan Hasenauer, Julian Heinrich, Malgorzata Doszczak, Peter Scheurich, Daniel Weiskopf, Frank Allgöwer
In recent years, cell population models have become increasingly common. In contrast to classic single cell models, population models allow for the study of cell-to-cell variability, a crucial phenomenon in most populations of primary cells, cancer cells, and stem cells. Unfortunately, tools for in-depth analysis of population models are still missing. This problem originates from the complexity of population models. Particularly important are methods to determine the source of heterogeneity (e.g., genetics or epigenetic differences) and to select potential (bio-)markers. We propose an analysis based on visual analytics to tackle this problem. Our approach combines parallel-coordinates plots, used for a visual assessment of the high-dimensional dependencies, and nonlinear support vector machines, for the quantification of effects. The method can be employed to study qualitative and quantitative differences among cells. To illustrate the different components, we perform a case study using the proapoptotic signal transduction pathway involved in cellular apoptosis.
Incorporating prior knowledge induced from stochastic differential equations in the classification of stochastic observations
Springer Science and Business Media LLC - Tập 2016 - Trang 1-14 - 2016
Amin Zollanvari, Edward R. Dougherty
In classification, prior knowledge is incorporated in a Bayesian framework by assuming that the feature-label distribution belongs to an uncertainty class of feature-label distributions governed by a prior distribution. A posterior distribution is then derived from the prior and the sample data. An optimal Bayesian classifier (OBC) minimizes the expected misclassification error relative to the posterior distribution. From an application perspective, prior construction is critical. The prior distribution is formed by mapping a set of mathematical relations among the features and labels, the prior knowledge, into a distribution governing the probability mass across the uncertainty class. In this paper, we consider prior knowledge in the form of stochastic differential equations (SDEs). We consider a vector SDE in integral form involving a drift vector and dispersion matrix. Having constructed the prior, we develop the optimal Bayesian classifier between two models and examine, via synthetic experiments, the effects of uncertainty in the drift vector and dispersion matrix. We apply the theory to a set of SDEs for the purpose of differentiating the evolutionary history between two species.
Identifying Statistical Dependence in Genomic Sequences via Mutual Information Estimates
Springer Science and Business Media LLC - Tập 2007 Số 1 - Trang 1-11 - 2007
Aktulga, Hasan Metin, Kontoyiannis, Ioannis, Lyznik, L Alex, Szpankowski, Lukasz, Grama, Ananth Y, Szpankowski, Wojciech
Questions of understanding and quantifying the representation and amount of information in organisms have become a central part of biological research, as they potentially hold the key to fundamental advances. In this paper, we demonstrate the use of information-theoretic tools for the task of identifying segments of biomolecules (DNA or RNA) that are statistically correlated. We develop a precise and reliable methodology, based on the notion of mutual information, for finding and extracting statistical as well as structural dependencies. A simple threshold function is defined, and its use in quantifying the level of significance of dependencies between biological segments is explored. These tools are used in two specific applications. First, they are used for the identification of correlations between different parts of the maize zmSRp32 gene. There, we find significant dependencies between the untranslated region in zmSRp32 and its alternatively spliced exons. This observation may indicate the presence of as-yet unknown alternative splicing mechanisms or structural scaffolds. Second, using data from the FBI's combined DNA index system (CODIS), we demonstrate that our approach is particularly well suited for the problem of discovering short tandem repeats—an application of importance in genetic profiling.
Phát hiện các gen tuần hoàn từ các biểu hiện gen được lấy mẫu không đồng đều: Nghiên cứu so sánh Dịch bởi AI
Springer Science and Business Media LLC - Tập 2008 - Trang 1-8 - 2008
Wentao Zhao, Kwadwo Agyepong, Erchin Serpedin, Edward R Dougherty
Các phép đo biểu hiện gen bằng microarray theo chuỗi thời gian đã được sử dụng để phát hiện các gen liên quan đến chu kỳ tế bào. Do các ràng buộc trong thực nghiệm, hầu hết các quan sát từ microarray được thu thập thông qua việc lấy mẫu không đều. Trong bài báo này, ba phương pháp phân tích phổ phổ biến, cụ thể là Lomb-Scargle, Capon và ước lượng biên độ và pha cho dữ liệu thiếu (MAPES) được so sánh về khả năng và hiệu quả trong việc phục hồi các gen biểu hiện tuần hoàn. Dựa trên các thí nghiệm in silico cho các phép đo microarray của Saccharomyces cerevisiae, Lomb-Scargle được phát hiện là phương pháp hiệu quả nhất. 149 gen được xác định là biểu hiện tuần hoàn trong tập dữ liệu Drosophila melanogaster.
#chuỗi thời gian #microarray #biểu hiện gen #Lomb-Scargle #Capon #ước lượng biên độ và pha #gen tuần hoàn
Origins of Stochasticity and Burstiness in High-Dimensional Biochemical Networks
Springer Science and Business Media LLC - Tập 2009 - Trang 1-14 - 2008
Simon Rosenfeld
Two major approaches are known in the field of stochastic dynamics of intracellular biochemical networks. The first one places the focus of attention on the fact that many biochemical constituents vitally important for the network functionality may be present only in small quantities within the cell, and therefore the regulatory process is essentially discrete and prone to relatively big fluctuations. The second approach treats the regulatory process as essentially continuous. Complex pseudostochastic behavior in such processes may occur due to multistability and oscillatory motions within limit cycles. In this paper we outline the third scenario of stochasticity in the regulatory process. This scenario is only conceivable in high-dimensional highly nonlinear systems. In particular, we show that burstiness, a well-known phenomenon in the biology of gene expression, is a natural consequence of high dimensionality coupled with high nonlinearity. In mathematical terms, burstiness is associated with heavy-tailed probability distributions of stochastic processes describing the dynamics of the system. We demonstrate how the "shot" noise originates from purely deterministic behavior of the underlying dynamical system. We conclude that the limiting stochastic process may be accurately approximated by the "heavy-tailed" generalized Pareto process which is a direct mathematical expression of burstiness.
Relations between the set-complexity and the structure of graphs and their sub-graphs
Springer Science and Business Media LLC - Tập 2012 - Trang 1-10 - 2012
Tomasz M Ignac, Nikita A Sakhanenko, David J Galas
We describe some new conceptual tools for the rigorous, mathematical description of the “set-complexity” of graphs. This set-complexity has been shown previously to be a useful measure for analyzing some biological networks, and in discussing biological information in a quantitative fashion. The advances described here allow us to define some significant relationships between the set-complexity measure and the structure of graphs, and of their component sub-graphs. We show here that modular graph structures tend to maximize the set-complexity of graphs. We point out the relationship between modularity and redundancy, and discuss the significance of set-complexity in this regard. We specifically discuss the relationship between complexity and entropy in the case of complete-bipartite graphs, and present a new method for constructing highly complex, binary graphs. These results can be extended to the case of ternary graphs, and to other multi-edge graphs, which are fundamentally more relevant to biological structures and systems. Finally, our results lead us to an approach for extracting high complexity modular graphs from large, noisy graphs with low information content. We illustrate this approach with two examples.
Tổng số: 124   
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 10