Explicit representation of protein activity states significantly improves causal discovery of protein phosphorylation networks

BMC Bioinformatics - Tập 21 - Trang 1-17 - 2020
Jinling Liu1,2, Xiaojun Ma1, Gregory F. Cooper1, Xinghua Lu1
1Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, USA
2Present address: Department of Engineering Management and Systems Engineering and Department of Biological Sciences, Missouri University of Science and Technology, Rolla, USA

Tóm tắt

Protein phosphorylation networks play an important role in cell signaling. In these networks, phosphorylation of a protein kinase usually leads to its activation, which in turn will phosphorylate its downstream target proteins. A phosphorylation network is essentially a causal network, which can be learned by causal inference algorithms. Prior efforts have applied such algorithms to data measuring protein phosphorylation levels, assuming that the phosphorylation levels represent protein activity states. However, the phosphorylation status of a kinase does not always reflect its activity state, because interventions such as inhibitors or mutations can directly affect its activity state without changing its phosphorylation status. Thus, when cellular systems are subjected to extensive perturbations, the statistical relationships between phosphorylation states of proteins may be disrupted, making it difficult to reconstruct the true protein phosphorylation network. Here, we describe a novel framework to address this challenge. We have developed a causal discovery framework that explicitly represents the activity state of each protein kinase as an unmeasured variable and developed a novel algorithm called “InferA” to infer the protein activity states, which allows us to incorporate the protein phosphorylation level, pharmacological interventions and prior knowledge. We applied our framework to simulated datasets and to a real-world dataset. The simulation experiments demonstrated that explicit representation of activity states of protein kinases allows one to effectively represent the impact of interventions and thus enabled our framework to accurately recover the ground-truth causal network. Results from the real-world dataset showed that the explicit representation of protein activity states allowed an effective and data-driven integration of the prior knowledge by InferA, which further leads to the recovery of a phosphorylation network that is more consistent with experiment results. Explicit representation of the protein activity states by our novel framework significantly enhances causal discovery of protein phosphorylation networks.

Tài liệu tham khảo

Ardito F, Giuliani M, Perrone D, Troiano G, Lo ML. The crucial role of protein phosphorylation in cell signalingand its use as targeted therapy (review). Int J Mol Med. 2017;40:271–80. Cohen P. Protein kinases--the major drug targets of the twenty-first century? Nat Rev Drug Discov. 2002;1:309–15. Knight ZA, Lin H, Shokat KM. Targeting the cancer kinome through polypharmacology. Nat Rev Cancer. 2010;10:130–7. Alberts B, Johnson A, Lewis J, Morgan D, Raff M, Roberts K, et al. In: Anderson M, Granum S, editors. Molecular biology of the cell. 5th ed. New York: Garland Science; 2007. p. 175. 2017. Hill SM, Heiser LM, Cokelaer T, Linger M, Nesser NK, Carlin DE, et al. Inferring causal molecular networks: empirical assessment through a community-based effort. Nat Methods. 2016;13:310–22. Hill SM, Nesser NK, Johnson-Camacho K, Jeffress M, Johnson A, Boniface C, et al. Context specificity in causal signaling networks revealed by phosphoprotein profiling. Cell Syst. 2017;4:73–83.e10. Li J, Lu Y, Akbani R, Ju Z, Roebuck PL, Liu W, et al. TCPA: a resource for cancer functional proteomics data. Nat Methods. 2013;10:1046–47. Hausman DM, Woodward J. Independence, invariance and the causal Markov condition. Br J Philos Sci. 1999;50:521–83. Heckerman D, Meek C, Cooper G. A {B}ayesian approach to causal discovery. In: Computation, causation, & discovery; 1999. p. 141–65. Spirtes P, Glymour C, Scheines R, Heckerman D, Meek C, Cooper GF, et al. Causation, prediction, and search. 2nd edition. Cambridge: MIT Press; 2000. Ramsey J, Glymour M, Sanchez-Romero R, Glymour C. A million variables and more: the fast greedy equivalence search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images. Int J Data Sci Anal. 2017;3:121–29. Andrieu C. An introduction to MCMC for machine learning; 2003. p. 5–43. Pearl J. Probabilistic reasoning in intelligent systems: : networks of plausible inference (Morgan kaufmann series in representation and reasoning); 1988. Chickering DM. Optimal structure identification with greedy search. J Mach Learn Res. 2002;3:507–54. Spencer SEF, Hill SM, Mukherjee S. Inferring network structure from interventional time-course experiments. Ann Appl Stat. 2015;9:507–24.