Robust and rigorous identification of tissue-specific genes by statistically extending tau score
Tóm tắt
In this study, we aimed to identify tissue-specific genes for various human tissues/organs more robustly and rigorously by extending the tau score algorithm. Tissue-specific genes are a class of genes whose functions and expressions are preferred in one or several tissues restrictedly. Identification of tissue-specific genes is essential for discovering multi-cellular biological processes such as tissue-specific molecular regulations, tissue development, physiology, and the pathogenesis of tissue-associated diseases. Gene expression data derived from five large RNA sequencing (RNA-seq) projects, spanning 96 different human tissues, were retrieved from ArrayExpress and ExpressionAtlas. The first step is categorizing genes using significant filters and tau score as a specificity index. After calculating tau for each gene in all datasets separately, statistical distance from the maximum expression level was estimated using a new meaningful procedure. Specific expression of a gene in one or several tissues was calculated after the integration of tau and statistical distance estimation, which is called as extended tau approach. Obtained tissue-specific genes for 96 different human tissues were functionally annotated, and some comparisons were carried out to show the effectiveness of the extended tau method. Categorization of genes based on expression level and identification of tissue-specific genes for a large number of tissues/organs were executed. Genes were successfully assigned to multiple tissues by generating the extended tau approach as opposed to the original tau score, which can assign tissue specificity to single tissue only.
Tài liệu tham khảo
Yu X, Lin J, Zack DJ, Qian J. Computational analysis of tissue-specific combinatorial gene regulation: predicting interaction between transcription factorsin human tissues. Nucleic Acids Res. 2006;34(17):4925–36.
Xiao SJ, Zhang C, Zou Q, Ji ZL. TiSGeD: a database for tissue-specific genes. Bioinformatics. 2010;26(9):1273–5. https://doi.org/10.1093/bioinformatics/btq109.
Kim P, Park A, Han G, Sun H, Jia P, Zhao Z. TissGDB: tissue-specific gene database in cancer. Nucleic Acids Res. 2017;46(D1):D1031–8. https://doi.org/10.1093/nar/gkx850.
Jiang W, Chen L. Tissue Specificity of Gene Expression Evolves Across Mammal Species. J Comput Biol. 2022;29(8):880–91. https://doi.org/10.1089/cmb.2021.0592.
Petretto E, Mangion J, Dickens NJ, Cook SA, Kumaran MK, Lu H, et al. Heritability and Tissue Specificity of Expression Quantitative Trait Loci. PLoS Genet. 2006;2(10): e172. https://doi.org/10.1371/journal.pgen.0020172.
Nagaraj SH, Ingham A, Reverter A. The interplay between evolution, regulation and tissue specificity in the Human Hereditary Diseasome. BMC Genomics. 2010;11(Suppl 4):S23. https://doi.org/10.1186/1471-2164-11-s4-s23.
Lage K, Hansen NT, Karlberg EO, Eklund AC, Roque FS, Zoltan Szallasi PKD, et al. A large-scale analysis of tissue-specific pathology and gene expression of human disease genes and complexes. Proc Natl Acad Sci USA. 2008;105(52):20870–5. https://doi.org/10.1073/pnas.0810772105.
Dezső Z, Nikolsky Y, Sviridov E, Shi W, Serebriyskaya T, Dosymbekov D, et al. A comprehensive functional analysis of tissue specificity of human gene expression. BMC Biol. 2008;6(1):49. https://doi.org/10.1186/1741-7007-6-49.
Song Y, Ahn J, Suh Y, Davis ME, Lee K. Identification of Novel Tissue-Specific Genes by Analysis of Microarray Databases: A Human and Mouse Model. PLoS ONE. 2013;8(5): e64483. https://doi.org/10.1371/journal.pone.0064483.
Nguyen TT, Almon RR, DuBois DC, Sukumaran S, Jusko WJ, loannis P. Androulakis: Tissue-Specific Gene Expression and Regulation in Liver and Muscle following Chronic Corticosteroid Administration. Gene Regul Syst Biol. 2014;8:75–87.
Rodemoyer A, Kibiryeva N, Bair A, Marshall J, O’Brien JE, Bittel DC. A tissue-specific gene expression template portrays heart development and pathology. Hum Genomics. 2014;8(1). https://doi.org/10.1186/1479-7364-8-6.
Kitsak M, Sharma A, Menche J, Guney E, Ghiassian SD, Loscalzo J, et al. Tissue Specificity of Human Disease Module. Sci Rep. 2016;6(1). https://doi.org/10.1038/srep35241.
Greco D, Somervuo P, Lieto AD, Raitila T, Nitsch L, Castrén E, et al. Physiology, Pathology and Relatedness of Human Tissues from Gene Expression Meta-Analysis. PLoS ONE. 2008;3(4):e1880.
Reverter A, Ingham A, Dalrymple BP. Mining tissue specificity, gene connectivity and disease association to reveal a set of genes that modify the action of disease causing genes. BioData Min. 2008;1(1). https://doi.org/10.1186/1756-0381-1-8.
Yang X, Ye Y, Wang G, Huang H, Yu D, Liang S. VeryGene: linking tissue-specific genes to diseases, drugs, and beyond for knowledge discovery. Physiol Genomics. 2011;43(8):457–460. https://doi.org/10.1152/physiolgenomics.00178.2010.
Greene CS, Krishnan A, Wong AK, Ricciotti E, Zelaya RA, Himmelstein DS, et al. Understanding multicellular function and disease with human tissue-specific networks. Nat Genet. 2015;47(6):569–76. https://doi.org/10.1038/ng.3259.
Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA. 2004;101(16):6062–7. https://doi.org/10.1073/pnas.0400782101.
Liang S, Li Y, Be X, Howes S, Liu W. Detecting and profiling tissue-selective genes. Physiol Genomics. 2006;26(2):158–162. https://doi.org/10.1152/physiolgenomics.00313.2005.
Kryuchkova-Mostacci N, Robinson-Rechavi M. A benchmark of gene expression tissue-specificity metrics. Brief Bioinform. 2016;bbw008. https://doi.org/10.1093/bib/bbw008.
Yanai I, Benjamin H, Shmoish M, Chalifa-Caspi V, Shklar M, Ophir R, et al. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics. 2004;21(5):650–659. https://doi.org/10.1093/bioinformatics/bti042.
Julien P, Brawand D, Soumillon M, Necsulea A, Liechti A, Schütz F, et al. Mechanisms and Evolutionary Patterns of Mammalian and Avian Dosage Compensation. PLoS Biol. 2012;10 (5):e1001328. https://doi.org/10.1371/journal.pbio.1001328.
Schug J, Schuller WP, Kappen C, Salbaum JM, Bucan M, Stoeckert CJ. Promoter features related to tissue specificity as measured by Shannon entropy. Genome Biol. 2005;6(4). https://doi.org/10.1186/gb-2005-6-4-r33.
Cheadle C, Vawter MP, Freed WJ, Becker KG. Analysis of Microarray Data Using Z Score Transformation. J Mol Diagn. 2003;5(2):73–81. https://doi.org/10.1016/s1525-1578(10)60455-2.
Huminiecki L, Lloyd AT, Wolfe KH. Congruence of tissue expression profiles from Gene Expression Atlas, SAGEmap and TissueInfo databases. BMC Genomics. 2003;4(1). https://doi.org/10.1186/1471-2164-4-31.
Kadota K, Ye J, Nakai Y, Terada T, Shimizu K. ROKU: a novel method for identification of tissue-specific genes. BMC Bioinformatics. 2006;7(1). https://doi.org/10.1186/1471-2105-7-294.
Liu X, Yu X, Zack DJ, Zhu H, Qian J. TiGER: A database for tissue-specific gene expression and regulation. BMC Bioinformatics. 2008;9(1). https://doi.org/10.1186/1471-2105-9-271.
Vandenbon A, Nakai K. Modeling tissue-specific structural patterns in human and mouse promoters. Nucleic Acids Res. 2009;38(1):17–25. https://doi.org/10.1093/nar/gkp866.
GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans. Science. 2015;348(6235):648–660. https://doi.org/10.1126/science.1262110.
Pan JB, Hu SC, Shi D, Cai MC, Li YB, Zou Q, et al. PaGenBase: A Pattern Gene Database for the Global and Dynamic Understanding of Gene Function. PLoS ONE. 2013;8(12): e80747. https://doi.org/10.1371/journal.pone.0080747.
Lash AE, Tolstoshev CM, Wagner L, Schuler GD, Strausberg RL, Riggins GJ, et al. SAGEmap: A Public Gene Expression Resource. Genome Res. 2000;10(7):1051–60. https://doi.org/10.1101/gr.10.7.1051.
Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, et al. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol. 1996;14(13):1675–80. https://doi.org/10.1038/nbt1296-1675.
Skrabanek L. TissueInfo: high-throughput identification of tissue expression profiles and specificity. Nucleic Acids Res. 2001;29(21):102e–102. https://doi.org/10.1093/nar/29.21.e102.
Nelms BD, Waldron L, Barrera LA, Weflen AW, Goettel JA, Guo G, et al. CellMapper: rapid and accurate inference of gene expression in difficult-to-isolate cell types. Genome Biol. 2016;17(1). https://doi.org/10.1186/s13059-016-1062-5.
Amrani KE, Stachelscheid H, Lekschas F, Kurtz A, Andrade-Navarro MA. MGFM: a novel tool for detection of tissue and cell specific marker genes from microarray gene expression data. BMC Genomics. 2015;16(1). https://doi.org/10.1186/s12864-015-1785-9.
Duffy Á, Verbanck M, Dobbyn A, Won HH, Rein JL, Forrest IS, et al. Tissue-specific genetic features inform prediction of drug side effects in clinical trials. Sci Adv. 2020;6(37). https://doi.org/10.1126/sciadv.abb6242.
Liao BY, Zhang J. Low Rates of Expression Profile Divergence in Highly Expressed Genes and Tissue-Specific Genes During Mammalian Evolution. Mol Biol Evol. 2006;23(6):1119–28. https://doi.org/10.1093/molbev/msj119.
Smeds L, Warmuth V, Bolivar P, Uebbing S, Burri R, Suh A, et al. Evolutionary analysis of the female-specific avian W chromosome. Nat Commun. 2015;6(1). https://doi.org/10.1038/ncomms8330.
Kryuchkova-Mostacci N, Robinson-Rechavi M. Tissue-Specific Evolution of Protein Coding Genes in Human and Mouse. PLoS ONE. 2015;10(6): e0131673. https://doi.org/10.1371/journal.pone.0131673.
Kryuchkova-Mostacci N, Robinson-Rechavi M. Tissue-Specificity of Gene Expression Diverges Slowly between Orthologs, and Rapidly between Paralogs. PLoS Comput Biol. 2016;1(12): e1005274. https://doi.org/10.1371/journal.pcbi.1005274.
Assis R, Bachtrog D. Neofunctionalization of young duplicate genes in Drosophila. Proc Natl Acad Sci USA. 2013;110(43):17409–14. https://doi.org/10.1073/pnas.1313759110.
Schuster EF, Blanc E, Partridge L, Thornton JM. Correcting for sequence biases in present/absent calls. Genome Biol. 2007;8(6):R125. https://doi.org/10.1186/gb-2007-8-6-r125.
Piasecka B, Robinson-Rechavi M, Bergmann S. Correcting for the bias due to expression specificity improves the estimation of constrained evolution of expression between mouse and human. Bioinformatics. 2012;28(14):1865–1872. https://doi.org/10.1093/bioinformatics/bts266.
Bush SJ, Kover PX, Urrutia AO. Lineage-specific sequence evolution and exon edge conservation partially explain the relationship between evolutionary rate and expression level in A. thaliana. Mol Ecol. 2015;24(12):3093–3106. https://doi.org/10.1111/mec.13221.
Fagerberg L, Hallström BM, Oksvold P, Kampf C, Djureinovic D, andMasato Habuka JO, et al. Analysis of the Human Tissue-specific Expression by Genome-wide Integration of Transcriptomics and Antibody-based Proteomics. Mol Cell Proteomics. 2014;13(2):397–406. https://doi.org/10.1074/mcp.m113.035600.
Uhlén M, Fagerberg L, Hallström BM, Lindskog C, Oksvold P, Mardinoglu A, et al. Tissue-based map of the human proteome. Science. 2015;347(6220). https://doi.org/10.1126/science.1260419.
Noguchi S, Arakawa T, Fukuda S, Furuno M, Hasegawa A, Hori F, et al. FANTOM5 CAGE profiles of human and mouse samples. Sci Data. 2017;4(1). https://doi.org/10.1038/sdata.2017.112.
ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007;447(7146):799–816. https://doi.org/10.1038/nature05874.
Lin S, Lin Y, Nery JR, Urich MA, Breschi A, Davis CA, et al. Comparison of the transcriptional landscapes between human and mouse tissues. Proc Natl Acad Sci USA. 2014;111(48):17224–9. https://doi.org/10.1073/pnas.1413624111.
Petryszak R, Burdett T, Fiorelli B, Fonseca NA, Gonzalez-Porta M, Huber EHW, et al. Expression Atlas update—a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments. Nucleic Acids Res. 2013;42(D1):D926–32. https://doi.org/10.1093/nar/gkt1270.
Parkinson H, Kapushesky M, Shojatalab M, Abeygunawardena N, Coulson R, Farne A, et al. ArrayExpress–a public database of microarray experiments and gene expression profiles. Nucleic Acids Res. 2007;35(Database):D747–D750. https://doi.org/10.1093/nar/gkl995.
Gremse M, Chang A, Schomburg I, Grote A, Scheer M, Ebeling C, et al. The BRENDA Tissue Ontology (BTO): the first all-integrating ontology of all organisms for enzyme sources. Nucleic Acids Res. 2010;39(Database):D507–D513. https://doi.org/10.1093/nar/gkq968.
Gibbons MR, Ross SA, Shanken J. A Test of the Efficiency of a Given Portfolio. Econometrica. 1989;57(5):1121. https://doi.org/10.2307/1913625.
Tari L, Baral C, Kim S. Fuzzy c-means clustering with prior biological knowledge. J Biomed Inform. 2009;42(1):74–81. https://doi.org/10.1016/j.jbi.2008.05.009.
Ranganathan P, Aggarwal R. Common pitfalls in statistical analysis: Linear regression analysis. Perspect Clin Res. 2017;8(2):100. https://doi.org/10.4103/2229-3485.203040.
Sachs L. Applied Statistics, A Handbook of Techniques. 2nd ed. New York: Springer; 1984. https://doi.org/10.1007/978-1-4612-5246-7.
Dennis G, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, et al. DAVID: database for annotation, visualization, and integrated discovery. Genome Biol. 2003;4(9):R60.
Rappaport N, Fishilevich S, Nudel R, Twik M, Belinky F, Plaschkes I, et al. Rational confederation of genes and diseases: NGS interpretation via GeneCards, MalaCards and VarElect. BioMed Eng OnLine. 2017;16(S1). https://doi.org/10.1186/s12938-017-0359-2.
Fonseca NA, Marioni J, Brazma A. RNA-Seq Gene Profiling - A Systematic Empirical Comparison. PLoS ONE. 2014;9(9): e107026. https://doi.org/10.1371/journal.pone.0107026.
Wu C, Orozco C, Boyer J, Leglise M, Goodale J, Batalov S, et al. BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources. Genome Biol. 2009;10(11). https://doi.org/10.1186/gb-2009-10-11-r130.
Liang P. SAGE Genie: A suite with panoramic view of gene expression. Proc Natl Acad Sci USA. 2002;99(18):11547–8. https://doi.org/10.1073/pnas.192436299.
Nikoozad Z, Ghorbanian MT, Rezaei A. Comparison of the liver function and hepatic specific genes expression in cultured mesenchymal stem cells and hepatocytes. Iran J Basic Med Sci. 2014;17(1):27.
Sjöstedt E, Fagerberg L, Hallström BM, Häggmark A, Mitsios N, Nilsson P, et al. Defining the Human Brain Proteome Using Transcriptomics and Antibody-Based Profiling with a Focus on the Cerebral Cortex. PLoS ONE. 2015;10(6): e0130028. https://doi.org/10.1371/journal.pone.0130028.
Myers AJ, Gibbs JR, Webster JA, Rohrer K, Zhao A, Marlowe L, et al. A survey of genetic human cortical gene expression. Nat Genet. 2007;39(12):1494–9. https://doi.org/10.1038/ng.2007.16.
Naumova OY, Lee M, Rychkov SY, Vlasova NV, Grigorenko EL. Gene Expression in the Human Brain: The Current State of the Study of Specificity and Spatiotemporal Dynamics. Child Dev. 2012;84(1):76–88. https://doi.org/10.1111/cdev.12014.
Zhang Y, Chen K, Sloan SA, Bennett ML, Scholze AR, O’Keeffe S, et al. An RNA-Sequencing Transcriptome and Splicing Database of Glia, Neurons, and Vascular Cells of the Cerebral Cortex. J Neurosci. 2014;34(36):11929–47. https://doi.org/10.1523/jneurosci.1860-14.2014.
Göring HHH. Tissue specificity of genetic regulation of gene expression. Nat Genet. 2012;44(10):1077–8. https://doi.org/10.1038/ng.2420.
Blighe K. Cancer mutations and their tissue-specific nature. J Cancer Sci Ther. 2014;6:009–11.
Maris JM, Knudson AG. Revisiting tissue specificity of germline cancer predisposing mutations. Nat Rev Cancer. 2015;15(2):65–6. https://doi.org/10.1038/nrc3894.
Ko Y, Ament SA, Eddy JA, Caballero J, Earls JC, Hood L, et al. Cell type-specific genes show striking and distinct patterns of spatial expression in the mouse brain. Proc Natl Acad Sci USA. 2013;110(8):3095–100. https://doi.org/10.1073/pnas.1222897110.
Willard-Mack CL. Normal Structure, Function, and Histology of Lymph Nodes. Toxicol Pathol. 2006;34(5):409–24. https://doi.org/10.1080/01926230600867727.
Waters S. The Female Reproductive System. New York: The Rosen Publishing Group; 2007.
Saladin KS, Miller L. Anatomy & physiology. New York: WCB/McGraw-Hill; 1998.
Vela CIB, Padilla FJB. Determination of ammonia concentrations in cirrhosis patients-still confusing after all these years? Ann Hepatol. 2011;10:S60–5. https://doi.org/10.1016/s1665-2681(19)31609-6.
Sonawane AR, Platig J, Fagny M, Chen CY, Paulson JN, Lopes-Ramos CM, et al. Understanding Tissue-Specific Gene Regulation. Cell Rep. 2017;21(4):1077–88. https://doi.org/10.1016/j.celrep.2017.10.001.
Vaquerizas JM, Kummerfeld SK, Teichmann SA, Luscombe NM. A census of human transcription factors: function, expression and evolution. Nat Rev Genet. 2009;10(4):252–63. https://doi.org/10.1038/nrg2538.