Integrative genomics and transcriptomics analysis of human embryonic and induced pluripotent stem cells

BioData Mining - Tập 7 - Trang 1-16 - 2014
Kirsti Laurila1, Reija Autio2,3, Lingjia Kong2,4, Elisa Närvä4, Samer Hussein5,6, Timo Otonkoski6, Riitta Lahesmaa4, Harri Lähdesmäki1,4
1Department of Information and Computer Science, Aalto University School of Science, Espoo, Finland
2Department of Signal Processing, Tampere University of Technology, Tampere, Finland
3School of Health Sciences, University of Tampere, Tampere, Finland
4Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Turku, Finland
5Samuel Lunenfeld Research Institute, Toronto, Canada
6Research Program Unit, Molecular Neurology, Biomedicum Stem Cell Center, University of Helsinki, Helsinki, Finland

Tóm tắt

Human genomic variations, including single nucleotide polymorphisms (SNPs) and copy number variations (CNVs), are associated with several phenotypic traits varying from mild features to hereditary diseases. Several genome-wide studies have reported genomic variants that correlate with gene expression levels in various tissue and cell types. We studied human embryonic stem cells (hESCs) and human induced pluripotent stem cells (hiPSCs) measuring the SNPs and CNVs with Affymetrix SNP 6 microarrays and expression values with Affymetrix Exon microarrays. We computed the linear relationships between SNPs and expression levels of exons, transcripts and genes, and the associations between gene CNVs and gene expression levels. Further, for a few of the resulted genes, the expression value was associated with both CNVs and SNPs. Our results revealed altogether 217 genes and 584 SNPs whose genomic alterations affect the transcriptome in the same cells. We analyzed the enriched pathways and gene ontologies within these groups of genes, and found out that the terms related to alternative splicing and development were enriched. Our results revealed that in the human pluripotent stem cells, the expression values of several genes, transcripts and exons were affected due to the genomic variation.

Tài liệu tham khảo

The International HapMap Consortium: The International HapMap Project. Nature. 2003, 426: 789-796. 10.1038/nature02168. Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA, 1000 Genomes: An integrated map of genetic variation from 1,092 human genomes. Nature. 2012, 491: 56-65. 10.1038/nature11632. Wang L, Luhm R, Lei M: SNP and mutation analysis. Adv Exp Med Biol. 2007, 593: 105-116. 10.1007/978-0-387-39978-2_11. Hull J, Campino S, Rowlands K, Chan M-S, Copley RR, Taylor MS, Rockett K, Elvidge G, Keating B, Knight J, Kwiatkowski D: Identification of Common Genetic Variation That Modulates Alternative Splicing. PLoS Genet. 2007, 3: e99-10.1371/journal.pgen.0030099. Coulombe-Huntington J, Lam KCL, Dias C, Majewski J: Fine-Scale Variation and Genetic Determinants of Alternative Splicing across Individuals. PLoS Genet. 2009, 5: e1000766-10.1371/journal.pgen.1000766. Gibbs JR, van der Brug MP, Hernandez DG, Traynor BJ, Nalls MA, Lai S-L, Arepalli S, Dillman A, Rafferty IP, Troncoso J, Johnson R, Zielke HR, Ferrucci L, Longo DL, Cookson MR, Singleton AB: Abundant Quantitative Trait Loci Exist for DNA Methylation and Gene Expression in Human Brain. PLoS Genet. 2010, 6: e1000952-10.1371/journal.pgen.1000952. Zhang W, Edwards A, Zhu D, Flemington EK, Deininger P, Zhang K: miRNA-mediated relationships between Cis-SNP genotypes and transcript intensities in lymphocyte cell lines. PloS One. 2012, 7: e31429-10.1371/journal.pone.0031429. Närvä E, Autio R, Rahkonen N, Kong L, Harrison N, Kitsberg D, Borghese L, Itskovitz-Eldor J, Rasool O, Dvorak P, Hovatta O, Otonkoski T, Tuuri T, Cui W, Brustle O, Baker D, Maltby E, Moore HD, Benvenisty N, Andrews PW, Yli-Harja O, Lahesmaa R: High-resolution DNA analysis of human embryonic stem cell lines reveals culture-induced copy number changes and loss of heterozygosity. Nat Biotech. 2010, 28: 371-377. 10.1038/nbt.1615. Järvinen A-K, Autio R, Haapa-Paananen S, Wolf M, Saarela M, Grénman R, Leivo I, Kallioniemi O, Mäkitie AA, Monni O: Identification of target genes in laryngeal squamous cell carcinoma by high-resolution copy number and gene expression microarray analyses. Oncogene. 2006, 25: 6997-7008. 10.1038/sj.onc.1209690. Skotheim RI, Autio R, Lind GE, Kraggerud SM, Andrews PW, Monni O, Kallioniemi O, Lothe RA: Novel genomic aberrations in testicular germ cell tumors by array-CGH, and associated gene expression changes. Cell Oncol Off J Int Soc Cell Oncol. 2006, 28: 315-326. Stranger BE, Montgomery SB, Dimas AS, Parts L, Stegle O, Ingle CE, Sekowska M, Smith GD, Evans D, Gutierrez-Arcelus M, Price A, Raj T, Nisbett J, Nica AC, Beazley C, Durbin R, Deloukas P, Dermitzakis ET: Patterns of Cis Regulatory Variation in Diverse Human Populations. PLoS Genet. 2012, 8: e1002639-10.1371/journal.pgen.1002639. Spielman RS, Bastone LA, Burdick JT, Morley M, Ewens WJ, Cheung VG: Common genetic variants account for differences in gene expression among ethnic groups. Nat Genet. 2007, 39: 226-231. 10.1038/ng1955. Kwan T, Benovoy D, Dias C, Gurd S, Provencher C, Beaulieu P, Hudson TJ, Sladek R, Majewski J: Genome-wide analysis of transcript isoform variation in humans. Nat Genet. 2008, 40: 225-231. 10.1038/ng.2007.57. Zhang W, Duan S, Bleibel WK, Wisel SA, Huang RS, Wu X, He L, Clark TA, Chen TX, Schweitzer AC, Blume JE, Dolan ME, Cox NJ: Identification of common genetic variants that account for transcript isoform variation between human populations. Hum Genet. 2009, 125: 81-93. 10.1007/s00439-008-0601-x. Richards JB, Rivadeneira F, Inouye M, Pastinen TM, Soranzo N, Wilson SG, Andrew T, Falchi M, Gwilliam R, Ahmadi KR, Valdes AM, Arp P, Whittaker P, Verlaan DJ, Jhamai M, Kumanduri V, Moorhouse M, van Meurs JB, Hofman A, Pols HAP, Hart D, Zhai G, Kato BS, Mullin BH, Zhang F, Deloukas P, Uitterlinden AG, Spector TD: Bone mineral density, osteoporosis, and osteoporotic fractures: a genome-wide association study. Lancet. 2008, 371: 1505-1512. 10.1016/S0140-6736(08)60599-1. Lee Y, Gamazon ER, Rebman E, Lee Y, Lee S, Dolan ME, Cox NJ, Lussier YA: Variants affecting exon skipping contribute to complex traits. PLoS Genet. 2012, 8: e1002998-10.1371/journal.pgen.1002998. Degner JF, Pai AA, Pique-Regi R, Veyrieras J-B, Gaffney DJ, Pickrell JK, De Leon S, Michelini K, Lewellen N, Crawford GE, Stephens M, Gilad Y, Pritchard JK: DNase I sensitivity QTLs are a major determinant of human expression variation. Nature. 2012, 482: 390-394. 10.1038/nature10808. Heap G, Trynka G, Jansen R, Bruinenberg M, Swertz M, Dinesen L, Hunt K, Wijmenga C, van Heel D, Franke L: Complex nature of SNP genotype effects on gene expression in primary human leucocytes. BMC Med Genomics. 2009, 2: 1-10.1186/1755-8794-2-1. Gerrits A, Li Y, Tesson BM, Bystrykh LV, Weersing E, Ausema A, Dontje B, Wang X, Breitling R, Jansen RC, de Haan G: Expression Quantitative Trait Loci Are Highly Sensitive to Cellular Differentiation State. PLoS Genet. 2009, 5: e1000692-10.1371/journal.pgen.1000692. Lee J-H, Park I-H, Gao Y, Li JB, Li Z, Daley GQ, Zhang K, Church GM: A robust approach to identifying tissue-specific gene expression regulatory variants using personalized human induced pluripotent stem cells. PLoS Genet. 2009, 5: e1000718-10.1371/journal.pgen.1000718. Hussein SM, Batada NN, Vuoristo S, Ching RW, Autio R, Närvä E, Ng S, Sourour M, Hamalainen R, Olsson C, Lundin K, Mikkola M, Trokovic R, Peitz M, Brustle O, Bazett-Jones DP, Alitalo K, Lahesmaa R, Nagy A, Otonkoski T: Copy number variation and selection during reprogramming to pluripotency. Nature. 2011, 471: 58-62. 10.1038/nature09871. Laurent LC, Ulitsky I, Slavin I, Tran H, Schork A, Morey R, Lynch C, Harness JV, Lee S, Barrero MJ, Ku S, Martynova M, Semechkin R, Galat V, Gottesfeld J, Belmonte JCI, Murry C, Keirstead HS, Park H-S, Schmidt U, Laslett AL, Muller F-J, Nievergelt CM, Shamir R, Loring JF: Dynamic Changes in the Copy Number of Pluripotency and Cell Proliferation Genes in Human ESCs and iPSCs during Reprogramming and Time in Culture. Cell Stem Cell. 2011, 8: 106-118. 10.1016/j.stem.2010.12.003. Lund RJ, Närvä E, Lahesmaa R: Genetic and epigenetic stability of human pluripotent stem cells. Nat Rev Genet. 2012, 13: 732-744. 10.1038/nrg3271. Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, Redon R, Bird CP, de Grassi A, Lee C, Tyler-Smith C, Carter N, Scherer SW, Tavaré S, Deloukas P, Hurles ME, Dermitzakis ET: Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science. 2007, 315: 848-853. 10.1126/science.1136678. Bengtson H, Simpson K, Bullard J, Hansen K: Aroma.affymetrix: A Generic Framework in R for Analyzing Small to Very Large Affymetrix Data Sets in Bounded Memory. Technical Report 745, University of California, Department of Statistics. 2008, Department of Statistics, University of California Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5: R80-10.1186/gb-2004-5-10-r80. Ihaka R, Gentleman R: R: A Language for Data Analysis and Graphics. J Comput Graph Stat. 1996, 5: 299-314. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostat Oxf Engl. 2003, 4: 249-264. 10.1093/biostatistics/4.2.249. Hautaniemi S, Ringnér M, Kauraniemi P, Autio R, Edgren H, Yli-Harja O, Astola J, Kallioniemi A, Kallioniemi O-P: A strategy for identifying putative causes of gene expression variation in human cancers. Genomics Signal Process Stat. 2004, 341: 77-88. Järvinen A-K, Autio R, Kilpinen S, Saarela M, Leivo I, Grénman R, Mäkitie AA, Monni O: High-resolution copy number and gene expression microarray analyses of head and neck squamous cell carcinoma cell lines of tongue and larynx. Genes Chromosomes Cancer. 2008, 47: 500-509. 10.1002/gcc.20551. Myllykangas S, Junnila S, Kokkola A, Autio R, Scheinin I, Kiviluoto T, Karjalainen-Lindsberg M-L, Hollmén J, Knuutila S, Puolakkainen P, Monni O: Integrated gene copy number and expression microarray analysis of gastric cancer highlights potential target genes. Int J Cancer J Int Cancer. 2008, 123: 817-825. 10.1002/ijc.23574. Dai M, Wang P, Boyd AD, Kostov G, Athey B, Jones EG, Bunney WE, Myers RM, Speed TP, Akil H, Watson SJ, Meng F: Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 2005, 33: e175-10.1093/nar/gni179. Kapur K, Jiang H, Xing Y, Wong WH: Cross-hybridization modeling on Affymetrix exon arrays. Bioinforma Oxf Engl. 2008, 24: 2887-2893. 10.1093/bioinformatics/btn571. Cooper TA, Wan L, Dreyfuss G: RNA and Disease. Cell. 2009, 136: 777-793. 10.1016/j.cell.2009.02.011. Gamazon ER, Zhang W, Dolan ME, Cox NJ: Comprehensive Survey of SNPs in the Affymetrix Exon Array Using the 1000 Genomes Dataset. PLoS ONE. 2010, 5: e9366-10.1371/journal.pone.0009366. Sela N, Mersch B, Hotz-Wagenblatt A, Ast G: Characteristics of transposable element exonization within human and mouse. PloS One. 2010, 5: e10907-10.1371/journal.pone.0010907. Ciobanu DC, Lu L, Mozhui K, Wang X, Jagalur M, Morris JA, Taylor WL, Dietz K, Simon P, Williams RW: Detection, validation, and downstream analysis of allelic variation in gene expression. Genetics. 2010, 184: 119-128. 10.1534/genetics.109.107474. Benovoy D, Kwan T, Majewski J: Effect of polymorphisms within probe-target sequences on olignonucleotide microarray experiments. Nucleic Acids Res. 2008, 36: 4417-4423. 10.1093/nar/gkn409. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001, 29: 308-311. 10.1093/nar/29.1.308. Bolstad BM, Irizarry R, Astrand M, Speed T: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003, 19: 185-193. 10.1093/bioinformatics/19.2.185. Gautier L, Cope L, Bolstad BM, Irizarry RA: affy–analysis of Affymetrix GeneChip data at the probe level. Bioinforma Oxf Engl. 2004, 20: 307-315. 10.1093/bioinformatics/btg405. Carvalho B, Bengtsson H, Speed TP, Irizarry RA: Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. Biostat Oxf Engl. 2007, 8: 485-499. 10.1093/biostatistics/kxl042. Guo L, Du Y, Chang S, Zhang K, Wang J: rSNPBase: a database for curated regulatory SNPs. Nucleic Acids Res. 2014, 42: D1033-D1039. 10.1093/nar/gkt1167. Boyer LA, Lee TI, Cole MF, Johnstone SE, Levine SS, Zucker JP, Guenther MG, Kumar RM, Murray HL, Jenner RG, Gifford DK, Melton DA, Jaenisch R, Young RA: Core transcriptional regulatory circuitry in human embryonic stem cells. Cell. 2005, 122: 947-956. 10.1016/j.cell.2005.08.020. Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K, Voss N, Stegmaier P, Lewicki-Potapov B, Saxel H, Kel AE, Wingender E: TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 2006, 34 (Database issue): D108-110. 10.1093/nar/gkj143. Stormo GD: DNA binding sites: representation and discovery. Bioinforma Oxf Engl. 2000, 16: 16-23. 10.1093/bioinformatics/16.1.16. Dennis G, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA: DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 2003, 4: P3-10.1186/gb-2003-4-5-p3. Huang DW, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009, 4: 44-57. 10.1038/nprot.2008.211. Lin J, Kreisberg R, Kallio A, Dudley A, Nykter M, Shmulevich I, May P, Autio R: POMO - Plotting Omics analysis results for Multiple Organisms. BMC Genomics. 2013, 14: 918-10.1186/1471-2164-14-918. McGarvey PB, Huang H, Barker WC, Orcutt BC, Garavelli JS, Srinivasarao GY, Yeh LS, Xiao C, Wu CH: PIR: a new resource for bioinformatics. Bioinforma Oxf Engl. 2000, 16: 290-291. 10.1093/bioinformatics/16.3.290. Kasowski M, Grubert F, Heffelfinger C, Hariharan M, Asabere A, Waszak SM, Habegger L, Rozowsky J, Shi M, Urban AE, Hong MY, Karczewski KJ, Huber W, Weissman SM, Gerstein MB, Korbel JO, Snyder M: Variation in Transcription Factor Binding Among Humans. Science. 2010, 328: 232-235. 10.1126/science.1183621. Chen K, van Nimwegen E, Rajewsky N, Siegal ML: Correlating gene expression variation with cis-regulatory polymorphism in Saccharomyces cerevisiae. Genome Biol Evol. 2010, 2: 697-707. Hellman A, Chess A: Extensive sequence-influenced DNA methylation polymorphism in the human genome. Epigenetics Chromatin. 2010, 3: 11-10.1186/1756-8935-3-11.