Whole-genome expression analysis: challenges beyond clustering

Current Opinion in Structural Biology - Tập 11 Số 3 - Trang 340-347 - 2001
Russ B. Altman1, Soumya Raychaudhuri1
1Stanford Medical Informatics, 251 Campus Drive, MSOB X-215, Stanford, California 95305-5479, USA.

Tóm tắt

Từ khóa


Tài liệu tham khảo

Michaels GS, Carr DB, Askenazi M, Fuhrman S, Wen X, Somogyi R: Cluster analysis and data visualization of large-scale gene expression data. Pac Symp Biocomput 1998:42-53.

Raychaudhuri S, Stuart JM, Altman RB: Principal components analysis to summarize microarray experiments: application to sporulation time series. Pac Symp Biocomput 2000:455-466.

Koza JR, Mydlower JD, Lanza G, Yu J, Keanne MA: Reverse engineering of metabolic pathways from observed data using genetic programming. Pac Symp Biocomput 2001:434-445. Genetic programming allows computer programs to evolve under selective pressure in order to maximize their performance on a given task. This paper is the first to apply these methods to genetic network reconstruction.

van Someren, 2000, Linear modeling of genetic networks from experimental data, Ismb, 8, 355

Hartemink AJ, Gifford DK, Jaakkola TS, Young RA: Using graphical models and genomic expression data to statistically validate models of regulatory networks. Pac Symp Biocomput 2001:422-433. Although large amounts of data are required to build a Bayesian network de novo, it is relatively easy to evaluate the compatibility of a network with a given set of data. The investigators encoded two models for galactose regulation and then scored them against experimental data. They were able to recover the correct network in yeast based on 52 expression arrays that were collected without this question in mind.

DeRisi, 1997, Exploring the metabolic and genetic control of gene expression on a genomic scale, Science, 278, 680, 10.1126/science.278.5338.680

Cho, 1998, A genome-wide transcriptional analysis of the mitotic cell cycle, Mol Cell, 2, 65, 10.1016/S1097-2765(00)80114-8

Spellman, 1998, Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization, Mol Biol Cell, 9, 3273, 10.1091/mbc.9.12.3273

Butte A, Ye J, Niederfellner G, Rett K, Häring H, White M, Kohane I: Determining significant fold differences in gene expression analysis. Pac Symp Biocomput 2001:6-17.

White, 1999, Microarray analysis of Drosophila development during metamorphosis, Science, 286, 2179, 10.1126/science.286.5447.2179

Holter, 2000, Fundamental patterns underlying gene expression profiles: simplicity from complexity, Proc Natl Acad Sci USA, 97, 8409, 10.1073/pnas.150242097

Jansen, 2000, Analysis of the yeast transcriptome with structural and functional categories: characterizing highly expressed proteins, Nucleic Acids Res, 28, 1481, 10.1093/nar/28.6.1481

Kane, 2000, Assessment of the sensitivity and specificity of oligonucleotide (50mer) microarrays, Nucleic Acids Res, 28, 4552, 10.1093/nar/28.22.4552

Talaat, 2000, Genome-directed primers for selective labeling of bacterial transcripts for DNA microarray analysis, Nat Biotechnol, 18, 679, 10.1038/76543

Sengupta R, Tompa M: Quality control in manufacturing oligo arrays: a combinatorial design approach. Pac Symp Biocomput 2001:348-359.

Tsien CL, Libermann TA, Gu X, Kohane IS: On the reporting of fold differences. Pac Symp Biocomput 2001:496-507. This study addresses the issue of whether or not a particular expression value is truly meaningful, or just part of the noise. The authors created a tool to examine replicated data and then to mask insignificant fold differences in expression.

Claverie, 1999, Computational methods for the identification of differential and coordinated gene expression, Hum Mol Genet, 8, 1821, 10.1093/hmg/8.10.1821

Manduchi, 2000, Generation of patterns from gene expression data by assigning confidence to differentially expressed genes, Bioinformatics, 16, 685, 10.1093/bioinformatics/16.8.685

Park P, Pagano M, Bonetti M: A nonparametric scoring algorithm for identifying informative genes from microarray data. Pac Symp Biocomput 2001:52-63.

Raychaudhuri, 2000, Pattern recognition of genomic features with microarrays: site typing of Mycobacterium tuberculosis strains, Ismb, 8, 286

Klus G, Song A, Schick A, Wahde M, Szallasi Z: Mutual information analysis as a tool to assess the role of aneuploidy in the generation of cancer-associated differential gene expression patterns. Pac Symp Biocomput 2001:42-51.

Hughes, 2000, Widespread aneuploidy revealed by DNA microarray expression profiling, Nat Genet, 25, 333, 10.1038/77116

Golub, 1999, Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science, 286, 531, 10.1126/science.286.5439.531

Alizadeh, 2000, Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling, Nature, 403, 503, 10.1038/35000501

Bittner, 2000, Molecular classification of cutaneous malignant melanoma by gene expression profiling, Nature, 406, 536, 10.1038/35020115

Ross, 2000, Systematic variation in gene expression patterns in human cancer cell lines, Nat Genet, 24, 227, 10.1038/73432

Ben-Dor, 1999, Clustering gene expression patterns, J Comput Biol, 6, 281, 10.1089/106652799318274

Scherf, 2000, A gene expression database for the molecular pharmacology of cancer, Nat Genet, 24, 236, 10.1038/73439

Holstege, 1998, Dissecting the regulatory circuitry of a eukaryotic genome, Cell, 95, 717, 10.1016/S0092-8674(00)81641-4

Roberts, 2000, Signaling and circuitry of multiple MAPK pathways revealed by a matrix of global gene expression profiles, Science, 287, 873, 10.1126/science.287.5454.873

Hughes, 2000, Functional discovery via a compendium of expression profiles, Cell, 102, 109, 10.1016/S0092-8674(00)00015-5

Iyer, 1999, The transcriptional program in the response of human fibroblasts to serum, Science, 283, 83, 10.1126/science.283.5398.83

Coller, 2000, Expression analysis with oligonucleotide microarrays reveals that MYC regulates genes involved in growth, cell cycle, signaling, and adhesion, Proc Natl Acad Sci USA, 97, 3260, 10.1073/pnas.97.7.3260

Phillips, 2000, The genetic program of hematopoietic stem cells, Science, 288, 1635, 10.1126/science.288.5471.1635

Eisen, 1998, Cluster analysis and display of genome-wide expression patterns, Proc Natl Acad Sci USA, 95, 14863, 10.1073/pnas.95.25.14863

Tamayo, 1999, Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation, Proc Natl Acad Sci USA, 96, 2907, 10.1073/pnas.96.6.2907

Toronen, 1999, Analysis of gene expression data using self-organizing maps, FEBS Lett, 451, 142, 10.1016/S0014-5793(99)00524-4

Sharan, 2000, CLICK: a clustering algorithm with applications to gene expression analysis, Ismb, 8, 307

Sasik R, Hwa T, Iranar N, Loomis W: Percolation clustering: a novel algorithm applied to the clustering of gene expression patterns in Dictyostelium development. Pac Symp Biocomput 2001:335-347.

Heyer, 1999, Exploring expression data: identification and analysis of coexpressed genes, Genome Res, 9, 1106, 10.1101/gr.9.11.1106

Alon, 1999, Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proc Natl Acad Sci USA, 96, 6745, 10.1073/pnas.96.12.6745

Cheng, 2000, Biclustering of expression data, Ismb, 8, 93

Califano, 2000, Analysis of gene expression microarrays for phenotype classification, Ismb, 8, 75

Tavazoie, 1999, Systematic determination of genetic network architecture, Nat Genet, 22, 281, 10.1038/10343

Mandel-Gutfreund Y, Baron A, Margalit H: A structure-based approach for prediction of protein binding sites in gene-upstream regions. Pac Symp Biocomput 2001:139-150. The investigators deviate from the more familiar multiple alignment sequence search approaches to identify novel binding sites. Rather, they use crystal structure information about the transcription factor and a knowledge-based potential to identify putative upstream binding regions. As more transcription factor crystal structures become available and our ability to predict unknown structures improves, approaches such as this one may become common.

Moskvina, 1998, A search in the genome of Saccharomyces cerevisiae for genes regulated via stress response elements, Yeast, 14, 1041, 10.1002/(SICI)1097-0061(199808)14:11<1041::AID-YEA296>3.0.CO;2-4

van Helden, 1998, Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies, J Mol Biol, 281, 827, 10.1006/jmbi.1998.1947

van Helden, 2000, A web site for the computational analysis of yeast regulatory sequences, Yeast, 16, 177, 10.1002/(SICI)1097-0061(20000130)16:2<177::AID-YEA516>3.0.CO;2-9

Zhu J, Zhang MQ: Cluster, function and promoter: analysis of yeast expression array. Pac Symp Biocomput 2000:479-490.

Bussemaker, 2000, Regulatory element detection using a probabilistic segmentation model, Ismb, 8, 67

Jacobs Anderson, 2000, Computational identification of cis-acting elements affecting post-transcriptional control of gene expression in Saccharomyces cerevisiae, Nucleic Acids Res, 28, 1604, 10.1093/nar/28.7.1604

Brazma, 1998, Predicting gene regulatory elements in silico on a genomic scale, Genome Res, 8, 1202, 10.1101/gr.8.11.1202

Brazma, 1997, Data mining for regulatory elements in yeast genome, Ismb, 5, 65

Vilo, 2000, Mining for putative regulatory elements in the yeast genome using gene expression data, Ismb, 8, 384

Juhl Jensen, 2000, Automatic discovery of regulatory patterns in promoter regions based on whole cell expression data and functional annotation, Bioinformatics, 16, 326, 10.1093/bioinformatics/16.4.326

Lawrence, 1993, Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment, Science, 262, 208, 10.1126/science.8211139

Hertz, 1999, Identifying DNA and protein patterns with statistically significant alignments of multiple sequences, Bioinformatics, 15, 563, 10.1093/bioinformatics/15.7.563

Schneider, 1986, Information content of binding sites on nucleotide sequences, J Mol Biol, 188, 415, 10.1016/0022-2836(86)90165-8

Workman CT, Stormo GD: ANN-Spec: a method for discovering transcription factor binding sites with improved specificity. Pac Symp Biocomput 2000:467-478.

Liu X, Brutlag DL, Liu JS: BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac Symp Biocomput 2001:127-138. BioProspector is a modified version of the Gibbs sampler that permits the user to enter both a positive set to look for motifs and a negative set to create the background statistics in order to improve performance. BioProspector also contains a variety of other modifications that make it particularly suitable for binding-site searching; for example, it can look for palindromic sites and can identify motifs with variable-length gaps between them.

Hughes, 2000, Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae, J Mol Biol, 296, 1205, 10.1006/jmbi.2000.3519

Holmes, 2000, Finding regulatory elements using joint likelihoods for sequence and expression profile data, Ismb, 8, 202

Marcotte, 1999, A combined algorithm for genome-wide prediction of protein function, Nature, 402, 83, 10.1038/47048

Brown, 2000, Knowledge-based analysis of microarray gene expression data by using support vector machines, Proc Natl Acad Sci USA, 97, 262, 10.1073/pnas.97.1.262

Wilson, 2000, Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores, J Mol Biol, 297, 233, 10.1006/jmbi.2000.3550

Shatkay, 2000, Genes, themes and microarrays: using information retrieval for large-scale gene analysis, Ismb, 8, 317

Hvidsten T, Komorowski J, Sandvik A, Loegreid A: Predicting gene function from gene expressions and ontologies. Pac Symp Biocomput 2001:299-310.

Drawid, 2000, A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome, J Mol Biol, 301, 1059, 10.1006/jmbi.2000.3968

McAdams, 1995, Circuit simulation of genetic networks, Science, 269, 650, 10.1126/science.7624793

Maki Y, Tominaga D, Okamoto M, Watanabe S, Eguchi Y: Development of a system for the inference of large-scale genetic networks. Pac Symp Biocomput 2001:446-458.

Akutsu, 2000, Inferring qualitative relations in genetic networks and metabolic pathways, Bioinformatics, 16, 727, 10.1093/bioinformatics/16.8.727

D'Haeseleer P, Wen X, Fuhrman S, Somogyi R: Linear modeling of mRNA expression levels during CNS development and injury. Pac Symp Biocomput 1999:41-52.

Friedman, 2000, Using Bayesian networks to analyze expression data, J Comput Biol, 7, 601, 10.1089/106652700750050961

Yeung, 2001, Validating clustering for gene expression data, Bioinformatics, 17, 309, 10.1093/bioinformatics/17.4.309

Herrero, 2001, A hierarchical unsupervised growing neural network for clustering gene expression patterns, Bioinformatics, 17, 126, 10.1093/bioinformatics/17.2.126

Bussemaker, 2001, Regulatory element detection using correlation with expression, Nat Genet, 17, 167, 10.1038/84792

Masys, 2001, Use of keyword hierarchies to interpret gene expression patterns, Bioinformatics, 17, 319, 10.1093/bioinformatics/17.4.319

Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman R: Missing value estimation methods for DNA microarrays. Bioinformatics 2001, in press.