The Shivplot: a graphical display for trend elucidation and exploratory analysis of microarray data
Tóm tắt
High-throughput systems are powerful tools for the life science research community. The complexity and volume of data from these systems, however, demand special treatment. Graphical tools are needed to evaluate many aspects of the data throughout the analysis process because plots can provide quality assessments for thousands of values simultaneously. The utility of a plot, in turn, is contingent on both its interpretability and its efficiency. The shivplot, a graphical technique motivated by microarrays but applicable to any replicated high-throughput data set, is described. The plot capitalizes on the strengths of three well-established plotting graphics – a boxplot, a distribution density plot, and a variability vs intensity plot – by effectively combining them into a single representation. The utility of the new display is illustrated with microarray data sets. The proposed graph, retaining all the information of its precursors, conserves space and minimizes redundancy, but also highlights features of the data that would be difficult to appreciate from the individual display components. We recommend the use of the shivplot both for exploratory data analysis and for the communication of experimental data in publications.
Tài liệu tham khảo
Schena M, Shalon D, Davis RW, Brown PO: Quantitative Monitoring of Gene-Expression Patterns with a Complementary-DNA Microarray. Science. 1995, 270: 467-470. 10.1126/science.270.5235.467.
Craig BA, Black MA, Doerge RW: Gene expression data: The technology and statistical analysis. Journal of Agricultural Biological and Environmental Statistics. 2003, 8: 1-28. 10.1198/1085711031256.
Nadon R, Shoemaker J: Statistical issues with microarrays: Processing and analysis. Trends in Genetics. 2002, 18: 265-271. 10.1016/S0168-9525(02)02665-3.
Cleveland WS: Visualizing data. 1993, Murray Hill NJ:, Summit NJ: At&T Bell Laboratories ; Published by Hobart Press
Allison DB, Cui X, Page GP, Sabripour M: Microarray data analysis: from disarray to consolidation and consensus. Nature Reviews Genetics. 2006, 7: 55-65. 10.1038/nrg1749.
Affymetrix – Latin Square Data. [http://www.affymetrix.com/support/technical/sample_data/datasets.affx]
Hoaglin DC, Mosteller F, Tukey JW: Understanding robust and exploratory data analysis. 1983, New York: Wiley
Tufte ER: The visual display of quantitative information. 2001, Cheshire, CT: Graphics Press
Esty W, Banfield J: The Box-Percentile Plot. Journal of Statistical Software. 2003, 8.17: 1-14.
Venables WN, Ripley BD: Modern applied statistics with S-PLUS. 1999, Statistics and computing, New York: Springer-Verlag, 3
Silverman BW: Density estimation for statistics and data analysis. 1986, London ; New York: Chapman and Hall
Durbin B, Hardin J, Hawkins D, Rocke DM: A variance-stabilizing transformation for gene-expression microarray data. Bioinformatics. 2002, 18: S105-S110.
Durbin B, Rocke DM: Estimation of transformation parameters for microarray data. Bioinformatics. 2003, 19: 1360-1367. 10.1093/bioinformatics/btg178.
Geller SC, Gregg JP, Hagerman P, Rocke DM: Transformation and normalization of oligonucleotide microarray data. Bioinformatics. 2003, 19: 1817-1823. 10.1093/bioinformatics/btg245.
Huber W, von Heydebreck A, Sultmann H, Poustka A, Vingron M: Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics. 2002, 1: 1-9.
Rocke DM: Design and analysis of experiments with high throughput biological assay data. Seminars in Cell & Developmental Biology. 2004, 15: 703-713. 10.1016/j.semcdb.2004.09.007.
Baldi P, Long AD: A Bayesian framework for the analysis of microarray expression data: Regularized t-test and statistical inferences of gene changes. Bioinformatics. 2001, 17: 509-519. 10.1093/bioinformatics/17.6.509.
Jain N, Thatte J, Braciale T, Ley K, O'Connell M, Lee J: Local pooled error test for identifying differentially expressed genes with a small number of replicated microarrays. Bioinformatics. 2003, 19: 1945-1951. 10.1093/bioinformatics/btg264.
Cleveland WS, Devlin SJ: Locally Weighted Regression – an Approach to Regression-Analysis by Local Fitting. Journal of the American Statistical Association. 1988, 83: 596-610. 10.2307/2289282.
Kendziorski C, Irizarry RA, Chen KS, Haag JD, Gould MN: On the utility of pooling biological samples in microarray experiments. Proc Natl Acad Sci U S A. 2005, 102 (12): 4252-7. 10.1073/pnas.0500607102.
Choe SE, Boutros M, Michelson AMCGM, Halfon MS: Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset. Genome Biology. 2005, 6:
Affymetrix: Microarray Suite User Guide, Version 5. 2001
Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003, 4: 249-264. 10.1093/biostatistics/4.2.249.
Cope LM, Irizarry RA, Jaffee HA, Wu ZJ, Speed TP: A benchmark for affymetrix GeneChip expression measures. Bioinformatics. 2004, 20: 323-331. 10.1093/bioinformatics/btg410.
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge YC, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang JH: Bioconductor: open software development for computational biology and bioinformatics. Genome Biology. 2004, 5:
Li C, Wong WH: Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proceedings of the National Academy of Sciences of the United States of America. 2001, 98: 31-36. 10.1073/pnas.011404098.
Wu Z, Irizarry R, Gentleman R, Murillo F, Spencer F: A model based background adjustment for oligonucleotide expression arrays. 2004
Zhou L, Rocke DM: An expression index for Affymetrix GeneChips based on the generalized logarithm. Bioinformatics. 2005, 21 (21): 3983-9. 10.1093/bioinformatics/bti665.
Dabney AR, Storey JD: A reanalysis of a published Affymetrix GeneChip control dataset. Genome Biology. 2006, 7: