A new non-linear normalization method for reducing variability in DNA microarray experiments

Genome Biology - Tập 3 - Trang 1-16 - 2002
Christopher Workman1,2, Lars Juhl Jensen2, Hanne Jarmer2, Randy Berka3, Laurent Gautier2, Henrik Bjørn Nielser2, Hans-Henrik Saxild4, Claus Nielsen5, Søren Brunak2, Steen Knudsen2
1Genedata AG, Basel, Switzerland
2Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark
3Novozymes Biotechnology, Davis, USA
4Center for Microbiology, Technical University of Denmark, DK-2800, Denmark
5Statens Serum Institut, Copenhagen, Denmark

Tóm tắt

Microarray data are subject to multiple sources of variation, of which biological sources are of interest whereas most others are only confounding. Recent work has identified systematic sources of variation that are intensity-dependent and non-linear in nature. Systematic sources of variation are not limited to the differing properties of the cyanine dyes Cy5 and Cy3 as observed in cDNA arrays, but are the general case for both oligonucleotide microarray (Affymetrix GeneChips) and cDNA microarray data. Current normalization techniques are most often linear and therefore not capable of fully correcting for these effects. We present here a simple and robust non-linear method for normalization using array signal distribution analysis and cubic splines. These methods compared favorably to normalization using robust local-linear regression (lowess). The application of these methods to oligonucleotide arrays reduced the relative error between replicates by 5-10% compared with a standard global normalization method. Application to cDNA arrays showed improvements over the standard method and over Cy3-Cy5 normalization based on dye-swap replication. In addition, a set of known differentially regulated genes was ranked higher by the t-test. In either cDNA or Affymetrix technology, signal-dependent bias was more than ten times greater than the observed print-tip or spatial effects. Intensity-dependent normalization is important for both high-density oligonucleotide array and cDNA array data. Both the regression and spline-based methods described here performed better than existing linear methods when assessed on the variability of replicate arrays. Dye-swap normalization was less effective at Cy3-Cy5 normalization than either regression or spline-based methods alone.

Tài liệu tham khảo

Li C, Wong WH: Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proc Natl Acad Sci USA. 2001, 98: 31-36. 10.1073/pnas.011404098. Li C, Wong WH: Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biol. 2001, 2: research0032.1-0032.11. 10.1186/gb-2001-2-8-research0032. Schadt EE, Li C, Su C, Wong WH: Analyzing high-density oligonucleotide gene expression array data. J Cell Biochem. 2000, 80: 192-202. 10.1002/1097-4644(20010201)80:2<192::AID-JCB50>3.3.CO;2-N. Schadt EE, Li C, Ellis B, Wong WH: Feature extraction and normalization algorithms for high-density oligonucleotide gene expression array data. J Cell Biochem. 2001, Suppl 37: 120-125. 10.1002/jcb.10073. Cavalieri D, Townsend JP, Hartl DL: Manifold anomalies in gene expression in a vineyard isolate of Saccharomyces cerevisiae revealed by DNA microarray analysis. Proc Natl Acad Sci USA. 2000, 97: 12369-12374. 10.1073/pnas.210395297. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, et al: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999, 286: 531-537. 10.1126/science.286.5439.531. Virtaneva K, Wright FA, Tanner SM, Yuan B, Lemon WJ, Caligiuri MA, Bloomfield CD, de la Chapelle A, Krahe R: Expression profiling reveals fundamental biological differences in acute myeloid leukemia with isolated trisomy 8 and normal cytogenetics. Proc Natl Acad Sci USA. 2001, 98: 1124-1129. 10.1073/pnas.98.3.1124. Kerr K, Martin M, Churchill G: Analysis of variance for gene expression microarray data. J Comput Biol. 2000, 7: 819-837. 10.1089/10665270050514954. Zolotukhin I, Lange J: Application of analysis of variance schemes to expression data. In Proceedings of the German Conference on Bioinformatics. Berlin: Logos Verlag;. 2000, 159-166. Pollack JR, Perou CM, Alizadeh AA, Eisen MB, Pergamenschikov A, Williams CF, Jeffrey SS, Botstein D, Brown PO: Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat Genet. 1999, 23: 41-46. 10.1038/14385. Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP: Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002, 30: e15-10.1093/nar/30.4.e15. Chiang DY, Brown PO, Eisen MB: Visualizing associations between genome sequences and gene expression data using genome-mean expression profiles. Bioinformatics. 2001, 17: S49-S55. Tseng GC, Oh MK, Rohlin L, Liao JC, Wong WH: Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res. 2001, 29: 2549-2557. 10.1093/nar/29.12.2549. Cleveland WS, Grosse E, Shyu WM: Local regression models. In Statistical Models in S. Edited by: Chambers JM, Hastie TJ. 1992, Pacific Grove, CA: Wadsworth & Brooks/Cole;, Chapter 8- The Comprehensive R Archive Network. [http://cran.us.r-project.org] R package: statistics for microarray analysis. [http://www.stat.berkeley.edu/users/terry/zarray/Software/smacode.html] Additional figures. [http://www.cbs.dtu.dk/~workman/qspline] Zien A, Aigner T, Zimmer R, Lengauer T: Centralization: a new method for the normalization of gene expression data. Bioinformatics. 2001, 17: s323-s331. Saxild HH, Jacobsen JH, Nygaard P: Functional analysis of the Bacillus subtilis purT gene encoding formate-dependent glycinamide ribonucleotide transformylase. Microbiology. 1995, 141: 2211-2218. Microarrays.org. [http://www.microarrays.org] BioConductor: software for bioinformatics. [http://www.bioconductor.org]