Orthogonal projections to latent structures as a strategy for microarray data normalization

BMC Bioinformatics - Tập 8 - Trang 1-10 - 2007
Max Bylesjö1, Daniel Eriksson2, Andreas Sjödin3, Stefan Jansson3, Thomas Moritz2, Johan Trygg1
1Research Group for Chemometrics, Department of Chemistry, Umeå University, Umeå, Sweden
2Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, Umeå, Sweden
3Umeå Plant Science Centre, Department of Plant Physiology, Umeå University, Umeå, Sweden

Tóm tắt

During generation of microarray data, various forms of systematic biases are frequently introduced which limits accuracy and precision of the results. In order to properly estimate biological effects, these biases must be identified and discarded. We introduce a normalization strategy for multi-channel microarray data based on orthogonal projections to latent structures (OPLS); a multivariate regression method. The effect of applying the normalization methodology on single-channel Affymetrix data as well as dual-channel cDNA data is illustrated. We provide a parallel comparison to a wide range of commonly employed normalization methods with diverse properties and strengths based on sensitivity and specificity from external (spike-in) controls. On the illustrated data sets, the OPLS normalization strategy exhibits leading average true negative and true positive rates in comparison to other evaluated methods. The OPLS methodology identifies joint variation within biological samples to enable the removal of sources of variation that are non-correlated (orthogonal) to the within-sample variation. This ensures that structured variation related to the underlying biological samples is separated from the remaining, bias-related sources of systematic variation. As a consequence, the methodology does not require any explicit knowledge regarding the presence or characteristics of certain biases. Furthermore, there is no underlying assumption that the majority of elements should be non-differentially expressed, making it applicable to specialized boutique arrays.

Tài liệu tham khảo

Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995, 270 (5235): 467-470. 10.1126/science.270.5235.467. Iyer VR, Eisen MB, Ross DT, Schuler G, Moore T, Lee JC, Trent JM, Staudt LM, Hudson J, Boguski MS, Lashkari D, Shalon D, Botstein D, Brown PO: The transcriptional program in the response of human fibroblasts to serum. Science. 1999, 283 (5398): 83-87. 10.1126/science.283.5398.83. Moreau C, Aksenov N, Lorenzo MG, Segerman B, Funk C, Nilsson P, Jansson S, Tuominen H: A genomic approach to investigate developmental cell death in woody tissues of Populus trees. Genome Biol. 2005, 6 (4): R34-10.1186/gb-2005-6-4-r34. Barrangou R, Azcarate-Peril MA, Duong T, Conners SB, Kelly RM, Klaenhammer TR: Global analysis of carbohydrate utilization by Lactobacillus acidophilus using cDNA microarrays. Proc Natl Acad Sci U S A. 2006, 103 (10): 3816-3821. 10.1073/pnas.0511287103. Hessner MJ, Wang X, Hulse K, Meyer L, Wu Y, Nye S, Guo SW, Ghosh S: Three color cDNA microarrays: quantitative assessment through the use of fluorescein-labeled probes. Nucleic Acids Res. 2003, 31 (4): e14-10.1093/nar/gng014. Zhao H, Wong RNS, Fang KT, Yue PYK: Use of three-color cDNA microarray experiments to assess the therapeutic and side effect of drugs. Chemometrics Intell Lab Syst. 2006, 82 (1-2): 31-36. 10.1016/j.chemolab.2005.06.021. Forster T, Costa Y, Roy D, Cooke HJ, Maratou K: Triple-target microarray experiments: a novel experimental strategy. BMC Genomics. 2004, 5 (1): 13-10.1186/1471-2164-5-13. Kerr MK, Martin M, Churchill GA: Analysis of variance for gene expression microarray data. J Comput Biol. 2000, 7 (6): 819-837. 10.1089/10665270050514954. Wolfinger RD, Gibson G, Wolfinger ED, Bennett L, Hamadeh H, Bushel P, Afshari C, Paules RS: Assessing gene significance from cDNA microarray expression data via mixed models. J Comput Biol. 2001, 8 (6): 625-637. 10.1089/106652701753307520. Wu W, Xing EP, Myers C, Mian IS, Bissell MJ: Evaluation of normalization methods for cDNA microarray data by k-NN classification. BMC Bioinformatics. 2005, 6: 191-10.1186/1471-2105-6-191. Quackenbush J: Microarray data normalization and transformation. Nat Genet. 2002, 32 Suppl: 496-501. 10.1038/ng1032. Yang YH, Dudoit S, Luu P, Speed TP: Normalization for cDNA microarray data. Microarrays: Optical Technologies and Informatics 2001, 4266:141-152 Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP: Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002, 30 (4): e15-10.1093/nar/30.4.e15. Futschik M, Crompton T: Model selection and efficiency testing for normalization of cDNA microarray data. Genome Biol. 2004, 5 (8): R60-10.1186/gb-2004-5-8-r60. Bolstad BM, Irizarry RA, Astrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003, 19 (2): 185-193. 10.1093/bioinformatics/19.2.185. Li C, Wong WH: Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biol. 2001, 2 (8): RESEARCH0032- Yang YH, Thorne NP: Normalization for two-color cDNA microarray data. Science and Statistics: A Festschrift for Terry Speed. Edited by: Goldstein DR. 2003, IMS Lecture Notes - Monograph Series, 40: 403-418. Huber W, von Heydebreck A, Sultmann H, Poustka A, Vingron M: Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics. 2002, 18 Suppl 1: S96-104. Huber W, von Heydebreck A, Sueltmann H, Poustka A, Vingron M: Parameter estimation for the calibration and variance stabilization of microarray data. Stat Appl Genet Mol Biol. 2003, 2 (1): Article3- Wold S, Antti H, Lindgren F, Öhman J: Orthogonal signal correction of near-infrared spectra. Chemometrics Intell Lab Syst. 1998, 44: 175-185. 10.1016/S0169-7439(98)00109-9. Trygg J, Wold S: Orthogonal projections to latent structures (O-PLS). J Chemometrics. 2002, 16: 119-128. 10.1002/cem.695. Wold S, Sjöström M, Eriksson L: PLS-regression: a basic tool of chemometrics. Chemometrics Intell Lab Syst. 2001, 58 (2): 109-130. 10.1016/S0169-7439(01)00155-1. Wold S: Cross Validatory Estimation of the Number of Components in Factor and Principal Components Models. Technometrics. 1978, 20: 397-406. 10.2307/1267639. Trygg J: O2-PLS for qualitative and quantitative analysis in multivariate calibration. J Chemometrics. 2002, 16: 283-293. 10.1002/cem.724. Shao J: Linear-Model Selection by Cross-Validation. J Am Stat Assoc. 1993, 88 (422): 486-494. 10.2307/2290328. Smyth GK, Michaud J, Scott HS: Use of within-array replicate spots for assessing differential expression in microarray experiments. Bioinformatics. 2005, 21 (9): 2067-2075. 10.1093/bioinformatics/bti270. Affymetrix sample data set repository. [http://www.affymetrix.com/support/technical/sample_data/datasets.affx] Oshlack A, Emslie D, Corcoran L, Smyth GK: Normalization of boutique two-color microarrays with a high proportion of differentially expressed probes. Genome Biol. 2007, 8 (1): R2-10.1186/gb-2007-8-1-r2. van Bakel H, Holstege FC: In control: systematic assessment of microarray performance. EMBO Rep. 2004, 5 (10): 964-969. 10.1038/sj.embor.7400253. Martens H, Naes T: Multivariate Calibration. 1992, Chichester , John Wiley & Sons Trygg J: Prediction and spectral profile estimation in multivariate calibration. J Chemometrics. 2004, 18: 166-172. 10.1002/cem.860. Bylesjö M, Rantalainen M, Cloarec O, Nicholson JK, Holmes E, Trygg J: OPLS discriminant analysis: combining the strengths of PLS-DA and SIMCA classification. J Chemometrics. 2006, 20: 341-351. 10.1002/cem.1006. Churchill GA: Fundamentals of experimental design for cDNA microarrays. Nat Genet. 2002, 32 Suppl: 490-495. 10.1038/ng1031. Woo Y, Krueger W, Kaur A, Churchill G: Experimental design for three-color and four-color gene expression microarrays. Bioinformatics. 2005, 21 Suppl 1: i459-i467. 10.1093/bioinformatics/bti1031. Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B. 1995, 57 (1): 289-300. The R project for statistical computing. [http://www.r-project.org/]