Using chemometrics for navigating in the large data sets of genomics, proteomics, and metabonomics (gpm)

Springer Science and Business Media LLC - Tập 380 - Trang 419-429 - 2004
Lennart Eriksson1, Henrik Antti2,3, Johan Gottfries4,3, Elaine Holmes2, Erik Johansson1, Fredrik Lindgren5, Ingrid Long6, Torbjörn Lundstedt6, Johan Trygg3, Svante Wold3
1Umetrics AB, Umeå, Sweden
2Biological Chemistry, Biomedical Sciences Division, Faculty of Medicine, Imperial College of Science Technology and Medicine, London, UK
3Institute of Chemistry, Umeå University, Umeå, Sweden
4AstraZeneca R&D Mölndal, Mölndal, Sweden
5Umetrics AB, Malmö Office, Malmö, Sweden
6Department of Pharmaceutical Chemistry, Uppsala University, Uppsala, Sweden

Tóm tắt

This article describes the applicability of multivariate projection techniques, such as principal-component analysis (PCA) and partial least-squares (PLS) projections to latent structures, to the large-volume high-density data structures obtained within genomics, proteomics, and metabonomics. PCA and PLS, and their extensions, derive their usefulness from their ability to analyze data with many, noisy, collinear, and even incomplete variables in both X and Y. Three examples are used as illustrations: the first example is a genomics data set and involves modeling of microarray data of cell cycle-regulated genes in the microorganism Saccharomyces cerevisiae. The second example contains NMR-metabonomics data, measured on urine samples of male rats treated with either of the drugs chloroquine or amiodarone. The third and last data set describes sequence-function classification studies in a set of G-protein-coupled receptors using hierarchical PCA.

Tài liệu tham khảo

http://www.nobel.se/chemistry/laureates/2002/chemadv02.pdf (Cited 5 December 2003) Lockhart DJ, Winzeler EA (2000) Nature 405:827–836 Nicholson JK, Connelly J, Lindon JC, Holmes E (2002) Metabonomics: A Platform for Studying Drug Toxicity and Gene Function. Nat Rev 1:153–161 Jackson JE (1991) A user’s guide to principal components. Wiley, New York (ISBN 0-471-62267-2) Martens H, Naes T (1989) Multivariate calibration. Wiley, NY, ISBN 0-471-90979-3 Wold S, Esbensen K, Geladi P (1987) Chemom Intel Lab Syst 2:37–52 Wold S, Albano C, Dunn WJ, Edlund U, Esbensen K, Geladi P, Hellberg S, Johansson E, Lindberg W, Sjöström M (1984) In: Kowalski BR (ed) Chemometrics: mathematics and statistics in chemistry, D. Reidel Publishing Company, Dordrecht Sjöström M, Wold S, Söderström B (1985) PLS Discriminant Plots. In: Proceedings of PARC in Practice, Amsterdam Kalivas JH (1999) J Chemom 13:111–132 Wold S, Johansson E, Cocchi M (1993) In: Kubinyi H (ed) 3D-QSAR in drug design, theory, methods, and applications. ESCOM Science Publishers, Leiden, pp 523–550 Burnham, AJ, Viveros R, MacGregor JF (1996) J Chemom 10:31–45 Burnham, AJ, MacGregor JF, Viveros R (1999) Chemom Intel Lab Syst 48:167–180 Eriksson L, Johansson E, Kettaneh-Wold N, Wold S (2001) Multi- and megavariate data analysis—principles and applications. Umetrics AB. ISBN 91-973730-1-X Berglund A, De Rosa MC, Wold S (1997) J Comput Aid Mol Des 11:601–612 Westerhuis J, Kourti T, MacGregor JF (1998) J Chemom 12:301–321 Wold S, Kettaneh N, Tjessem K (1996) J Chemom 10:463–482 Janné K, Pettersen J, Lindberg NO, Lundstedt T (2001) J Chemom 15:203–213 Eriksson L, Johansson E, Lindgren F, Sjöström M, Wold S (2002) J Comput Aided Mol Des 16:711–726 The data are taken from the web-site http://cellcycle-www.stanford.edu. Cited 25 March 2003 Spellman et al. (1998) Mol Biol Cell 9:3273–3297. Cho RJ et al. (1998) Mol Cell 2:65–73 Johansson D, Lindgren P (2002) Masters Thesis in Bioinformatics, Umeå University, Espina JR, Shockcor JP, Herron WJ, Car BD, Contel NR, Ciaccio PJ, Lindon JC, Holmes E, Nicholson JK (2001) Magn Reson Chem 39:559–565 Eriksson L, Antti H, Holmes E, Johansson E Multi- and Megavariate Data Analysis: Finding and Using Regularities in Metabonomics Data. In: Robertson DG (Ed) Toxicological metabonomics: the use of NMR spectroscopy and multivariate statistics in drug safety evaluation. Kluwer, Dordrecht Gunnarsson I, Andersson PM, Wikberg J, Lundstedt T (2003) J Chemom 17:82–92 Sandberg M, Eriksson L, Jonsson J, Sjöström M, Wold S (1988) J Med Chem 41:2481–2491 Eriksson L, Andersson PM, Johansson E, Lundstedt T (2002) Statistical molecular design—a core concept in multivariate qsar and combinatorial technologies. Part I—Basic principles and application to lead optimization. Part II—QSAR applications. Part III—QSAR-directed virtual screening. Part IV—SMD: an integral part of combC and HTS. Part V—Some extensions and recent developments. http://www.acc.umu.se/%7Etnkjtg/chemometrics/editorial/. cited 19 December 2003 http://www.umetrics.com Wold S (1978) Technometrics 20:397–405 Trygg J (2001) PhD Thesis. Umeå University, Ståhle L, Wold S (1987) J Chemom 1:185–196 Barker M, Rayens W (2003) J Chemom 17:166–173 Atif U, Earll, Eriksson L, Johansson E, Lord P, Margrett S (2002) Analysis of gene expression datasets using partial least-squares discriminant analysis and principal-component analysis. In: Martyn Ford, David Livingstone, John Dearden and Han Van de Waterbeemd (eds) Euro QSAR 2002 designing drugs and crop protectants: processes, problems and solutions. Blackwell, Oxford, pp 369–373 ISBN 1-4051-2561-0. Wold S, Trygg J, Berglund A, Antti H (2001) Chemom Intell Lab Syst 58:131–150 Kristal BS (2002) Practical considerations and approaches for entry-level megavariate analysis. http://mickey.utmem.edu/papers/bioinformatics_02/pdfs/Kristal.pdf. cited 5 February 2004 Box GEP, Hunter WG, Hunter JS (1978) Statistics for experimenters. Wiley, New York Eriksson L, Johansson E, Kettaneh-Wold N, Wikström C, Wold S (2000) Design of experiments—principles and applications. Umetrics AB, 2000. ISBN 91-973730-0-1 Olsson I, Gottfries J, Wold S, D-optimal onion design (DOOD) in statistical molecular design, chemometrics and intelligent laboratory systems. Chemom Intell Lab Syst 73:37–46 Eriksson L, Arnhold T, Beck B, Fox T, Johansson E, Kriegl JM (2004) Onion design and its application to a pharmaceutical QSAR problem. J Chemom 18:188–202 Wold S, Antti H, Lindgren F, Öhman J (1998) Chemom Intell Lab Syst 44:175–185 Trygg J, Wold S (1998) Chemom Intell Lab Syst 42:209–220 Wold S, Kettaneh-Wold N, Skagerberg B (1989) Chemom Intell Lab Syst 7:53–65 Wold S (1992) Chemom Intell Lab Syst 14:71–84 Eriksson L, Johansson E, Lindgren F, Wold S (2000) Quant Struct Act Relat 19:345–355 Berglund A, Wold S (1997) J Chemom 11:141–156 Wold S, Hellberg S, Lundstedt T, Sjöström M, Wold H (1987) PLS modeling with latent variables in two or more dimensions. In: Proceedings Frankfurt PLS-meeting, September Eriksson L, Damborsky J, Earll M, Johansson E, Trygg J, Wold S (2004) SAR & QSAR Env. Res. 15 ( In press) Wold S, Kettaneh N, Fridén H, Holmberg A (1998) Chemom Intell Lab Syst 44:331–340 Antti H, Bollard ME, Ebbels T, Keun H, Lindon JC, Nicholson JK, Holmes E (2002) J Chemom 16:461–468 Wold S, Geladi P, Esbensen K, Öhman J (1987) J Chemom 1:41–56 Nomikos P, MacGregor JF (1995) Chemom Intell Lab Syst 30:97–108