Using chemometrics for navigating in the large data sets of genomics, proteomics, and metabonomics (gpm)
Tóm tắt
This article describes the applicability of multivariate projection techniques, such as principal-component analysis (PCA) and partial least-squares (PLS) projections to latent structures, to the large-volume high-density data structures obtained within genomics, proteomics, and metabonomics. PCA and PLS, and their extensions, derive their usefulness from their ability to analyze data with many, noisy, collinear, and even incomplete variables in both X and Y. Three examples are used as illustrations: the first example is a genomics data set and involves modeling of microarray data of cell cycle-regulated genes in the microorganism Saccharomyces cerevisiae. The second example contains NMR-metabonomics data, measured on urine samples of male rats treated with either of the drugs chloroquine or amiodarone. The third and last data set describes sequence-function classification studies in a set of G-protein-coupled receptors using hierarchical PCA.
Tài liệu tham khảo
http://www.nobel.se/chemistry/laureates/2002/chemadv02.pdf (Cited 5 December 2003)
Lockhart DJ, Winzeler EA (2000) Nature 405:827–836
Nicholson JK, Connelly J, Lindon JC, Holmes E (2002) Metabonomics: A Platform for Studying Drug Toxicity and Gene Function. Nat Rev 1:153–161
Jackson JE (1991) A user’s guide to principal components. Wiley, New York (ISBN 0-471-62267-2)
Martens H, Naes T (1989) Multivariate calibration. Wiley, NY, ISBN 0-471-90979-3
Wold S, Esbensen K, Geladi P (1987) Chemom Intel Lab Syst 2:37–52
Wold S, Albano C, Dunn WJ, Edlund U, Esbensen K, Geladi P, Hellberg S, Johansson E, Lindberg W, Sjöström M (1984) In: Kowalski BR (ed) Chemometrics: mathematics and statistics in chemistry, D. Reidel Publishing Company, Dordrecht
Sjöström M, Wold S, Söderström B (1985) PLS Discriminant Plots. In: Proceedings of PARC in Practice, Amsterdam
Kalivas JH (1999) J Chemom 13:111–132
Wold S, Johansson E, Cocchi M (1993) In: Kubinyi H (ed) 3D-QSAR in drug design, theory, methods, and applications. ESCOM Science Publishers, Leiden, pp 523–550
Burnham, AJ, Viveros R, MacGregor JF (1996) J Chemom 10:31–45
Burnham, AJ, MacGregor JF, Viveros R (1999) Chemom Intel Lab Syst 48:167–180
Eriksson L, Johansson E, Kettaneh-Wold N, Wold S (2001) Multi- and megavariate data analysis—principles and applications. Umetrics AB. ISBN 91-973730-1-X
Berglund A, De Rosa MC, Wold S (1997) J Comput Aid Mol Des 11:601–612
Westerhuis J, Kourti T, MacGregor JF (1998) J Chemom 12:301–321
Wold S, Kettaneh N, Tjessem K (1996) J Chemom 10:463–482
Janné K, Pettersen J, Lindberg NO, Lundstedt T (2001) J Chemom 15:203–213
Eriksson L, Johansson E, Lindgren F, Sjöström M, Wold S (2002) J Comput Aided Mol Des 16:711–726
The data are taken from the web-site http://cellcycle-www.stanford.edu. Cited 25 March 2003
Spellman et al. (1998) Mol Biol Cell 9:3273–3297.
Cho RJ et al. (1998) Mol Cell 2:65–73
Johansson D, Lindgren P (2002) Masters Thesis in Bioinformatics, Umeå University,
Espina JR, Shockcor JP, Herron WJ, Car BD, Contel NR, Ciaccio PJ, Lindon JC, Holmes E, Nicholson JK (2001) Magn Reson Chem 39:559–565
Eriksson L, Antti H, Holmes E, Johansson E Multi- and Megavariate Data Analysis: Finding and Using Regularities in Metabonomics Data. In: Robertson DG (Ed) Toxicological metabonomics: the use of NMR spectroscopy and multivariate statistics in drug safety evaluation. Kluwer, Dordrecht
Gunnarsson I, Andersson PM, Wikberg J, Lundstedt T (2003) J Chemom 17:82–92
Sandberg M, Eriksson L, Jonsson J, Sjöström M, Wold S (1988) J Med Chem 41:2481–2491
Eriksson L, Andersson PM, Johansson E, Lundstedt T (2002) Statistical molecular design—a core concept in multivariate qsar and combinatorial technologies. Part I—Basic principles and application to lead optimization. Part II—QSAR applications. Part III—QSAR-directed virtual screening. Part IV—SMD: an integral part of combC and HTS. Part V—Some extensions and recent developments. http://www.acc.umu.se/%7Etnkjtg/chemometrics/editorial/. cited 19 December 2003
http://www.umetrics.com
Wold S (1978) Technometrics 20:397–405
Trygg J (2001) PhD Thesis. Umeå University,
Ståhle L, Wold S (1987) J Chemom 1:185–196
Barker M, Rayens W (2003) J Chemom 17:166–173
Atif U, Earll, Eriksson L, Johansson E, Lord P, Margrett S (2002) Analysis of gene expression datasets using partial least-squares discriminant analysis and principal-component analysis. In: Martyn Ford, David Livingstone, John Dearden and Han Van de Waterbeemd (eds) Euro QSAR 2002 designing drugs and crop protectants: processes, problems and solutions. Blackwell, Oxford, pp 369–373 ISBN 1-4051-2561-0.
Wold S, Trygg J, Berglund A, Antti H (2001) Chemom Intell Lab Syst 58:131–150
Kristal BS (2002) Practical considerations and approaches for entry-level megavariate analysis. http://mickey.utmem.edu/papers/bioinformatics_02/pdfs/Kristal.pdf. cited 5 February 2004
Box GEP, Hunter WG, Hunter JS (1978) Statistics for experimenters. Wiley, New York
Eriksson L, Johansson E, Kettaneh-Wold N, Wikström C, Wold S (2000) Design of experiments—principles and applications. Umetrics AB, 2000. ISBN 91-973730-0-1
Olsson I, Gottfries J, Wold S, D-optimal onion design (DOOD) in statistical molecular design, chemometrics and intelligent laboratory systems. Chemom Intell Lab Syst 73:37–46
Eriksson L, Arnhold T, Beck B, Fox T, Johansson E, Kriegl JM (2004) Onion design and its application to a pharmaceutical QSAR problem. J Chemom 18:188–202
Wold S, Antti H, Lindgren F, Öhman J (1998) Chemom Intell Lab Syst 44:175–185
Trygg J, Wold S (1998) Chemom Intell Lab Syst 42:209–220
Wold S, Kettaneh-Wold N, Skagerberg B (1989) Chemom Intell Lab Syst 7:53–65
Wold S (1992) Chemom Intell Lab Syst 14:71–84
Eriksson L, Johansson E, Lindgren F, Wold S (2000) Quant Struct Act Relat 19:345–355
Berglund A, Wold S (1997) J Chemom 11:141–156
Wold S, Hellberg S, Lundstedt T, Sjöström M, Wold H (1987) PLS modeling with latent variables in two or more dimensions. In: Proceedings Frankfurt PLS-meeting, September
Eriksson L, Damborsky J, Earll M, Johansson E, Trygg J, Wold S (2004) SAR & QSAR Env. Res. 15 ( In press)
Wold S, Kettaneh N, Fridén H, Holmberg A (1998) Chemom Intell Lab Syst 44:331–340
Antti H, Bollard ME, Ebbels T, Keun H, Lindon JC, Nicholson JK, Holmes E (2002) J Chemom 16:461–468
Wold S, Geladi P, Esbensen K, Öhman J (1987) J Chemom 1:41–56
Nomikos P, MacGregor JF (1995) Chemom Intell Lab Syst 30:97–108