Metabolite profile analysis: from raw data to regression and classification

Physiologia Plantarum - Tập 132 Số 2 - Trang 150-161 - 2008
Matthias Steinfath1, Detlef Groth2, Jan Lisec3, Joachim Selbig1
1BioinformaticsCRG, Cooperative Research Groups, Max Planck Institute of Molecular Plant Physiology, Max Planck Society
2BioinformaticsCIG, Infrastructure Groups and Service Units, Max Planck Institute of Molecular Plant Physiology, Max Planck Society
3Small Molecules, Department Willmitzer, Max Planck Institute of Molecular Plant Physiology, Max Planck Society

Tóm tắt

Successful metabolic profile analysis will aid in the fundamental understanding of physiology. Here, we present a possible analysis workflow. Initially, the procedure to transform raw data into a data matrix containing relative metabolite levels for each sample is described. Given that, because of experimental issues in the technical equipment, the levels of some metabolites cannot be universally determined or that different experiments need to be compared, missing value estimation and normalization are presented as helpful preprocessing steps. Regression methods are presented in this review as tools to relate metabolite levels with other physiological properties like biomass and gene expression. As the number of measured metabolites often exceeds the number of samples, dimensionality reduction methods are required. Two of these methods are discussed in detail in this review. Throughout this article, practical examples illustrating the application of the aforementioned methods are given. We focus on the uncovering the relationship between metabolism and growth‐related properties.

Từ khóa


Tài liệu tham khảo

10.1162/neco.2007.19.7.1962

10.1002/cem.785

Benjamini Y, 1995, Controlling the false discovery rate: a practical and powerful approach to multiple testing, J R Stat Soc, 57, 289

10.1049/cp:19991160

10.1136/bmj.310.6973.170

10.1111/j.1399-3054.2007.00990.x

10.1038/nprot.2007.95

10.1093/bioinformatics/btg315

10.1038/nrm1451

10.1023/A:1013713905833

10.1007/11530084_18

10.1016/j.tibtech.2004.03.007

10.1080/00401706.1969.10490657

10.1111/j.1469-8137.2005.01632.x

10.1007/978-94-015-3994-4

10.1093/biomet/28.3-4.321

10.1093/bioinformatics/bth158

Jolliffe IT, 2002, Principal Component Analysis

10.1021/ac050601e

10.1038/ng1815

10.1007/978-3-642-56927-2

Kovats, 1965, Gas Chromatographic Characterization of Organic Substances in the Retention Index System

10.1038/nprot.2006.59

10.1073/pnas.0609709104

10.1093/chromsci/45.4.169

10.1007/s11306-005-4430-9

10.1016/j.tplants.2004.07.006

10.1038/ng1032

10.1038/nbt1192

10.1093/bioinformatics/bth270

10.1093/bioinformatics/bti634

10.1146/annurev.ps.46.020195.003021

10.1021/ac051437y

10.1093/bioinformatics/btm069

10.1093/bioinformatics/btg120

10.1021/ac0614846

10.1002/pmic.200600520

10.1002/cem.695

Verbeek J, 2002, Coordinating Mixtures of Probabilistic Principal Component Analyzers

10.1146/annurev.arplant.54.031902.135014

10.1016/S1359-6446(05)03609-3

10.1073/pnas.0303415101

Wold H, 1975, Soft Modelling by Latent Variables