Translational biomarker discovery in clinical metabolomics: an introductory tutorial

Jianguo Xia1, David Broadhurst2, Michael Wilson3, David S. Wishart3
1Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada
2Department of Medicine, University of Alberta, Edmonton, AB, Canada
3Department of Computing Science, University of Alberta, Edmonton, AB, Canada

Tóm tắt

Từ khóa


Tài liệu tham khảo

Ambroise, C., & McLachlan, G. J. (2002). Selection bias in gene extraction on the basis of microarray gene-expression data. Proceedings of the National Academy of Sciences of the United States of America, 99(10), 6562–6566.

Arkin, C. F., & Wachtel, M. S. (1990). How many patients are necessary to assess test performance? JAMA: The Journal of the American Medical Association, 263(2), 275–278.

Atkinson, A. J., Colburn, W. A., DeGruttola, V. G., DeMets, D. L., Downing, G. J., Hoth, D. F., et al. (2001). Biomarkers and surrogate endpoints: Preferred definitions and conceptual framework*. Clinical Pharmacology and Therapeutics, 69(3), 89–95.

Bahado-Singh, R. O., Akolekar, R., Mandal, R., Dong, E., Xia, J., Kruger, M., et al. (2012) Metabolomics and first-trimester prediction of early-onset preeclampsia. Journal of Maternal, Fetal and Neonatal Medicine, 25(10), 1840–1847.

Bamber, D. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology, 12(4), 387–415.

Barker, M., & Rayens, W. (2003). Partial least squares for discrimination. Journal of Chemometrics, 17(3), 166–173.

Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological), 57(1), 289–300.

Berrar, D., & Flach, P. (2010). Caveats and pitfalls of ROC analysis in clinical microarray research (and how to avoid them). Briefings in Bioinformatics, 13(1), 83–97.

Bijlsma, S., Bobeldijk, I., Verheij, E. R., Ramaker, R., Kochhar, S., Macdonald, I. A., et al. (2006). Large-scale human metabolomics studies: A strategy for data (pre-) processing and validation. Analytical Chemistry, 78(2), 567–574.

Bolstad, B. M., Irizarry, R. A., Astrand, M., & Speed, T. P. (2003). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19(2), 185–193.

Bourgon, R., Gentleman, R., & Huber, W. (2010). Independent filtering increases detection power for high-throughput experiments. Proceedings of the National Academy of Sciences of the United States of America, 107(21), 9546–9551.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

Broadhurst, D., & Kell, D. B. (2006). Statistical strategies for avoiding false discoveries in metabolomics and related experiments. Metabolomics, 2(4), 171–196.

Broadhurst, D., Goodacre, R., Jones, A., Rowland, J. J., & Kell, D. B. (1997). Genetic algorithms as a method for variable selection in multiple linear regression and partial least squares regression, with applications to pyrolysis mass spectrometry. Analytica Chimica Acta, 348(1–3), 71–86.

Carpenter, J., & Bithell, J. (2000). Bootstrap confidence intervals: When, which, what? A practical guide for medical statisticians. Statistics in Medicine, 19(9), 1141–1164.

Chace, D. H. (2001). Mass spectrometry in the clinical laboratory. Chemical Reviews, 101(2), 445–477.

Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.

Dieterle, F., Ross, A., Schlotterbeck, G., & Senn, H. (2006). Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabolomics. Analytical Chemistry, 78(13), 4281–4290.

Dodd, L. E., & Pepe, M. S. (2003). Partial AUC Estimation and Regression. Biometrics, 59(3), 614–623.

Dunn, W. B., Broadhurst, D. I., Atherton, H. J., Goodacre, R., & Griffin, J. L. (2011). Systems level studies of mammalian metabolomes: The roles of mass spectrometry and nuclear magnetic resonance spectroscopy. Chemical Society Reviews, 40(1), 387–426.

Dunn, W. B., Wilson, I. D., Nicholls, A. W., & Broadhurst, D. (2012). The importance of experimental design and QC samples in large-scale and MS-driven untargeted metabolomic studies of humans. Bioanalysis, 4(18), 2249–2264.

Efron, B. (1987). Better bootstrap confidence intervals. Journal of the American Statistical Association, 82(397), 171–185.

Efron, B., & Tibshirani, R. (1997). Improvements on cross-validation: The.632+ bootstrap method. Journal of the American Statistical Association, 92(438), 548–560.

Eng, J. (2003). Sample size estimation: How many individuals should be studied? Radiology, 227(2), 309–313.

Eng, J. (2004). Sample size estimation: A glimpse beyond simple formulas. Radiology, 230(3), 606–612.

Eriksson, L., Johansson, E., Kettaneh-Wold, N., & Wold, S. (2001). Multi- and Megavariate Data Analysis Principles and Applications. Dublin: Umetrics Academy.

Filzmoser, P., Liebmann, B., & Varmuza, K. (2009). Repeated double cross validation. Journal of Chemometrics, 23(4), 160–171.

Gao, J., Tarcea, V. G., Karnovsky, A., Mirel, B. R., Weymouth, T. E., Beecher, C. W., et al. (2010). Metscape: A Cytoscape plug-in for visualizing and interpreting metabolomic data in the context of human metabolic networks. Bioinformatics, 26(7), 971–973.

Good, P.I. (2011). Permutation tests. In Analyzing the Large Number of Variables in Biomedical and Satellite Imagery (pp. 5–21). New York: Wiley.

Hackstadt, A. J., & Hess, A. M. (2009). Filtering for increased power for microarray data analysis. BMC Bioinformatics, 10, 11.

Handl, J., Kell, D. B., & Knowles, J. (2007). Multiobjective optimization in bioinformatics and computational biology. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 4(2), 279–292.

Isabelle, G., & Andr, E. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.

Jarvis, R. M., & Goodacre, R. (2005). Genetic algorithm optimization for pre-processing and variable selection of spectroscopic data. Bioinformatics, 21(7), 860–868.

Kankainen, M., Gopalacharyulu, P., Holm, L., & Oresic, M. (2011). MPEA–metabolite pathway enrichment analysis. Bioinformatics, 27(13), 1878–1879.

Knowles, J.D., Watson, R.A., & Corne, D. (2001). Reducing local optima in single-objective problems by multi-objectivization. Paper presented at the Proceedings of the 1st International Conference on Evolutionary Multi-Criterion Optimization.

Kohl, S. M., Klein, M. S., Hochrein, J., Oefner, P. J., Spang, R., & Gronwald, W. (2012). State-of-the art data normalization methods improve NMR-based metabolomic analysis. Metabolomics, 8(Suppl 1), 146–160.

Lasko, T. A., Bhagwat, J. G., Zou, K. H., & Ohno-Machado, L. (2005). The use of receiver operating characteristic curves in biomedical informatics. Journal of Biomedical Informatics, 38(5), 404–415.

Liebmann, B., Filzmoser, P., & Varmuza, K. (2010). Robust and classical PLS regression compared. Journal of Chemometrics, 24(3–4), 111–120.

McClish, D. K. (1989). Analyzing a portion of the ROC curve. Medical Decision Making, 9(3), 190–195.

Miki, Y., Swensen, J., Shattuck-Eidens, D., Futreal, P. A., Harshman, K., Tavtigian, S., et al. (1994). A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. Science, 266(5182), 66–71.

Newby, L. K., Storrow, A. B., Gibler, W. B., Garvey, J. L., Tucker, J. F., Kaplan, A. L., et al. (2001). Bedside multimarker testing for risk stratification in chest pain units: The chest pain evaluation by creatine kinase-MB, myoglobin, and troponin I (CHECKMATE) study. Circulation, 103(14), 1832–1837.

Noble, W. S. (2009). How does multiple testing correction work? Nature Biotechnology, 27(12), 1135–1137.

Obuchowski, N. A., Lieber, M. L., & Wians, F. H. (2004). ROC curves in clinical chemistry: Uses, misuses, and possible solutions. Clinical Chemistry, 50(7), 1118–1125.

Pepe, M. S., Etzioni, R., Feng, Z. D., Potter, J. D., Thompson, M. L., Thornquist, M., et al. (2001). Phases of biomarker development for early detection of cancer. Journal of the National Cancer Institute, 93(14), 1054–1061.

Picard, R. R., & Cook, R. D. (1984). Cross-validation of regression models. Journal of the American Statistical Association, 79(387), 575–583.

Polascik, T. J., Oesterling, J. E., & Partin, A. W. (1999). Prostate specific antigen: A decade of discovery–what we have learned and where we are going. Journal of Urology, 162(2), 293–306.

Rothman, K. J., & Greenland, S. (1998). Modern Epidemiology (2nd ed. ed.). Philadelphia: Lippincott Williams & Wilkins.

Saeys, Y., Inza, I., & Larranaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23(19), 2507–2517.

Sansone, S. A., Rocca-Serra, P., Field, D., Maguire, E., Taylor, C., Hofmann, O., et al. (2012). Toward interoperable bioscience data. Nature Genetics, 44(2), 121–126.

Smit, S., van Breemen, M.L.J., Hoefsloot, H.C.J., Smilde, A.K., Aerts, J.M.F.G., & de Koster, C.G. (2007). Assessing the statistical validity of proteomics based biomarkers. Analytica Chimica Acta, 592(2), 210–217.

Soreide, K. (2009). Receiver-operating characteristic curve analysis in diagnostic, prognostic and predictive biomarker research. Journal of Clinical Pathology, 62(1), 1–5.

Szymanska, E., Saccenti, E., Smilde, A. K., & Westerhuis, J. A. (2012). Double-check: Validation of diagnostic statistics for PLS-DA models in metabolomics studies. Metabolomics, 8(Suppl 1), 3–16.

Trygg, J., Holmes, E., & Lundstedt, T. (2007). Chemometrics in metabonomics. Journal of Proteome Research, 6(2), 469–479.

van den Berg, R. A., Hoefsloot, H. C., Westerhuis, J. A., Smilde, A. K., & van der Werf, M. J. (2006). Centering, scaling, and transformations: Improving the biological information content of metabolomics data. BMC Genomics, 7, 142.

Walter, S. D. (2005). The partial area under the summary ROC curve. Statistics in Medicine, 24(13), 2025–2040.

Westerhuis, J. A., Hoefsloot, H. C. J., Smit, S., Vis, D. J., Smilde, A. K., van Velzen, E. J. J., et al. (2008). Assessment of PLSDA cross validation. Metabolomics, 4(1), 81–89.

Wilcken, B., Wiley, V., Hammond, J., & Carpenter, K. (2003). Screening newborns for inborn errors of metabolism by tandem mass spectrometry. New England Journal of Medicine, 348(23), 2304–2312.

Xia, J., & Wishart, D. S. (2010a). MetPA: A web-based metabolomics tool for pathway analysis and visualization. Bioinformatics, 26(18), 2342–2344.

Xia, J., & Wishart, D. S. (2010b). MSEA: A web-based tool to identify biologically meaningful patterns in quantitative metabolomic data. Nucleic Acids Research, 38, W71–W77.

Xia, J., & Wishart, D. S. (2011). Web-based inference of biological patterns, functions and pathways from metabolomic data using MetaboAnalyst. Nature Protocols, 6(6), 743–760.

Youden, W. J. (1950). Index for rating diagnostic tests. Cancer, 3(1), 32–35.

Zou, K. H., Hall, W. J., & Shapiro, D. E. (1997). Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests. Statistics in Medicine, 16(19), 2143–2156.

Zweig, M. H., & Campbell, G. (1993). Receiver-operating characteristic (ROC) plots: A fundamental evaluation tool in clinical medicine. Clinical Chemistry, 39(4), 561–577.