Statistical considerations and database limitations in NMR-based metabolic profiling studies

Metabolomics - Tập 19 - Trang 1-13 - 2023
Imani L. Ross1, Julie A. Beardslee2, Maria M. Steil3, Tafadzwa Chihanga4, Michael A. Kennedy2
1Department of Chemistry and Biochemistry, University of California, San Diego, USA
2Department of Chemistry and Biochemistry, Miami University, Oxford, USA
3Division of Plastic Surgery, University of Texas Medical Branch, Galveston, USA
4Division of Oncology, Cincinnati Children’s Hospital Medical Center, Cincinnati, USA

Tóm tắt

Interpretation and analysis of NMR-based metabolic profiling studies is limited by substantially incomplete commercial and academic databases. Statistical significance tests, including p-values, VIP scores, AUC values and FC values, can be largely inconsistent. Data normalization prior to statistical analysis can cause erroneous outcomes. The objectives were (1) to quantitatively assess consistency among p-values, VIP scores, AUC values and FC values in representative NMR-based metabolic profiling datasets, (2) to assess how data normalization can impact statistical significance outcomes, (3) to determine resonance peak assignment completion potential using commonly used databases and (4) to analyze intersection and uniqueness of metabolite space in these databases. P-values, VIP scores, AUC values and FC values, and their dependence on data normalization, were determined in orthotopic mouse model of pancreatic cancer and two human pancreatic cancer cell lines. Completeness of resonance assignments were evaluated using Chenomx, the human metabolite database (HMDB) and the COLMAR database. The intersection and uniqueness of the databases was quantified. P-values and AUC values were strongly correlated compared to VIP or FC values. Distributions of statistically significant bins depended strongly on whether or not datasets were normalized. 40–45% of peaks had either no or ambiguous database matches. 9–22% of metabolites were unique to each database. Lack of consistency in statistical analyses of metabolomics data can lead to misleading or inconsistent interpretation. Data normalization can have large effects on statistical analysis and should be justified. About 40% of peak assignments remain ambiguous or impossible with current databases. 1D and 2D databases should be made consistent to maximize metabolite assignment confidence and validation.

Tài liệu tham khảo