Statistical quality assessment and outlier detection for liquid chromatography-mass spectrometry experiments
Tóm tắt
Quality assessment methods, that are common place in engineering and industrial production, are not widely spread in large-scale proteomics experiments. But modern technologies such as Multi-Dimensional Liquid Chromatography coupled to Mass Spectrometry (LC-MS) produce large quantities of proteomic data. These data are prone to measurement errors and reproducibility problems such that an automatic quality assessment and control become increasingly important. We propose a methodology to assess the quality and reproducibility of data generated in quantitative LC-MS experiments. We introduce quality descriptors that capture different aspects of the quality and reproducibility of LC-MS data sets. Our method is based on the Mahalanobis distance and a robust Principal Component Analysis. We evaluate our approach on several data sets of different complexities and show that we are able to precisely detect LC-MS runs of poor signal quality in large-scale studies.
Tài liệu tham khảo
Mann M, Aebersold R: Mass spectrometry-based proteomics. Nature 422. 2003, 422: 198-207.
Cappadona S, Levander F, Jansson M, James P, Cerutti S, Pattini L: Wavelet-Based Method for Noise Characterization and Rejection in High-Performance Liquid Chromatography Coupled to Mass Spectrometry. Analytical Chemistry. 2008
Leptos KC, Sarracino DA, Jaffe JD, Krastins B, Church GM: MapQuant: Open-Source software for large-scale protein quantification. Proteomics. 2006, 6 (6): 1770-1782.
Schulz-Trieglaff O, Hussong R, Gröpl C, Hildebrandt A, Reinert K: A fast and accurate algorithm for the quantification of peptides from LC-MS data. Research in Computational Molecular Biology, 11th Annual International Conference, RECOMB 2007, Oakland, CA, USA, April 21–25, 2007, Proceedings, of Lecture Notes in Computer Science. Edited by: Speed TP, Huang H. 2007, Springer, 4453: 473-487.
Mayr BM, Kohlbacher O, Reinert K, Sturm M, Gröpl C, Lange E, Klein C, Huber C: Absolute Myoglobin Quantitation in Serum by Combining Two-Dimensional Liquid Chromatography-Electrospray Ionization Mass Spectrometry and Novel Data Analysis Algorithms. J Proteome Res. 2006, 5: 414-421.
Bern M, Goldberg D, McDonald WH, Yates I, John R: Automatic Quality Assessment of Peptide Tandem Mass Spectra. Bioinformatics. 2004, 20: i49-54.
Choo K, Tham W: Tandem mass spectrometry data quality assessment by self-convolution. BMC Bioinformatics. 2007, 8: 352-
Na S, Paek E: Quality Assessment of Tandem Mass Spectra Based on Cumulative Intensity Normalization. Journal of Proteome Research. 2006, 5 (12): 3241-3248.
Nesvizhskii AI, Roos FF, Grossmann J, Vogelzang M, Eddes JS, Gruissem W, Baginsky S, Aebersold R: Dynamic Spectrum Quality Assessment and Iterative Computational Analysis of Shotgun Proteomic Data: Toward More Efficient Identification of Post-translational Modifications, Sequence Polymorphisms, and Novel Peptides. Mol Cell Proteomics. 2006, 5 (4): 652-670.
Moore RE, Young MK, Lee TD: Method for screening peptide fragment ion mass spectra prior to database searching. Journal of the American Society for Mass Spectrometry. 2000, 11 (5): 422-426.
Xu M, Geer L, Bryant S, Roth J, Kowalak J, Maynard D, Markey S: Assessing Data Quality of Peptide Mass Spectra Obtained by Quadrupole Ion Trap Mass Spectrometry. Journal of Proteome Research. 2005, 4 (2): 300-305.
Flikka K, Martens L, Vandekerckhove J, Gevaert K, Eidhammer I: Improving the reliability and throughput of mass spectrometry-based proteomics by spectrum quality filtering. PROTEOMICS. 2006, 6 (7): 2086-2094.
Coombes KR, Fritsche J, Herbert A, Clarke C, Chen Jn, Baggerly KA, Morris JS, Xiao Lc, Hung MC, Kuerer HM: Quality Control and Peak Finding for Proteomics Data Collected from Nipple Aspirate Fluid by Surface-Enhanced Laser Desorption and Ionization. Clin Chem. 2003, 49 (10): 1615-1623.
Harezlak J, Wang M, Christiani D, Lin X: Quantitative quality-assessment techniques to compare fractionation and depletion methods in SELDI-TOF mass spectrometry experiments. Bioinformatics. 2007, 23 (18): 2441-2448.
Prakash A, Piening B, Whiteaker J, Zhang H, Shaffer SA, Martin D, Hohmann L, Cooke K, Olson JM, Hansen S, Flory MR, Lee H, Watts J, Goodlett DR, Aebersold R, Paulovich A, Schwikowski B: Assessing bias in experiment design for large-scale mass spectrometry-based quantitative proteomics. Mol Cell Proteomics. 2007, M600470-MCP200.
Whistler T, Rollin D, Vernon S: A method for improving SELDI-TOF mass spectrometry data quality. Proteome Science. 2007, 5: 14-
Listgarten J, Emili A: Statistical and computational methods for comparative proteomic profiling using liquid chromatography-tandem mass spectrometry. Mol Cell Proteomics. 2005, 4 (4): 419-434.
Stead DA, Paton NW, Missier P, Embury SM, Hedeler C, Jin B, Brown AJP, Preece A: Information quality in proteomics. Brief Bioinform. 2008, 9 (2): 174-188.
Brown CS, Goodwin PC, Sorger PK: Image metrics in the statistical analysis of DNA microarray data. Proceedings of the National Academy of Sciences. 2001, 98 (16): 8944-8949.
Cohen Freue GV, Hollander Z, Shen E, Zamar RH, Balshaw R, Scherer A, McManus B, Keown P, McMaster WR, Ng RT: MDQC: a new quality assessment method for microarrays based on quality control reports. Bioinformatics. 2007, 23 (23): 3162-3169.
Model F, Konig T, Piepenbrock C, Adorjan P: Statistical process control for large scale microarray experiments. Bioinformatics. 2002, 18: S155-163.
Windig W, Phalp J, Payne A: A Noise and Background Reduction Method for Component Detection in Liquid Chromatography/Mass Spectrometry. Analytical Chemistry. 1996, 68: 3602-3603.
Mahalanobis P: On the generalized distance in statistics. Proceedings of the National Institute of Science of India. 1936, 12: 49-55.
Fraser A, Hengartner N, Vixie K, Wohlberg B: Incorporating invariants in Mahalanobis distance based classifiers: application to face recognition. Proceedings of the International Joint Conference on Neural Networks. 2003, 4: 3118-3123.
Pearson K: On Lines and Planes of Closest Fit to Systems of Points in Space. Philosophical Magazine. 1901, 2: 559-572.
Croux C, Ruiz-Gazen A: A fast algorithm for robust principal components based on projection pursuit. COMPSTAT: Proceedings in Computational Statistics. Edited by: Prat A. 1996, Physica-Verlag, 211-216.
Hössjer O, Croux C: Generalizing univariate signed rank statistics for testing and estimating a multivariate location parameter. Journal of Nonparametric Statistics. 1995, 4 (3): 293-308.
Sturm M, Bertsch A, Groepl C, Hildebrandt A, Hussong R, Lange E, Pfeifer N, Schulz-Trieglaff O, Zerck A, Reinert K, Kohlbacher O: OpenMS – An open-source software framework for mass spectrometry. BMC Bioinformatics. 2008, 9:
Machtejevas E, Andrecht S, Lubda D, Unger KK: Monolithic silica columns of various format in automated sample clean-up/multidimensional liquid chromatography/mass spectrometry for peptidomics. Journal of Chromatography A. 2007, 1144: 97-101.
Schulz-Trieglaff O, Pfeifer N, Groepl C, Kohlbacher O, Reinert K: LC-MSsim: a simulation software for Mas Spectrometry-Liquid Chromatography Experiments. BMC Bioinformatics. 2008, 9: 423-
Kohlbacher O, Reinert K, Gröpl C, Lange E, Pfeifer N, Schulz-Trieglaff O, Sturm M: TOPP-the OpenMS proteomics pipeline. Bioinformatics. 2007, 23 (2): e191-197.