Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework

BMC Bioinformatics - Tập 13 Số 1 - 2012
Steven M. Lewis1, Attila Csordás2, Sarah Killcoyne3, Henning Hermjakob2, Michael R. Hoopmann1, Robert L. Moritz1, Eric W. Deutsch1, John P. Boyle1
1Institute for Systems Biology, Seattle, WA, USA
2PRIDE Group Proteomics Services Team EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK
3Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Luxembourg, Germany

Tóm tắt

Từ khóa


Tài liệu tham khảo

Mann M, Wilm M: Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal Chem 1994, 66(24):4390–4399. 10.1021/ac00096a002

Craig R, Beavis RC: TANDEM: matching proteins with tandem mass spectra. Bioinformatics 2004, 20(9):1466–1467. 10.1093/bioinformatics/bth092

Eng J, McCormack A, Yates J: An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 1994, 5(11):976–989. 10.1016/1044-0305(94)80016-2

Geer LY, et al.: Open mass spectrometry search algorithm. J proteome Res 2004, 3(5):958–964. 10.1021/pr0499491

Baumgardner L, et al.: Fast parallel tandem mass spectral library searching using GPU hardware acceleration. J Proteome Res 2011, 10(6):2882–2888. 10.1021/pr200074h

Bogdán I, et al.: High-performance hardware implementation of a parallel database search engine for real-time peptide mass fingerprinting. Bioinformatics 2008, 24(13):1498–1502. 10.1093/bioinformatics/btn216

Oh JH, Gao J: Peptide identification by tandem mass spectra: an efficient parallel searching, Bioinformatics and Bioengineering. Fifth IEEE Symposium on Bioinformatics and Bioengineering 2005, 161–168.

Cannon WR, et al.: Large improvements in MS/MS-based peptide identification rates using a hybrid analysis. J Proteome Res 2011, 10(5):2306–17. 10.1021/pr101130b

Duncan DT, Craig R, Link AJ: Parallel tandem: a program for parallel processing of tandem mass spectra using PVM or MPI and X!Tandem. J Proteome Res 2005, 4(5):1842–7. 10.1021/pr050058i

Bjornson RD, et al.: X!!Tandem, an improved method for running X!tandem in parallel on collections of commodity computers. J Proteome Res 2008, 7(1):293–9. 10.1021/pr0701198

White T: Hadoop: the definitive guide. Sebastopol: O’Reilly Media; 2009.

Dean J, Ghemawat S: MapReduce: simplified data processing on large clusters. Commun ACM 2008., 51(1):

Pratt B, et al.: MR-tandem: parallel X!tandem using hadoop MapReduce on amazon Web services. Bioinformatics 2012, 28(1):136–7. 10.1093/bioinformatics/btr615

MacLean B, et al.: General framework for developing and evaluating database scoring algorithms using the TANDEM search engine. Bioinformatics 2006, 22(22):2830–2. 10.1093/bioinformatics/btl379

Keller A, et al.: Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 2002, 74(20):5383–92. 10.1021/ac025747h

Keller A, et al.: A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol Syst Biol 2005, 1: 2005–0017.

The Global Proteome MachineAvailable from: http://www.thegpm.org/gpm/faq.html Available from:

Uniprot reference proteome 2012. http://www.uniprot.org/uniprot/?query=organism%3a9606+keyword%3a1185&format=%2A Available from:

Wang R, et al.: PRIDE Inspector: a tool to visualize and validate MS proteomics data. Nat Biotechnol 2012, 30(2):135–7. 10.1038/nbt.2112

Schatz MC: CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 2009, 25(11):1363–9. 10.1093/bioinformatics/btp236