‘Big data’, Hadoop and cloud computing in genomics

Journal of Biomedical Informatics - Tập 46 Số 5 - Trang 774-781 - 2013
Aisling O’Driscoll1, Jurate Daugelaite2, Roy D. Sleator2
1Department of Computing, Cork Institute of Technology, Rossa Avenue, Bishopstown, Cork, Ireland
2Department of Biological Sciences, Cork Institute of Technology, Rossa Avenue, Bishopstown, Cork, Ireland

Tóm tắt

Từ khóa


Tài liệu tham khảo

Quail, 2012, A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers, BMC Genomics, 13, 341, 10.1186/1471-2164-13-341

Pollack, 2011

Moore, 1965, Cramming more components into integrated circuits, Electronics, 38, 4

Walter, 2005, Kryder’s Law Sci Am, 293, 32, 10.1038/scientificamerican0805-32

As We May Communicate. <http://www.tmcnet.com/articles/comsol/0100/0100pubout.htm>.

Loman, 2012, High-throughput bacterial genome sequencing: an embarrassment of choice, a world of opportunity, Nat Rev Microbiol, 10, 599, 10.1038/nrmicro2850

Davies, 2010

Mathe, 2002, Current methods of gene prediction, their strengths and weaknesses, Nucl Acids Res, 30, 4103, 10.1093/nar/gkf543

Stein, 2010, The case for cloud computing in genome informatics, Rev J: Genome Biol, 11, 207

Mason, 2012, Faster sequencers, larger datasets, new challenges, Genome Biol, 13, 314, 10.1186/gb-2012-13-3-314

Managing and Analysing 1,000,000 Genomes. <http://rgrossman.com/2012/09/18/million-genomes-challeng/>.

Genomics Takes Flight….To the Cloud. <https://idc-insights-community.com/health/life-sciences/genomics-takes-flight-to-the-cloud>.

Gantz J, Reinsel, D. The Digital Universe in 2020: big data, bigger digital shadows, and biggest growth in the far east. In: IDC iView: IDC Analyze the, Future; 2012.

Social Media And The Big Data Explosion. <http://www.forbes.com/sites/onmarketing/2012/06/28/social-media-and-the-big-data-explosion/>.

Big Data Offers Big Opportunities for Retail, Financial, Web Companies. <http://www.eweek.com/enterprise-apps/big-data-offers-big-opportunities-for-retail-financial-web-companies/>.

Data Deluge and the Human Microbiome Project. <http://www.issues.org/28.4/sagoff.html>.

Chae, 2013, Bio and health informatics meets cloud: BioVLab as an example, Health Inform Sci Syst, 1, 6, 10.1186/2047-2501-1-6

Davenport, 2012, D. J. Data scientist: the sexiest job of the 21st century, Harward Business, 90, 128

EMC Sitting In Sweet Spot Of $70 Billion Big Data Industry. <http://www.forbes.com/sites/greatspeculations/2011/11/18/emc-sitting-in-sweet-spot-of-70-billion-big-data-industry/>.

Yeh, 2001, Computational inference of homologous gene structures in the human genome, Genome Res, 11, 803, 10.1101/gr.175701

Obama Administration Unveils “Big Data” Initiative: Announces $200 Million In New R&D Investments. <http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdf>.

Manyika, 2011

The Benefits Of Data Center Virtualization For Businesses. <http://www.cloudtweaks.com/2012/03/the-benefits-of-data-center-virtualization-for-businesses/>.

Big Data, Meet the Huge Data That Will Shape Your Future. <http://www.information-management.com/news/big-data-meet-the-huge-data-that-will-shape-your-future-10023324-1.html>.

Bridging the gap between HPC and IaaS clouds. <http://datasys.cs.iit.edu/seminar/2012-03-06_bogdan-nicolae.html>.

Dai, 2012, Bioinformatics clouds for big data manipulation, Biology Direct, 7, 43, 10.1186/1745-6150-7-43

What will happen to Amazon’s massive cloud business? <http://tech.fortune.cnn.com/2012/05/22/aws/>.

Fusaro, 2011, Biomedical cloud computing with amazon web services, PLOS J

Shachak, 2007, Barriers and enablers to the acceptance of bioinformatics tools: a qualitative study, J Med Libr Assoc, 95, 454, 10.3163/1536-5050.95.4.454

Krampis K, Booth T, Chapman B, Tiwari B, Bicak M, Field D, et al.. Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community. BMC Bioinform; 2012;13:42. <calendar:T1:13:42>.

Giardine, 2005, Galaxy: a platform for interactive large-scale genome analysis, Genome Res, 15, 1451, 10.1101/gr.4086505

Angiuoli, 2011, CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing, BMC Bioinform, 12

Oinn, 2004, Taverna: a tool for the composition and enactment of bioinformatics workflows, Bioinformatics, 20, 3045, 10.1093/bioinformatics/bth361

Hong, 2012, FX: an RNA-Seq analysis tool on the cloud, Bioinformatics, 28, 721, 10.1093/bioinformatics/bts023

O’Connor, 2010, SeqWare Query Engine: storing and searching sequence data in the cloud, BMC Bioinform, 11, 10.1186/1471-2105-11-S12-S2

Available at https://dnanexus.com/.

How Hadoop Makes Short Work of Big Data. <http://www.forbes.com/sites/netapp/2012/09/24/hadoop-big-data/>.

Taylor, 2010, An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics, BMC Bioinform, 11, S1, 10.1186/1471-2105-11-S12-S1

Hadoop Sorts a Petabyte in 16.25 Hours and a Terabyte in 62 Seconds. <http://developer.yahoo.com/blogs/hadoop/posts/2009/05/hadoop_sorts_a_petabyte_in_162/>.

Dai, 2012, Bioinformatics clouds for big data manipulation, Biol Direct, 7, 10.1186/1745-6150-7-43

Cloudera and Mount Sinai: The structure of a Big Data Revolution? <http://www.zdnet.com/cloudera-and-mount-sinai-the-structure-of-a-big-data-revolution-7000000354/>.

Zou, 2013, Survey of MapReduce frame operation in bioinformatics, Brief Bioinform

McKenna, 2010, The genome analysis toolkit: a MapReduce framework for analysing next-generation DNA sequencing data, Genome Res, 20, 1297, 10.1101/gr.107524.110

Gurtowski, 2012, Genotyping in the cloud with Crossbow, Current Protocol Bioinform, 10.1002/0471250953.bi1503s39

Schatz, 2010, Cloud computing and the DNA data race, Nat Biotechnol, 28, 691, 10.1038/nbt0710-691

Langmead, 2009, Searching for SNPs with cloud computing, Genome Biol, 10, R134, 10.1186/gb-2009-10-11-r134

Nguyen, 2011, CloudAligner: a fast and full-featured MapReduce based tool for sequence mapping, BMC Res Notes, 4, 171, 10.1186/1756-0500-4-171

Langmead, 2010, Cloud-scale RNA-sequencing differential expression analysis with Myrna, Genome Biol, 11, 10.1186/gb-2010-11-8-r83

Helping accelerate treatment for pediatric cancer with Dell cloud technology. <http://content.dell.com/us/en/corp/d/corp-comm/pediatric-cancer>.

NextBio, Intel to collaborate on improving Hadoop Stack for Genomic Data Analysis. <http://www.genomeweb.com/informatics/nextbio-intel-collaborate-improving-hadoop-stack-genomic-data-analysis>.

Cloudera Chief Scientist Jeff Hammerbacher Teams with Mount Sinai School of Medicine to Solve Medical Challenges Using Big Data. <http://www.marketwire.com/press-release/Cloudera-Chief-Scientist-Jeff-Hammerbacher-Teams-With-Mount-Sinai-School-Medicine-1676135.htm>.

Schadt, 2010, Computational solutions to large-scale data management and analysis, Nat Rev Genet, 11, 647, 10.1038/nrg2857

Healthcare Cloud Computing (Clinical, EMR, SaaS, Private, Public, Hybrid) Market – Global Trends, Challenges, Opportunities & Forecasts (2012–2017). <http://www.reportlinker.com/p0924631-summary/Healthcare-Cloud-Computing-Clinical-EMR-SaaS-Private-Public-Hybrid-Market-Global-Trends-Challenges-Opportunities-Forecasts-.html.

Schoenherr, 2012, Cloudgene: a graphical execution platform for MapReduce programs on private and public clouds, BMC Bioinform, 13, 200, 10.1186/1471-2105-13-200

Pennisi, 2011, Will computers crash genomics?, Science, 331, 666, 10.1126/science.331.6018.666

1,000 Genomes in the Cloud and NCBI Experiences. <https://respond.niaid.nih.gov/conferences/bioinformatics2012/Festival%20Proceedings/Preuss_1000_Genomes.pdf>.

Available at http://asperasoft.com/.

How “Cloud” Services Democratize DNA Sequencing. <http://techonomy.com/2012/08/how-cloud-services-democratize-dna-sequencing/>.

Schadt, 2012, The changing privacy landscape in the era of big data, Mol Syst Biol, 8, 612, 10.1038/msb.2012.47

Creating HIPAA-Compliant Medical Data Applications With AWS. <http://aws.amazon.com/about-aws/whats-new/2009/04/06/whitepaper-hipaa/>.

Managing data in the Cloud Age. <http://www.dddmag.com/articles/2012/10/managing-data-cloud-age>.

Robertson, 2003, The $1000 genome: ethical and legal issues in whole genome sequencing of individuals, Am J Bioeth, 3, 10.1162/152651603322874762

Klein, 2011, Cloudy confidentiality: clinical and legal implications of cloud computing in health care, J Am Acad Psychiatry Law, 39, 571

Sleator, 2008, Metagenomics, Lett Appl Microbiol, 47, 361, 10.1111/j.1472-765X.2008.02444.x

Sleator, 2010, An overview of the processes shaping protein evolution, Sci Prog, 93, 1, 10.3184/003685009X12605492662844

Sleator, 2012, Prediction of protein functions, Methods Mol Biol, 815, 15, 10.1007/978-1-61779-424-7_2

Sleator, 2012, Proteins: form and function, Bioeng Bugs, 3, 80

Marianayagam, 2005, Protein folding by distributed computing and the denatured state ensemble, Proc Natl Acad Sci USA, 102, 16684, 10.1073/pnas.0506388102

Cooper, 2010, Predicting protein structures with a multiplayer online game, Nature, 466, 756, 10.1038/nature09304

Murray, 2012, Personalized medicine: been there, done that, always needs work!, Am J Respir Crit Care Med, 185, 1251, 10.1164/rccm.201203-0523ED

Furusawa, 2012, A dynamical-systems view of stem cell biology, Science, 338, 215, 10.1126/science.1224311

Sleator, 2010, The human superorganism – of microbes and men, Med Hypotheses, 74, 214, 10.1016/j.mehy.2009.08.047

O’Driscoll, 2013, Synthetic DNA: the next generation of big data storage, Bioengineered, 4

Karr, 2012, A whole-cell computational model predicts phenotype from genotype, Cell, 150, 389, 10.1016/j.cell.2012.05.044

Sleator, 2012, Digital biology: a new era has begun, Bioengineered, 3, 311, 10.4161/bioe.22367

Schatz, 2009, CloudBurst: highly sensitive read mapping with MapReduce, Bioinformatics, 25, 1363, 10.1093/bioinformatics/btp236

Pireddu, 2011, SEAL: a distributed short read mapping and duplicate removal tool, Bioinformatics, 27, 2159, 10.1093/bioinformatics/btr325

Blastreduce: high performance short read mapping with mapreduce. <http://www.cbcb.umd.edu/software/blastreduce/>.

Schatz, 2010, De Novo assembly of large genomes with cloud computing, vol. 10

Chang, 2012, A de novo next generation genomic sequence assembler based on string graph and MapReduce cloud computing framework, BMC Genomics, 13, S28, 10.1186/1471-2164-13-S7-S28

Jourdren, 2012, Eoulsan: a cloud computing-based framework facilitating high throughput sequencing analyses, Bioinformatics, 28, 1542, 10.1093/bioinformatics/bts165

Niemenmaa, 2012, Hadoop-BAM: directly manipulating next generation sequencing data in the cloud, Bioinformatics, 28, 876, 10.1093/bioinformatics/bts054

Matthews, 2010, MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees, BMC Bioinform, 11, S15, 10.1186/1471-2105-11-S1-S15

Colosimo, 2011, Nephele: genotyping via complete composition vectors and MapReduce, Source Code Biol Med, 6, 13, 10.1186/1751-0473-6-13

Vouzis, 2011, GPU-BLAST: using graphics processors to accelerate protein sequence alignment, Bioinformatics, 27, 182, 10.1093/bioinformatics/btq644

Liu, 2012, SOAP3: ultra-fast GPU-based parallel alignment tool for short reads, Bioinformatics, 28, 878, 10.1093/bioinformatics/bts061

Lewis, 2012, Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework, BMC Bioinform, 13, 10.1186/1471-2105-13-324

Matsunaga A, Tsugawa M, and Fortes J. CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications. IEEE Fourth International Conference on eScience, Indiana, USA, 2008 222-229.

Leo S, Santoni F, Zanetti G. Biodoop: bioinformatics on hadoop. In: Parallel processing workshops, 2009. ICPPW ‘09. International Conference on; 2009. p. 415–22.

Huang, 2013, BlueSNP: R package for highly scalable genome-wide association studies using Hadoop clusters, Bioinformatics, 29, 135, 10.1093/bioinformatics/bts647

Kelley, 2010, Quake: quality-aware detection and correction of sequencing errors, Genome Biol, 11, R116, 10.1186/gb-2010-11-11-r116

Zhang, 2012, Gene set analysis in the cloud, Bioinformatics, 28, 294, 10.1093/bioinformatics/btr630

Feng, 2011, PeakRanger: a cloud-enabled peak caller for ChIP-seq data, BMC Bioinform, 12, 139, 10.1186/1471-2105-12-139