‘Big data’, Hadoop and cloud computing in genomics
Tóm tắt
Từ khóa
Tài liệu tham khảo
Quail, 2012, A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers, BMC Genomics, 13, 341, 10.1186/1471-2164-13-341
Pollack, 2011
Moore, 1965, Cramming more components into integrated circuits, Electronics, 38, 4
As We May Communicate. <http://www.tmcnet.com/articles/comsol/0100/0100pubout.htm>.
Loman, 2012, High-throughput bacterial genome sequencing: an embarrassment of choice, a world of opportunity, Nat Rev Microbiol, 10, 599, 10.1038/nrmicro2850
Davies, 2010
Mathe, 2002, Current methods of gene prediction, their strengths and weaknesses, Nucl Acids Res, 30, 4103, 10.1093/nar/gkf543
Stein, 2010, The case for cloud computing in genome informatics, Rev J: Genome Biol, 11, 207
Mason, 2012, Faster sequencers, larger datasets, new challenges, Genome Biol, 13, 314, 10.1186/gb-2012-13-3-314
Managing and Analysing 1,000,000 Genomes. <http://rgrossman.com/2012/09/18/million-genomes-challeng/>.
Genomics Takes Flight….To the Cloud. <https://idc-insights-community.com/health/life-sciences/genomics-takes-flight-to-the-cloud>.
Gantz J, Reinsel, D. The Digital Universe in 2020: big data, bigger digital shadows, and biggest growth in the far east. In: IDC iView: IDC Analyze the, Future; 2012.
Social Media And The Big Data Explosion. <http://www.forbes.com/sites/onmarketing/2012/06/28/social-media-and-the-big-data-explosion/>.
Big Data Offers Big Opportunities for Retail, Financial, Web Companies. <http://www.eweek.com/enterprise-apps/big-data-offers-big-opportunities-for-retail-financial-web-companies/>.
Data Deluge and the Human Microbiome Project. <http://www.issues.org/28.4/sagoff.html>.
Chae, 2013, Bio and health informatics meets cloud: BioVLab as an example, Health Inform Sci Syst, 1, 6, 10.1186/2047-2501-1-6
Davenport, 2012, D. J. Data scientist: the sexiest job of the 21st century, Harward Business, 90, 128
EMC Sitting In Sweet Spot Of $70 Billion Big Data Industry. <http://www.forbes.com/sites/greatspeculations/2011/11/18/emc-sitting-in-sweet-spot-of-70-billion-big-data-industry/>.
Yeh, 2001, Computational inference of homologous gene structures in the human genome, Genome Res, 11, 803, 10.1101/gr.175701
Obama Administration Unveils “Big Data” Initiative: Announces $200 Million In New R&D Investments. <http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdf>.
Manyika, 2011
The Benefits Of Data Center Virtualization For Businesses. <http://www.cloudtweaks.com/2012/03/the-benefits-of-data-center-virtualization-for-businesses/>.
Big Data, Meet the Huge Data That Will Shape Your Future. <http://www.information-management.com/news/big-data-meet-the-huge-data-that-will-shape-your-future-10023324-1.html>.
Bridging the gap between HPC and IaaS clouds. <http://datasys.cs.iit.edu/seminar/2012-03-06_bogdan-nicolae.html>.
Dai, 2012, Bioinformatics clouds for big data manipulation, Biology Direct, 7, 43, 10.1186/1745-6150-7-43
What will happen to Amazon’s massive cloud business? <http://tech.fortune.cnn.com/2012/05/22/aws/>.
Fusaro, 2011, Biomedical cloud computing with amazon web services, PLOS J
Shachak, 2007, Barriers and enablers to the acceptance of bioinformatics tools: a qualitative study, J Med Libr Assoc, 95, 454, 10.3163/1536-5050.95.4.454
Krampis K, Booth T, Chapman B, Tiwari B, Bicak M, Field D, et al.. Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community. BMC Bioinform; 2012;13:42. <calendar:T1:13:42>.
Giardine, 2005, Galaxy: a platform for interactive large-scale genome analysis, Genome Res, 15, 1451, 10.1101/gr.4086505
Angiuoli, 2011, CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing, BMC Bioinform, 12
Oinn, 2004, Taverna: a tool for the composition and enactment of bioinformatics workflows, Bioinformatics, 20, 3045, 10.1093/bioinformatics/bth361
Hong, 2012, FX: an RNA-Seq analysis tool on the cloud, Bioinformatics, 28, 721, 10.1093/bioinformatics/bts023
O’Connor, 2010, SeqWare Query Engine: storing and searching sequence data in the cloud, BMC Bioinform, 11, 10.1186/1471-2105-11-S12-S2
Available at https://dnanexus.com/.
How Hadoop Makes Short Work of Big Data. <http://www.forbes.com/sites/netapp/2012/09/24/hadoop-big-data/>.
Taylor, 2010, An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics, BMC Bioinform, 11, S1, 10.1186/1471-2105-11-S12-S1
Hadoop Sorts a Petabyte in 16.25 Hours and a Terabyte in 62 Seconds. <http://developer.yahoo.com/blogs/hadoop/posts/2009/05/hadoop_sorts_a_petabyte_in_162/>.
Cloudera and Mount Sinai: The structure of a Big Data Revolution? <http://www.zdnet.com/cloudera-and-mount-sinai-the-structure-of-a-big-data-revolution-7000000354/>.
Zou, 2013, Survey of MapReduce frame operation in bioinformatics, Brief Bioinform
McKenna, 2010, The genome analysis toolkit: a MapReduce framework for analysing next-generation DNA sequencing data, Genome Res, 20, 1297, 10.1101/gr.107524.110
Gurtowski, 2012, Genotyping in the cloud with Crossbow, Current Protocol Bioinform, 10.1002/0471250953.bi1503s39
Langmead, 2009, Searching for SNPs with cloud computing, Genome Biol, 10, R134, 10.1186/gb-2009-10-11-r134
Nguyen, 2011, CloudAligner: a fast and full-featured MapReduce based tool for sequence mapping, BMC Res Notes, 4, 171, 10.1186/1756-0500-4-171
Langmead, 2010, Cloud-scale RNA-sequencing differential expression analysis with Myrna, Genome Biol, 11, 10.1186/gb-2010-11-8-r83
Helping accelerate treatment for pediatric cancer with Dell cloud technology. <http://content.dell.com/us/en/corp/d/corp-comm/pediatric-cancer>.
NextBio, Intel to collaborate on improving Hadoop Stack for Genomic Data Analysis. <http://www.genomeweb.com/informatics/nextbio-intel-collaborate-improving-hadoop-stack-genomic-data-analysis>.
Cloudera Chief Scientist Jeff Hammerbacher Teams with Mount Sinai School of Medicine to Solve Medical Challenges Using Big Data. <http://www.marketwire.com/press-release/Cloudera-Chief-Scientist-Jeff-Hammerbacher-Teams-With-Mount-Sinai-School-Medicine-1676135.htm>.
Schadt, 2010, Computational solutions to large-scale data management and analysis, Nat Rev Genet, 11, 647, 10.1038/nrg2857
Healthcare Cloud Computing (Clinical, EMR, SaaS, Private, Public, Hybrid) Market – Global Trends, Challenges, Opportunities & Forecasts (2012–2017). <http://www.reportlinker.com/p0924631-summary/Healthcare-Cloud-Computing-Clinical-EMR-SaaS-Private-Public-Hybrid-Market-Global-Trends-Challenges-Opportunities-Forecasts-.html.
Schoenherr, 2012, Cloudgene: a graphical execution platform for MapReduce programs on private and public clouds, BMC Bioinform, 13, 200, 10.1186/1471-2105-13-200
1,000 Genomes in the Cloud and NCBI Experiences. <https://respond.niaid.nih.gov/conferences/bioinformatics2012/Festival%20Proceedings/Preuss_1000_Genomes.pdf>.
Available at http://asperasoft.com/.
How “Cloud” Services Democratize DNA Sequencing. <http://techonomy.com/2012/08/how-cloud-services-democratize-dna-sequencing/>.
Schadt, 2012, The changing privacy landscape in the era of big data, Mol Syst Biol, 8, 612, 10.1038/msb.2012.47
Creating HIPAA-Compliant Medical Data Applications With AWS. <http://aws.amazon.com/about-aws/whats-new/2009/04/06/whitepaper-hipaa/>.
Managing data in the Cloud Age. <http://www.dddmag.com/articles/2012/10/managing-data-cloud-age>.
Robertson, 2003, The $1000 genome: ethical and legal issues in whole genome sequencing of individuals, Am J Bioeth, 3, 10.1162/152651603322874762
Klein, 2011, Cloudy confidentiality: clinical and legal implications of cloud computing in health care, J Am Acad Psychiatry Law, 39, 571
Sleator, 2010, An overview of the processes shaping protein evolution, Sci Prog, 93, 1, 10.3184/003685009X12605492662844
Sleator, 2012, Prediction of protein functions, Methods Mol Biol, 815, 15, 10.1007/978-1-61779-424-7_2
Sleator, 2012, Proteins: form and function, Bioeng Bugs, 3, 80
Marianayagam, 2005, Protein folding by distributed computing and the denatured state ensemble, Proc Natl Acad Sci USA, 102, 16684, 10.1073/pnas.0506388102
Cooper, 2010, Predicting protein structures with a multiplayer online game, Nature, 466, 756, 10.1038/nature09304
Murray, 2012, Personalized medicine: been there, done that, always needs work!, Am J Respir Crit Care Med, 185, 1251, 10.1164/rccm.201203-0523ED
Furusawa, 2012, A dynamical-systems view of stem cell biology, Science, 338, 215, 10.1126/science.1224311
Sleator, 2010, The human superorganism – of microbes and men, Med Hypotheses, 74, 214, 10.1016/j.mehy.2009.08.047
O’Driscoll, 2013, Synthetic DNA: the next generation of big data storage, Bioengineered, 4
Karr, 2012, A whole-cell computational model predicts phenotype from genotype, Cell, 150, 389, 10.1016/j.cell.2012.05.044
Schatz, 2009, CloudBurst: highly sensitive read mapping with MapReduce, Bioinformatics, 25, 1363, 10.1093/bioinformatics/btp236
Pireddu, 2011, SEAL: a distributed short read mapping and duplicate removal tool, Bioinformatics, 27, 2159, 10.1093/bioinformatics/btr325
Blastreduce: high performance short read mapping with mapreduce. <http://www.cbcb.umd.edu/software/blastreduce/>.
Schatz, 2010, De Novo assembly of large genomes with cloud computing, vol. 10
Chang, 2012, A de novo next generation genomic sequence assembler based on string graph and MapReduce cloud computing framework, BMC Genomics, 13, S28, 10.1186/1471-2164-13-S7-S28
Jourdren, 2012, Eoulsan: a cloud computing-based framework facilitating high throughput sequencing analyses, Bioinformatics, 28, 1542, 10.1093/bioinformatics/bts165
Niemenmaa, 2012, Hadoop-BAM: directly manipulating next generation sequencing data in the cloud, Bioinformatics, 28, 876, 10.1093/bioinformatics/bts054
Matthews, 2010, MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees, BMC Bioinform, 11, S15, 10.1186/1471-2105-11-S1-S15
Colosimo, 2011, Nephele: genotyping via complete composition vectors and MapReduce, Source Code Biol Med, 6, 13, 10.1186/1751-0473-6-13
Vouzis, 2011, GPU-BLAST: using graphics processors to accelerate protein sequence alignment, Bioinformatics, 27, 182, 10.1093/bioinformatics/btq644
Liu, 2012, SOAP3: ultra-fast GPU-based parallel alignment tool for short reads, Bioinformatics, 28, 878, 10.1093/bioinformatics/bts061
Lewis, 2012, Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework, BMC Bioinform, 13, 10.1186/1471-2105-13-324
Matsunaga A, Tsugawa M, and Fortes J. CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications. IEEE Fourth International Conference on eScience, Indiana, USA, 2008 222-229.
Leo S, Santoni F, Zanetti G. Biodoop: bioinformatics on hadoop. In: Parallel processing workshops, 2009. ICPPW ‘09. International Conference on; 2009. p. 415–22.
Huang, 2013, BlueSNP: R package for highly scalable genome-wide association studies using Hadoop clusters, Bioinformatics, 29, 135, 10.1093/bioinformatics/bts647
Kelley, 2010, Quake: quality-aware detection and correction of sequencing errors, Genome Biol, 11, R116, 10.1186/gb-2010-11-11-r116