Scalability and Validation of Big Data Bioinformatics Software
Tài liệu tham khảo
Viceconti, 2015, Big data, big knowledge: big data for personalized healthcare, IEEE J Biomed Health Inform, 19, 1209, 10.1109/JBHI.2015.2406883
Baker, 2010, Next-generation sequencing: adjusting to data overload, Nat Methods, 7, 495, 10.1038/nmeth0710-495
Goodwin, 2016, Coming of age: ten years of next-generation sequencing technologies, Nat Rev Genet, 17, 333, 10.1038/nrg.2016.49
Yu, 2016, Single-cell transcriptome study as big data, Genomics Proteomics Bioinformatics, 14, 21, 10.1016/j.gpb.2016.01.005
Marx, 2013, Biology: the big challenges of big data, Nature, 498, 255, 10.1038/498255a
Dolinski, 2015, Implications of Big Data for cell biology, Mol Biol Cell, 26, 2575, 10.1091/mbc.e13-12-0756
Alyass, 2015, From big data analysis to personalized medicine for all: challenges and opportunities, BMC Med Genomics, 8, 10.1186/s12920-015-0108-y
Giannoulatou, 2014, Verification and validation of bioinformatics software without a gold standard: a case study of BWA and Bowtie, BMC Bioinf, 15, S15, 10.1186/1471-2105-15-S16-S15
Baruzzo, 2016, Simulation-based comprehensive benchmarking of RNA-seq aligners, Nat Methods, 14, 135, 10.1038/nmeth.4106
Mattmann, 2013, Computing: a vision for data science, Nature, 493, 473, 10.1038/493473a
Schlosberg, 2016, Data security in genomics: a review of Australian privacy requirements and their relation to cryptography in data storage, J Pathol Inf, 7, 6, 10.4103/2153-3539.175793
2017
1994, Int J Supercomput Appl High Perform Eng, 8
Sunderam, 1990, PVM: a framework for parallel distributed computing, Concurr Pract Exp, 2, 315, 10.1002/cpe.4330020404
Darling, 2003, The design, implementation, and evaluation of mpiBLAST, Proc Clust, 2003, 13
Ebedes, 2004, Multiple sequence alignment in parallel on a workstation cluster, Bioinformatics, 20, 1193, 10.1093/bioinformatics/bth055
Foster, 2002, The Grid: a new infrastructure for 21st century science, Phys Today, 55, 42, 10.1063/1.1461327
Foster, 2005, Globus Toolkit Version 4: software for service-oriented systems, 3779/2005, 2
Krishnan, 2005, GridBLAST: a Globus-based high-throughput implementation of BLAST in a Grid computing framework, Concurr Comput Pract Exp, 17, 1607, 10.1002/cpe.906
Stevens, 2003, myGrid: personalised bioinformatics on the information grid, Bioinformatics, 19, i302, 10.1093/bioinformatics/btg1041
Carvalho, 2005, Squid – a simple bioinformatics grid, BMC Bioinf, 6, 197, 10.1186/1471-2105-6-197
Charalambous, 2005, Initial experiences porting a bioinformatics application to a graphics processor, 3746, 415
Buck, 2004, Brook for GPUs: stream computing on graphics hardware, ACM Trans Graph, 23, 777, 10.1145/1015706.1015800
Liu, 2009, CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units, BMC Res Notes, 2, 73, 10.1186/1756-0500-2-73
Nickolls, 2008, Scalable parallel programming with CUDA, Queue, 6, 40, 10.1145/1365490.1365500
Mell, 2011, The NIST definition of cloud computing, NIST Spec Publ, 145, 7
2017
2017
2017
2017
2017
Nguyen, 2011, CloudAligner: a fast and full-featured MapReduce based tool for sequence mapping, BMC Res Notes, 4, 171, 10.1186/1756-0500-4-171
Abuín, 2016, SparkBWA: speeding up the alignment of high-throughput DNA sequencing data, PLoS One, 11, e0155461, 10.1371/journal.pone.0155461
Decap, 2015, Halvade: scalable sequence analysis with MapReduce, Bioinformatics, 31, 2482, 10.1093/bioinformatics/btv179
Kelly, 2015, Churchill: an ultra-fast, deterministic, highly scalable and balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics, Genome Biol, 16, 1, 10.1186/s13059-014-0577-x
Sreedharan, 2014, Oqtans: the RNA-seq workbench in the cloud for complete and reproducible quantitative transcriptome analysis, Bioinformatics, 30, 1300, 10.1093/bioinformatics/btt731
Yang, 2016, Falco: a quick and flexible single-cell RNA-seq processing framework on the cloud, Bioinformatics, btw732, 10.1093/bioinformatics/btw732
Afgan, 2010, Galaxy CloudMan: delivering cloud compute clusters, BMC Bioinf, 11, S4, 10.1186/1471-2105-11-S12-S4
2017
Krampis, 2012, Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community, BMC Bioinf, 13, 42, 10.1186/1471-2105-13-42
Angiuoli, 2011, CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing, BMC Bioinf, 12, 356, 10.1186/1471-2105-12-356
Beaulieu-Jones, 2017, Reproducibility of computational workflows is automated using continuous analysis, Nat Biotechnol, 35, 342, 10.1038/nbt.3780
Dean, 2004, MapReduce: simplified data processing on large clusters
Langmead, 2010, Cloud-scale RNA-sequencing differential expression analysis with Myrna, Genome Biol, Figure 1, 1
Zaharia, 2010, Spark: cluster computing with working sets, 10
O'Brien, 2015, VariantSpark: population scale clustering of genotype information, BMC Genomics, 16, 1052, 10.1186/s12864-015-2269-7
Blue, 2014, Targeted next-generation sequencing identifies pathogenic variants in familial congenital heart disease, J Am Coll Cardiol, 64, 2498, 10.1016/j.jacc.2014.09.048
Bennett, 2014, Next-generation sequencing in clinical oncology: next steps towards clinical validation, Cancer, 6, 2296, 10.3390/cancers6042296
O'Rawe, 2013, Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing, Genome Med, 5, 28, 10.1186/gm432
Ho, 2011, ChIP-chip versus ChIP-seq: lessons for experimental design and data analysis, BMC Genomics, 12, 10.1186/1471-2164-12-134
Jung, 2014, Impact of sequencing depth in ChIP-seq experiments, Nucleic Acids Res, 42, e74, 10.1093/nar/gku178
Wilbanks, 2010, Evaluation of algorithm performance in ChIP-Seq peak detection, PLoS One, 5, e11471, 10.1371/journal.pone.0011471
Yu, 2016, Comparing five statistical methods of differential methylation identification using bisulfite sequencing data, Stat Appl Genet Mol Biol, 15, 10.1515/sagmb-2015-0078
Allen, 2015, Variant calling assessment using Platinum Genomes, NIST Genome in a Bottle, and VCAT 2.0
Sanger, 1977, DNA sequencing with chain-terminating inhibitors, Proc Natl Acad Sci, 74, 5463, 10.1073/pnas.74.12.5463
Huang, 2012, ART: a next-generation sequencing read simulator, Bioinformatics, 28, 593, 10.1093/bioinformatics/btr708
Mu, 2015, VarSim: a high-fidelity simulation and validation framework for high-throughput genome sequencing with cancer applications, Bioinformatics, 31, 1469, 10.1093/bioinformatics/btu828
Li, 2009, The Sequence Alignment/Map format and SAMtools, Bioinformatics, 25, 2078, 10.1093/bioinformatics/btp352
Myers, 2011
Weyuker, 1982, On testing non-testable programs, Comput J, 25, 465, 10.1093/comjnl/25.4.465
Kamali, 2015, How to test bioinformatics software?, Biophys Rev, 7, 343, 10.1007/s12551-015-0177-3
Chen, 1998, Metamorphic testing: a new approach for generating next test cases
Xie, 2011, Testing and validating machine learning classifiers by metamorphic testing, J Syst Softw, 84, 544, 10.1016/j.jss.2010.11.920
Liu, 2014, How effectively does metamorphic testing alleviate the Oracle problem?, IEEE Trans Softw Eng, 40, 4, 10.1109/TSE.2013.46
Sun, 2012, A metamorphic relation-based approach to testing web services without oracles, Int J Web Serv Res, 9, 51, 10.4018/jwsr.2012010103
Tao, 2010, An automatic testing approach for compiler based on metamorphic testing technique, 270
Segura, 2011, Automated metamorphic testing on the analyses of feature models, Inf Softw Technol, 53, 245, 10.1016/j.infsof.2010.11.002
Chen, 2002, Metamorphic testing of programs on partial differential equations: a case study, 327
Troup, 2016, A cloud-based framework for applying metamorphic testing to a bioinformatics pipeline, 33
Chen, 2009, An innovative approach for testing bioinformatics programs using metamorphic testing, BMC Bioinf, 10, 24, 10.1186/1471-2105-10-24
Heath, 2015, Single-cell analysis tools for drug discovery and development, Nat Rev Drug Discov, 15, 204, 10.1038/nrd.2015.16
Rotem, 2015, Single-cell ChIP-seq reveals cell subpopulations defined by chromatin state, Nat Biotechnol, 33, 1165, 10.1038/nbt.3383
Ellingson, 2014, High-throughput virtual molecular docking with AutoDockCloud: high-throughput virtual molecular docking with AutoDockCloud, Concurr Comput Pract Exp, 26, 907, 10.1002/cpe.2926
Feng, 2011, PeakRanger: a cloud-enabled peak caller for ChIP-seq data, BMC Bioinf, 12, 139, 10.1186/1471-2105-12-139