Toward Scalable Systems for Big Data Analytics: A Technology Tutorial

IEEE Access - Tập 2 - Trang 652-687 - 2014
Han Hu1, Yonggang Wen2, Tat‐Seng Chua1, Xuelong Li3
1School of Computing, National University of Singapore, Singapore
2School of Computer Engineering, Nanyang Technological University, Singapore
3State Key Laboratory of Transient Optics and Photonics, Center for Optical Imagery Analysis and Learning, Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an, China

Tóm tắt

Từ khóa


Tài liệu tham khảo

10.14778/1920841.1920862

10.1145/1740390.1740405

rao, 2012, Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments

10.1145/2094114.2094118

10.2200/S00274ED1V01Y201006HLT007

10.1145/1807167.1807273

10.14778/1978665.1978670

10.14778/1687553.1687609

10.1145/1807128.1807150

10.14778/2212351.2212354

yu, 2008, DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language, Proc 8th USENIX Conf Oper Syst Des Implement, 1

10.14778/1920841.1920881

10.1109/IPDPS.2008.4536311

zaharia, 2012, Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing, Proc 9th USENIX Conf Netw Syst Des Implement, 2

10.1145/1851476.1851593

peng, 2010, Large-scale incremental processing using distributed transactions and notifications, Proc 9th USENIX Conf Oper Syst Des Implement, 1

10.1145/2038916.2038923

10.1155/2005/962135

10.14778/1687553.1687568

10.1145/2168931.2168943

agrawal, 0, Challenges and opportunities with big data—A community white paper developed by leading researchers across the united states

10.1002/9781119204275

hey, 2009, The Fourth Paradigm Data-Intensive Scientific Discovery

10.1145/945449.945450

2013, Vertica

sandholm, 2010, Dynamic proportional share scheduling in Hadoop, Job Scheduling Strategies for Parallel Processing, 110, 10.1007/978-3-642-16505-4_7

goodhope, 2012, Building linkedins real-time activity data pipeline, Data Eng, 35, 33

10.1109/CloudCom.2010.97

10.1109/ICDMW.2010.172

yong, 2009, Towards a resource aware scheduler in Hadoop, Proc Int Conf Web Services (ICWS), 102

2013, Storm

10.1109/ICDEW.2010.5452751

chen, 2011, The case for evaluating mapreduce performance using workload suites, Proc IEEE 19th Int Symp Model Anal Simul Comput Telecommun Syst (MASCOTS), 390, 10.1109/MASCOTS.2011.12

2013, Gray Sort

10.1145/1559845.1559865

2013, Pig Mix

2013, Grid Mix

murray, 2011, Ciel: A universal execution engine for distributed data-flow computing, Proc 8th USENIX Conf Netw Syst Des Implement, 9

10.14778/2367502.2367512

10.1145/1989323.1989439

10.1145/2038916.2038925

nambiar, 2006, The making of TPC-DS, Proc 32nd Int Conf Very Large Data Bases (VLDB) Endowment, 1049

2013, TPC benchmarks

richardson, 2008, Magic Quadrant for Business Intelligence Platforms

blackett, 2013, Analytics Network-O R Analytics

10.1007/s11042-010-0645-5

eschenfelder, 1980, Data Mining and Knowledge Discovery Handbook, 14

friedman, 2008, Data visualization and infographics

2013, iPlant

foundation, 2013, Core Techniques and Technologies for Advancing Big Data Science and Engineering

economist, 2011

2013, Aster Data

2013, Netezza

10.1109/CLOUD.2012.67

2013, Greenplum[EB/OL]

team, 2011, Big Data Now Current Perspectives from O’Reilly Radar

marche, 2012, Is Facebook making us lonely, Atlantic, 309, 60

grobelnik, 2012, Big Data Tutorial

10.1145/2247596.2247598

10.1145/2331042.2331057

2013, Summingbird

2014, Teradata

chen, 2012, We don’t know enough to make a big data benchmark suite—An academia-industry view, Proc Workshop Big Data Benchmarking (WBDB)

10.1145/129888.129894

gantz, 2010, The digital universe decade-are you ready, IDC White Paper

layton, 2013, How Amazon Works

2013, DEX

2013, Neo4j

baker, 2011, Megastore: Providing scalable, highly available storage for interactive services, Proc Conf Innov Database Res (CIDR), 223

10.14778/1454159.1454167

2013, Hypert

2013, Mongodb

10.1145/2463676.2463712

crochford, 2006, RFC 4627 - The Application/json Media Type for JavaScript Object Notation (JSON)

10.1145/1365815.1365816

burrows, 2006, The chubby lock service for loosely-coupled distributed systems, Proc Symp Oper Syst Des Implementation, 335

lakshman, 2009, Cassandra: Structured storage system on a p2p network, Proc ACM Symp Principles Distributed Computing, 5, 10.1145/1582716.1582722

2013, HBase

10.14778/2367502.2367519

10.1109/LCN.2002.1181851

laurila, 2012, The mobile data challenge: Big data for mobile computing research, Proc 10th Int Conf Pervas Comput Workshop Nokia Mobile Data Challenge Conjunct, 1

10.2481/dsj.WDS-018

wang, 2012, Semantically-aware data discovery and placement in collaborative computing environments

2013, ATLAS

2013, SDSS

10.1109/MCSE.2011.73

2013, A Comprehensive List of Big Data Statistics

gallagher, 2013, The Big Data Value Chain

walker, 1996, MPI: A standard message passing interface, Supercomputer, 12, 56

10.1109/99.660313

tanenbaum, 2006, Distributed Systems Principles and Paradigms

10.1145/564585.564601

10.1145/343477.343502

10.1145/1558334.1558339

10.1145/1740390.1740395

10.14778/1687627.1687657

economist, 2011, Drowning in Numbers—Digital Data Will Flood the Planet- and Help us Understand it Better

cukier, 2010, Data, data everywhere, Economist, 394, 3

noguchi, 2011, Following Digital Breadcrumbs to Big Data Gold

lohr, 2012, New York Times, 11

10.1145/1807128.1807152

house, 2012, Fact Sheet Big Data Across the Federal Government

2013, eBay Study How to Build Trust and Improve the Shopping Experience

noguchi, 2011, The Search for Analysts to Make Sense-of-Big-Data

corbett, 2012, Spanner: Google’s globally-distributed database, Proc 10th Conf Oper Syst Des Implement (OSDI)

10.1145/2213836.2213954

kelly, 2013, Taming Big Data

10.1145/1978542.1978562

10.14778/2367502.2367572

evans, 2010, The explosion of data

2013, What is Big Data?

10.1145/1272996.1273005

sevilla, 2012, Big Data Vendors and Technologies the list!

10.14778/1920841.1920886

10.1145/1807167.1807184

10.1145/1067268.1067287

10.1145/511446.511464

10.1023/A:1021515408295

10.1145/584943.584945

jain, 1999, Biometrics Personal Identification in Networked Society

10.1109/35.825644

choudhary, 2012, Crawling rich internet applications: The state of the art, Proc Conf Center Adv Studies Collaborative Res (CASCON), 146

2013, Robots

10.1109/35.668282

10.1109/JLT.2008.2010061

10.1145/1322263.1322274

10.1145/1460412.1460443

10.1109/IPSN.2008.28

10.1109/IPSN.2007.4379685

ceriotti, 2009, Monitoring heritage buildings with wireless sensor networks: The Torre Aquila deployment, Proc Int Conf Inf Process Sensor Netw, 277

10.1145/1098918.1098925

10.1109/SURV.2011.060710.00066

10.1109/WCSP.2011.6096958

2013, Journal of Scientific Instruments

wahab, 2008, Data pre-processing on web server logs for generalized association rules mining algorithm, World Acad Sci Eng Technol, 48, 970

10.1145/1083784.1083789

10.1145/974121.974131

10.1145/2229156.2229157

10.1007/s10115-007-0114-2

10.1016/j.tics.2007.09.004

10.1145/1188895.1188911

10.1145/1787275.1787336

10.1109/JSTQE.2010.2051419

10.1109/JSTQE.2010.2049733

friedman, 0, Data visualization Modern approaches

10.1145/1868447.1868455

10.1007/978-1-4615-1177-9

ye, 2010, DOS—A scalable optical switch for datacenters, Proc 6th ACM/IEEE Symp Archit Netw Commun Syst, 1

anderson, 2003, An Introduction to Multivariate Statistical Analysis

10.1145/1851275.1851222

10.1109/ICTON.2012.6253903

10.1145/1151659.1159918

müller, 2005, Problems Methods and Challenges in Comprehensive Data Cleansing

10.1145/1851275.1851192

10.1145/2377677.2377709

goutelle, 2005, A survey of transport protocols other than standard TCP

jinno, 2009, Dynamic optical mesh networks: Drivers, challenges and solutions for the future, Proc Eur Conf Optical Communication (ECOC), 1

2009, Cisco Data Center Interconnect Design and Deployment Guide

hoelzle, 2009, The Datacenter as a Computer An Introduction to the Design of Warehouse-Scale Machines

10.1109/JLT.2011.2132115

10.1109/MCOM.2010.5496876

10.1145/1592568.1592576

10.1145/1594977.1592577

10.1145/1851182.1851223

10.1145/1851275.1851191

10.1126/science.168.3929.335

han, 2006, Data Mining Concepts and Techniques

10.1145/1041410.1041421

10.1561/1500000011

10.1109/TCBB.2010.99

10.1145/2133806.2133826

manning, 1999, Foundations of Statistical Natural Language Processing

ritter, 2011, Named entity recognition in tweets: An experimental study, Proc Conf Empirical Methods Nat Lang Process, 1524

10.1145/1390334.1390367

maybury, 2004, New Directions in Question Answering

10.1145/2034691.2034731

10.1145/2350176.2350181

10.1145/846183.846187

10.1109/TNN.2002.1031947

10.1016/S0169-7552(98)00110-X

konopnicki, 1995, W3QS: A query system for the world-wide web, Proc Int Conf On Very Large Data Bases, 54

10.1016/S1389-1286(99)00052-3

10.1145/2187836.2187840

10.1145/2324796.2324799

10.1145/2333112.2333120

hu, 2011, A survey on visual content-based video indexing and retrieval, IEEE Trans Syst Man Cybern C Appl Rev, 41, 797, 10.1109/TSMCC.2011.2109710

li, 2008, Discriminant locally linear embedding with high-order tensor data, IEEE Trans Syst Man Cybern B Cybern, 38, 342, 10.1109/TSMCB.2007.911536

10.1109/TKDE.2009.64

10.1016/j.eswa.2007.12.034

li, 2010, L1-norm-based 2DPCA, IEEE Trans Syst Man Cybern B Cybern, 40, 1170, 10.1109/TSMCB.2009.2035629

10.1145/2020408.2020575

10.1007/978-1-4419-8462-3

watts, 2004, Six Degrees The Science of a Connected Age

10.1073/pnas.0507655102

10.1145/2393347.2393414

jiang, 2010, Columbia-UCF TRECvid2010 multimedia event detection: Combining multiple modalities, contextual concepts, and temporal matching, Proc Nat Inst Standards Technol (NIST) TRECvid Workshop, 2, 6

10.1016/j.ijar.2010.04.001

mell, 2009, The NIST definition of cloud computing, National Inst Standards Technol, 53, 50

troppens, 2011, Storage Networks Explained Basics and Application of Fibre Channel SAN NAS ISCSI Infiniband and FCoE

guerra, 2011, Cost effective storage using extent based dynamic tiering, Proc 9th USENIX Conf File Stroage Technol (FAST), 273

soundararajan, 2010, Extending SSD lifetimes with disk-based write caches, Proc 8th USENIX Conf File Storage Technol, 8

10.1145/1594204.1594206

clark, 2005, Storage Virtualization Technologies for Simplifying Data Storage and Management

2013, Hadoop Distributed File System

beaver, 2010, Finding a needle in Haystack: Facebook’s photo storage, Proc 9th USENIX Conf Oper Syst Des Implement (OSDI), 1

2013, Taobao File System

2013, Kosmosfs

10.14778/1454159.1454166

10.1145/1772690.1772755

10.1145/1348549.1348552

10.1145/1774088.1774323

10.1145/1921632.1921636

10.1145/2433396.2433478

10.1145/1557019.1557108

10.1145/2398776.2398793

10.1145/1644893.1644931

10.1145/1557019.1557128

10.1145/2398776.2398792

decandia, 2007, Dynamo: Amazon’s highly available key-value store, SIGOPS Oper Syst Rev, 41, 205, 10.1145/1323293.1294281

2013, Fast Distributed File System

2013, Voldemort

10.1145/258533.258660

2013, Redis

2013, Tokyo Canbinet

2013, Tokyo Tyrant

2013, Memcached

2013, Memcached

2013, Riak

manyika, 2011, Big Data The Next Frontier for Innovation Competition and Productivity, 1

2013, Scala

gantz, 2012, The digital universe in 2020: Big data, bigger digital shadows, and biggest growth in the far east, IDC IView IDC Analyze the Future

10.1145/1150402.1150510

10.14778/1687627.1687709

dai, 2008, Translated learning: Transfer learning across different feature spaces, Proc Adv Neural Inform Process Syst (NIPS), 353

10.1145/1991996.1992068

10.1145/1557019.1557074

hu, 0, Towards multi-screen social tv with geo-aware social sense, IEEE Multimedia

10.1145/2557642.2579373

10.1145/2213836.2213926

10.1145/2442796.2442809

10.1109/TWC.2013.100113.130619

10.1109/ICDE.2008.4497596

10.1145/2396761.2396871

10.1145/1807167.1807176

10.1023/B:MACH.0000035473.11134.83

maletic, 2000, Data cleansing: Beyond integrity analysis, Proc Conf Inform Qual, 200

10.14778/1687627.1687750

silberschatz, 1997, Database System Con-cepts, 4

10.1145/543613.543644

10.1145/564376.564393

salomon, 2004, Data Compression

10.1504/IJIQ.2007.013376

10.1109/ISIT.2013.6620341

10.1109/ISIT.2014.6874834

10.1145/1142473.1142520

10.1145/1743384.1743467

2013, Cisco visual networking index: Global mobile data traffic forecast update

10.1145/1401890.1551566

10.1145/1851476.1851544

10.1145/2110363.2110414

10.1145/1851476.1851593

2013, Applications and Organizations Using Hadoop

10.14778/1920841.1920881

10.1145/35037.35059

10.1145/1978915.1978919

10.1145/1327452.1327492

white, 2012, Hadoop The Definitive Guide

gantz, 2011, Extracting value from chaos, Proc IDC iView, 1

zikopoulos, 2011, Understanding Big Data Analytics for Enterprise Class Hadoop and Streaming Data

10.1109/TCBB.2012.53

10.1145/2001269.2001285

10.1145/1989323.1989430

laney, 2001, 3d data management: Controlling data volume, velocity and variety

cooper, 2012, Tackling Big Data

10.1109/TCBB.2009.6

symes, 2004, Digital Video Compression

dufaux, 2004, Video surveillance using JPEG 2000, Proc SPIE, 5588, 268, 10.1117/12.564828

10.1145/775047.775087

10.1109/TMM.2011.2180705

10.1145/1456650.1456651

10.1145/2247596.2247658

10.1109/JPROC.2012.2189787

10.1145/2421648.2421656

10.14778/1454159.1454204

condie, 2010, Mapreduce online, Proc 7th USENIX Conf Netw Syst Des Implement, 21

10.1145/1989323.1989426

10.14778/1920841.1920903

10.14778/1920841.1920908

10.1145/1807167.1807294

10.1145/1755913.1755940