Data-intensive applications, challenges, techniques and technologies: A survey on Big Data

Information Sciences - Tập 275 - Trang 314-347 - 2014
C.L. Philip Chen1, Chun-Yang Zhang1
1Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macau, China

Tài liệu tham khảo

http://www.whitehouse.gov/sites/default/files/microsites/ostp/big-data-fact-sheet-final-1.pdf. http://quantumcomputers.com. Karmasphere Studio and Analyst, 2012. <http://www.karmasphere.com/>. Pentaho Business Analytics, 2012. <http://www.pentaho.com/explore/pentaho-business-analytics/>. Sqlstream, 2012. <http://www.sqlstream.com/products/server/>. Storm, 2012. <http://storm-project.net/>. Abzetdin Adamov. Distributed file system as a basis of data-intensive computing, in: 2012 6th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–3 (October). Divyakant Agrawal, Philip Bernstein, Elisa Bertino, Susan Davidson, Umeshwas Dayal, Michael Franklin, Johannes Gehrke, Laura Haas, Jiawei Han Alon Halevy, H.V. Jagadish, Alexandros Labrinidis, Sam Madden, Yannis Papakon stantinou, Jignesh Patel, Raghu Ramakrishnan, Kenneth Ross, Shahabi Cyrus, Dan Suciu, Shiv Vaithyanathan, Jennifer Widom, Challenges and Opportunities with Big Data, CYBER CENTER TECHNICAL REPORTS, Purdue University, 2011. Byungik Ahn, Neuron machine: Parallel and pipelined digital neurocomputing architecture, in: 2012 IEEE International Conference on Computational Intelligence and Cybernetics (CyberneticsCom), 2012, pp. 143–147. Ahrens, 2001, Large-scale data visualization using parallel data streaming, IEEE Comput. Graph. Appl., 21, 34, 10.1109/38.933522 Chris Anderson, The End of Theory: The Data Deluge Makes the Scientific Method Obsolete, 2008. <http://www.wired.com/science/discoveries/magazine/16-07/pb-theory>. Andrianantoandro, 2006, Synthetic biology: new engineering rules for an emerging discipline, Mol. Syst. Biol., 2, 10.1038/msb4100073 Arel, 2010, Deep machine learning – a new frontier in artificial intelligence research, IEEE Comput. Intell. Mag., 5, 13, 10.1109/MCI.2010.938364 Aditya Auradkar, Chavdar Botev, Shirshanka Das, Dave DeMaagd, Alex Feinberg, Phanindra Ganti, Bhaskar Ghosh Lei Gao, Kishore Gopalakrishna, Brendan Harris, Joel Koshy, Kevin Krawez, Jay Kreps, Shi Lu, Sunil Nagaraj, Neha Narkhede, Sasha Pachev, Igor Perisic, Lin Qiao, Tom Quiggle, Jun Rao, Bob Schulman, Abraham Sebastian, Oliver Seeliger, Adam Silberstein, Boris Shkolnik, Chinmay Soman, Roshan Sumbaly, Kapil Surlaker, Sajid Topiwala, Cuong Tran, Balaji Varadarajan, Jemiah Westerman, Zach White, David Zhang, Jason Zhang, Data infrastructure at linkedin, in: 2012 IEEE 28th International Conference on Data Engineering (ICDE), 2012, pp. 1370–1381. Bahga, 2012, Analyzing massive machine maintenance data in a computing cloud, IEEE Trans Parallel Distrib. Syst., 23, 1831, 10.1109/TPDS.2011.306 Barbarossa, 2009, Bio-inspired sensor network design, IEEE Signal Process. Mag., 24, 95 Bekkerman, 2012 Bell, 2009, Beyond the data deluge, Science, 323, 1297, 10.1126/science.1170411 Bencivenni, 2008, A comparison of data-access platforms for the computing of large hadron collider experiments, IEEE Trans. Nucl. Sci., 55, 1621, 10.1109/TNS.2008.924087 Bengio, 2009, Learning deep architectures for ai, Found. Trends Mach. Learn., 2, 1, 10.1561/2200000006 Bengio, 2013, Representation learning: a review and new perspectives, IEEE Trans. Pattern Anal. Mach. Intell., 35, 1798, 10.1109/TPAMI.2013.50 Janine Bennett, Ray Grout, Philippe Pebay, Diana Roe, David Thompson, Numerically stable, single-pass, parallel statistics algorithms, in: IEEE International Conference on Cluster Computing and Workshops, 2009, CLUSTER ’09, 2009, pp. 1–8. Bertone, 2001, Integrative data mining: the new direction in bioinformatics, IEEE Eng. Med. Biol. Mag., 20, 33, 10.1109/51.940042 Bezdek, 1981 Bingham, 2001, Random projection in dimensionality reduction: applications to image and text data, 245 Bongard, 2009, Biologically inspired computing, Computer, 42, 95, 10.1109/MC.2009.104 Bringmann, 2010, Learning and predicting the evolution of social networks, IEEE Intell. Syst., 25, 26, 10.1109/MIS.2010.91 Jason Brooks, Review: Talend Open Studio Makes Quick etl Work of Large Data Sets, 2009. <http://www.eweek.com/c/a/Database/REVIEW-Talend-Open-Studio-Makes-Quick-ETL-Work-of-Large-Data-Sets-281473/>. Brumfiel, 2011, High-energy physics: down the petabyte highway, Nature, 282 Randal E. Bryant, Data Intensive supercomputing: The Case for Disc. Technical Report CMU-CS-07-128, 2007. Bryant, 2011, Data-intensive scalable computing for scientific applications, Comput. Sci. Eng., 13, 25, 10.1109/MCSE.2011.73 Pavel Bzoch, Jiri Safarik, State of the art in distributed file systems: Increasing performance, in: Engineering of Computer Based Systems (ECBS-EERC), 2011 2nd Eastern European Regional Conference on the, 2011, pp. 153–154. Cai, 2008, Srda: an efficient algorithm for large-scale discriminant analysis, IEEE Trans. Knowl. Data Eng., 20, 1, 10.1109/TKDE.2007.190669 Cannataro, 2004, Distributed data mining on grids: services, tools, and applications, IEEE Trans. Syst. Man Cyber. Part B: Cyber., 34, 2451, 10.1109/TSMCB.2004.836890 Cao, 2012, A parallel computing framework for large-scale air traffic flow optimization, IEEE Trans. Intell. Trans. Syst., 13, 1855, 10.1109/TITS.2012.2205145 Capriolo, 2011 Chang, 2008, Bigtable: a distributed storage system for structured data, ACM Trans. Comput. Syst., 26, 10.1145/1365815.1365816 Jagmohan Chauhan, Shaiful Alam Chowdhury, Dwight Makaroff, Performance evaluation of yahoo! s4: a first look, in: 2012 Seventh International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, 2012, pp. 58–65. Chen, 2011, A multiple-kernel fuzzy c-means algorithm for image segmentation, IEEE Trans. Syst. Man Cyber. Part B: Cyber., 41, 1263, 10.1109/TSMCB.2011.2124455 Chen, 2013, Distributed modeling in a mapreduce framework for data-driven traffic flow forecasting, IEEE Trans. Intell. Trans. Syst., 14, 22, 10.1109/TITS.2012.2205144 Ciaccio, 2012 Cireşan, 2012, Multi-column deep neural networks for image classification, IEEE Conf. Comput. Vision Pattern Recognit. Deam, 2008, Mapreduce: simplified data processing on large clusters, Commun. ACM, 51, 107, 10.1145/1327452.1327492 Valle, 2008, Particle swarm optimization: basic concepts, variants and applications in power systems, IEEE Trans. Evol. Comput., 12, 171, 10.1109/TEVC.2007.896686 Fey, 2008, Big data: the future of biocuration, Nature, 455, 47, 10.1038/455047a Rui Máximo Esteves, Chunming Rong, Using mahout for clustering wikipedia’s latest articles: a comparison between k-means and fuzzy c-means in the cloud, in: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), 2011, pp. 565–569. Rui Máximo Esteves, Chunming Rong, Rui Pais, K-means clustering in the cloud – a mahout test, in: 2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications (WAINA), 2011, pp. 514–519. Ian Foster, Yong Zhao, Ioan Raicu, Shiyong Lu, Cloud computing and grid computing 360-degree compared, in: Grid Computing Environments Workshop, 2008, GCE’08, 2008, pp. 1–10. Fujimoto, 1992, Massively parallel architectures for large scale neural network simulations, IEEE Trans. Neural Networks, 3, 876, 10.1109/72.165590 Furht, 2011 Garber, 2012, Using in-memory analytics to quickly crunch big data, IEEE Comput. Soc., 45, 16, 10.1109/MC.2012.358 A.O. García, S. Bourov, A. Hammad, V. Hartmann, T. Jejkal, J.C. Otte, S. Pfeiffer, T. Schenker, C. Schmidt, P. Neuberger, R. Stotzka, J. van Wezel, B. Neumair, A. Streit, Data-intensive analysis for scientific experiments at the large scale data facility, in: 2011 IEEE Symposium on Large Data Analysis and Visualization (LDAV), 2011, pp. 125–126. Geng, 2012, Parallel lasso for large-scale video concept detection, IEEE Trans. Multimedia, 14, 55, 10.1109/TMM.2011.2174781 Dan Gillick, Arlo Faria, John DeNero, Mapreduce: Distributed Computing for Machine Learning, 2006. Gokhale, 2008, Hardware technologies for high-performance data-intensive computing, Computer, 41, 60, 10.1109/MC.2008.125 Guan, 2012, Online nonnegative matrix factorization with robust stochastic approximation, IEEE Trans. Neural Networks Learning Syst., 23, 1087, 10.1109/TNNLS.2012.2197827 Gulisano, 2012, Streamcloud: an elastic and scalable data streaming system, IEEE Trans. Parallel Distrib. Syst., 23, 2351, 10.1109/TPDS.2012.24 Apache Hadoop, Words Count Example, 2012. <http://developer.yahoo.com/hadoop/tutorial/module4.html#wordcount>. Jiawei Han, Micheline Kamber, Data Mining: Concepts and Techniques, Diane Cerra, second ed., 2000. Jing Han, Haihong E, Guan Le, Jian Du, Survey on nosql database, in: 2011 6th International Conference on Pervasive Computing and Applications (ICPCA), 2011, pp. 363–366. Han, 2012, Efficient skyline computation on big data, IEEE Trans. Knowl. Data Eng., PP, 1 Hassan, 1987, An incremental approach for the solution of quadratic problems, Math. Modell., 8, 34, 10.1016/0270-0255(87)90536-7 Hastie, 2009 Heer, 2008, Graphical histories for visualization: supporting analysis, communication, and evaluation, IEEE Trans. Visual. Comput. Graph., 14, 1189, 10.1109/TVCG.2008.137 Hey, 2009, The fourth paradigm: data-intensive scientific discovery, Microsoft Research Hey, 2002, The uk e-science core programme and the grid, Future Gener. Comput. Syst., 18, 1017, 10.1016/S0167-739X(02)00082-1 Hilbert, 2011, The world’s technological capacity to store, communicate, and compute information, Science, 332, 60, 10.1126/science.1200970 Hinton, 2007, Learning multiple layers of representation, Trends Cogn. Sci., 11, 428, 10.1016/j.tics.2007.09.004 Hinton, 2006, A fast learning algorithm for deep belief nets, Neural Comput., 18, 1527, 10.1162/neco.2006.18.7.1527 Hinton, 2006, Reducing the dimensionality of data with neural networks, Science, 313, 504, 10.1126/science.1127647 Hirota, 1999, Fuzzy computing for data mining, Proc. IEEE, 87, 575, 10.1109/5.784240 Hsiao, 2008, An incremental cluster-based approach to spam filtering, Expert Syst. Appl., 34, 1599, 10.1016/j.eswa.2007.01.018 Hutchinson, 2012, Solid-state revolution: in-depth on how ssds really work, Ars Technica Ingersoll, 2009, Introducing apache mahout: scalable, commercial-friendly machine learning for building intelligent applications, IBM Corporation Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly, Dryad: distributed data-parallel programs from sequential building blocks, in: EuroSys ’07 Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, vol. 41(3), 2007, pp. 59–72. Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly, Dryad: Distributed data-parallel programs from sequential building blocks, in: Proceedings of the 2007 Eurosys Conference, 2007. Porfirio Ishii, 2011, An adaptive and historical approach to optimize data access in grid computing environments, INFOCOMP J. Comput. Sci., 10, 26 Porfirio Ishii, 2012, An online data access prediction and optimization approach for distributed systems, IEEE Trans. Parallel Distrib. Syst., 23, 1017, 10.1109/TPDS.2011.256 R.P. Ishii, R.F. de Mello, A history-based heuristic to optimize data access in distributed environments, in: Proc. 21st IASTED International Conf. Parallel and Distributed Computing and Systems, 2009. Jacob, 2005 Jacobs, 2009, The pathologies of big data, Commun. ACM, 52, 36, 10.1145/1536616.1536632 Mohsen Jamali, Hassan Abolhassani, Different aspects of social network analysis, in: IEEE/WIC/ACM International Conference on Web Intelligence, 2006, WI 2006, 2006, pp. 66–72. Jeon, 2006, Rough sets attributes reduction based expert system in interlaced video sequences, IEEE Trans. Consum. Electr., 52, 1348, 10.1109/TCE.2006.273155 Jiang, 2011, Map-join-reduce: toward scalable and efficient data analysis on large clusters, IEEE Trans. Knowl. Data Eng., 23, 1299, 10.1109/TKDE.2010.248 Wei Jiang, Eric Zavesky, Shih-Fu Chang, Alex Loui, Cross-domain learning methods for high-level visual concept classification, in: 15th IEEE International Conference on Image Processing, 2008, ICIP 2008, 2008, pp. 161–164. M. Tim Jones, Process Real-Time Big Data with Twitter Storm, 2012. <http://www.ibm.com/developerworks/opensource/library/os-twitterstorm/index.html?ca=drs->. Kasavajhala, 2012, Solid state drive vs. hard disk drive price and performance study, Dell PowerVault Tech. Mark. Keim, 2004, Visual data mining in large geospatial point sets, IEEE Comput. Graph. Appl., 24, 36, 10.1109/MCG.2004.41 Jeff Kelly, Apache drill brings sql-like, ad hoc query capabilities to big data, February 2013. <http://wikibon.org/wiki/v/Apache-Drill-Brings-SQL-Like-Ad-Hoc-Query-Capabilities-to-Big-Data>. Kim, 2009, Parallel clustering algorithms: survey Klemens, 2008 Kouzes, 2009, The changing paradigm of data-intensive computing, Computer, 42, 26, 10.1109/MC.2009.26 Stephan Kraft, Giuliano Casale, Alin Jula, Peter Kilpatrick, Des Greer, Wiq: work-intensive query scheduling for in-memory database systems, in: 2012 IEEE 5th International Conference on Cloud Computing (CLOUD), 2012, pp. 33–40. K.P. Lakshmi, C.R.K. Reddy, A survey on different trends in data streams, in: 2010 International Conference on Networking and Information Technology (ICNIT), 2010, pp. 451–455. Lane, 2011, Exploiting social networks for large-scale human behavior modeling, IEEE Pervasive Comput., 10, 45, 10.1109/MPRV.2011.70 Laney, 2001, 3d Data managment: controlling data volume, velocity and variety, Appl. Delivery Strategies Meta Group Quoc V. Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeff Dean, Andrew Y. Ng, Building high-level features using large scale unsupervised learning, in: Proceedings of the 29th International Conference on Machine Learning, 2012. Lee, 2007 Leong, 2009, A new revolution in enterprise storage architecture, IEEE Potentials, 28, 32, 10.1109/MPOT.2009.934894 Lesk, 2008 Hui Li, Geoffrey Fox, Judy Qiu, Performance model for parallel matrix multiplication with dryad: dataflow graph runtime, in: 2012 Second International Conference on Cloud and Green Computing, 2012, pp. 675–683. Li, 2008, Cooperatively coevolving particle swarms for large scale optimization, IEEE Trans. Evol. Comput., 16, 210 Zhong Liang, ChiTian He, Zhang Xin, Feature based visualization algorithm for large-scale flow data, in: Second International Conference on Computer Modeling and Simulation, 2010, ICCMS ’10, vol. 1, 2010, pp. 194–197. Lin, 2012, Social network analysis in enterprise, Proc. IEEE, 100, 2759, 10.1109/JPROC.2012.2203090 Liu, 2011, Adaptive neural output feedback tracking control for a class of uncertain discrete-time nonlinear systems, IEEE Trans. Neural Networks, 22, 1162, 10.1109/TNN.2011.2146788 Liu, 2011, Textual query of personal photos facilitated by large-scale web data, IEEE Trans. Pattern Anal. Mach. Intell., 33, 1022, 10.1109/TPAMI.2010.142 Lloyd, 1982, Least squares quantization in pcm, IEEE Trans. Inf. Theory, 28, 129, 10.1109/TIT.1982.1056489 Loughran, 2012, Dynamic cloud deployment of a mapreduce architecture, IEEE Internet Comput., 16, 40, 10.1109/MIC.2011.163 Lu, 2011, A survey of multilinear subspace learning for tensor data, Pattern Recogn., 44, 1540, 10.1016/j.patcog.2011.01.004 Lynch, 2008, Big data: how do your data grow?, Nature, 455, 28, 10.1038/455028a Ma, 2012, Mining web graphs for recommendations, IEEE Trans. Knowl. Data Eng., 24, 1051, 10.1109/TKDE.2011.18 Ma, 2004, Massively parallel software rendering for visualizing large-scale data sets, IEEE Comput. Graph. Appl., 24, 36 Mansour, 1997, Large scale dynamic security screening and ranking using neural networks, IEEE Trans. Power Syst., 12, 954, 10.1109/59.589789 Manyika, 2012 Mao, 2010, S4: Small state and small stretch compact routing protocol for large static wireless networks, IEEE/ACM Transactions on Networking, 18, 761, 10.1109/TNET.2010.2046645 Marz, 2012, Big data: principles and best practices of scalable realtime data systems, Manning McDermott, 2009 Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo Vassilakis, Dremel: interactive analysis of webscale datasets, in: Proc. of the 36th Int’l Conf. on Very Large Data Bases (2010), vol. 3(1), 2010, pp. 330–339. Tomáš Mikolov, Anoop Deoras, Daniel Povey, Lukáš Burget, Jan Černocký, Strategies for training large scale neural network language models, in: IEEE Workshop on Automatic Speech Recognition and Understanding, 2011. Mistry, 2012, Introducing microsoft SQL server 2012, Microsoft Mitra, 2004, A probabilistic active support vector learning algorithm, IEEE Trans. Pattern Anal. Mach. Intell., 26, 603, 10.1109/TPAMI.2004.1262340 Molchanov, 2005 Christian Molinari, No One Size Fits all Strategy for Big Data, Says ibm, October 2012. <http://www.bnamericas.com/news/technology/no-one-size-fits-all-strategy-for-big-data-says-ibm>. Mühleisen, 2012, Large-scale storage and reasoning for semantic data using swarms, IEEE Comput. Intell. Mag., 7, 32, 10.1109/MCI.2012.2188586 Tadashi Nakano, Biological computing based on living cells and cell communication, in: 2010 13th International Conference on Network-Based Information Systems (NBiS), 2010, pp. 42–47. Nandi, 2012, Data cube materialization and mining over mapreduce, IEEE Trans. Knowl. Data Eng., 24, 1747, 10.1109/TKDE.2011.257 Henry, 2007, Nodetrix: a hybrid visualization of social network, IEEE Trans. Visual. Comput. Graph., 13, 1302, 10.1109/TVCG.2007.70582 Leonardo Neumeyer, Bruce Robbins, Anish Nair, Anand Kesari, S4: distributed stream computing platform, in: 2010 IEEE Data Mining Workshops (ICDMW), Sydney, Australia, 2010, pp. 170–177. Nielsen, 2009 Oehmen, 2006, Scalablast: a scalable implementation of blast for high-performance data-intensive bioinformatics analysis, IEEE Trans. Parallel Distrib. Syst., 17, 740, 10.1109/TPDS.2006.112 Oh, 2010, Large-scale pattern storage and retrieval using generalized brain-state-in-a-box neural networks, IEEE Trans. Neural Networks, 21, 633, 10.1109/TNN.2010.2040291 Oleg, 2011, A segmentation-based algorithm for large-scale partially ordered monotonic regression, Comput. Stat. Data Anal., 55, 2463, 10.1016/j.csda.2011.03.001 Simone Ferlin Oliveira, Karl Fürlinger, Dieter Kranzlmüller, Trends in computation, communication and storage and the consequences for data-intensive science, in: IEEE 14th International Conference on High Performance Computing and Communications, 2012. Alina Oprea, Michael K. Reiter, Ke Yang, Space efficient block storage integrity, in: Proc. 12th Ann. Network and Distributed System Security Symp. (NDSS 05), 2005. Oracle, Oracle information architecture: an architect’s guide to big data, An Oracle White Paper in Enterprise Architecture, 2012. Özsu, 2011 Ahrens, 2011, Data-intensive science in the us doe: case studies and future challenges, Comput. Sci. Eng., 13, 14, 10.1109/MCSE.2011.77 Palit, 2012, Scalable and parallel boosting with mapreduce, IEEE Trans. Knowl. Data Eng., 20, 1904, 10.1109/TKDE.2011.208 Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, Michael Stonebraker, A comparison of approaches to large-scale data analysis, in: SIGMOD ’09 Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, 2009, pp. 165–178. Pearson, 2007, Implementing spiking neural networks for real-time signal-processing and control applications: a model-validated fpga approach, IEEE Trans. Neural Networks, 18, 1472, 10.1109/TNN.2007.891203 Philippe Pébay, David Thompson, Janine Bennett, Ajith Mascarenhas, Design and performance of a scalable, parallel statistics toolkit, in: 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011, pp. 1475–1484. Pedrycz, 2008 Peters, 2011, Granular box regression, IEEE Trans. Fuzzy Syst., 19, 1141, 10.1109/TFUZZ.2011.2162416 Pirovano, 2003, Scaling analysis of phase-change memory technology, IEEE Int. Electron Dev. Meeting, 29.6.1 Plugge, 2010 Brian Proffitt, Big Data Tools and Vendors, 2012. <http://www.itworld.com/big-datahadoop/251912/big-data-tools-and-vendors?page=0,0>. Radovanović, 2010, Hubs in space: popular nearest neighbors in high-dimensional data, J. Mach. Learn. Res., 11, 2487 William Yurcik Larry Brumbaugh Ragib Hasan, Zahid Anwar, Roy H. Campbell, A survey of peer-to-peer storage techniques for distributed file systems, in: International Conference on Information Technology: Coding and Computing, 2005, ITCC 2005, vol. 2, 2005, pp. 205–213. Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, Christos Kozyrakis, Evaluating mapreduce for multi-core and multiprocessor systems, in: IEEE 13th International Symposium on High Performance Computer Architecture, 2007, HPCA 2007, 2006, pp. 13–24. Ranka, 1991, Clustering on a hypercube multicomputer, IEEE Trans. Parallel Distrib. Syst., 2, 532, 10.1109/71.89059 Ratner, 2002 Raykar, 2008, A fast algorithm for learning a ranking function from large-scale data sets, IEEE Trans. Pattern Anal. Mach. Intell., 30, 1158, 10.1109/TPAMI.2007.70776 Sahimi, 2010, Efficient computational strategies for solving global optimization problems, Comput. Sci. Eng., 12, 74, 10.1109/MCSE.2010.85 Sakr, 2011, A survey of large scale data management approaches in cloud environments, IEEE Commun. Surv. Tutorials, 13, 311, 10.1109/SURV.2011.032211.00087 Ted Samson, Splunk Storm Brings Log Management to the Cloud, 2012. <http://www.infoworld.com/t/managed-services/splunk-storm-brings-log-management-the-cloud-201098?source=footer>. Diana Samuels, Skytree: Machine Learning Meets Big Data, February 2012. <http://www.bizjournals.com/sanjose/blog/2012/02/skytree-machine-learning-meets-big-data.html?page=all>. Eric Savitz, Gartner: 10 Critical Tech Trends for the Next Five Years, October 2012. <http://www.forbes.com/sites/ericsavitz/2012/10/22/gartner-10-critical-tech-trends-for-the-next-five-years/>. Eric Savitz, Gartner: Top 10 Strategic Technology Trends for 2013, October 2012. <http://www.forbes.com/sites/ericsavitz/2012/10/23/gartner-top-10-strategic-technology-trends-for-2013/>. Schadt, 2010, Computational solutions to large-scale data management and analysis, Nat. Rev. Genet., 11, 647, 10.1038/nrg2857 Seenumani, 2012, Real-time power management of integrated power systems in all electric ships leveraging multi time scale property, IEEE Trans. Control Syst. Technol., 20, 232 Udo Seiffert, Training of large-scale feed-forward neural networks, in: International Joint Conference on Neural Networks, IJCNN ’06, 2006, pp. 5324–5329. Shen, 2011, A distributed spatial-temporal similarity data storage scheme in wireless sensor networks, IEEE Trans. Mobile Comput., 10, 982, 10.1109/TMC.2010.214 Shen, 2003, A high-performance application data environment for large-scale scientific computations, IEEE Trans. Parallel Distrib. Syst., 14, 1262, 10.1109/TPDS.2003.1255638 Shen, 2006, Visual analysis of large heterogeneous social networks by semantic and structural abstraction, IEEE Trans. Visual. Comput. Graph., 12, 1427, 10.1109/TVCG.2006.107 Weiya Shi, Yue-Fei Guo, Cheng Jin, Xiangyang Xue, An improved generalized discriminant analysis for large-scale data set, in: Seventh International Conference on Machine Learning and Applications, 2008, 2008, pp. 769–772. Katsunari Shibata, Yusuke Ikeda, Effect of number of hidden neurons on learning in large-scale layered neural networks, in: ICROS-SICE International Joint Conference 2009, 2009, pp. 5008–5013. Andrew Horne Shvetank Shah, Jaime Capellá, Good Data won’t Guarantee Good Decisions, 2012. <http://hbr.org/2012/04/good-data-wont-guarantee-good-decisions>. Simeonidou, 2005, Dynamic optical-network architectures and technologies for existing and emerging grid services, J. Lightwave Technol., 23, 3347, 10.1109/JLT.2005.856254 Simoff, 2008 Simon, 1994, On the power of quantum computation, SIAM J. Comput., 26, 116 Sipper, 1997, A phylogenetic, ontogenetic, and epigenetic view of bio-inspired hardware systems, IEEE Trans. Evol. Comput., 1, 83, 10.1109/4235.585894 Matthew Smith, Christian Szongott, Benjamin Henne, Gabriele von Voigt, Big data privacy issues in public social media, in: 2012 6th IEEE International Conference on Digital Ecosystems Technologies (DEST), 2012, pp. 1–6. Spiliopoulou, 1996, Parallel optimization of large join queries with set operators and aggregates in a parallel environment supporting pipeline, IEEE Trans. Knowl. Data Eng., 8, 429, 10.1109/69.506710 Sridhar, 2013, A comparative study on how big data is scaling business intelligence and analytics, Int. J. Enhanced Res. Sci. Technol. Eng., 2, 87 Stonebraker, 2005, The 8 requirements of real-time stream processing, SIGMOD Rec., 34, 42, 10.1145/1107499.1107504 Su, 2011, Radial basis function networks with linear interval regression weights for symbolic interval data, IEEE Trans. Syst. Man Cyber.–Part B: Cyber., 19, 1141 Sun, 2010, Sparse approximation through boosting for learning large scale kernel machines, IEEE Trans. Neural Networks, 21, 883, 10.1109/TNN.2010.2044244 Szalay, 2006, Science in an exponential world, Nature, 440, 23, 10.1038/440413a Szalay, 2011, Extreme data-intensive scientific computing, Comput. Sci. Eng., 13, 34, 10.1109/MCSE.2011.74 Tang, 2009, Selective negative correlation learning approach to incremental learning, Neurocomputing, 72, 2796, 10.1016/j.neucom.2008.09.022 David Taniar, High performance database processing, in: 2012 IEEE 26th International Conference on Advanced Information Networking and Applications (AINA), 2012, pp. 5–6. David Thompson, Joshua A. Levine, Janine C. Bennett, Peer-Timo Bremer, Attila Gyulassy, Valerio Pascucci, Philippe P. Pébay, Analysis of large-scale scalar data using hixels, in: 2011 IEEE Symposium on Large Data Analysis and Visualization (LDAV), 2011, pp. 23–30. Tkacz, 2009 Vettiger, 2002, The millipede – nanotechnology entering data storage, IEEE Trans. Nanotechnol., 1, 39, 10.1109/TNANO.2002.1005425 Senthilkumar Vijayakumar, Anjani Bhargavi, Uma Praseeda, Syed Azar Ahamed, Optimizing sequence alignment in cloud using hadoop and mpp database, in: 2012 IEEE 5th International Conference on Cloud Computing (CLOUD), 2012, pp. 819–827. Mladen A. Vouk, Cloud computing – issues, research and implementations, in: 30th International Conference on Information Technology Interfaces, 2008, ITI 2008, 2008, pp. 31–40. Wang, 2007, Social computing: from social informatics to social intelligence, IEEE Intell. Syst., 22, 79, 10.1109/MIS.2007.41 Lijuan Wang, Jun Shen, Towards bio-inspired cost minimisation for data-intensive service provision, in: 2012 IEEE First International Conference on Services Economics (SE), 2012, pp. 16–23. Qian Wang, Kui Ren, Wenjing Lou, Yanchao Zhang, Dependable and secure sensor data storage with dynamic integrity assurance, in: Proc. IEEE INFOCOM, 2009, pp. 954–962. Wang, 2011, Enabling public auditability and data dynamics for storage security in cloud computing, IEEE Trans. Parallel Distrib. Syst., 22, 847, 10.1109/TPDS.2010.183 Peter Wayner. 7 Top Tools for Taming Big Data, 2012. <http://www.networkworld.com/reviews/2012/041812-7-top-tools-for-taming-258398.html>. Weiss, 2003, Genetic circuit building blocks for cellular computation, communications, and signal processing, Natural Comput., 2, 47, 10.1023/A:1023307812034 Wilkinson, 2008, The future of statistical computing, Technometrics, 50, 418, 10.1198/004017008000000460 Worlton, 1971, Bulk storage requirements in large-scale scientific calculations, IEEE Trans. Magn., 7, 830, 10.1109/TMAG.1971.1067246 Wu, 2012, Visualizing flow of uncertainty through analytical processes, IEEE Trans. Visual. Comput. Graph., 18, 2526, 10.1109/TVCG.2012.285 Dong, 2005, Fast svm training algorithm with decomposition on very large data sets, IEEE Trans. Pattern Anal. Mach. Intell., 27, 603, 10.1109/TPAMI.2005.77 Yan, 2011, Trace-oriented feature analysis for large-scale text data dimension reduction, IEEE Trans. Knowl. Data Eng., 23, 1103, 10.1109/TKDE.2010.34 Yang, 2008, Large scale evolutionary optimization using cooperative coevolution, Inf. Sci., 178, 2985, 10.1016/j.ins.2008.02.017 Yao, 2012, Concurrent subspace width optimization method for rbf neural network modeling, IEEE Trans. Neural Networks Learn. Syst., 23, 247, 10.1109/TNNLS.2011.2178560 Yu, 2011, Deep learning and its applications to signal and information processing, IEEE Signal Process. Mag., 28, 145, 10.1109/MSP.2010.939038 Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu úlfar Erlingsson, Pradeep Kumar Gunda, Jon Currey, Dryadlinq: a system for general-purpose distributed data-parallel computing using a high-level language, in: 8th USENIX Symposium on Operating Systems Design and Implementation, 2008. Jiawei Yuan, Shucheng Yu, Privacy Preserving Back-Propagation Neural Network Learning Made Practical with Cloud Computing, 2013. Zhang, 2011, Data-driven intelligent transportation systems: a survey, IEEE Trans. Intell. Trans. Syst., 12, 1624, 10.1109/TITS.2011.2158001 Zhang, 2012, Information production and link formation in social computing systems, IEEE J. Sel. Areas Commun., 30, 2136, 10.1109/JSAC.2012.121206 Zhou, 2013, A collaborative fuzzy clustering algorithm in distributed network environments, IEEE Trans. Fuzzy Syst., PP, 1 Zhou, 2012, Neural-network-based decentralized adaptive output-feedback control for large-scale stochastic nonlinear systems, IEEE Trans. Syst. Man Cyber Part B: Cyber, 46, 1608, 10.1109/TSMCB.2012.2196432 Zikopoulos, 2011