Social big data: Recent achievements and new challenges

Information Fusion - Tập 28 - Trang 45-59 - 2016
Gema Bello-Orgaz1, Jason J. Jung2, David Camacho1
1Computer Science Department, Universidad Autónoma de Madrid, Spain
2Department of Computer Engineering, Chung-Ang University, Seoul, Republic of Korea

Tóm tắt

Từ khóa


Tài liệu tham khảo

IBM, Big Data and Analytics, 2015. URL http://www-01.ibm.com/software/data/bigdata/what-is-big-data.html

Infographic, The Data Explosion in 2014 Minute by Minute, 2015. URL http://aci.info/2014/07/12/the-data-explosion-in-2014-minute-by-minute-infographic

Wu, 2014, Data mining with big data, IEEE Trans. Knowl. Data Eng., 26, 97, 10.1109/TKDE.2013.109

Cuzzocrea, 2011, Analytics over large-scale multidimensional data: the big data revolution!, 101

Laney, 2001, 3D Data Management: Controlling Data Volume, Velocity, and Variety

M.A. Beyer, D. Laney, The Importance of ‘Big Data’: A Definition, Gartner, Stamford, CT (2012).

Hashema, 2015, The rise of big data on cloud computing: review and open research issues, Inf. Syst., 47, 98, 10.1016/j.is.2014.07.006

Grossman, 2010, An overview of the open science data cloud, 377

Khan, 2014, Big data: survey, technologies, opportunities, and challenges, The Sci. World J., 2014, 1

Couldry, 2012

Correa, 2010, Who interacts on the web?: the intersection of users’ personality and social media use, Comput. Hum. Behav., 26, 247, 10.1016/j.chb.2009.09.003

Kaplan, 2010, Users of the world, unite! the challenges and opportunities of social media, Bus. Horizons, 53, 59, 10.1016/j.bushor.2009.09.003

Tess, 2013, The role of social media in higher education classes (real and virtual)–a literature review, Comput. Hum. Behav., 29, A60, 10.1016/j.chb.2012.12.032

Salathé, 2013, The dynamics of health behavior sentiments on a large online social network, EPJ Data Sci., 2, 1, 10.1140/epjds16

Cambria, 2013, Big social data analysis, Big Data Comput., 13, 401, 10.1201/b16014-19

Manovich, 2011, Trending: the promises and the challenges of big social data, Debates Digit. Hum., 460

Kaisler, 2013, Big data: Issues and challenges moving forward, 995

Chen, 2012, Business intelligence and analytics: from big data to big impact, MIS Q., 36, 1165, 10.2307/41703503

White, 2009

Zaharia, 2010, Spark: Cluster computing with working sets, 10

Owen, 2011

X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen, et al., MLlib: machine learning in apache spark, 2015, pp. 1–7, arXiv:1505.06807.

Kraska, 2013, Mlbase: a distributed machine-learning system

Sparks, 2013, MLI: an API for distributed machine learning, 1187

Dean, 2004, Mapreduce: simplified data processing on large clusters

Dean, 2008, Mapreduce: simplified data processing on large clusters, Commun. ACM, 51, 107, 10.1145/1327452.1327492

Shim, 2012, Mapreduce algorithms for big data analysis, Proc. VLDB Endow., 5, 2016, 10.14778/2367502.2367563

Zaharia, 2008, Improving mapreduce performance in heterogeneous environments, 29

Xin, 2013, Shark: Sql and rich analytics at scale, 13

A. Mostosi, Useful stuff, 2015. http://blog.andreamostosi.name/big-data/

A. Mostosi, The big-data ecosystem table, 2015. URL http://bigdata.andreamostosi.name/

Emerick, 2011

Burrows, 2006, The chubby lock service for loosely-coupled distributed systems, 335

Alexandrov, 2014, The stratosphere platform for big data analytics, VLDB J., 23, 939, 10.1007/s00778-014-0357-y

Ghemawat, 2003, The google file system, 29

Chang, 2006, Bigtable: a distributed storage system for structured data, 15

Malewicz, 2010, Pregel: a system for large-scale graph processing, 135

Chodorow, 2013

Crawley, 2007

S. Bennett, Twitter now seeing 400 million tweets per day, increased mobile ad revenue, says ceo, 2012. URL http://www.adweek.com/socialtimes/twitter-400-million-tweets

Ott, 2001, 511

Elser, 2013, An evaluation study of bigdata frameworks for graph processing, 60

Valiant, 1990, A bridging model for parallel computation, Commun. ACM, 33, 103, 10.1145/79173.79181

Seo, 2010, Hama: an efficient matrix computation with the mapreduce framework, 721

Clauset, 2005, Finding local community structure in networks, Phys. Rev. E, 72, 026132, 10.1103/PhysRevE.72.026132

Santo, 2010, Community detection in graphs, Phys. Rep., 486, 75, 10.1016/j.physrep.2009.11.002

Kannan, 2000, On clusterings-good, bad and spectral, 367

Bomze, 1999, The maximum clique problem, 1

Girvan, 2002, Community structure in social and biological networks, Proc. Natl. Acad. Sci., 99, 7821, 10.1073/pnas.122653799

Newman, 2004, Fast algorithm for detecting community structure in networks, Phys. Rev. E, 69, 066133+, 10.1103/PhysRevE.69.066133

Clauset, 2004, Finding community structure in very large networks, Phys. Rev. E, 70, 066111, 10.1103/PhysRevE.70.066111

Newman, 2006, Modularity and community structure in networks, Proc. Natl. Acad. Sci., 103, 8577, 10.1073/pnas.0601602103

Richardson, 2009, Spectral tri partitioning of networks, Phys. Rev. E, 80, 036111, 10.1103/PhysRevE.80.036111

Wang, 2008, A vector partitioning approach to detecting community structure in complex networks, Comput. Math. Appl., 55, 2746, 10.1016/j.camwa.2007.10.028

Zhou, 2004, Network brownian motion: a new method to measure vertex-vertex proximity and to identify communities and subcommunities, 1062

Dong, 2006, A hierarchical clustering algorithm based on fuzzy graph connectedness, Fuzzy Sets Syst., 157, 1760, 10.1016/j.fss.2006.01.001

Bello-Orgaz, 2012, Adaptive k-means algorithm for overlapped graph clustering, Int. J. Neural Syst., 22, 1250018, 10.1142/S0129065712500189

Xie, 2013, Overlapping community detection in networks: the state-of-the-art and comparative study, ACM Comput. Surv. (CSUR), 45, 43, 10.1145/2501654.2501657

Zamir, 1998, Web document clustering: a feasibility demonstration, 46

1992

Manning, 2008

Hu, 2012, Text analytics in social media, 385

Wold, 1987, Principal component analysis, Chemom. Intell. Lab. Syst., 2, 37, 10.1016/0169-7439(87)80084-9

Deerwester, 1990, Indexing by latent semantic analysis, JAsIs, 41, 391, 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9

Blei, 2003, Latent Dirichlet allocation, J. Mach. Learn. Res., 3, 993

Yao, 2009, Efficient methods for topic model inference on streaming document collections, 937

Jain, 1999, Data clustering: a review, ACM Comput. Surv., 31, 264, 10.1145/331499.331504

Larsen, 1999, Fast and effective text mining using linear-time document clustering, 16

Zhao, 2002, Evaluation of hierarchical clustering algorithms for document datasets, 515

Zhao, 2004, Empirical and theoretical comparisons of selected criterion functions for document clustering, Mach. Learn., 55, 311, 10.1023/B:MACH.0000027785.44527.d6

Sebastiani, 2002, Machine learning in automated text categorization, ACM Comput. Surv. (CSUR), 34, 1, 10.1145/505282.505283

Pang, 2002, Thumbs up?: sentiment classification using machine learning techniques, 10, 79

Aggarwal, 2007, 31

Zhong, 2005, Efficient online spherical k-means clustering, 5, 3180

Chen, 2010, Scalable influence maximization for prevalent viral marketing in large-scale social networks, 1029

Nguyen, 2015, Real-time event detection on social data stream, Mobile Netw. Appl., 20, 475, 10.1007/s11036-014-0557-0

Guille, 2013, Information diffusion in online social networks: a survey, SIGMOND Rec., 42, 17, 10.1145/2503792.2503797

Gomez-Rodriguez, 2012, Inferring networks of diffusion and influence, ACM Trans. Knowl. Discov. Data, 5, 21, 10.1145/2086737.2086741

Sadikov, 2011, Correcting for missing data in information cascades, 55

Anshelevich, 2015, Seeding influential nodes in non-submodular models of information diffusion, Auton. Agents Multi-Agent Syst., 29, 131, 10.1007/s10458-014-9254-4

Jiang, 2014, Graphical evolutionary game for information diffusion over social networks, IEEE J. Sel. Top. Signal Process., 8, 524, 10.1109/JSTSP.2014.2313024

Jiang, 2014, Evolutionary dynamics of information diffusion over social networks, IEEE Trans. Signal Process., 62, 4573, 10.1109/TSP.2014.2339799

Fu, 2011, A review on time series data mining, Eng. Appl. Artif. Intell., 24, 164, 10.1016/j.engappai.2010.09.007

Lin, 2003, A symbolic representation of time series, with implications for streaming algorithms, 2

Cataldi, 2010, Emerging topic detection on twitter based on temporal and social terms evaluation, 1

Nguyen, 2014, Privacy-preserving discovery of topic-based events from social sensor signals: an experimental study on twitter, Sci. World J., 2014, 1

Jung, 2010, Integrating social networks for context fusion in mobile service platforms, J. Univers. Comput. Sci., 16, 2099

Hoang, 2014, Semantic information integration with linked data mashups approaches, Int. J. Distrib. Sens. Networks, 2014, 1, 10.1155/2014/813875

Long, 2015, Privacy-aware framework for matching online social identities in multiple social networking services, Cybern. Syst., 46, 69, 10.1080/01969722.2015.1007737

Caton, 2014, A social compute cloud: allocating and sharing infrastructure resources via social networks, IEEE Trans. Serv. Comput., 7, 359, 10.1109/TSC.2014.2303091

Davenport, 2007

Maurer, 2011

Trattner, 2013, Social stream marketing on Facebook: a case study, Int. J. Soc. Humanist. Comput., 2, 86, 10.1504/IJSHC.2013.053268

Jansen, 2009, Twitter power: tweets as electronic word of mouth, J. Am. Soc. Inf. Sci. Tech., 60, 2169, 10.1002/asi.21149

Asur, 2010, Predicting the future with social media, 1, 492

Ma, 2008, Mining social networks using heat diffusion processes for marketing candidates selection, 233

Wortley, 2013

Knutsson, 2012, Opportunities for improving egovernment: using language technology in workflow management, 495

Ku, 2014, A decision support system: automated crime report analysis and classification for e-government, Gov. Inf. Q., 31, 534, 10.1016/j.giq.2014.08.003

Phillips, 2012, Mining co-distribution patterns for large crime datasets, Expert Syst. Appl., 39, 11556, 10.1016/j.eswa.2012.03.071

Chainey, 2008, The utility of hotspot mapping for predicting spatial patterns of crime, Secur. J., 21, 4, 10.1057/palgrave.sj.8350066

Gerber, 2014, Predicting crime using twitter and Kernel density estimation, Decis. Support Syst., 61, 115, 10.1016/j.dss.2014.02.003

Kirkos, 2007, Data mining techniques for the detection of fraudulent financial statements, Expert Syst. Appl., 32, 995, 10.1016/j.eswa.2006.02.016

Quah, 2008, Real-time credit card fraud detection using computational intelligence, Expert Syst. Appl., 35, 1721, 10.1016/j.eswa.2007.08.093

Li, 2012, Identifying the signs of fraudulent accounts using data mining techniques, Comput. Hum. Behav., 28, 1002, 10.1016/j.chb.2012.01.002

Paquet, 2005, Epidemic intelligence: a new framework for strengthening disease surveillance in europe., Euro surveillance: bulletin europeen sur les maladies transmissibles European communicable disease bulletin, 11, 212

Cohen, 2005, A survey of current work in biomedical text mining, Brief. Bioinform., 6, 57, 10.1093/bib/6.1.57

Lampos, 2012, Nowcasting events from the social web with statistical learning, ACM Trans. Intell. Syst. Technol. (TIST), 3, 72

Collier, 2010, An ontology-driven system for detecting global health events, 215

Culotta, 2010, Towards detecting influenza epidemics by analyzing twitter messages, 115

Aramaki, 2011, Twitter catches the flu: detecting influenza epidemics using twitter, 1568

Bodnar, 2013, Validating models for disease detection using twitter, 699

Fisichella, 2011, Detecting health events on the social web to enable epidemic intelligence, 87

Hartley, 2010, The landscape of international event-based biosurveillance., Emerg. Health Threat., 3

Mykhalovskiy, 2006, The global public health intelligence network and early warning outbreak detection, Can. J. Public Health, 97, 42, 10.1007/BF03405213

Collier, 2008, Biocaster: detecting public health rumors with a web-based text mining system, Bioinformatics, 24, 2940, 10.1093/bioinformatics/btn534

Brownstein, 2008, Surveillance sans frontieres: internet-based emerging infectious disease intelligence and the healthmap project, PLoS Med., 5, e151, 10.1371/journal.pmed.0050151

Keller, 2009, Use of unstructured event-based reports for global infectious disease surveillance, Emerg. Infect. Dis., 15, 689, 10.3201/eid1505.081114

Lyon, 2012, Comparison of web-based biosecurity intelligence systems: biocaster, epispider and healthmap, Transbound. Emerg. Dis., 59, 223, 10.1111/j.1865-1682.2011.01258.x

Keim, 2013, Big-data visualization, IEEE Comput. Gr. Appl., 33, 20, 10.1109/MCG.2013.54

Kotval, 2013, Visualization of entities within social media: toward understanding users’ needs, Bell Labs Tech. J., 17, 77, 10.1002/bltj.21576

Miroshnikov, 2014, Parallelmcmccombine: an r package for bayesian methods for big data and analytics, PLOS One, 9, 10.1371/journal.pone.0108425

Swayne, 2003, GGobi: evolving from XGobi into an extensible framework for interactive data visualization, Comput. Stat. Data Anal., 43, 423, 10.1016/S0167-9473(02)00286-4

Ashok, 2008, A visualization framework for real time decision making in a multi-input multi-output system, IEEE Syst. J., 2, 129, 10.1109/JSYST.2008.916060

Gurrin, 2014, 8, pp.1

Blum, 2006, Insense: interest-based life logging, Multimed. IEEE, 13, 40, 10.1109/MMUL.2006.87

Viegas, 2007, Manyeyes: a site for visualization at internet scale, IEEE Trans. Vis. Comput. Gr., 13, 1121, 10.1109/TVCG.2007.70577

Hwang, 2014, Social data visualization system for understanding diffusion patterns on twitter: a case study on korean enterprises, Comput. Inform., 33, 591

Sweeney, 2002, K-anonymity: a model for protecting privacy, Int. J. Uncertain. Fuzziness Knowledge-based Syst., 10, 557, 10.1142/S0218488502001648

Dwork, 2008, Differential privacy: a survey of results, 4978, 1

Landau, 2014, Educating engineers: teaching privacy in a world of open doors, IEEE Secur. Priv., 12, 66, 10.1109/MSP.2014.43

Fiat, 1998, Online Algorithms: The State of the Art, 1442

Crammer, 2003, Ultraconservative online algorithms for multiclass problems, J. Mach. Learn. Res., 3, 951

Charikar, 2003, Better streaming algorithms for clustering problems, 30

Cheng, 2008, A survey on algorithms for mining frequent itemsets over data streams, Knowl. Inf. Syst., 16, 1, 10.1007/s10115-007-0092-4

Menéndez, 2013, A multi-objective genetic graph-based clustering algorithm with memory optimization, 3174

Zhao, 2009, Parallel k-means clustering based on mapreduce, 674

Chu, 2007, Map-reduce for machine learning on multicore, Adv. Neural Inf. Process. Syst., 19, 281

Chen, 2011, Parallel spectral clustering in distributed systems, IEEE Trans. Pattern Anal. Mach. Intell., 33, 568, 10.1109/TPAMI.2010.88

Menendez, 2014, A co-evolutionary multi-objective approach for a k-adaptive graph-based clustering algorithm, 2724

Menendez, 2015, Gany: a genetic spectral-based clustering algorithm for large data analysis, 640

Ng, 2001, On Spectral Clustering: Analysis and an algorithm, 849

Bach, 2006, Learning spectral clustering, with application to speech separation, J. Mach. Learn. Res., 7, 1963

Kumar, 2003, Dfuse: a framework for distributed data fusion, 114

Keim, 2008

Cui, 2010, Multiple feature fusion for social media applications, 435

Bakshy, 2012, The role of social networks in information diffusion, 519

Becker, 2009, Event identification in social media.

Becker, 2010, Learning similarity metrics for event identification in social media, 291

Wong, 2004, Visual analytics, IEEE Comput. Gr. Appl., 20, 10.1109/MCG.2004.39

Andrienko, 2007, Visual analytics tools for analysis of movement data, ACM SIGKDD Explor. Newsl., 9, 38, 10.1145/1345448.1345455

Andrienko, 2010, Space, time and visual analytics, Int. J. Geogr. Inf. Sci., 24, 1577, 10.1080/13658816.2010.508043