Social big data: Recent achievements and new challenges
Tóm tắt
Từ khóa
Tài liệu tham khảo
IBM, Big Data and Analytics, 2015. URL http://www-01.ibm.com/software/data/bigdata/what-is-big-data.html
Infographic, The Data Explosion in 2014 Minute by Minute, 2015. URL http://aci.info/2014/07/12/the-data-explosion-in-2014-minute-by-minute-infographic
Cuzzocrea, 2011, Analytics over large-scale multidimensional data: the big data revolution!, 101
Laney, 2001, 3D Data Management: Controlling Data Volume, Velocity, and Variety
M.A. Beyer, D. Laney, The Importance of ‘Big Data’: A Definition, Gartner, Stamford, CT (2012).
Hashema, 2015, The rise of big data on cloud computing: review and open research issues, Inf. Syst., 47, 98, 10.1016/j.is.2014.07.006
Grossman, 2010, An overview of the open science data cloud, 377
Khan, 2014, Big data: survey, technologies, opportunities, and challenges, The Sci. World J., 2014, 1
Couldry, 2012
Correa, 2010, Who interacts on the web?: the intersection of users’ personality and social media use, Comput. Hum. Behav., 26, 247, 10.1016/j.chb.2009.09.003
Kaplan, 2010, Users of the world, unite! the challenges and opportunities of social media, Bus. Horizons, 53, 59, 10.1016/j.bushor.2009.09.003
Tess, 2013, The role of social media in higher education classes (real and virtual)–a literature review, Comput. Hum. Behav., 29, A60, 10.1016/j.chb.2012.12.032
Salathé, 2013, The dynamics of health behavior sentiments on a large online social network, EPJ Data Sci., 2, 1, 10.1140/epjds16
Manovich, 2011, Trending: the promises and the challenges of big social data, Debates Digit. Hum., 460
Kaisler, 2013, Big data: Issues and challenges moving forward, 995
Chen, 2012, Business intelligence and analytics: from big data to big impact, MIS Q., 36, 1165, 10.2307/41703503
White, 2009
Zaharia, 2010, Spark: Cluster computing with working sets, 10
Owen, 2011
X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen, et al., MLlib: machine learning in apache spark, 2015, pp. 1–7, arXiv:1505.06807.
Kraska, 2013, Mlbase: a distributed machine-learning system
Sparks, 2013, MLI: an API for distributed machine learning, 1187
Dean, 2004, Mapreduce: simplified data processing on large clusters
Dean, 2008, Mapreduce: simplified data processing on large clusters, Commun. ACM, 51, 107, 10.1145/1327452.1327492
Shim, 2012, Mapreduce algorithms for big data analysis, Proc. VLDB Endow., 5, 2016, 10.14778/2367502.2367563
Zaharia, 2008, Improving mapreduce performance in heterogeneous environments, 29
Xin, 2013, Shark: Sql and rich analytics at scale, 13
A. Mostosi, Useful stuff, 2015. http://blog.andreamostosi.name/big-data/
A. Mostosi, The big-data ecosystem table, 2015. URL http://bigdata.andreamostosi.name/
Emerick, 2011
Burrows, 2006, The chubby lock service for loosely-coupled distributed systems, 335
Alexandrov, 2014, The stratosphere platform for big data analytics, VLDB J., 23, 939, 10.1007/s00778-014-0357-y
Ghemawat, 2003, The google file system, 29
Chang, 2006, Bigtable: a distributed storage system for structured data, 15
Malewicz, 2010, Pregel: a system for large-scale graph processing, 135
Chodorow, 2013
Crawley, 2007
S. Bennett, Twitter now seeing 400 million tweets per day, increased mobile ad revenue, says ceo, 2012. URL http://www.adweek.com/socialtimes/twitter-400-million-tweets
Ott, 2001, 511
Elser, 2013, An evaluation study of bigdata frameworks for graph processing, 60
Seo, 2010, Hama: an efficient matrix computation with the mapreduce framework, 721
Clauset, 2005, Finding local community structure in networks, Phys. Rev. E, 72, 026132, 10.1103/PhysRevE.72.026132
Kannan, 2000, On clusterings-good, bad and spectral, 367
Bomze, 1999, The maximum clique problem, 1
Girvan, 2002, Community structure in social and biological networks, Proc. Natl. Acad. Sci., 99, 7821, 10.1073/pnas.122653799
Newman, 2004, Fast algorithm for detecting community structure in networks, Phys. Rev. E, 69, 066133+, 10.1103/PhysRevE.69.066133
Clauset, 2004, Finding community structure in very large networks, Phys. Rev. E, 70, 066111, 10.1103/PhysRevE.70.066111
Newman, 2006, Modularity and community structure in networks, Proc. Natl. Acad. Sci., 103, 8577, 10.1073/pnas.0601602103
Richardson, 2009, Spectral tri partitioning of networks, Phys. Rev. E, 80, 036111, 10.1103/PhysRevE.80.036111
Wang, 2008, A vector partitioning approach to detecting community structure in complex networks, Comput. Math. Appl., 55, 2746, 10.1016/j.camwa.2007.10.028
Zhou, 2004, Network brownian motion: a new method to measure vertex-vertex proximity and to identify communities and subcommunities, 1062
Dong, 2006, A hierarchical clustering algorithm based on fuzzy graph connectedness, Fuzzy Sets Syst., 157, 1760, 10.1016/j.fss.2006.01.001
Bello-Orgaz, 2012, Adaptive k-means algorithm for overlapped graph clustering, Int. J. Neural Syst., 22, 1250018, 10.1142/S0129065712500189
Xie, 2013, Overlapping community detection in networks: the state-of-the-art and comparative study, ACM Comput. Surv. (CSUR), 45, 43, 10.1145/2501654.2501657
Zamir, 1998, Web document clustering: a feasibility demonstration, 46
1992
Manning, 2008
Hu, 2012, Text analytics in social media, 385
Wold, 1987, Principal component analysis, Chemom. Intell. Lab. Syst., 2, 37, 10.1016/0169-7439(87)80084-9
Deerwester, 1990, Indexing by latent semantic analysis, JAsIs, 41, 391, 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
Blei, 2003, Latent Dirichlet allocation, J. Mach. Learn. Res., 3, 993
Yao, 2009, Efficient methods for topic model inference on streaming document collections, 937
Larsen, 1999, Fast and effective text mining using linear-time document clustering, 16
Zhao, 2002, Evaluation of hierarchical clustering algorithms for document datasets, 515
Zhao, 2004, Empirical and theoretical comparisons of selected criterion functions for document clustering, Mach. Learn., 55, 311, 10.1023/B:MACH.0000027785.44527.d6
Sebastiani, 2002, Machine learning in automated text categorization, ACM Comput. Surv. (CSUR), 34, 1, 10.1145/505282.505283
Pang, 2002, Thumbs up?: sentiment classification using machine learning techniques, 10, 79
Aggarwal, 2007, 31
Zhong, 2005, Efficient online spherical k-means clustering, 5, 3180
Chen, 2010, Scalable influence maximization for prevalent viral marketing in large-scale social networks, 1029
Nguyen, 2015, Real-time event detection on social data stream, Mobile Netw. Appl., 20, 475, 10.1007/s11036-014-0557-0
Guille, 2013, Information diffusion in online social networks: a survey, SIGMOND Rec., 42, 17, 10.1145/2503792.2503797
Gomez-Rodriguez, 2012, Inferring networks of diffusion and influence, ACM Trans. Knowl. Discov. Data, 5, 21, 10.1145/2086737.2086741
Sadikov, 2011, Correcting for missing data in information cascades, 55
Anshelevich, 2015, Seeding influential nodes in non-submodular models of information diffusion, Auton. Agents Multi-Agent Syst., 29, 131, 10.1007/s10458-014-9254-4
Jiang, 2014, Graphical evolutionary game for information diffusion over social networks, IEEE J. Sel. Top. Signal Process., 8, 524, 10.1109/JSTSP.2014.2313024
Jiang, 2014, Evolutionary dynamics of information diffusion over social networks, IEEE Trans. Signal Process., 62, 4573, 10.1109/TSP.2014.2339799
Fu, 2011, A review on time series data mining, Eng. Appl. Artif. Intell., 24, 164, 10.1016/j.engappai.2010.09.007
Lin, 2003, A symbolic representation of time series, with implications for streaming algorithms, 2
Cataldi, 2010, Emerging topic detection on twitter based on temporal and social terms evaluation, 1
Nguyen, 2014, Privacy-preserving discovery of topic-based events from social sensor signals: an experimental study on twitter, Sci. World J., 2014, 1
Jung, 2010, Integrating social networks for context fusion in mobile service platforms, J. Univers. Comput. Sci., 16, 2099
Hoang, 2014, Semantic information integration with linked data mashups approaches, Int. J. Distrib. Sens. Networks, 2014, 1, 10.1155/2014/813875
Long, 2015, Privacy-aware framework for matching online social identities in multiple social networking services, Cybern. Syst., 46, 69, 10.1080/01969722.2015.1007737
Caton, 2014, A social compute cloud: allocating and sharing infrastructure resources via social networks, IEEE Trans. Serv. Comput., 7, 359, 10.1109/TSC.2014.2303091
Davenport, 2007
Maurer, 2011
Trattner, 2013, Social stream marketing on Facebook: a case study, Int. J. Soc. Humanist. Comput., 2, 86, 10.1504/IJSHC.2013.053268
Jansen, 2009, Twitter power: tweets as electronic word of mouth, J. Am. Soc. Inf. Sci. Tech., 60, 2169, 10.1002/asi.21149
Asur, 2010, Predicting the future with social media, 1, 492
Ma, 2008, Mining social networks using heat diffusion processes for marketing candidates selection, 233
Wortley, 2013
Knutsson, 2012, Opportunities for improving egovernment: using language technology in workflow management, 495
Ku, 2014, A decision support system: automated crime report analysis and classification for e-government, Gov. Inf. Q., 31, 534, 10.1016/j.giq.2014.08.003
Phillips, 2012, Mining co-distribution patterns for large crime datasets, Expert Syst. Appl., 39, 11556, 10.1016/j.eswa.2012.03.071
Chainey, 2008, The utility of hotspot mapping for predicting spatial patterns of crime, Secur. J., 21, 4, 10.1057/palgrave.sj.8350066
Gerber, 2014, Predicting crime using twitter and Kernel density estimation, Decis. Support Syst., 61, 115, 10.1016/j.dss.2014.02.003
Kirkos, 2007, Data mining techniques for the detection of fraudulent financial statements, Expert Syst. Appl., 32, 995, 10.1016/j.eswa.2006.02.016
Quah, 2008, Real-time credit card fraud detection using computational intelligence, Expert Syst. Appl., 35, 1721, 10.1016/j.eswa.2007.08.093
Li, 2012, Identifying the signs of fraudulent accounts using data mining techniques, Comput. Hum. Behav., 28, 1002, 10.1016/j.chb.2012.01.002
Paquet, 2005, Epidemic intelligence: a new framework for strengthening disease surveillance in europe., Euro surveillance: bulletin europeen sur les maladies transmissibles European communicable disease bulletin, 11, 212
Cohen, 2005, A survey of current work in biomedical text mining, Brief. Bioinform., 6, 57, 10.1093/bib/6.1.57
Lampos, 2012, Nowcasting events from the social web with statistical learning, ACM Trans. Intell. Syst. Technol. (TIST), 3, 72
Collier, 2010, An ontology-driven system for detecting global health events, 215
Culotta, 2010, Towards detecting influenza epidemics by analyzing twitter messages, 115
Aramaki, 2011, Twitter catches the flu: detecting influenza epidemics using twitter, 1568
Bodnar, 2013, Validating models for disease detection using twitter, 699
Fisichella, 2011, Detecting health events on the social web to enable epidemic intelligence, 87
Hartley, 2010, The landscape of international event-based biosurveillance., Emerg. Health Threat., 3
Mykhalovskiy, 2006, The global public health intelligence network and early warning outbreak detection, Can. J. Public Health, 97, 42, 10.1007/BF03405213
Collier, 2008, Biocaster: detecting public health rumors with a web-based text mining system, Bioinformatics, 24, 2940, 10.1093/bioinformatics/btn534
Brownstein, 2008, Surveillance sans frontieres: internet-based emerging infectious disease intelligence and the healthmap project, PLoS Med., 5, e151, 10.1371/journal.pmed.0050151
Keller, 2009, Use of unstructured event-based reports for global infectious disease surveillance, Emerg. Infect. Dis., 15, 689, 10.3201/eid1505.081114
Lyon, 2012, Comparison of web-based biosecurity intelligence systems: biocaster, epispider and healthmap, Transbound. Emerg. Dis., 59, 223, 10.1111/j.1865-1682.2011.01258.x
Kotval, 2013, Visualization of entities within social media: toward understanding users’ needs, Bell Labs Tech. J., 17, 77, 10.1002/bltj.21576
Miroshnikov, 2014, Parallelmcmccombine: an r package for bayesian methods for big data and analytics, PLOS One, 9, 10.1371/journal.pone.0108425
Swayne, 2003, GGobi: evolving from XGobi into an extensible framework for interactive data visualization, Comput. Stat. Data Anal., 43, 423, 10.1016/S0167-9473(02)00286-4
Ashok, 2008, A visualization framework for real time decision making in a multi-input multi-output system, IEEE Syst. J., 2, 129, 10.1109/JSYST.2008.916060
Gurrin, 2014, 8, pp.1
Viegas, 2007, Manyeyes: a site for visualization at internet scale, IEEE Trans. Vis. Comput. Gr., 13, 1121, 10.1109/TVCG.2007.70577
Hwang, 2014, Social data visualization system for understanding diffusion patterns on twitter: a case study on korean enterprises, Comput. Inform., 33, 591
Sweeney, 2002, K-anonymity: a model for protecting privacy, Int. J. Uncertain. Fuzziness Knowledge-based Syst., 10, 557, 10.1142/S0218488502001648
Dwork, 2008, Differential privacy: a survey of results, 4978, 1
Landau, 2014, Educating engineers: teaching privacy in a world of open doors, IEEE Secur. Priv., 12, 66, 10.1109/MSP.2014.43
Fiat, 1998, Online Algorithms: The State of the Art, 1442
Crammer, 2003, Ultraconservative online algorithms for multiclass problems, J. Mach. Learn. Res., 3, 951
Charikar, 2003, Better streaming algorithms for clustering problems, 30
Cheng, 2008, A survey on algorithms for mining frequent itemsets over data streams, Knowl. Inf. Syst., 16, 1, 10.1007/s10115-007-0092-4
Menéndez, 2013, A multi-objective genetic graph-based clustering algorithm with memory optimization, 3174
Zhao, 2009, Parallel k-means clustering based on mapreduce, 674
Chu, 2007, Map-reduce for machine learning on multicore, Adv. Neural Inf. Process. Syst., 19, 281
Chen, 2011, Parallel spectral clustering in distributed systems, IEEE Trans. Pattern Anal. Mach. Intell., 33, 568, 10.1109/TPAMI.2010.88
Menendez, 2014, A co-evolutionary multi-objective approach for a k-adaptive graph-based clustering algorithm, 2724
Menendez, 2015, Gany: a genetic spectral-based clustering algorithm for large data analysis, 640
Ng, 2001, On Spectral Clustering: Analysis and an algorithm, 849
Bach, 2006, Learning spectral clustering, with application to speech separation, J. Mach. Learn. Res., 7, 1963
Kumar, 2003, Dfuse: a framework for distributed data fusion, 114
Keim, 2008
Cui, 2010, Multiple feature fusion for social media applications, 435
Bakshy, 2012, The role of social networks in information diffusion, 519
Becker, 2009, Event identification in social media.
Becker, 2010, Learning similarity metrics for event identification in social media, 291
Andrienko, 2007, Visual analytics tools for analysis of movement data, ACM SIGKDD Explor. Newsl., 9, 38, 10.1145/1345448.1345455