Analyzing spatiotemporal trends in social media data via smoothing spline analysis of variance

Spatial Statistics - Tập 14 - Trang 491-504 - 2015
Nathaniel E. Helwig1,2, Yizhao Gao3, Shaowen Wang3,4, Ping Ma5
1Department of Psychology, University of Minnesota, Minneapolis, MN, 55455-0366, United States
2School of Statistics, University of Minnesota, Minneapolis, MN, 55455-0493, United States
3Department of Geography and Geographic Information Science, University of Illinois, Champaign, IL, 61820-6371, United States
4National Center for Supercomputing Applications, University of Illinois, Urbana, IL, 61801-2311, United States
5Department of Statistics, University of Georgia, Athens, GA, 30602-5029, United States

Tài liệu tham khảo

Achrekar, H., Gandhe, A., Lazarus, R., Yu, S.H., Liu, B., 2011. Predicting flu trends using Twitter data, in: Computer Communications Workshops, INFOCOM WKSHPS, pp. 702–707. Akaike, 1974, A new look at the statistical model identification, IEEE Trans. Automat. Control, 19, 716, 10.1109/TAC.1974.1100705 Asur, S., Huberman, B.A., 2010. Predicting the future with social media, in: Web Intelligence and Intelligent Agent Technology, vol. 1, pp. 492–499. Becker, R.A., Wilks, A.R., Brownrigg, R., Minka, T.P., 2013. maps: Draw geographical maps. R package version 2.3-6. URL: http://CRAN.R-project.org/package=maps. Bollen, 2011, Twitter mood predicts the stock market, J. Comput. Sci., 2, 1, 10.1016/j.jocs.2010.12.007 Cheng, 2014, Event detection using Twitter: A spatio-temporal approach, PLoS One, 9, e97807, 10.1371/journal.pone.0097807 Cho, E., Myers, S.A., Leskovec, J., 2011. Friendship and mobility: user movement in location-based social networks, in: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1082–1090. Corley, C., Mikler, A.R., Singh, K.P., Cook, D.J., 2009. Monitoring influenza trends through mining social media, in: BIOCOMP, pp. 340–346. Craven, 1979, Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation, Numer. Math., 31, 377, 10.1007/BF01404567 Culotta, A., 2010a. Detecting influenza outbreaks by analyzing Twitter messages. Culotta, A., 2010b. Towards detecting influenza epidemics by analyzing Twitter messages, in: Proceedings of the First Workshop on Social Media Analytics, pp. 115–122. Fan, 2014, Challenges of big data analysis, Nat. Sci. Rev., 10.1093/nsr/nwt032 Gu, 2013 Gu, 2002, Penalized likelihood regression: general formulation and efficient approximation, Canad. J. Statist., 30, 619, 10.2307/3316100 Gu, 2005, Generalized nonparametric mixed-effect models: Computation and smoothing parameter selection, J. Comput. Graph. Statist., 14, 485, 10.1198/106186005X47651 Gu, 2005, Optimal smoothing in nonparametric mixed-effect models, Ann. Statist., 33, 1357, 10.1214/009053605000000110 Gu, 1991, Minimizing GCV/GML scores with multiple smoothing parameters via the Newton method, SIAM J. Sci. Stat. Comput., 12, 383, 10.1137/0912021 Gu, 1993, Smoothing spline ANOVA with component-wise Bayesian “confidence intervals”, J. Comput. Graph. Statist., 2, 97 Helwig, 2013 Helwig, N.E., 2015. bigsplines: Smoothing Splines for Large Samples. R package version 1.0-6. URL: http://CRAN.R-project.org/package=bigsplines. Helwig, 2015, Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples, J. Comput. Graph. Statist., 24, 715, 10.1080/10618600.2014.926819 Helwig, 2015, Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters, Stat. Interface Hormann, 2001, The point in polygon problem for arbitrary polygons, Comput. Geom., 20, 131, 10.1016/S0925-7721(01)00012-8 Kaplan, 2010, Users of the world unite! The challenges and opportunities of social media, Bus. Horiz., 53, 59, 10.1016/j.bushor.2009.09.003 Kim, 2004, Smoothing spline Gaussian regression: More scalable computation via efficient approximation, J. R. Stat. Soc. Ser. B, 66, 337, 10.1046/j.1369-7412.2003.05316.x Kimeldorf, 1970, A correspondence between Bayesian estimation on stochastic processes and smoothing by splines, Ann. Math. Statist., 41, 495, 10.1214/aoms/1177697089 Lampos, V., 2012. Detecting events and patterns in large-scale user generated textual streams with statistical learning methods. Lampos, V., Cristianini, N., 2010. Tracking the flu pandemic by monitoring the social web, in: IAPR Cognitive Information Processing. Lampos, 2012, Nowcasting events from the social web with statistical learning, ACM Trans. Intell. Syst. Technol., 3, 72, 10.1145/2337542.2337557 Lee, R., Sumiya, K., 2010. Measuring geographical regularities of crowd behaviors for Twitter-based geo-social event detection, in: Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Location Based Social Networks, pp. 1–10. Li, 1987, Asymptotic optimality for Cp, CL, cross-validation and generalized cross-validation: Discrete index set, Ann. Statist., 15, 958, 10.1214/aos/1176350486 Lipman, V., 2014. Top Twitter trends: What countries are most active? Who’s most popular? http://www.forbes.com/sites/victorlipman/2014/05/24/top-twitter-trends-what-countries-are-most-active-whos-most-popular/. Ma, 2015, Efficient computation of smoothing splines via adaptive basis sampling, Biometrika, 102, 631, 10.1093/biomet/asv009 Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F., 2014. e1071: Misc Functions of the Department of tatistics (e1071), TU Wien. R package version 1.6-4. http://CRAN.R-project.org/package=e1071. Moore, 1920, On the reciprocal of the general algebraic matrix, Bull. Amer. Math. Soc., 26, 394 Nychka, 1988, Bayesian confidence intervals for smoothing splines, J. Amer. Statist. Assoc., 83, 1134, 10.1080/01621459.1988.10478711 Padmanabhan, 2014, Flumapper: A cybergis application for interactive analysis of massive location-based social media, Concurr. Comput.: Pract. Exper., 26, 2253, 10.1002/cpe.3287 Penrose, 1950, A generalized inverse for matrices, Math. Proc. Camb. Phil. Soc., 51, 406, 10.1017/S0305004100030401 Sadilek, A., Kautz, H., Silenzio, V., 2012. Predicting disease transmission from geo-tagged micro-blog data, in: AAAI, pp. 136–142. Schwarz, 1978, Estimating the dimension of a model, Ann. Statist., 6, 461, 10.1214/aos/1176344136 Signorini, 2011, The use of Twitter to track levels of disease activity and public concern in the us during the influenza A H1N1 pandemic, PLoS One, 6, e19467, 10.1371/journal.pone.0019467 Therneau, T., Atkinson, B., Ripley, B., 2015. rpart: Recursive partitioning and regression trees. R package version 4.1-10. URL: http://CRAN.R-project.org/package=rpart. Tsou, 2013, Mapping social activities and concepts with social media (Twitter) and web search engines (Yahoo and Bing): a case study in 2012 US presidential election, Cart. Geog. Info. Sci., 40, 337, 10.1080/15230406.2013.799738 Twitter, 2015. Twitter usage (https://about.twitter.com/company). URL: https://about.twitter.com/company. Wahba, 1983, Bayesian confidence intervals for the cross-validated smoothing spline, J. R. Stat. Soc. Ser. B, 45, 133 Wahba, 1990 Wahba, 1995, Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological study of diabetic retinopathy, Ann. Statist., 23, 1865, 10.1214/aos/1034713638 Wang, 1998, Mixed effects smoothing spline analysis of variance, J. R. Stat. Soc. Ser. B, 60, 159, 10.1111/1467-9868.00115 Wang, 1998, Smoothing spline models with correlated random errors, J. Amer. Statist. Assoc., 93, 341, 10.1080/01621459.1998.10474115 Wang, 2013, A cybergis environment for analysis of location-based social media data, 187 Wood, 2003, Thin plate regression splines, J. R. Stat. Soc. Ser. B, 65, 95, 10.1111/1467-9868.00374 Wood, 2006 Zhang, 1998, Semiparametric stochastic mixed models for longitudinal data, J. Amer. Statist. Assoc., 93, 710, 10.1080/01621459.1998.10473723