Linking Twitter and survey data: asymmetry in quantity and its impact

Tarek Al Baghal1, Alexander Wenz2, Luke Sloan3, Curtis Jessop4
1University of Essex, Colchester, UK
2University of Mannheim, Mannheim, Germany
3Cardiff University, Cardiff, UK
4NatCen Social Research, London, UK

Tóm tắt

Abstract

Linked social media and survey data have the potential to be a unique source of information for social research. While the potential usefulness of this methodology is widely acknowledged, very few studies have explored methodological aspects of such linkage. Respondents produce planned amounts of survey data, but highly variant amounts of social media data. This study explores this asymmetry by examining the amount of social media data available to link to surveys. The extent of variation in the amount of data collected from social media could affect the ability to derive meaningful linked indicators and could introduce possible biases. Linked Twitter data from respondents to two longitudinal surveys representative of Great Britain, the Innovation Panel and the NatCen Panel, show that there is indeed substantial variation in the number of tweets posted and the number of followers and friends respondents have. Multivariate analyses of both data sources show that only a few respondent characteristics have a statistically significant effect on the number of tweets posted, with the number of followers being the strongest predictor of posting in both panels, women posting less than men, and some evidence that people with higher education post less, but only in the Innovation Panel. We use sentiment analyses of tweets to provide an example of how the amount of Twitter data collected can impact outcomes using these linked data sources. Results show that more negatively coded tweets are related to general happiness, but not the number of positive tweets. Taken together, the findings suggest that the amount of data collected from social media which can be linked to surveys is an important factor to consider and indicate the potential for such linked data sources in social research.

Từ khóa


Tài liệu tham khảo

Al Baghal T, Sloan L, Jessop C, Williams M, Burnap P (2019) Linking Twitter and survey data: the impact of survey mode and demographics on consent rates across three UK studies. Online first at Soc Sci Comput Rev. https://doi.org/10.1177/0894439319828011

Amaya A, Bach R, Keusch F, Kreuter F (2019) New data sources in social science research: things to know before working with reddit data. Soc Sci Comput Rev. https://doi.org/10.1177/0894439319893305

Cox BD, Blaxter M, Buckle ALJ, Fenner NP, Golding JF, Gore M, Huppert FA, Nickson J, Roth M, Stark J, Wadsworth MEJ, Whichelow M (1987) The health and lifestyle survey preliminary report of a nationwide survey of the physical and mental health, attitudes and lifestyle of a random sample of 9,003 British adults. Health Promotion Research Trust, London

Dahal B, Kumar SAP, Li Z (2019) Topic modeling and sentiment analysis of global climate change tweets. Soc Netw Anal Min 9:24

Guess A, Munger K, Nagler J, Tucker JA (2018) How accurate are survey responses on social media and politics? Polit Commun 36:241–258

Henderson M, Jiang K, Johnson M, Porter L (2019) Measuring Twitter use: validating survey-based measures. Soc Sci Comput Rev. https://doi.org/10.1177/0894439319896244

Jessop C (2018) The NatCen panel: developing an open probability-based mixed-mode panel in Great Britain. Soc Res Pract 6:2–14

Karlsen R, Enjolras B (2016) Styles of social media campaigning and influence in a hybrid political communication system: linking candidate survey data with Twitter data. Int J Press/Politics 21:338–357

Kearney MW (2019) rtweet: collecting Twitter Data R package version 0.69 https://cran.r-project.org/web/packages/rtweet/readme/README.html Accessed 5 April 2021

Knies G (ed) (2018) Understanding society: the UK household longitudinal study waves 1-8, user manual institute for social and economic research University of Essex, Colchester

Kwak H, Lee C, Park H, Moon S (2010) What is Twitter. In: A social network or a news media? Proceedings of the WWW conference 2010, pp 591–600

Liu B (2015) Sentiment analysis: mining opinions, sentiments, and emotions. Cambridge University Press, Cambridge

Murphy J, Landwehr J, Richards A (2013) Using Twitter to Predict Survey Responses. Paper presented at the Midwest Association of Public Opinion Research conference. Nov 2013

Murphy J, Link MW, Hunter-Childs J, Langer-Tesfaye C, Dean E, Stern M, Pasek J, Cohen J, Callegaro M, Harwood P (2014) Social media in public opinion research: report of the AAPOR task force on emerging technologies in public opinion research. American Association for Public Opinion Research, Deerfield

Nielsen FA (2011) A new ANEW: evaluation of a word list for sentiment analysis in microblogs. In: Proceedings of the ESWC2011 workshop on ‘Making sense of microposts’: big things come in small packages 718 in CEUR workshop proceedings, pp 93–98

Office of Communications (Ofcom) (2019) Adults’ Media Use and Attitudes Report 2019. Research Document

Office of National Statistics (2019) Average household income, UK: Financial year ending 2019 (provisional) Statistical Bulletin

Pennebaker JW, Booth R, Boyd RL, Francis ME (2015) Linguistic inquiry and word count: LIWC2015. Pennebaker Conglomerate, Austin

R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.r-project.org/. Accessed 5 April 2021

Sakshaug JW, Couper MP, Ofstedal MB, Weir D (2012) Linking survey and administrative records: mechanisms of consent. Sociol Methods Res 41:535–569

Sakshaug JW, Kreuter F (2012) Assessing the magnitude of non-consent biases in linked survey and administrative data. Surv Res Methods 6:113–122

Silge J, Robinson D (2017) Text mining with R: a tidy approach. O’Reilly Media, Inc, Sebastopol

Sloan L (2017) Social science ‘Lite’? Deriving demographic proxies from Twitter. In: Sloan L, Quan-Haase A (eds) The SAGE handbook of social media research methods. Sage, Thousand Oaks, pp 90–104

Sloan L, Jessop C, Al Baghal T, Williams M (2019) Linking survey and Twitter data: informed consent, disclosure, security, and archiving. Journal of Empirical Research on Human Research Ethics. https://doi.org/10.1177/1556264619853447

University of Essex Institute for Social and Economic Research (2018) Understanding Society: innovation Panel, Waves 1-10, 2008-2017 [data collection] 9th Edition UK Data Service SN: 6849. https://doi.org/10.5255/UKDA-SN-6849-10

Wojcik S, Hughes A (2019) Sizing Up Twitter Users Pew. Research Center Report. April 2019

Yang J, Counts S (2010) Predicting the speed, scale, and range of information diffusion in Twitter. In: Proceedings of the fourth international AAAI conference on weblogs and social media