Forecasting influenza epidemics by integrating internet search queries and traditional surveillance data with the support vector machine regression model in Liaoning, from 2011 to 2015

PeerJ - Tập 6 - Trang e5134
Feng Liang1, Peng Guan1, Wei Wu1, Desheng Huang1,2
1Department of Epidemiology, School of Public Health, China Medical University, Shenyang, Liaoning, China
2Department of Mathematics, School of Fundamental Sciences, China Medical University, Shenyang, Liaoning,, China

Tóm tắt

Background

Influenza epidemics pose significant social and economic challenges in China. Internet search query data have been identified as a valuable source for the detection of emerging influenza epidemics. However, the selection of the search queries and the adoption of prediction methods are crucial challenges when it comes to improving predictions. The purpose of this study was to explore the application of the Support Vector Machine (SVM) regression model in merging search engine query data and traditional influenza data.

Methods

The official monthly reported number of influenza cases in Liaoning province in China was acquired from the China National Scientific Data Center for Public Health from January 2011 to December 2015. Based on Baidu Index, a publicly available search engine database, search queries potentially related to influenza over the corresponding period were identified. An SVM regression model was built to be used for predictions, and the choice of three parameters (C, γ, ε) in the SVM regression model was determined by leave-one-out cross-validation (LOOCV) during the model construction process. The model’s performance was evaluated by the evaluation metrics including Root Mean Square Error, Root Mean Square Percentage Error and Mean Absolute Percentage Error.

Results

In total, 17 search queries related to influenza were generated through the initial query selection approach and were adopted to construct the SVM regression model, including nine queries in the same month, three queries at a lag of one month, one query at a lag of two months and four queries at a lag of three months. The SVM model performed well when with the parameters (C = 2, γ = 0.005, ɛ = 0.0001), based on the ensemble data integrating the influenza surveillance data and Baidu search query data.

Conclusions

The results demonstrated the feasibility of using internet search engine query data as the complementary data source for influenza surveillance and the efficiency of SVM regression model in tracking the influenza epidemics in Liaoning.

Từ khóa


Tài liệu tham khảo

Allen, 2016, Applying GIS and machine learning methods to Twitter data for multiscale surveillance of influenza, PLOS ONE, 11, e157734, 10.1371/journal.pone.0157734

Bouzille, 2018, Leveraging hospital big data to monitor flu epidemics, Computer Methods and Programs in Biomedicine, 154, 153, 10.1016/j.cmpb.2017.11.012

China Internet Network Information Center, 2018, The 41st Statistical Report on Internet Development

Du, 2017, Predicting the hand, foot, and mouth disease incidence using search engine query data and climate variables: an ecological study in Guangdong, China, BMJ Open, 7, e16263, 10.1136/bmjopen-2017-016263

Fung, 2013, Chinese social media reaction to the MERS-CoV and avian influenza A(H7N9) outbreaks, Infectious Diseases of Poverty, 2, 31, 10.1186/2049-9957-2-31

Ghalehkhondabi, 2017, Water demand forecasting: review of soft computing methods, Environmental Monitoring and Assessment, 189, 313, 10.1007/s10661-017-6030-3

Ginsberg, 2009, Detecting influenza epidemics using search engine query data, Nature, 457, 1012, 10.1038/nature07634

Gomez-Barroso, 2017, Climatic factors and influenza transmission, Spain, 2010–2015, International Journal of Environmental Research and Public Health, 14, 1469, 10.3390/ijerph14121469

Gu, 2015, Early detection of an epidemic erythromelalgia outbreak using Baidu search data, Scientific Reports, 5, 12649, 10.1038/srep12649

Guo, 2017a, Developing a dengue forecast model using machine learning: a case study in China, PLOS Neglected Tropical Diseases, 11, e0005973, 10.1371/journal.pntd.0005973

Guo, 2017b, Monitoring seasonal influenza epidemics by using internet search data with an ensemble penalized regression model, Scientific Reports, 7, 46469, 10.1038/srep46469

Hickmann, 2015, Forecasting the 2013–2014 influenza season using Wikipedia, PLOS Computational Biology, 11, e1004239, 10.1371/journal.pcbi.1004239

Kagashe, 2017, Enhancing seasonal influenza surveillance: topic analysis of widely used medicinal drugs using Twitter data, Journal of Medical Internet Research, 19, e315, 10.2196/jmir.7393

Lampos, 2015, Advances in nowcasting influenza-like illness rates using search query logs, Scientific Reports, 5, 12760, 10.1038/srep12760

Li, 2017, Dengue Baidu search index data can improve the prediction of local dengue epidemic: a case study in Guangzhou, China, PLOS Neglected Tropical Diseases, 11, e0005354, 10.1371/journal.pntd.0005354

Liu, 2017a, Urban air quality forecasting based on multi-dimensional collaborative Support Vector Regression (SVR): a case study of Beijing-Tianjin-Shijiazhuang, PLOS ONE, 12, e0179763, 10.1371/journal.pone.0179763

Liu, 2017b, Identifying potential norovirus epidemics in China via internet surveillance, Journal of Medical Internet Research, 19, e282, 10.2196/jmir.7855

McIver, 2014, Wikipedia usage estimates prevalence of influenza-like illness in the United States in near real-time, PLOS Computational Biology, 10, e1003581, 10.1371/journal.pcbi.1003581

National Health and Family Planning Commission of the People’s Republic of China, 2018, National Statutory Epidemic Situation in 2017

Nickerson, 2016, Deep neural network architectures for forecasting analgesic response, 2966

Olson, 2013, Reassessing Google Flu trends data for detection of seasonal and pandemic influenza: a comparative epidemiological study at three geographic scales, PLOS Computational Biology, 9, e1003256, 10.1371/journal.pcbi.1003256

Polgreen, 2008, Using internet searches for influenza surveillance, Clinical Infectious Diseases, 47, 1443, 10.1086/593098

Pollett, 2017, Evaluating Google flu trends in Latin America: important lessons for the next phase of digital disease detection, Clinical Infectious Diseases, 64, 34, 10.1093/cid/ciw657

Santillana, 2014, Using clinicians’ search query data to monitor influenza epidemics, Clinical Infectious Diseases, 59, 1446, 10.1093/cid/ciu647

Seo, 2014, Cumulative query method for influenza surveillance using search engine data, Journal of Medical Internet Research, 16, e289, 10.2196/jmir.3680

Seo, 2017, Methods using social media and search queries to predict infectious disease outbreaks, Healthcare Informatics Research, 23, 343, 10.4258/hir.2017.23.4.343

Shin, 2016, Correlation between national influenza surveillance data and search queries from mobile devices and desktops in South Korea, PLOS ONE, 11, e158539, 10.1371/journal.pone.0158539

Wagner, 2017, Estimating the population impact of a new pediatric influenza vaccination program in England using social media content, Journal of Medical Internet Research, 19, e416, 10.2196/jmir.8184

Wang, 2015, Socio-economic impact of influenza in children: a single-centered hospital study in Shanghai, Zhonghua Liu Xing Bing Xue Za Zhi, 36, 27

Wang, 2017, Epidemiological features and forecast model analysis for the morbidity of influenza in Ningbo, China, 2006–2014, International Journal of Environmental Research and Public Health, 14, 559, 10.3390/ijerph14060559

Woo, 2016, Estimating influenza outbreaks using both search engine query data and social media data in South Korea, Journal of Medical Internet Research, 18, e177, 10.2196/jmir.4955

World Health Organization, 2017, Up to 650,000 people die of respiratory diseases linked to seasonal flu each year

World Health Organization, 2018, Influenza (Seasonal)

Xu, 2017, Forecasting influenza in Hong Kong with Google search queries and statistical model fusion, PLOS ONE, 12, e0176690, 10.1371/journal.pone.0176690

Yang, 2015, The economic burden of influenza-associated outpatient visits and hospitalizations in China: a retrospective survey, Infectious Diseases of Poverty, 4, 44, 10.1186/s40249-015-0077-6

Yang, 2017, Using electronic health records and Internet search information for accurate influenza forecasting, BMC Infectious Diseases, 17, 332, 10.1186/s12879-017-2424-7

Yuan, 2013, Monitoring influenza epidemics in china with search query from baidu, PLOS ONE, 8, e64323, 10.1371/journal.pone.0064323

Yun, 2016, Social media and flu: media Twitter accounts as agenda setters, International Journal of Medical Informatics, 91, 67, 10.1016/j.ijmedinf.2016.04.009

Zhang, 2015, Leveraging social networking sites for disease surveillance and public sensing: the case of the 2013 avian influenza A(H7N9) outbreak in China, Western Pacific Surveillance and Response Journal, 6, 66, 10.5365/WPSAR.2015.6.1.013

Zhang, 2017, Development of a method for comprehensive water quality forecasting and its application in Miyun reservoir of Beijing, China, Journal of Environmental Sciences, 56, 240, 10.1016/j.jes.2016.07.017