Sensor data quality: a systematic reviewJournal of Big Data - Tập 7 - Trang 1-49 - 2020
Hui Yie Teh, Andreas W. Kempa-Liehr, Kevin I-Kai Wang
Sensor data quality plays a vital role in Internet of Things (IoT) applications as they are rendered useless if the data quality is bad. This systematic review aims to provide an introduction and guide for researchers who are interested in quality-related issues of physical sensor data. The process and results of the systematic review are presented which aims to answer the following research quest...... hiện toàn bộ
Impact of reviewer social interaction on online consumer review fraud detectionJournal of Big Data - Tập 4 - Trang 1-19 - 2017
Kunal Goswami, Younghee Park, Chungsik Song
Online consumer reviews have become a baseline for new consumers to try out a business or a new product. The reviews provide a quick look into the application and experience of the business/product and market it to new customers. However, some businesses or reviewers use these reviews to spread fake information about the business/product. The fake information can be used to promote a relatively av...... hiện toàn bộ
An intelligent Alzheimer’s disease diagnosis method using unsupervised feature learningJournal of Big Data - Tập 6 - Trang 1-16 - 2019
Firouzeh Razavi, Mohammad Jafar Tarokh, Mahmood Alborzi
Today, the diagnosis of Alzheimer’s disease (AD) or mild cognitive impairment (MCI) has attracted the attention of researchers in this field owing to the increase in the occurrence of the diseases and the need for early diagnosis. Unfortunately, the nature of high dimension of neural data and few available samples led to the creation of a precise computer diagnostic system. Machine learning techni...... hiện toàn bộ
HaRD: a heterogeneity-aware replica deletion for HDFSJournal of Big Data - Tập 6 - Trang 1-21 - 2019
Hilmi Egemen Ciritoglu, John Murphy, Christina Thorpe
The Hadoop distributed file system (HDFS) is responsible for storing very large data-sets reliably on clusters of commodity machines. The HDFS takes advantage of replication to serve data requested by clients with high throughput. Data replication is a trade-off between better data availability and higher disk usage. Recent studies propose different data replication management frameworks that alte...... hiện toàn bộ
Rating prediction of peer-to-peer accommodation through attributes and topics from customer reviewJournal of Big Data - Tập 8 - Trang 1-29 - 2021
Athor Subroto, Marcel Christianis
This study aims to predict customers’ behavior in classifying their reviews as high rated or low rated using associated attributes and topics found in the review. Knowing customer reviewing action better can lead to a successful strategy implementation of the relevant parties related to this study such as policy to manage customer reviews by keeping their satisfaction high. We applied a big data a...... hiện toàn bộ
A graph-based big data optimization approach using hidden Markov model and constraint satisfaction problemJournal of Big Data - Tập 8 - Trang 1-29 - 2021
Imad Sassi, Samir Anter, Abdelkrim Bekkhoucha
To address the challenges of big data analytics, several works have focused on big data optimization using metaheuristics. The constraint satisfaction problem (CSP) is a fundamental concept of metaheuristics that has shown great efficiency in several fields. Hidden Markov models (HMMs) are powerful machine learning algorithms that are applied especially frequently in time series analysis. However,...... hiện toàn bộ
Using Big Data-machine learning models for diabetes prediction and flight delays analyticsJournal of Big Data - Tập 7 - Trang 1-18 - 2020
Thérence Nibareke, Jalal Laassiri
Nowadays large data volumes are daily generated at a high rate. Data from health system, social network, financial, government, marketing, bank transactions as well as the censors and smart devices are increasing. The tools and models have to be optimized. In this paper we applied and compared Machine Learning algorithms (Linear Regression, Naïve bayes, Decision Tree) to predict diabetes. Further ...... hiện toàn bộ
Summarizing large text collection using topic modeling and clustering based on MapReduce frameworkJournal of Big Data - Tập 2 - Trang 1-18 - 2015
N K Nagwani
Document summarization provides an instrument for faster understanding the collection of text documents and has a number of real life applications. Semantic similarity and clustering can be utilized efficiently for generating effective summary of large text collections. Summarizing large volume of text is a challenging and time consuming problem particularly while considering the semantic similari...... hiện toàn bộ
An enhanced random forest approach using CoClust clustering: MIMIC-III and SMS spam collection applicationJournal of Big Data - Tập 10 Số 1
Zeynep Ilhan Taskin, Kasırga Yıldırak, Çağdaş Hakan Aladağ
AbstractThe random forest algorithm could be enhanced and produce better results with a well-designed and organized feature selection phase. The dependency structure between the variables is considered to be the most important criterion behind selecting the variables to be used in the algorithm during the feature selection phase. As the dependency structure is most...... hiện toàn bộ
Prediction of flight departure delays caused by weather conditions adopting data-driven approachesJournal of Big Data -
Seong‐Eun Kim, Eunil Park
AbstractIn this study, we utilize data-driven approaches to predict flight departure delays. The growing demand for air travel is outpacing the capacity and infrastructure available to support it. In addition, abnormal weather patterns caused by climate change contribute to the frequent occurrence of flight delays. In light of the extensive network of international...... hiện toàn bộ