Sensor data quality: a systematic reviewJournal of Big Data - Tập 7 - Trang 1-49 - 2020
Hui Yie Teh, Andreas W. Kempa-Liehr, Kevin I-Kai Wang
Sensor data quality plays a vital role in Internet of Things (IoT) applications
as they are rendered useless if the data quality is bad. This systematic review
aims to provide an introduction and guide for researchers who are interested in
quality-related issues of physical sensor data. The process and results of the
systematic review are presented which aims to answer the following research
quest... hiện toàn bộ
Impact of reviewer social interaction on online consumer review fraud detectionJournal of Big Data - Tập 4 - Trang 1-19 - 2017
Kunal Goswami, Younghee Park, Chungsik Song
Online consumer reviews have become a baseline for new consumers to try out a
business or a new product. The reviews provide a quick look into the application
and experience of the business/product and market it to new customers. However,
some businesses or reviewers use these reviews to spread fake information about
the business/product. The fake information can be used to promote a relatively
av... hiện toàn bộ
An intelligent Alzheimer’s disease diagnosis method using unsupervised feature learningJournal of Big Data - Tập 6 - Trang 1-16 - 2019
Firouzeh Razavi, Mohammad Jafar Tarokh, Mahmood Alborzi
Today, the diagnosis of Alzheimer’s disease (AD) or mild cognitive impairment
(MCI) has attracted the attention of researchers in this field owing to the
increase in the occurrence of the diseases and the need for early diagnosis.
Unfortunately, the nature of high dimension of neural data and few available
samples led to the creation of a precise computer diagnostic system. Machine
learning techni... hiện toàn bộ
HaRD: a heterogeneity-aware replica deletion for HDFSJournal of Big Data - Tập 6 - Trang 1-21 - 2019
Hilmi Egemen Ciritoglu, John Murphy, Christina Thorpe
The Hadoop distributed file system (HDFS) is responsible for storing very large
data-sets reliably on clusters of commodity machines. The HDFS takes advantage
of replication to serve data requested by clients with high throughput. Data
replication is a trade-off between better data availability and higher disk
usage. Recent studies propose different data replication management frameworks
that alte... hiện toàn bộ
Rating prediction of peer-to-peer accommodation through attributes and topics from customer reviewJournal of Big Data - Tập 8 - Trang 1-29 - 2021
Athor Subroto, Marcel Christianis
This study aims to predict customers’ behavior in classifying their reviews as
high rated or low rated using associated attributes and topics found in the
review. Knowing customer reviewing action better can lead to a successful
strategy implementation of the relevant parties related to this study such as
policy to manage customer reviews by keeping their satisfaction high. We applied
a big data a... hiện toàn bộ
A graph-based big data optimization approach using hidden Markov model and constraint satisfaction problemJournal of Big Data - Tập 8 - Trang 1-29 - 2021
Imad Sassi, Samir Anter, Abdelkrim Bekkhoucha
To address the challenges of big data analytics, several works have focused on
big data optimization using metaheuristics. The constraint satisfaction problem
(CSP) is a fundamental concept of metaheuristics that has shown great efficiency
in several fields. Hidden Markov models (HMMs) are powerful machine learning
algorithms that are applied especially frequently in time series analysis.
However,... hiện toàn bộ
Using Big Data-machine learning models for diabetes prediction and flight delays analyticsJournal of Big Data - Tập 7 - Trang 1-18 - 2020
Thérence Nibareke, Jalal Laassiri
Nowadays large data volumes are daily generated at a high rate. Data from health
system, social network, financial, government, marketing, bank transactions as
well as the censors and smart devices are increasing. The tools and models have
to be optimized. In this paper we applied and compared Machine Learning
algorithms (Linear Regression, Naïve bayes, Decision Tree) to predict diabetes.
Further ... hiện toàn bộ
Summarizing large text collection using topic modeling and clustering based on MapReduce frameworkJournal of Big Data - Tập 2 - Trang 1-18 - 2015
N K Nagwani
Document summarization provides an instrument for faster understanding the
collection of text documents and has a number of real life applications.
Semantic similarity and clustering can be utilized efficiently for generating
effective summary of large text collections. Summarizing large volume of text is
a challenging and time consuming problem particularly while considering the
semantic similari... hiện toàn bộ
An enhanced random forest approach using CoClust clustering: MIMIC-III and SMS spam collection applicationJournal of Big Data - Tập 10 Số 1
Zeynep Ilhan Taskin, Kasırga Yıldırak, Çağdaş Hakan Aladağ
AbstractThe random forest algorithm could be enhanced and produce better results
with a well-designed and organized feature selection phase. The dependency
structure between the variables is considered to be the most important criterion
behind selecting the variables to be used in the algorithm during the feature
selection phase. As the dependency structure is mostly nonlinear, making use of
a too... hiện toàn bộ
Prediction of flight departure delays caused by weather conditions adopting data-driven approachesJournal of Big Data -
Seong‐Eun Kim, Eunil Park
AbstractIn this study, we utilize data-driven approaches to predict flight
departure delays. The growing demand for air travel is outpacing the capacity
and infrastructure available to support it. In addition, abnormal weather
patterns caused by climate change contribute to the frequent occurrence of
flight delays. In light of the extensive network of international flights
covering vast distances ... hiện toàn bộ