BIG DATA PROCESSING WITH APACHE SPARK

Quy Quang Tran1, Binh Duc Nguyen1, Linh Thi Thuy Nguyen2, Oanh Thi Thu Nguyen3
1University of Information and Communication Technology (ICTU), Thai Nguyen University, Vietnam
2Lao Cai College, Vietnam

Tóm tắt

With the exponential growth of information, it is no surprise that we are in a period of history as the Information Age. The rapid growth of data has presented challenges regarding storage and processing technology. This article refers to Apache Spark, an ecosystem that provides many integrated technologies in Big Data processing, including machine learning libraries and data storage platforms. Apache Spark provides distributed data processing for open source applications, loading data in-memory and making operations for analyzing data of any size, with efficient support for popular programming languages like Java, Scala, R, and Python. The article aims to compare the superior computing power of Saprk compared to Hadoop and how to connect Spark with today's popular data processing tools such as the R language.

Từ khóa

#Apache Spark #Big Data #distributed-computing #R language

Tài liệu tham khảo