Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo

Chiến lược tối ưu hóa quản lý bộ nhớ trong khung Spark dựa trên việc giảm thiểu cạnh tranh

Springer Science and Business Media LLC - Tập 79 - Trang 1504-1525 - 2022

Yixin Song^1,2, Junyang Yu^1,2, JinJiang Wang^1,2, Xin He^1,2

¹School of Software, Henan University, Kaifeng, China

²Intelligent Data Processing Engineering Research Center of Henan Province, Kaifeng, China

Tóm tắt

Khung tính toán song song Spark 2.x áp dụng một mô hình quản lý bộ nhớ thống nhất. Trong trường hợp xảy ra tắc nghẽn bộ nhớ, việc phân bổ bộ nhớ cho các tác vụ đang hoạt động và bộ nhớ cache RDD (Resilient Distributed Datasets) gây ra cạnh tranh bộ nhớ, có thể làm giảm hiệu suất sử dụng tài nguyên tính toán và hiệu ứng tăng tốc độ bền, từ đó ảnh hưởng đến hiệu quả thực thi của chương trình. Để giải quyết vấn đề này, chúng tôi đề xuất một chiến lược quản lý giảm cạnh tranh, viết tắt là MCM, nhằm giảm thiểu tác động tiêu cực của cạnh tranh bộ nhớ. MCM được chia thành hai bước: Đầu tiên, thuật toán đảm bảo ưu tiên bộ nhớ tối thiểu cho tác vụ nhằm đáp ứng các tài nguyên tối thiểu cần thiết cho việc thực thi, tối ưu hóa số lượng tác vụ đang hoạt động. Thứ hai, với việc xem xét chi phí cạnh tranh, thuật toán chọn lựa vị trí lưu trữ được duy trì sẽ chọn vị trí lưu trữ tốt nhất một cách động nhằm cải thiện hiệu quả của việc tăng tốc độ bền. Các kết quả thí nghiệm khẳng định rằng MCM có khả năng thích ứng và mở rộng tuyệt vời. Trong trường hợp có tắc nghẽn bộ nhớ nghiêm trọng, MCM rõ ràng giảm thời gian thực thi công việc. So với các công trình tương tự như only_memory, only_disk, memory_and_disk, DMAOM và SACM, MCM giảm thời gian thực thi xuống 28.3%.

Từ khóa

Tài liệu tham khảo

Mostafaeipour A, Rafsanjani AJ, Ahmadi M, Dhanraj JA (2020) Investigating the performance of hadoop and spark platforms on machine learning algorithms. J Supercomput 77(2):1–28 Apache. Apache Spark. https://spark.apache.org. Accessed 24 Oct 2021 Karau H, Konwinski A, Wendell P, Zaharia M (2015) Learning Spark. O’Reilly Media Inc. Sebastopol pp 1-30 Zhang XW, Li ZH, Liu GS, Xu JJ, Xie TK (2018) A spark scheduling strategy for heterogeneous cluster. Comput Mater Continua 55(3):405–417 Ahmed N, Barczak A, Susnjak T et al (2020) A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench. J Big Data 110(7):1–18 Hu ZY, Shi XH, Ke ZX et al (2020) Estimating the memory consumption of big data applications based on program analysis. Scientia Sinica Inf 50(8):1178–1196 Hong-tao M, Song-ping Y, Fang L, Nong XIAO et al (2017) Research on memory management and cache replacement policies in spark. Comput Sci 44(06):37–41 Apache.Spark memory management overview. http://Spark.apache.org/docs/latest/tuning.html#memory-management-overview. Accessed 24 Oct (2021) Zaharia M, Xin R, Wendell P et al (2016) Apache spark: a unified engine for big data processing. Commun ACM 59(11):56–65 Apache.Unified Memory Management in Spark 1.6. https://issues.apache.org/jira/secure/attachment/12765646/unified-memory-management-Spark-10000.pdf. Accessed 24 Dec (2019) Zhao Z, Zhang H, Geng X, Ma H (2019). Resource-aware cache management for in-memory data analytics frameworks. In: 2019 IEEE international conference on parallel & distributed processing with applications, big data & cloud computing, sustainable computing & communications, social computing & networking (ISPA/BDCloud/SocialCom/SustainCom), pp 365-371 Bian C (2017) Research on key technologies of memory computing framework performance optimization (Ph.D. Thesis). Xinjiang University, China Ying C T (2017) Research on storage layer fault tolerance and optimization strategy in memory computing environment (Ph.D. Thesis). Xinjiang University, China L Yuan (2018) Research and optimization on resource usage and allocation strategy for spark (M.S. Thesis). Huazhong University of Science and Technology, China Geng Yuanzhen et al (2017) LCS: an efficient data eviction strategy for spark. Int J Parallel Prog 45(6):1285–1297 Adinew DM, Shijie Z, Liao Y (2020) Spark performance optimization analysis in memory management with deploy mode in standalone cluster computing. In: 2020 IEEE 36th international conference on data engineering (ICDE), IEEE, pp 58-69 Wang SZ, Zhang YP, Zhang L, Cao N, Pang CY (2018) An improved memory cache management study based on spark. Comput Mater Continua 56(04):415–431 Yun W, Yuchen Ding (2020) Research on efficient RDD self-cache replacement strategy in Spark. Appl Res Comput 37(10):3043–3047 Wangjian L, Yongfeng H, Congkai Bao (2018) Memory optimization of Spark parallellel computing framework. Comput Eng Sci 40(04):587–593 Wang B, Tang J, Zhang R, Ding W, Qi D (2019) LCRC: a dependency-aware cache management policy for spark. In: 2018 IEEE international conference on parallel & distributed processing with applications, ubiquitous computing & communications, big data & cloud computing, social computing & networking, sustainable computing & communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom). IEEE. pp 27-46 Young N (1994) The k-server dual and loose competitiveness for paging. Algorithmica 11(6):525–541 Li C, Cox AL (2015) GD-Wheel: a cost-aware replacement policy for key-value stores. In: the tenth European conference on computer systems,ACM, pp 1-15 Duan M, Li K, Tang Z, Xiao G, Li K (2016) Selection and replacement algorithms for memory performance improvement in Spark. Concurr Comput Pract Exp 28(8):2473–2486 Heng Liu, Liang Tan (2018) New RDD partition weight cache replacement algorithm in spark. J Chin Comput Syst 39(10):2279–2284 Bian C, Yu J, Ying CT et al (2017) Self-adaptive strategy for cache management in spark. Acta Electron Sin 45(2):278–284 Kang M, Lee JG (2020) Effect of garbage collection in iterative algorithms on Spark: an experimental analysis. J Supercomput 76(01):7204–7218 Bian C, Yu J, Xiu YR et al (2019) Parallelism deduction algorithm for spark. J Univ Electron Sci Technol China 48(04):567–574 Zhuo T, Az A, Xz A, Li YC, Kl A (2020) Dynamic memory-aware scheduling in Spark computing environment. J Parallel Distrib Comput 14(01):10–22 Xu L, Li M, Zhang L, Butt AR, Wang Y, Hu ZZ (2016) MEMTUNE: dynamic memory management for in-memory data analytic platforms. In: proceedings of IEEE international parallel and distributed processing symposium (IPDPS), pp 383–392 Wang Suzhen et al (2019) A dynamic memory allocation optimization mechanism based on spark. CMC Comput Mater Continua 61(02):739–757 Karau H, Warren R (2016) High performance Spark: best practices for scaling and optimizing Apache Spark. O’Reilly Media Inc, Sebastopol pp 1-10 Li C, Cai Q, Luo Y (2022) Data balancing-based intermediate data partitioning and check point-based cache recovery in spark environment. Supercomput 78(08):3561–3604 Elmeiligy MA, Desouky A, Elghamrawy SM (2021) An efficient parallel indexing structure for multi-dimensional big data using spark. Supercomput 77(03):11187–11214 Raj S, Ramesh D, Sethi KK (2021) A spark-based apriori algorithm with reduced shuffle overhead. Supercomput 77(03):133–151 Zhu Z, Shen Q, Yang Y, Wu Z (2017) MCS: memory constraint strategy for unified memory manager in spark. In: IEEE international conference on parallel & distributed systems, IEEE. pp 41-60 Jia D, Bhimani J, Nguyen SN, Sheng B, Mi N (2019) ATuMm: auto-tuning memory manager in apache spark. In: 2019 IEEE 38th international performance computing and communications conference (IPCCC), IEEE. pp 14-33 Hussain MA, Tsai TH (2021) Memory access optimization for on-chip transfer learning. IEEE Trans Circuits Syst 68(04):1507–1519 Kumari P, Saxena AS (2021) Advanced fusion ACO approach for memory optimization in cloud computing environment. In: 2021 third international conference on intelligent communication technologies and virtual mobile networks (ICICV) pp 168-172 Allen T, Ge R (2021) In-depth analyses of unified virtual memory system for GPU accelerated computing. In: the international conference for high performance computing, networking, storage and analysis pp 1-15 Bender MA (2021) External-memory dictionaries in the affine and PDAM models. ACM Trans Parallel Comput 8(03):1–20 Saha R, Pundir YP, Pal PK (2021) Design of an area and energy-efficient last-level cache memory using STT-MRAM. J Magn Magn Mater 529(03):167882 Chaudhuri M (2021) Zero directory eviction victim: unbounded coherence directory and core cache isolation. In: 2021 IEEE international symposium on high-performance computer architecture (HPCA) IEEE, pp 277-290 Apache. Apache Spark web interfaces. https://Spark.apache.org/docs/latest/monitoring.html. Accessed 24 Dec (2021)

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA