Performance and programmability of GrPPI for parallel stream processing on multi-cores

Adriano Marques Garcia1, Dalvan Griebler2, Cláudio Schepke3, Juan Carlos Figueroa García4, Javier Fernández Muñoz4, Luiz Gustavo Fernandes2
1Department of Computer Science, University of Turin, Turin, Italy
2School of Technology, Pontifical Catholic University of Rio Grande do Sul, Porto Alegre, RS, Brazil
3Laboratory of Advances Studies in Computation (LEA), Federal University of Pampa, Alegrete, RS, Brazil
4Department of Computer Science, University Carlos III of Madrid, Madrid, Spain

Tóm tắt

Abstract

GrPPI library aims to simplify the burdening task of parallel programming. It provides a unified, abstract, and generic layer while promising minimal overhead on performance. Although it supports stream parallelism, GrPPI lacks an evaluation regarding representative performance metrics for this domain, such as throughput and latency. This work evaluates GrPPI focused on parallel stream processing. We compare the throughput and latency performance, memory usage, and programmability of GrPPI against handwritten parallel code. For this, we use the benchmarking framework SPBench to build custom GrPPI benchmarks and benchmarks with handwritten parallel code using the same backends supported by GrPPI. The basis of the benchmarks is real applications, such as Lane Detection, Bzip2, Face Recognizer, and Ferret. Experiments show that while performance is often competitive with handwritten parallel code, the infeasibility of fine-tuning GrPPI is a crucial drawback for emerging applications. Despite this, programmability experiments estimate that GrPPI can potentially reduce the development time of parallel applications by about three times.

Từ khóa


Tài liệu tham khảo

McCool M, Reinders J, Robison A (2012) Structured parallel programming: patterns for efficient computation. Elsevier, Amsterdam

Aldinucci M, Danelutto M, Kilpatrick P, Torquati M (2017) Fastflow: high-level and efficient streaming on multicore, Chap. 13. In: Pllana S, Xhafa F (eds) Programming multi-core and many-core computing systems. Wiley, Hoboken, pp 261–280. https://doi.org/10.1002/9781119332015.ch13

Voss M, Asenjo R, Reinders J (2019) Pro TBB: C++ parallel programming with threading building blocks, vol 295. Springer, Berkeley

Rio Astorga D, Dolz MF, Fernández J, García JD (2017) A generic parallel pattern interface for stream and data processing. Concurrency Comput Pract Exp. https://doi.org/10.1002/cpe.4175

del Rio Astorga D, Dolz MF, Fernández J, García JD (2018) Paving the way towards high-level parallel pattern interfaces for data stream processing. Future Gen Comput Syst 87:228–241. https://doi.org/10.1016/j.future.2018.05.011

Muñoz JF, Dolz MF, Rio Astorga D, Cepeda JP, García JD (2018) Supporting MPI-distributed stream parallel patterns in GrPPI. In: Proceedings of the 25th European MPI Users’ Group Meeting, EuroMPI’18. ACM, New York, NY, USA. https://doi.org/10.1145/3236367.3236380

López-Gómez J, Fernández Muñoz J, del Rio Astorga D, Dolz MF, Garcia JD (2019) Exploring stream parallel patterns in distributed MPI environments. Parallel Comput 84:24–36. https://doi.org/10.1016/j.parco.2019.03.004

Garcia AM, Griebler D, Schepke C, García JD, Muñoz JF, Fernandes LG (2023) A latency, throughput, and programmability perspective of GrPPI for streaming on multi-cores. In: 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), PDP’23. IEEE, Naples, Italy, pp 164–168. https://doi.org/10.1109/PDP59025.2023.00033

Garcia AM, Griebler D, Schepke C, Fernandes LG (2022) SPBench: a framework for creating benchmarks of stream processing applications. Computing. https://doi.org/10.1007/s00607-021-01025-6

Vogel A, Griebler D, Danelutto M, Fernandes LG (2022) Self-adaptation on parallel stream processing: a systematic review. Concurrency Comput Pract Exp 34(6):6759. https://doi.org/10.1002/cpe.6759

Garcia JD, Rio D, Aldinucci M, Tordini F, Danelutto M, Mencagli G, Torquati M (2020) Challenging the abstraction penalty in parallel patterns libraries. J Supercomput 76(7):5139–5159. https://doi.org/10.1007/s11227-019-02826-5

Garcia AM, Griebler D, Schepke C, Fernandes LG (2023) Micro-batch and data frequency for stream processing on multi-cores. J Supercomput. https://doi.org/10.1007/s11227-022-05024-y

Garcia-Blas J, Rio Astorga D, García JD, Carretero J (2019) Exploiting stream parallelism of MRI reconstruction using GrPPI over multiple back-ends. In: 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp 631–637. https://doi.org/10.1109/CCGRID.2019.00081

Vílchez Moya C (2020) Application parallelization and debugging using pattern-based programming. Technical report, Undergraduate Thesis of Double Degree in Computer Engineering and Mathematics, Faculty of Informatics UCM, Department of Computer Architecture and Automation. https://eprints.ucm.es/id/eprint/62014/

Brown C, Janjic V, Barwell AD, Garcia JD, MacKenzie K (2020) Refactoring GrPPI: generic refactoring for generic parallelism in C++. Int J Parallel Prog 48(4):603–625. https://doi.org/10.1007/s10766-020-00667-x

Andrade G, Griebler D, Santos R, Danelutto M, Fernandes LG (2021) Assessing coding metrics for parallel programming of stream processing programs on multi-cores. In: 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), SEAA’21. IEEE, Pavia, Italy, pp 291–295

Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp 72–81

Liu S, Gaudiot J-L (2020) Autonomous vehicles lite self-driving technologies should start small, go slow. IEEE Spectrum 57(3):36–49. https://doi.org/10.1109/MSPEC.2020.9014458

Dekking FM, Kraaikamp C, Lopuhaä HP, Meester LE (2005) A modern introduction to probability and statistics: understanding why and how, vol 488. Springer, Berkeley

Ignatious HA, Sayed H-E, Khan M (2022) An overview of sensors in autonomous vehicles. Procedia Comput Sci 198:736–741. https://doi.org/10.1016/j.procs.2021.12.315

Bagwe GR (2018) Video frame reduction in autonomous vehicles. Master’s Thesis, Michigan Technological University, Michigan, USA. https://doi.org/10.37099/mtu.dc.etdr/645

Andrade G, Griebler D, Santos R, Fernandes LG (2023) A parallel programming assessment for stream processing applications on multi-core systems. Comput Stand Interfaces 84:1–25. https://doi.org/10.1016/j.csi.2022.103691

Andrade G, Griebler D, Santos R, Kessler C, Ernstsson A, Fernandes LG (2022) Analyzing programming effort model accuracy of high-level parallel programs for stream processing. In: Proceedings of the International Conference on Software Engineering and Advanced Applications, pp 229–232. https://doi.org/10.1109/SEAA56994.2022.00043

Halstead MH (1977) Elements of software science, vol 36. Elsevier, New York, pp 4–41

Bordin MV, Griebler D, Mencagli G, Geyer CFR, Fernandes LG (2020) DSPBench: a suite of benchmark applications for distributed data stream processing systems. IEEE Access 8(na):222900–222917. https://doi.org/10.1109/ACCESS.2020.3043948

Griebler D, Danelutto M, Torquati M, Fernandes LG (2017) SPar: A DSL for high-level and productive stream parallelism. Parallel Process Lett 27(01):1740005