H-Storm: A Hybrid CPU-FPGA Architecture to Accelerate Apache Storm
Tóm tắt
The era of big data has led to the exponential growth of the amount of real-time data. Nowadays, traditional centralized solutions and parallelism techniques in distributed systems cannot satisfy the processing requirements of emerging applications. To overcome this inability, distributed stream processing (DSP) frameworks have emerged to utilize parallelism techniques and facilitate large-scale real-time data analytics. However, they are becoming impractical due to low throughput processing and inefficient resource utilization. In this paper, we design and implement a hybrid CPU-FPGA architecture based on Apache Storm (H-Storm), to improve processing throughput and average tuple processing time. H-Storm harnesses the computing power of FPGA by providing easy-to-use interfaces while preserving all strengths of Apache Storm. To utilize the FPGA resources, our architecture supports multiple accelerator interfaces to accelerate different tasks, simultaneously. An extensive evaluation of two different applications named Matrix Multiplication and Edge Detection shows that H-Storm can gain throughput improvement over the original Storm. To have a fair comparison, we used jBlas and OpenCV libraries as the rivals in full software implementations and the F-Storm framework in the hardware-accelerated implementation. Experimental results show that H-Storm archives up to 3.2X throughput gain and 2.3X speedup for Matrix Multiplication. It also leads to 3.4X throughput gain and 2.2X speedup for the Edge Detection application. Furthermore, several experiments are designed to determine when it is beneficial to use FPGA to accelerate compute-intensive components of the streaming applications.
Tài liệu tham khảo
Motwani, R., et al.: Models and issues in data stream systems. Invited Talk, PODS (2002)
Apache s4: Distributed stream computing platform, [Online] Available: http://incubator.apache.org/s4/ ([Accessed: 2020-04-05])
Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J.M., Kulkarni, S., Jackson, J., Gade, K., Fu, M., Donham, J. et al.: Storm@twitter, in: Proceedings of the 2014 ACM SIGMOD international conference on Management of data, ACM, pp 147–156 (2014)
Katsifodimos, A., Schelter, S.: Apache flink: stream analytics at scale, in: Cloud Engineering Workshop (IC2EW), 2016 IEEE International conference on, IEEE, pp 193–193 (2016)
Van Dongen, G., Van den Poel, D.: Evaluation of stream processing frameworks. IEEE Trans. Parall. Distributed Syst. 31(8), 1845–1858 (2020). https://doi.org/10.1109/TPDS.2020.2978480
Chen, Z., Xu, J., Tang, J., Kwiat, K.A., Kamhoua, C.A., Wang, C.: Gpu-accelerated high-throughput online stream data processing. IEEE Trans Big Data 4(2), 191–202 (2018)
Zhai, Y., Xu, W.: Efficient bottleneck detection in stream process system using fuzzy logic model. 438–445 (2017). https://doi.org/10.1109/PDP.2017.71
Challenges of processing streaming data, [Online] Available: https://nexocode.com/blog/posts/data-stream-processing-challenges/ (Accessed: 2022-10-08)
Keckler, S.W., Dally, W.J., Khailany, B., Garland, M., Glasco, D.: Gpus and the future of parallel computing. IEEE Micro 31(5), 7–17 (2011)
Lin, Z., Sinha, S., Liang, H., Feng, L., Zhang, W.: Scalable light-weight integration of fpga based accelerators with chip multi-processors. IEEE Trans Multi-Scale Comput Syst 4(2), 152–162 (2018)
Nurvitadhi, E., Venkatesh, G., Sim, J., Marr, D., Huang, R., Ong Gee Hock, J., Liew, Y.T., Srivatsan, K., Moss, D., Subhaschandra, S. et al.: Can fpgas beat gpus in accelerating next-generation deep neural networks?, in: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ACM, pp 5–14 (2017)
Kestur, S., Davis, J.D., Williams, O.: Blas comparison on fpga, cpu and gpu, in: 2010 IEEE computer society annual symposium on VLSI, IEEE, pp 288–293 (2010)
Nurvitadhi, E., Sim, J., Sheffield, D., Mishra, A., Krishnan, S., Marr, D.: Accelerating recurrent neural networks in analytics servers: Comparison of fpga, cpu, gpu, and asic, in: 2016 26th International conference on field programmable logic and applications (FPL), IEEE, pp 1–4 (2016)
Nurvitadhi, E., Sheffield, D., Sim, J., Mishra, A., Venkatesh, G., Marr, D.: Accelerating binarized neural networks: Comparison of fpga, cpu, gpu, and asic, in: 2016 International conference on field-programmable technology (FPT), IEEE, pp 77–84 (2016)
Nydriotis, A., Malakonakis, P., Pavlakis, N., Chrysos, G., Ioannou, E., Sotiriades, E., Garofalakis, M.N., Dollas, A.: Leveraging reconfigurable computing in distributed real-time computation systems., in: EDBT/ICDT Workshops, pp 1–6 (2016)
Peltenburg, J., Van Straten, J., Brobbel, M., Al-Ars, Z., Hofstee, H.P.: Generating high-performance fpga accelerator designs for big data analytics with fletcher and apache arrow. J. Signal Process. Syst. 93(5), 565–586 (2021)
Wu, S., Hu, D., Ibrahim, S., Jin, H., Xiao, J., Chen, F., Liu, H.: When fpga-accelerator meets stream data processing in the edge, in: 2019 IEEE 39th International conference on distributed computing systems (ICDCS), IEEE, pp 1818–1829 (2019)
Nasiri, H., Kavand, N., Darjani, A., Goudarzi, M.: Accelerating distributed stream processing. United States Patent 10534737 (Feb. 14, 2020)
Kachris, C., Soudris, D.: A survey on reconfigurable accelerators for cloud computing, in: 2016 26th International conference on field programmable logic and applications (FPL), IEEE, pp 1–10 (2016)
Nunna, K.C., Madipour, F., Trouve, A., Murakami, K.J.: A survey on big data processing infrastructure: evolving role of fpga (2015)
Najafi, M., Zhang, K., Sadoghi, M., Jacobsen, H.-A.: Hardware acceleration landscape for distributed real-time analytics: Virtues and limitations, in: 2017 IEEE 37th International conference on distributed computing systems (ICDCS), IEEE, pp 1938–1948 (2017)
Ozdal, M.M.: Emerging accelerator platforms for data centers. IEEE Design Test 35(1), 47–54 (2017)
Catapult fpga accelerator, [Online] Available: https://www.microsoft.com/en-us/research/project/project-catapult/ ([Accessed: 2019-02-08])
Xeon+fpga platform for the data center, [Online] Available: https://www.ece.cmu.edu/~calcm/carl/lib/exe/fetch.php?media=carl15-gupta.pdf ([Accessed: 2021-12-08])
Mbongue, J.M., Kwadjo, D.T., Shuping, A., Bobda, C.: Deploying multi-tenant fpgas within linux-based cloud infrastructure, ACM Trans. Reconfigurable Technol. Syst. 15(2) (2021). https://doi.org/10.1145/3474058
Caulfield, A.M., Chung, E.S., Putnam, A., Angepat, H., Fowers, J., Haselman, M., Heil, S., Humphrey, M., Kaur, P., Kim, J.-Y. et al.: A cloud-scale acceleration architecture, in: The 49th Annual IEEE/ACM International symposium on microarchitecture, IEEE Press, p 7 (2016)
Nasiri, H., Goudarzi, M.: Dynamic fpga-accelerator sharing among concurrently running virtual machines, in: 2016 IEEE East-West design & test symposium (EWDTS), IEEE, pp 1–4 (2016)
Tarafdar, N., Eskandari, N., Lin, T., Chow, P.: Designing for fpgas in the cloud. IEEE Design Test 35(1), 23–29 (2017)
Lallet, J., Enrici, A., Saffar, A.: Fpga-based system for the acceleration of cloud microservices, in: 2018 IEEE International symposium on broadband multimedia systems and broadcasting (BMSB), IEEE, pp 1–5 (2018)
Kulanov, V., Perepelitsyn, A., Zarizenko, I.: Method of development and deployment of reconfigurable fpga-based projects in cloud infrastructure, in: 2018 IEEE 9th International conference on dependable systems, services and technologies (DESSERT), IEEE, pp 103–106 (2018)
Minhas, U., Woods, R., Karakonstantis, G.: Facilitating easier access to fpgas in the heterogeneous cloud ecosystems, in: 2018 28th International conference on field programmable logic and applications (FPL), IEEE, pp 447–4471 (2018)
Tarafdar, N., Eskandari, N., Sharma, V., Lo, C., Chow, P.: Galapagos: A full stack approach to fpga integration in the cloud. IEEE Micro 38(6), 18–24 (2018)
Bobda, C., Mandebi Mbongue, J., Chow, P., Ewais, M., Tarafdar, N., Vega, J.C., Eguro, K., Koch, D., Handagala, S., Leeser, M., Herbordt, M., Shahzad, H., Hofste, P., Ringlein, B., Szefer, J., Sanaullah, A., Tessier, R.: The future of fpga acceleration in datacenters and the cloud. ACM Trans Reconfigurable Technol Syst 15, 1–42 (2022). https://doi.org/10.1145/3506713
Kachris, C., Diamantopoulos, D., Sirakoulis, G.C., Soudris, D.: An fpga-based integrated mapreduce accelerator platform. J. Signal Process. Syst. 87(3), 357–369 (2017)
Huang, M., Wu, D., Yu, C.H., Fang, Z., Interlandi, M., Condie, T., Cong, J.: Programming and runtime support to blaze fpga accelerator deployment at datacenter scale, in: Proceedings of the seventh ACM symposium on cloud computing, ACM, pp 456–469 (2016)
Hou, J., Zhu, Y., Kong, L., Wang, Z., Du, S., Song, S., Huang, T.: A case study of accelerating apache spark with fpga, in: 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/12th IEEE International conference on big data science and engineering (TrustCom/BigDataSE), IEEE, pp 855–860 (2018)
Ghasemi, E., Chow, P.: Accelerating apache spark with fpgas. Concurrency Comput.: Pract. Exper. 31(2), e4222 (2019)
Rouhani, B.D., Songhori, E.M., Mirhoseini, A., Koushanfar, F.: Ssketch: An automated framework for streaming sketch-based analysis of big data on fpga, in: 2015 IEEE 23rd Annual international symposium on field-programmable custom computing machines, IEEE, pp 187–194 (2015)
Aniello, L., Baldoni, R., Querzoni, L.: Adaptive online scheduling in storm, in: Proceedings of the 7th ACM international conference on Distributed event-based systems, ACM, pp 207–218 (2013)
Xu, J., Chen, Z., Tang, J., Su, S.: T-storm: Traffic-aware online scheduling in storm, in: 2014 IEEE 34th International conference on distributed computing systems, IEEE, pp 535–544 (2014)
Peng, B., Hosseini, M., Hong, Z., Farivar, R., Campbell, R.: R-storm: Resource-aware scheduling in storm, in: Proceedings of the 16th annual middleware conference, ACM, pp 149–161 (2015)
Xu, L., Peng, B., Gupta, I.: Stela: Enabling stream processing systems to scale-in and scale-out on-demand, in: 2016 IEEE International conference on cloud engineering (IC2E), IEEE, pp 22–31 (2016)
Liu, X., Buyya, R.: D-storm: Dynamic resource-efficient scheduling of stream processing applications, in: 2017 IEEE 23rd International conference on parallel and distributed systems (ICPADS), IEEE, pp 485–492 (2017)
Chen, Z., Xu, J., Tang, J., Kwiat, K., Kamhoua, C.: G-storm: Gpu-enabled high-throughput online data processing in storm, in: 2015 IEEE International conference on big data (Big Data), IEEE, pp 307–312 (2015)
Koliousis, A., Weidlich, M., Fernandez, R., Wolf, A., Costa, P., Pietzuch, P.: Saber: Window-based hybrid stream processing for heterogeneous architectures 555–569 (2016). https://doi.org/10.1145/2882903.2882906
Nasiri, H., Nasehi, S., Divband, A., Goudarzi, M.: A scheduling algorithm to maximize storm throughput in heterogeneous cluster (2020). arXiv:2001.10308
Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Cetintemel, U., Xing, Y., Zdonik, S.B.: Scalable distributed stream processing., in: CIDR, 3, 257–268 (2003)
Nasiri, H., Nasehi, S., Goudarzi, M.: A survey of distributed stream processing systems for smart city data analytics, in: Proceedings of the international conference on smart cities and internet of things, ACM, p 12 (2018)
Apache storm, [Online] Available: http://storm.apache.org/ ([Accessed: 2022-05-26])
Apache flink: Stateful computations over data streams, [Online] Available: https://flink.apache.org ([Accessed: 2021-12-08])
Apache spark: Lightning-fast unified analytics engine, [Online] Available: https://spark.apache.org/ ([Accessed: 2022-06-22])
Singh, M.P., Hoque, M.A., Tarkoma, S.: A survey of systems for massive stream analytics, (2016). arXiv:1605.09021
Jacobsen, M., Richmond, D., Hogains, M., Kastner, R.: Riffa 2.1: A reusable integration framework for fpga accelerators. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 8(4), 22 (2015)
Wiltgen, J., Ayer, J.: Bus master dma performance demonstration reference design for the xilinx endpoint pci express solutions, Xilinx, xapp1052 edition (2010)
Nasiri, H., Nasehi, S., Goudarzi, M.: Evaluation of distributed stream processing frameworks for iot applications in smart cities. J. Big Data 6(1), 52 (2019)
Shukla, A., Chaturvedi, S., Simmhan, Y.: Riotbench: A real-time iot benchmark for distributed stream processing platforms, (2017). arXiv:1701.08530
Vincent, O.R., Folorunso, O. et al.: A descriptive algorithm for sobel image edge detection, in: Proceedings of informing science & IT education conference (InSITE), Informing Science Institute California, 40, 97–107 (2009)
Rockenbach, D., Stein, C., Griebler, D., Mencagli, G., Torquati, M., Danelutto, M., Fernandes, L.: Stream Processing on Multi-cores with GPUs: Parallel Programming Models’ Challenges. 834–841, 2019 (2019)
Huang, Y., Li, Y., Zhang, Z., Liu, R.W.: “GPU-accelerated compression and visualization of large-scale vessel trajectories in maritime IoT industries.” IEEE Int. Things J. 7(11) (2020)
Sojoodi, A., Salimi Beni, M., Khunjush, F.: Ignite-GPU: a GPU-enabled in-memory computing architecture on clusters. J. Supercomput. 77, 2021 (2021)
Accelerating Apache Spark 3.0 with GPUs and RAPIDS. NVIDIA Developer Blog. https://developer.nvidia.com/blog/accelerating-apache-spark-3-0-with-gpus-and-rapids. Accessed on 2023-08-10