A Bottom-Up Tree Based Storage Approach for Efficient IoT Data Analytics in Cloud Systems
Tóm tắt
Internet of Things (IoT) has been widely applied in various domains, e.g. environmental monitoring, intelligent transport system, video surveillance, etc. In most of the IoT applications, the IoT data is generated from a number of data sources, not just only one source. In addition, IoT data has various types with different processing requirements. The high-priority IoT data should have better storage and processing manners than the low-priority IoT data. The objective of this paper is to propose an efficient cloud storage approach for considering the multi-aspect requirements of IoT data. In the approach, a light-weight data structure is used to depict the distribution and calculate the size of each IoT subset (type) in all data sources. Then, we form a number of storage-locality groups from cloud storage blocks. However, the storage-locality groups have different storage sizes and locality capabilities. We would like to place the high-priority IoT subset in the storage-locality group with a strong locality capability. Therefore, there is the placement-combinational problem between IoT subsets and the storage-locality groups. To efficiently solve the IoT placement problem, we propose a bottom-up tree based approach associated with the solution of the well-known combinatorial problem: knapsack. Considering the knapsack problem with the NP-hard computational complexity, we also propose a heuristic placement approach.
Tài liệu tham khảo
Stankovic, J.A.: Research directions for the internet of things. IEEE Internet of Things Journal 1(1), 3–9 (2014)
Cai, H., Xu, B., Jiang, L., Vasilakos, A.V.: IoT-based big data storage systems in cloud computing: perspectives and challenges. IEEE Internet of Things Journal 4(1), 75–87 (2017)
Mallapuram, S., Ngwum, N., Yuan, F., Lu, C., Yu, W.: Smart City: the state of the art, datasets, and evaluation platforms. In: Proc. 16th IEEE/ACIS, Int. Conf. Comput. Inf. Sci. (ICIS), pp 447–452 (2017)
Mallapuram, S., Ngwum, N., Yuan, F., Lu, C., Yu, W.: City environmental monitoring [Online]. Available: https://aqicn.org/city/china/dalizhou/dalishihuanjingjia-ncezhan/ (2020)
Yu, J., Fu, B., Cao, A., He, Z., Wu, D.: EdgeCNN: a hybrid architecture for agile learning of healthcare data from IoT devices. In: 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS) (2018)
Pan, J., Yin, Y., Xiong, J., Luo, W., Gui, G., Sari, H.: Deep learning-based unmanned surveillance systems for observing water levels. IEEE Access 6, 73561–73571 (2018)
Marjani, M., et al.: Big IoT data analytics: architecture, opportunities, and open research challenges. IEEE Access 5, 5247–5261 (2017)
Ahlgren, B., Hidell, M., Ngai, E.C.-: Internet of things for smart cities: interoperability and open data. IEEE Internet Computing 20(6), 52–56 (2016)
Wang, J., Zhang, X., Yin, J., Wu, H., Han, D.: Speed up big data analytics by unveiling the storage distribution of sub-datasets. IEEE Transactions on Big Data 5(2), 231–244 (2018)
Viles, C.L., French, J.C.: Content locality in distributed digital libraries. Inf. Process. Manage 35(3), 317–336 (1999)
Viles, C.L., French, J.C.: Open source log collection system. [Online]. Available: https://flume.apache.org/ (2020)
Chen, Q., Yao, J., Xiao, Z.: LIBRA: lightweight data skew mitigation in MapReduce. IEEE Transactions on Parallel and Distributed Systems 26(9), 2520–2533 (2015)
Kwon, Y., Balazinska, M., Howe, B., Rolia, J.: Skewtune: mitigating skew in MapReduce applications. In: Proc. ACM SIGMOD Int. Conf. Manage. Data, pp 25–36 (2012)
Grover, R., Carey, M.J.: Extending map-reduce for efficient predicate-based sampling. In: Proc. IEEE 28th Int. Conf. Data Eng., pp 486–497 (2012)
Chen, Z., Wu, D., Xie, W., Zeng, J., He, J., Wu, D.: A bloom filter based approach for efficient MapReduce query processing on ordered datasets. In: Proc. Int. Conf. Advanced Cloud Big Data, pp 93–98 (2013)
Chen, Z., Wu, D., Xie, W., Zeng, J., He, J., Wu, D.: Apache Hadoop Project. [Online]. Available: http://hadoop.apache.org/ (2020)
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: Proc. IEEE 26th Symp. Mass Storage Systems and Technologies (MSST), pp 1–10 (2010)
White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Yahoo! Press (2009)
Dasgupta, S., Papadimitriou, C.H., Vazirani, U.V.: Algorithms. McGraw-Hill (2008)
Kellerer, H., Pferschy, U., Pisinger, D.: Knapsack Problems. Springer, Berlin (2004)
Schrijver, A.: Theory of Linear and Integer Programming. Wiley, New York (1998)
Mehlhorn, K., Sanders, P.: Algorithms and Data Structures: the Basic Toolbox. Springer, Berlin (2007)
IEEE Standard for Local and Metropolitan Area Networks: Media AccessControl (MAC) Bridges, IEEE 802.1D Std. (2004)
Lin, J.W., Chen, C.H., Chang, J.: Qos-aware data replication for data-intensive applications in cloud computing systems. IEEE Trans. on Cloud Computing 1(1), 101–115 (2013)
Kumar, A., Rendra, N.C., Bellur, U.: Uploading and replicating internet of things (IoT) data on distributed cloud storage. In: 2016 IEEE 9th International Conference on Cloud Computing, vol. 4, pp 670–677 (2016)
Bryk, P., Malawski, M., Juve, G., Deelman, E.: Storage-aware algorithms for scheduling of workflow ensembles in clouds. Journal of Grid Computing 14, 359–378 (2015)
Hsieh, H.C., Chiang, M.L.: The incremental load balance cloud algorithm by using dynamic data deployment. Journal of Grid Computing 17, 553–575 (Mar. 2019)
Yin, J., Liao, Y., Baldi, M., Gao, L., Nucci, A.: A scalable distributed framework for efficient analytics on ordered datasets. In: 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing, pp 131–138 (2013)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. ACM Commun. 51(1), 107–113 (2008)
Sonbol, K., Özkasap, Ö., Al-oqily, I., Aloqaily, M.: EdgeKV: decentralized, scalable, and consistent storage for the edge. Journal of Parallel and Distributed Computing (2020)
Kotb, Y., Ridhawi, I.A., Aloqaily, M., Baker, T., Jararweh, Y., Tawfik, H.: Cloud-based multi-agent cooperation for IoT devices using workflow-nets. J. Grid Comput. 17(4), 625–650 (2019)
Li, T., Liu, Y., Tian, Y., Shen, S., Mao, W.: A storage solution for massive IoT data based on NoSQL. In: IEEE International Conference on Green Computing and Communications, pp 50–57 (2012)
Wu, J.J., Ho, L.Y., Liu, P.: 2011 Optimal algorithms for cross-rack communication optimization in mapreduce framework. In: IEEE 4th International Conference on Cloud Computing, pp 420–427 (2011)
Wu, J.J., Ho, L.Y., Liu, P.: Lindo Software. [Online]. Available: https://www.lindo.com/ (2020)
Wu, J.J., Ho, L.Y., Liu, P.: NS3. [Online]. Available: https://www.nsnam.org/ (2020)
Kumar, A.R.A., Rao, S.V., Goswami, D.: NS3 simulator for a study of data center networks. In: 2013 IEEE 12th International Symposium on Parallel and Distributed Computing, pp 224–231 (2013)
Shukla, S.N., Champaneria, T.A.: Survey of various data collection ways for smart transportation domain of smart city. In: Proc. Int. Conf. IoT Soc. Mobile Anal. Cloud (I-SMAC), pp 681–685 (2017)
Shukla, S.N., Champaneria, T.A.: Bevywise. [Online]. Available: https://www.bevywise.com/iot-simulator/ (2020)