Developing an Efficient Pattern Discovery Method for CPU Utilizations of Computers

Zhuoer Gu1, Ligang He1, Cheng Chang2, Jianhua Sun2, Hao Chen2, Chenlin Huang3
1Department of Computer Science, University of Warwick, Coventry, UK
2College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
3School of Computer Science, National University of Defense Technology, Changsha, China

Tóm tắt

Mining repeated patterns (often called motifs) in CPU utilization of computers (also called CPU host load) is of fundamental importance. Many recently emerging applications running on high performance computing systems rely on motif discovery for various purposes, including efficient task scheduling, energy saving, etc. In this paper, we propose an efficient motif discovery framework for CPU host load. The framework is elaborately designed to take into account the important properties in host load data. The framework benefits from its ability of on-line discovery and the adaptivity to work with massive data. The experiments are conducted in this paper and the experimental results show that the proposed method is effective and efficient.

Tài liệu tham khảo

Agrawal, R., Faloutsos, C., Swami, A.: Efficient Similarity Search in Sequence Databases. Springer, Berlin (1993) Butler, M., Kazakov, D.: Sax discretization does not guarantee equiprobable symbols. IEEE Trans. Knowl. Data Eng. 27(4), 1162–1166 (2015) Chan, K.-P., Fu, A.W.-C.: Efficient time series matching by wavelets. In: Proceedings of the 15th International Conference on Data Engineering, 1999, pp. 126–133, IEEE (1999) Chiu, B., Keogh, E., Lonardi, S.: Probabilistic discovery of time series motifs. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 493–498, ACM (2003) Das, G., Lin, K.-I., Mannila, H., Renganathan, G., Smyth, P.: Rule discovery from time series. In: KDD, vol. 98, pp. 16–22 (1998) Dasgupta, D., Forrest, S.: Novelty detection in time series data using ideas from immunology. In: Proceedings of the International Conference on Intelligent Systems, pp. 82–87 (1996) Di, S., Kondo, D., Cirne, W.: Characterization and comparison of cloud versus grid workloads. In: 2012 IEEE International Conference on Cluster Computing (CLUSTER), pp. 230–238, IEEE (2012) Di, S., Kondo, D., Cirne, W.: Host load prediction in a google compute cloud with a bayesian model. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 21, IEEE Computer Society Press (2012) Dinda, P.A., O’Hallaron, D.R.: An evaluation of linear models for host load prediction. In: Proceedings of the Eighth International Symposium on High Performance Distributed Computing, 1999, pp. 87–96 (1999) Dinda, P.A.: The statistical properties of host load. Sci. Program. 7(3), 211–229 (1999) Ge, X., Smyth, P.: Deformable markov model templates for time-series pattern matching. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 81–90, ACM (2000) Gu, Z., Chang, C., He, L., Li, K.: Developing a pattern discovery model for host load data. In: 2014 IEEE 17th International Conference on Computational Science and Engineering (CSE), pp. 265–271, IEEE (2014) Gu, Z., He, L.: Developing a pattern discovery model for host load data. Unpublished manuscript Guo, P., Wang, L., Chen, P.: A performance modeling and optimization analysis tool for sparse matrix-vector multiplication on GPUs. IEEE Trans. Parallel Distrib. Syst. 25(5), 1112–1123 (2014) Han, J., Dong, G., Yin, Y.: Efficient mining of partial periodic patterns in time series database. In: Proceedings of the 15th International Conference on Data Engineering, 1999, pp. 106–115, IEEE (1999) Hegland, M., Clarke, W., Kahn, M.: Mining the macho dataset. Comput. Phys. Commun. 142(1), 22–28 (2001) Hppner, F.: Discovery of temporal patterns. In: De Raedt, L., Siebes, A. (eds.) Principles of Data Mining and Knowledge Discovery, Volume 2168 of Lecture Notes in Computer Science, pp. 192–203. Springer, Berlin (2001) Hubel, D.H.: Eye, Brain, and Vision, vol. 22. Scientific American Library, New York (1988) Kalpakis, K., Gada, D., Puttagunta, V.: Distance measures for effective clustering of arima time-series. In: ICDM 2001, Proceedings IEEE International Conference on Data Mining, pp. 273–280, IEEE (2001) Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst. 3(3), 263–286 (2001) Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Locally adaptive dimensionality reduction for indexing large time series databases. ACM SIGMOD Rec. 30(2), 151–162 (2001) Keogh, E., Kasetty, S.: On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min. Knowl. Discov. 7(4), 349–371 (2003) Keogh, E., Lin, J.: Clustering of time-series subsequences is meaningless: implications for previous and future research. Knowl. Inf. Syst. 8(2), 154–177 (2005) Keogh, E., Ratanamahatana, C.A.: Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7(3), 358–386 (2005) Keogh, E.J., Pazzani, M.J.: An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. In: KDD, vol. 98, pp. 239–243 (1998) Li, K., Tang, X., Veeravalli, B., Li, K.: Scheduling precedence constrained stochastic tasks on heterogeneous cluster systems. IEEE Trans. Comput. 64(1), 191–204 (2015) Li, K., Yang, W., Li, K.: Performance analysis and optimization for SpMV on GPU using probabilistic modeling. IEEE Trans. Parallel Distrib. Syst. 26(1), 196–205 (2015) Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 2–11, ACM (2003) Lin R.A.K., Shim, H.S.S.K.: Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In: Proceeding of the 21th International Conference on Very Large Data Bases, pp. 490–501 (1995) Lonardi, J.L.E.K.S., Patel, P.: Finding motifs in time series. In: Proceedings of the 2nd Workshop on Temporal Data Mining, pp. 53–68 (2002) Marx, M.L., Larsen, R.J.: Introduction to Mathematical Statistics and its Applications. Pearson/Prentice Hall, Englewood Cliffs (2006) Mueen, A., Keogh, E.J., Zhu, Q., Cash, S., Westover, M.B.: Exact discovery of time series motifs. In: SDM, pp. 473–484, SIAM (2009) Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., Keogh, E.: Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 262–270, ACM (2012) Vahdatpour, A., Amini, N., Sarrafzadeh, M.: Toward unsupervised activity discovery using multi-dimensional motif detection in time series. In: IJCAI, vol. 9, pp. 1261–1266 (2009) Wilkes, J.: More Google cluster data. Google research blog, November 2011. Posted at http://googleresearch.blogspot.com/2011/11/more-google-cluster-data.html Yang, Q., Peng, C., Zhao, H., Yu, Y., Zhou, Y., Wang, Z., Du, S.: A new method based on PSR and EA-GMDH for host load prediction in cloud computing system. J. Supercomput. 68(3), 1402–1417 (2014) Yang, W., Li, K., Mo, Z., Li, K.: Performance optimization using partitioned SpMV on GPUs and multicore CPUs. IEEE Trans. Comput. 64(9), 2623–2636 (2015) Zhang, Y., Sun, W., Inoguchi, Y.: CPU load predictions on the computational grid. IEICE Trans. Inf. Syst. 90(1), 40–47 (2007)