Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo
Bảo trì dự đoán thông minh cho hệ thống tính toán hiệu suất cao: một tổng quan tài liệu
Tóm tắt
Bảo trì dự đoán là một công cụ vô giá để bảo vệ sức khỏe của các tài sản quan trọng trong nhiệm vụ trong khi giảm thiểu chi phí hoạt động liên quan đến can thiệp theo lịch trình. Các kỹ thuật trí tuệ nhân tạo đã chứng minh được hiệu quả trong việc xử lý khối lượng dữ liệu lớn, chẳng hạn như những dữ liệu được thu thập bởi các cảm biến thường có trong thiết bị. Trong nghiên cứu này, chúng tôi nhằm mục đích xác định và tổng hợp các ấn phẩm hiện có trong lĩnh vực bảo trì dự đoán, khám phá các thuật toán học máy và học sâu nhằm cải thiện hiệu suất của việc phân loại và phát hiện sự cố. Chúng tôi cho thấy một xu hướng tăng đáng kể trong việc sử dụng các phương pháp học sâu từ dữ liệu cảm biến được thu thập bởi các tài sản quan trọng trong nhiệm vụ để phát hiện sự cố sớm, nhằm hỗ trợ lịch trình bảo trì dự đoán. Chúng tôi cũng xác định các khía cạnh cần được điều tra thêm trong các công trình tương lai, liên quan đến việc khám phá các hệ thống hỗ trợ sự sống cho các tài sản siêu máy tính và tiêu chuẩn hóa các chỉ số hiệu suất.
Từ khóa
#bảo trì dự đoán #trí tuệ nhân tạo #học máy #học sâu #phát hiện sự cố #hệ thống tính toán hiệu suất cao.Tài liệu tham khảo
Aydin O, Guldamlasioglu S (2017) Using LSTM networks to predict engine condition on large scale data processing framework. In: 2017 4th International Conference on Electrical and Electronic Engineering (ICEEE). IEEE, pp 281–285. https://doi.org/10.1109/iceee2.2017.7935834
Borghesi A, Bartolini A, Lombardi M, Milano M, Benini L (2019) Anomaly detection using autoencoders in high performance computing systems. Proc AAAI Conf Artif Intell 33:9428–9433. https://doi.org/10.1609/aaai.v33i01.33019428
Borghesi A, Libri A, Benini L, Bartolini A (2019) Online anomaly detection in hpc systems. In: 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS). IEEE, pp 229–233. https://doi.org/10.1109/AICAS.2019.8771527
Caponetto R, Rizzo F, Russotti L, Xibilia M (2019) Deep learning algorithm for predictive maintenance of rotating machines through the analysis of the orbits shape of the rotor shaft. Ergonomics and applied human factors. International conference on smart innovation. Springer, pp 245–250. https://doi.org/10.1007/978-3-030-22964-1_25
Carvalho T P, Soares F A, Vita R, Francisco R d P, Basto J P, Alcalá S G (2019) A systematic literature review of machine learning methods applied to predictive maintenance. Comput Ind Eng 137:106024. https://doi.org/10.1016/j.cie.2019.106024
Chen X, Lu CD, Pattabiraman K (2014) Failure prediction of jobs in compute clouds: a google cluster case study. In: 2014 IEEE international symposium on software reliability engineering workshops. IEEE, pp 341–346. https://doi.org/10.1109/ISSREW.2014.105
Das A, Mueller F, Siegel C, Vishnu A (2018) Desh: deep learning for system health prediction of lead times to failure in hpc. In: Proceedings of the 27th international symposium on high-performance parallel and distributed computing. pp 40–51. https://doi.org/10.1145/3208040.3208051
Du M, Li F, Zheng G, Srikumar V (2017) Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. pp 1285–1298. https://doi.org/10.1145/3133956.3134015
Essien A, Giannetti C (2020) A deep learning model for smart manufacturing using convolutional LSTM neural network autoencoders. IEEE Trans Ind Inform 16(9):6069–6078. https://doi.org/10.1109/TII.2020.2967556
Fink O, Wang Q, Svensén M, Dersin P, Lee WJ, Ducoffe M (2020) Potential, challenges and future directions for deep learning in prognostics and health management applications. Eng Appl Artif Intell 92:103678. https://doi.org/10.1016/j.engappai.2020.103678
Francois C (2017) Deep learning with python. Apress, Berkeley
Ghiasvand S, Ciorba F.M (2019) Anomaly detection in high performance computers: a vicinity perspective. In: 2019 18th international symposium on parallel and distributed computing (ISPDC). IEEE, pp 112–120. https://doi.org/10.1109/ISPDC.2019.00024
Giommi L, Bonacorsi D, Diotalevi T, Tisbeni S.R, Rinaldi L, Morganti L, Falabella A, Ronchieri E, Ceccanti A, Martelli B (2019) Towards predictive maintenance with machine learning at the INFN-CNAF computing centre. In: international symposium on grids & clouds (ISGC). Taipei, Taiwan: Proceedings of Science, p 17. https://doi.org/10.22323/1.351.0003
Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning, vol 1. MIT press Cambridge, Cambridge
Guan Q, Zhang Z, Fu S (2012) Ensemble of bayesian predictors and decision trees for proactive failure management in cloud computing systems. J Commun 7(1):52–61. https://doi.org/10.4304/jcm.7.1.52-61
Haykin S (2007) Neural networks: a comprehensive foundation. Prentice-Hall Inc, New Jersey
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Hu B, Pang CK, Luo M, Li X, Chan HL (2012) A two-stage equipment predictive maintenance framework for high-performance manufacturing systems. In: 2012 7th IEEE Conference on Industrial Electronics and Applications (ICIEA). IEEE, pp 1343–1348. https://doi.org/10.1109/ICIEA.2012.6360931
Kitchenham B, Brereton OP, Budgen D, Turner M, Bailey J, Linkman S (2009) Systematic literature reviews in software engineering-a systematic literature review. Inf Softw Technol 51(1):7–15. https://doi.org/10.1016/j.infsof.2008.09.009
Klinkenberg J, Terboven C, Lankes S, Müller MS (2017) Data mining-based analysis of hpc center operations. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp 766–773. https://doi.org/10.1109/CLUSTER.2017.23
Kraus M, Feuerriegel S (2019) Forecasting remaining useful life: interpretable deep learning approach via variational bayesian inferences. Decis Support Syst 125:113100. https://doi.org/10.1016/j.dss.2019.113100
Li X, Ding Q, Sun JQ (2018) Remaining useful life estimation in prognostics using deep convolution neural networks. Reliab Eng Syst Saf 172:1–11. https://doi.org/10.1016/j.ress.2017.11.021
Lima ALDCD, Aranha VM, Sperandio EG (2019) Manutenção preditiva aplicada a ambientes de missão crítica de supercomputação utilizando inteligência artificial: Uma revisão sistemática de literatura. In: Anais do V Simpósio Internacional de Inovação e Tecnologia. Blucher Engineering Proceedings, pp 657–664. https://doi.org/10.5151/siintec2019-82
Luo B, Wang H, Liu H, Li B, Peng F (2018) Early fault detection of machine tools based on deep learning and dynamic identification. IEEE Trans Ind Electron 66(1):509–518. https://doi.org/10.1109/TIE.2018.2807414
Martínez D, Brewer W, Strelzoff A, Wilson A, Wade D (2020) Rotorcraft virtual sensors via deep regression. J Parallel Distrib Comput 135:114–126. https://doi.org/10.1016/j.jpdc.2019.08.008
Mathew V, Toby T, Singh V, Rao B.M, Kumar M.G (2017) Prediction of Remaining Useful Lifetime (RUL) of turbofan engine using machine learning. In: 2017 IEEE International Conference on Circuits and Systems (ICCS). IEEE, pp 306–311. https://doi.org/10.1109/ICCS1.2017.8326010
Mohammed B, Awan I, Ugail H, Younas M (2019) Failure prediction using machine learning in a virtualised HPC system and application. Cluster Computing 22(2):471–485. https://doi.org/10.1007/s10586-019-02917-1
Nakka N, Agrawal A, Choudhary A (2011) Predicting node failure in high performance computing systems from failure and usage logs. In: 2011 IEEE international symposium on parallel and distributed processing workshops and Phd Forum. IEEE, pp 1557–1566. https://doi.org/10.1109/IPDPS.2011.310
Nguyen KT, Medjaher K (2019) A new dynamic predictive maintenance framework using deep learning for failure prognostics. Reliab Eng Syst Saf 188:251–262. https://doi.org/10.1016/j.ress.2019.03.018
Nie B, Xue, J, Gupta S, Patel T, Engelmann C, Smirni E, Tiwari D (2018) Machine learning models for GPU error prediction in a large scale HPC system. In: 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, pp 95–106. https://doi.org/10.1109/DSN.2018.00022
Souza RM, Nascimento EGS, Miranda UA, Silva WJD, Lepikson HA (2021) Deep learning for diagnosis and classification of faults in industrial rotating machinery. Comput Ind Eng 153:107060. https://doi.org/10.1016/j.cie.2020.107060
Susto G.A, McLoone S, Pagano D, Schirru A, Pampuri S, Beghi A (2013) Prediction of integral type failures in semiconductor manufacturing through classification methods. In: 2013 IEEE 18th Conference on Emerging Technologies & Factory Automation (ETFA). IEEE, pp 1–4. https://doi.org/10.1109/ETFA.2013.6648127
Susto G.A, Schirru A, Pampuri S, McLoone S, Beghi A (2014) Machine learning for predictive maintenance: a multiple classifier approach. IEEE Trans Ind Inform 11(3):812–820. https://doi.org/10.1109/TII.2014.2349359
Tuncer O, Ates E, Zhang Y, Turk A, Brandt J, Leung VJ, Egele M, Coskun AK (2017) Diagnosing performance variations in HPC applications using machine learning. International supercomputing conference. Springer, pp 355–373. https://doi.org/10.1007/978-3-319-58667-0_19
Wu Y, Yuan M, Dong S, Lin L, Liu Y (2018) Remaining useful life estimation of engineered systems using vanilla LSTM neural networks. Neurocomputing 275:167–179. https://doi.org/10.1016/j.neucom.2017.05.063
Yurek O.E, Birant D (2019) Remaining useful life estimation for predictive maintenance using feature engineering. In: Innovations in Intelligent Systems and Applications Conference (ASYU). IEEE, pp 1–5. https://doi.org/10.1109/ASYU48272.2019.8946397
Zhang J, Wang P, Yan R, Gao R.X (2018) Long short-term memory for machine remaining life prediction. J Manuf Syst 48:78–86. https://doi.org/10.1016/j.jmsy.2018.05.011
Zhang K, Xu J, Min M.R, Jiang G, Pelechrinis K, Zhang H (2016) Automated IT system failure prediction: a deep learning approach. In: 2016 IEEE International Conference on Big Data (Big Data). IEEE, pp 1291–1300. https://doi.org/10.1109/BigData.2016.7840733
Zhang S, Li X, Wang J, Su S (2017) Curve-registration-based feature extraction for predictive maintenance of industrial equipment. International Conference on Collaborative Computing: Networking, Applications and Worksharing. Springer, pp 253–263. https://doi.org/10.1007/978-3-030-00916-8_24
Zhao H, Wang J, Gao P (2017) A Deep Learning Approach for Condition-Based Monitoring and Fault Diagnosis of Rod Pump System. STIoT Editorial Board 32. https://doi.org/10.29268/stsc.2017.0003
Zheng S, Farahat A, Gupta C (2019) Generative adversarial networks for failure prediction. Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, pp 621–637. https://doi.org/10.1007/978-3-030-46133-1_37
Zhu B, Wang G, Liu X, Hu D, Lin S, Ma J (2013) Proactive drive failure prediction for large scale storage systems. In: IEEE 29th symposium on mass storage systems and technologies (MSST). IEEE, pp 1–5. https://doi.org/10.1109/MSST.2013.6558427