A survey on the vulnerability of deep neural networks against adversarial attacks

Progress in Artificial Intelligence - Tập 11 - Trang 131-141 - 2022
Andy Michel1, Sumit Kumar Jha2, Rickard Ewetz3
1Department of Computer Science, University Of Central Florida, Orlando, United States
2Department of Computer Science, University of Texas San Antonio, San Antonio, United States
3Department of Electrical and Computer Engineering, University Of Central Florida, Orlando, United States

Tóm tắt

With the advancement of accelerated hardware in recent years, there has been a surge in the development and application of intelligent systems. Deep learning systems, in particular, have shown exciting results in a wide range of tasks: classification, detection, and recognition. Despite these remarkable achievements, there remains an active area critical for the safety of those systems. Deep learning algorithms have proven to be brittle against adversarial attacks. That is, carefully crafted adversarial inputs can consistently trigger an erroneous classification output from a network model. Hence, the motivation of this paper, we survey four different attacks, two adversarial defense methods on three benchmark datasets to gain a better understanding of how to protect those systems. We motivate our findings by achieving state-of-the-art accuracy and collecting empirical evidence of attack effectiveness against deep neural networks. Additionally, we leverage network explainability methods to investigate an alternative approach to defend deep neural networks.

Tài liệu tham khảo

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, IJ., Fergus, R.: Intriguing properties of neural networks. ICLR (2014b). arxiv:1312.6199 Xie, C., Tan, M., Gong, B., Yuille, A., Le, Q. V.: Smooth adversarial training. (2020). arXiv preprint arXiv:2006.14536 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: international conference on learning representations (2015) Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Adv. Neural Inf. Process, Syst (2014) Xu, H., Ma, Y., Liu, H.C., Deb, D., Liu, H., Tang, J.L., Jain, A.K.: Adversarial attacks and defenses in images, graphs and text: a review. Int. J. Automat. Comput. 17(2), 151–178 (2020) Martin, A., Soumith, C., Léon, B.: n (2017).Wasserstein GAN. arXiv preprint arXiv:1701.07875 Wainberg, M., Merico, D., Delong, A., Frey, B.J.: Deep learning in biomedicine. Nat. Biotechnol 36(9), 829–838 (2018) Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Bengio, Y.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27 (2014) Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with task learning. In: proceedings of the 25th international conference on machine learning, pp. 160-167. ACM (2008) Kaiming, H., Xiangyu, Z., Shaoqing, R., Jian, S.: Delving deep into rectifiers: surpassing human-level performance on imageNet classification. In: proceedings of the 2015 IEEE international conference on computer vision (ICCV) (ICCV ’15). IEEE Computer Society, USA, 1026-1034. (2015) https://doi.org/10.1109/ICCV.2015.123 Tramàr, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D.,and McDaniel, P. Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204 (2017) Samangouei, P., M. Kabkab, and R. Defense-GAN Chellappa. Protecting classifiers against adversarial attacks using generative models. arXiv 2018. arXiv preprint arXiv:1805.06605 (2018) Carlini, N., Wagner, D.A.: Towards evaluating the robustness of neural networks. In: 2017 IEEE symposium on security and privacy (SP), pp. 39–57 (2017) Papernot, N. et al.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European symposium on security and privacy (EuroS&P), pp. 372–387 (2016) Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. (2018) Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, University of Toronto, (2009) Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791 Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017) Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE symposium on security and privacy (SP), pp. 39-57. IEEE (2017b) Miyato, T., Dai, A. M., Goodfellow, I. (2016). Adversarial training methods for semi-supervised text classification. arXiv preprint arXiv:1605.07725 Goodfellow, I., Qin, Y., Berthelot, D.:Evaluation methodology for attacks against confidence thresholding models (2018) Papernot, N. et al.: Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE symposium on security and privacy (SP) (2016), pp. 582–597 Jha, S., Raj, S., Fernandes, S., Jha, S. K., Jha, S., Jalaian, B. Swami, A.:Attribution-based confidence metric for deep neural networks (2019) Tian, Y., Pei, K., Jana, S., Ray, B.: Deeptest: automated testing of deep-neural-network-driven autonomous cars. In: proceedings of the 40th international conference on software engineering, pp. 303–314 (2018) Ruder, S., Vulić, I., Søgaard, A.: A survey of cross-lingual word embedding models. J. Artif. Intell. Res. 65, 569–631 (2019) Sundararajan, M., Taly, A., Yan, Q.:Axiomatic attribution for deep networks. In: international conference on machine learning (pp. 3319-3328). PMLR (2017) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... Rabinovich, A.: Going deeper with convolutions. In: proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9) (2015) Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. In: international conference on machine learning, pp. 3145–3153. PMLR (2017) Binder, A., Montavon, G., Lapuschkin, S., Müller, K. R., Samek, W.: Layer-wise relevance propagation for neural networks with local renormalization layers. In: international conference on artificial neural networks, pp. 63–71. Springer, Cham (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016) Malik, A., et al. Calibrated model-based deep reinforcement learning. In: international conference on machine learning. PMLR, (2019) Platt, J.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classif. 10(3), 61–74 (1999) Guo, C., et al. On calibration of modern neural networks. In: international conference on machine learning (2017) Park, S., et al. PAC confidence sets for deep neural networks via calibrated prediction. In: 8th international conference on learning representations (ICLR) (2020) Wang, W., Xingye, Q.: Learning confidence sets using support vector machines. NeurIPS (2018) Friedman, J., Hastie, T., Tibshirani, R.: The elements of statistical learning. Springer, New York (2001) Naeini, M. P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: twenty-ninth AAAI conference on artificial intelligence. (2015)