Trembling triggers: exploring the sensitivity of backdoors in DNN-based face recognition

EURASIP Journal on Information Security - Tập 2020 - Trang 1-15 - 2020
Cecilia Pasquini1, Rainer Böhme2
1Department of Information Engineering and Computer Science, University of Trento, Trento, Italy
2Department of Computer Science, University of Innsbruck, Innsbruck, Austria

Tóm tắt

Backdoor attacks against supervised machine learning methods seek to modify the training samples in such a way that, at inference time, the presence of a specific pattern (trigger) in the input data causes misclassifications to a target class chosen by the adversary. Successful backdoor attacks have been presented in particular for face recognition systems based on deep neural networks (DNNs). These attacks were evaluated for identical triggers at training and inference time. However, the vulnerability to backdoor attacks in practice crucially depends on the sensitivity of the backdoored classifier to approximate trigger inputs. To assess this, we study the response of a backdoored DNN for face recognition to trigger signals that have been transformed with typical image processing operators of varying strength. Results for different kinds of geometric and color transformations suggest that in particular geometric misplacements and partial occlusions of the trigger limit the effectiveness of the backdoor attacks considered. Moreover, our analysis reveals that the spatial interaction of the trigger with the subject’s face affects the success of the attack. Experiments with physical triggers inserted in live acquisitions validate the observed response of the DNN when triggers are inserted digitally.

Tài liệu tham khảo

T. J. Sejnowski, The deep learning revolution (MIT Press, Cambridge, Massachusetts, 2018). Y. Taigman, M. Yang, M. Ranzato, L. Wolf, in 2014 IEEE Conference on Computer Vision and Pattern Recognition. Deepface: Closing the gap to human-level performance in face verification (IEEE, 2014). https://doi.org/10.1109/cvpr.2014.220. B. Biggio, G. Fumera, F. Roli, Security evaluation of pattern classifiers under attack. IEEE Trans. Knowl. Data Eng.26(4), 984–996 (2014). M. Barreno, B. Nelson, R. Sears, A. D. Joseph, J. D. Tygar, in Proceedings of the 2006 ACM Symposium on Information, computer and communications security - ASIACCS ’06. Can machine learning be secure? (ACM Press, 2006). https://doi.org/10.1145/1128817.1128824. M. Barreno, B. Nelson, A. D. Joseph, J. D. Tygar, The security of machine learning. Mach. Learn.81(2), 121–148 (2010). C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, R. Fergus, Intriguing properties of neural networks. CoRR. abs/1312.6199: (2013). arXIv. L. Muñoz-González, B. Biggio, A. Demontis, A. Paudice, V. Wongrassamee, E. C. Lupu, F. Roli, in Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security - AISec ’17. Towards poisoning of deep learning algorithms with back-gradient optimization (ACM Press, 2017). https://doi.org/10.1145/3128572.3140451. M. Jagielski, A. Oprea, B. Biggio, C. Liu, C. Nita-Rotaru, B. Li, in 2018 IEEE Symposium on Security and Privacy (SP). Manipulating machine learning: poisoning attacks and countermeasures for regression learning (IEEE, 2018). https://doi.org/10.1109/sp.2018.00057. A. Shafahi, W. R. Huang, M. Najibi, O. Suciu, C. Studer, T. Dumitras, T. Goldstein, in Conference on Neural Information Processing Systems (NIPS). Poison frogs! targeted clean-label poisoning attacks on neural networks, (2018). Y. Liu, S. Ma, Y. Aafer, W. -C. Lee, J. Zhai, W. Wang, X. Zhang, in Proceedings 2018 Network and Distributed System Security Symposium. Trojaning attack on neural networks (Internet Society, 2018). https://doi.org/10.14722/ndss.2018.23291. M. Sharif, S. Bhagavatula, L. Bauer, M. K. Reiter, in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS ’16. Accessorize to a crime: real and stealthy attacks on state-of-the-art face recognition (ACMNew York, 2016), pp. 1528–1540. https://doi.org/10.1145/2976749.2978392. T. B. Brown, D. Mané, A. Roy, M. Abadi, J. Gilmer, Adversarial patch. CoRR. abs/1712.09665: (2017). http://arxiv.org/abs/1712.09665. T. Gu, B. Dolan-Gavitt, S. Garg, in Machine Learning and Computer Security (MLSec) NIPS Workshop. Badnets: identifying vulnerabilities in the machine learning model supply chain, (2017). T. Gu, K. Liu, B. Dolan-Gavitt, S. Garg, Badnets: evaluating backdooring attacks on deep neural networks. IEEE Access. 7:, 47230–47244 (2019). https://doi.org/10.1109/ACCESS.2019.2909068. Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition. Proc. IEEE. 86(11), 2278–2324 (1998). Y. Yao, H. Li, H. Zheng, B. Y. Zhao, Regula sub-rosa: latent backdoor attacks on deep neural networks. CoRR. abs/1905.10447: (2019). http://arxiv.org/abs/1905.10447. C. Liao, H. Zhong, A. C. Squicciarini, S. Zhu, D. J. Miller, Backdoor embedding in convolutional neural network models via invisible perturbation. CoRR. abs/1808.10307: (2018). http://arxiv.org/abs/1808.10307. M. Barni, K. Kallas, B. Tondi, A new backdoor attack in CNNs by training set corruption without label poisoning. CoRR. abs/1902.11237: (2019). http://arxiv.org/abs/1902.11237. A. Bhalerao, K. Kallas, B. Tondi, M. Barni, in 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP). Luminance-based video backdoor attack against anti-spoofing rebroadcast detection (IEEE, 2019). https://doi.org/10.1109/mmsp.2019.8901711. W. Guo, L. Wang, X. Xing, M. Du, D. Song, Tabor: A highly accurate approach to inspecting and restoring trojan backdoors in ai systems. ArXiv abs/1908.01763 (2019). B. Tran, J. Li, A. Madry, in Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18. Spectral signatures in backdoor attacks, (2018), pp. 8011–8021. B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee, I. Molloy, B. Srivastava, in Artificial Intelligence Safety Workshop @ AAAI. Detecting backdoor attacks on deep neural networks by activation clustering, (2019). K. Liu, B. Dolan-Gavitt, S. Garg, in Research in Attacks, Intrusions, and Defenses. Fine-pruning: defending against backdooring attacks on deep neural networks (Springer, 2018), pp. 273–294. https://doi.org/10.1007/978-3-030-00470-5_13. B. Wang, Y. Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, B. Y. Zhao, in 2019 IEEE Symposium on Security and Privacy (SP). Neural cleanse: identifying and mitigating backdoor attacks in neural networks, (2019), pp. 707–723. https://doi.org/10.1109/SP.2019.00031. Y. Gao, C. Xu, D. Wang, S. Chen, D. C. Ranasinghe, S. Nepal, STRIP: a defence against trojan attacks on deep neural networks. CoRR. abs/1902.06531: (2019). http://arxiv.org/abs/1902.06531. E. Chou, F. Tramèr, G. Pellegrino, D. Boneh, Sentinet: Detecting physical attacks against deep learning systems. CoRR. abs/1812.00292: (2018). http://arxiv.org/abs/1812.00292. F. A. P. Petitcolas, R. J. Anderson, M. G. Kuhn, in Information Hiding (2nd International Workshop), LNCS 1525, ed. by D. Aucsmith. Attacks on copyright marking systems (SpringerBerlin Heidelberg, 1998), pp. 219–239. M. Barni, F. Bartolini, T. Furon, A general framework for robust watermarking security. Sig. Process.83(10), 2069–2084 (2003). TrojanNN. https://github.com/PurduePAML/TrojanNN. Accessed 2 June 2019. O. M. Parkhi, A. Vedaldi, A. Zisserman, in British Machine Vision Conference. Deep face recognition, (2015). VGG Face Dataset. http://www.robots.ox.ac.uk/\texttildelowvgg/data/vgg_face/. Accessed 2 June 2019. D. Vàzquez-Padìn, F. Pèrez-Gonzàlez, P. Comesaña-Alfaro, A random matrix approach to the forensic analysis of upscaled images. IEEE Trans. Inf. Forens. Secur.12(9), 2115–2130 (2017). C. Pasquini, R. Böhme, Information-theoretic bounds for the forensic detection of downscaled signals. IEEE Trans. Inf. Forens. Secur.14(7), 1928–1943 (2019). R. Böhme, M. Kirchner, ed. by S. Katzenbeisser, F. Petitcolas. Information hiding (Artech HouseNorwood, 2016), pp. 231–259. C. Pasquini, P. Schöttle, R. Böhme, G. Boato, F. Pèrez-Gonzàlez, in Proceedings of the 4th ACM Workshop on Information Hiding and Multimedia Security - IH&MMSec ’16. Forensics of high quality and nearly identical jpeg image recompression (ACM Press, 2016). https://doi.org/10.1145/2909827.2930787. K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, F. Tramèr, A. Prakash, T. Kohno, D. Song, in Proceedings of the 12th USENIX Conference on Offensive Technologies, WOOT’18. Physical adversarial examples for object detectors, (2018), p. 1. K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, C. Xiao, A. Prakash, T. Kohno, D. Song, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Robust physical-world attacks on deep learning visual classification (IEEE, 2018). https://doi.org/10.1109/cvpr.2018.00175. X. Chen, C. Liu, B. Li, K. Lu, D. Song, Targeted backdoor attacks on deep learning systems using data poisoning. CoRR. abs/1712.05526: (2017). arXIv. E. Eidinger, R. Enbar, T. Hassner, Age and gender estimation of unfiltered faces. IEEE Transactions on Information Forensics and Security. 9(12), 2170–2179 (2014). M. Barni, A. Costanzo, E. Nowroozi, B. Tondi, in 2018 25th IEEE International Conference on Image Processing (ICIP). CNN-based detection of generic contrast adjustment with jpeg post-processing (IEEE, 2018). https://doi.org/10.1109/icip.2018.8451698. C. Pasquini, G. Boato, N. Alajlan, F. G. B. De Natale, A deterministic approach to detect median filtering in 1D data. IEEE Trans. Inf. Forens. Secur.11(7), 1425–1437 (2016).