Chia sẻ trọng số trong các lớp nông thông qua các phép tích chập tương đương nhóm quay

Springer Science and Business Media LLC - Tập 19 - Trang 115-126 - 2022
Zhiqiang Chen1, Ting-Bing Xu2, Jinpeng Li3, Huiguang He1,4
1Research Center for Brain-inspired Intelligence, Institute of Automation, Chinese Academy of Sciences, Beijing, China
2School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing, China
3Ningbo HwaMei Hospital, University of Chinese Academy of Sciences, Ningbo, China
4Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Beijing, China

Tóm tắt

Phép toán tích chập có đặc tính equivariance nhóm dịch chuyển. Để đạt được nhiều tính chất equivariance nhóm hơn, các phép tích chập tương đương nhóm quay (RGEC) được đề xuất nhằm đạt được cả tính chất equivariance nhóm dịch chuyển và quay. Tuy nhiên, các công trình trước đó tập trung nhiều hơn vào số lượng tham số mà thường bỏ qua các chi phí tài nguyên khác. Trong bài báo này, chúng tôi xây dựng mạng lưới của mình mà không đưa ra thêm chi phí tài nguyên. Cụ thể, một bộ lọc tích chập được quay đến các hướng khác nhau để trích xuất đặc trưng từ nhiều kênh. Đồng thời, chúng tôi sử dụng ít bộ lọc hơn nhiều so với các công trình trước đó để đảm bảo rằng số kênh đầu ra không tăng lên. Để tăng cường tính trực giao của các bộ lọc ở các hướng khác nhau, chúng tôi xây dựng hàm mất mát không tối đa hóa trên chiều quay để chặn các hướng khác trừ hướng có kích hoạt cao nhất. Xem xét rằng các đặc trưng cấp thấp hưởng lợi nhiều hơn từ tính đối xứng quay, chúng tôi chỉ chia sẻ trọng số trong các lớp nông (SWSL) thông qua RGEC. Các thử nghiệm rộng rãi trên nhiều tập dữ liệu (ví dụ: ImageNet, CIFAR và MNIST) cho thấy SWSL có thể hưởng lợi hiệu quả từ việc chia sẻ trọng số cấp cao hơn và cải thiện hiệu suất của nhiều mạng khác nhau, bao gồm cả kiến trúc plain và ResNet. Trong khi đó, số lượng bộ lọc và tham số tích chập ít hơn nhiều (ví dụ: ít hơn 75%, 87,5%) trong các lớp nông, và không có chi phí tính toán bổ sung nào được đưa ra.

Từ khóa

#RGEC #chia sẻ trọng số #tính trực giao #mạng nơron sâu #phép tích chập nhóm quay

Tài liệu tham khảo

Y. Lecun, L. Bottou, Y. Bengio, P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. DOI: https://doi.org/10.1109/5.726791. Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Computation, vol. 1, no. 4, pp. 541–551, 1989. DOI: https://doi.org/10.1162/neco.1989.1.4.541. R. Girshick, J. Donahue, T. Darrell, J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Columbus, USA, pp. 580–587, 2014. DOI: https://doi.org/10.1109/CVPR.2014.81. J. Long, E. Shelhamer, T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 3431–3440, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298965. I. D. Longstaff, J. F. Cross. A pattern recognition approach to understanding the multi-layer perception. Pattern Recognition Letters, vol. 5, no. 5, pp. 315–319, 1987. DOI: https://doi.org/10.1016/0167-8655(87)90072-9. A. G. Howard, M. L. Zhu, B. Chen, D. Kalenichenko, W. J. Wang, T. Weyand, M. Andreetto, H. Adam. MobileNets: Efficient convolutional neural networks for mobile vision applications. [Online], Available: https://arxiv.org/abs/1704.04861, 2017. T. Zhang, G. J. Qi, B. Xiao, J. D. Wang. Interleaved group convolutions. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Venice, Italy, pp. 4383–4392, 2017. DOI: https://doi.org/10.1109/ICCV.2017.469. X. Y. Zhang, X. Y. Zhou, M. X. Lin, J. Sun. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 6848–6856, 2018. DOI: https://doi.org/10.1109/CV-PR.2018.00716. T. Cohen, M. Welling. Group equivariant convolutional networks. In Proceedings of the 33rd International Conference on Machine Learning, New York, USA, pp.2990–2999, 2016. M. Weiler, F. A. Hamprecht, M. Storath. Learning steerable filters for rotation equivariant CNNs. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 849–858, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00095. M. Weiler, G. Cesa. General E(2)-equivariant steerable CNNs. In Proceedings of the Annual Conference on Neural Information Processing Systems, Vancouver, Canada, pp. 14334–14345 2019. Z. Y. Shen, L. S. He, Z. C. Lin, J. W. Ma. PDO-eConvs: Partial differential operator based equivariant convolutions. In Proceedings of the 37th International Conference on Machine Learning, pp. 8697–9706, 2020. A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, USA, pp. 1097–1105, 2012. K. Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image recognition. [Online], Available: https://arxiv.org/abs/1409.1556, 2014. C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich. Going deeper with convolutions In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298594. K. M. He, X Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 770–778, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90. G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger. Densely connected convolutional networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 2261–2269, 2017. DOI: https://doi.org/10.1109/CVPR.2017.243. M. Lin, Q. Chen, S. C. Yan. Network in network. [Online], Available: https://arxiv.org/abs/1312.4400, 2014. K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904–1916, 2014. DOI: https://doi.org/10.1109/TPAMI.2015.2389824. R. Girshick. Fast R-CNN. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Santiago, Chile, pp. 1440–1448, 2015. DOI: https://doi.org/10.1109/ICCV.2015.169. S. Q. Ren, K. M. He, R. Girshick, J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 91–99, 2015. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A. C. Berg. SSD: Single shot MultiBox detector. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 21–37, 2016. DOI: https://doi.org/10.1007/978-3-319-46448-0_2. J. Redmon, S. Divvala, R. Girshick, A. Farhadi. You only look once: Unified, real-time object detection. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 779–788, 2016. DOI: https://doi.org/10.1109/CVPR.2016.91. K. M. He, G. Gkioxari, P. Dollár, R. Girshick. Mask R-CNN IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no 2, pp. 386–397, 2020. DOI: https://doi.org/10.1109/TPAMI.2018.2844175. K. Kamnitsas, C. Ledig, V. F. J. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert, B. Glocker. Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation Medical Image Analysis, vol. 36, pp. 61–78, 2017 DOI: https://doi.org/10.1016/j.media.2016.10.004. L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille. Semantic image segmentation with deep convolutional nets and fully connected CRFs. [Online], Available: https://arxiv.org/abs/1412.7062, 2014. O. Ronneberger, P. Fischer, T. Brox. U-Net: Convolutional networks for biomedical image segmentation In Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, Munich, Germany, pp. 234–241, 2015. DOI: https://doi.org/10.1007/978-3-319-24574-4_28. M. Frrer, M. Gary, S. Hernández. Representation of group isomorphisms: The compact case. Journal of Function Spaces, vol. 2015, Article number 879414, 2015. DOI: https://doi.org/10.1155/2015/879414. Y. C. Xu, T. J. Xiao, J. X. Zhang, K. Y. Yang, Z. Zhang. Scale-invariant convolutional neural networks. [Online], Available: https://arxiv.org/abs/1411.6369, 2014. S. Dieleman, J. De Fauw, K. Kavukcuoglu. Exploiting cyclic symmetry in convolutional neural networks. In Proceedings of the 33rd International Conference on Machine Learning, New York, USA, pp. 1889–1898, 2016. X. Y. Cheng, Q. Qiu, A. R. Calderbank, G. Sapiro. Rot-DCF: Decomposition of convolutional filters for rotation-equivariant deep networks. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, USA, 2019. Y. Xi, J. B. Zheng, X. X. Li, X. Y. Xu, J. C. Ren, G. Xie. SR-POD: Sample rotation based on principal-axis orientation distribution for data augmentation in deep object detection. Cognitive Systems Research, vol. 52, pp. 144–154, 2018. DOI: https://doi.org/10.1016/j.cogsys.2018.06.014. C. J. Luo, Y. Z. Zhu, L. W. Jin, Y. P. Wang. Learn to augment: Joint data augmentation and network optimization for text recognition. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 13743–13752, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.01376. S. Graham, D. Epstein, N. Rajpoot. Dense steerable filter CNNs for exploiting rotational symmetry in histology images. IEEE Transactions on Medical Imaging, vol. 39, no. 12, pp. 4124–4136, 2020. DOI: https://doi.org/10.1109/TMI.2020.3013246. M. Jacquemont, L. Antiga, T. Vuillaume, G. Silvestri, A. Benoit, P. Lambert, G. Maurin. Indexed operations for non-rectangular lattices applied to convolutional neural networks. In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Prague, Czech Republic, pp. 362–371, 2019. E. Hoogeboom, J. W. T. Peters, T. S. Cohen, M. Welling. HexaConv. [Online], Available: https://arxiv.org/abs/1803.02108, 2018. C. E. Rasmussen, Z. Ghahramani. Occam’s razor. In Proceedings of the 13th International Conference on Neural Information Processing Systems, Denver, USA, pp. 276–282, 2000. S. Ioffe, C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, Lille, France, pp. 448–456, 2015. J. Hu, L. Shen, S. Albanie, G. Sun, E. H. Wu. Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 8, pp. 2011–2023, 2020. DOI: https://doi.org/10.1109/TPAMI.2019.2913372. I. Sutskever, J. Martens, G. Dahl, G. Hinton. On the importance of initialization and momentum in deep learning. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, USA, pp. III-1139–III-1147, 2013. H. Larochelle, D. Erhan, A. Courville, J. Bergstra, Y. Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th International Conference on Machine Learning, ACM, Corvalis, USA, pp. 473–480, 2007. DOI: https://doi.org/10.1145/1273496.1273556. D. P. Kingma, J. Ba. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, USA, 2015. J. Bruna, S. Mallat. Invariant scattering convolution networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1872–1886, 2013. DOI: https://doi.org/10.1109/TPAMI.2012.230. T. H. Chan, K. Jia, S. H. Gao, J. W. Lu, Z. N. Zeng, Y. Ma. PCANet: A simple deep learning baseline for image classification? IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 5017–5032, 2015. DOI: https://doi.org/10.1109/TIP.2015.2475625. K. Sohn, H. Lee. Learning invariant representations with local transformations. In Proceedings of the 29th International Conference on Machine Learning, Edinburgh, UK, pp.1339-1346, 2012. Y. Z. Zhou, Q. X. Ye, Q. Qiu, J. B. Jiao. Oriented response networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 4961–4970, 2017. DOI: https://doi.org/10.1109/CVPR.2017.527. D. Laptev, N. Savinov, J. M. Buhmann, M. Pollefeys. TI-POOLING: Transformation-invariant pooling for feature learning in convolutional neural networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 289–297, 2016. DOI: https://doi.org/10.1109/CVPR.2016.38.