P2T: Pyramid Pooling Transformer for Scene Understanding
Tóm tắt
Từ khóa
Tài liệu tham khảo
Simonyan, Very deep convolutional networks for large-scale image recognition, Proc. Int. Conf. Learn. Represent.
Tan, EfficientNet: Rethinking model scaling for convolutional neural networks, Proc. Int. Conf. Mach. Learn., 6105
Vaswani, Attention is all you need, Proc. Adv. Neural Inform. Process. Syst., 6000
Zhu, 2020, Deformable DETR: Deformable transformers for end-to-end object detection
Dosovitskiy, An image is worth 16x16 words: Transformers for image recognition at scale, Proc. Int. Conf. Learn. Represent.
Liu, 2021, Transformer in convolutional neural networks
Chu, Twins: Revisiting the design of spatial attention in vision transformers, Proc. Adv. Neural Inform. Process. Syst., 9355
Han, 2021, Demystifying local vision transformer: Sparse connectivity, weight sharing, and dynamic weight
Howard, 2017, MobileNets: Efficient convolutional neural networks for mobile vision applications
Hu, 2021, ISTR: End-to-end instance segmentation with transformers
Touvron, Training data-efficient image transformers distillation through attention, Proc. Int. Conf. Mach. Learn., 10347
Chu, 2021, Conditional positional encodings for vision transformers
Jiang, 2021, Token labeling: Training a 85.5% top-1 accuracy vision transformer with 56M parameters on ImageNet
Li, 2021, LocalViT: Bringing locality to vision transformers
Ba, 2016, Layer normalization
Dong, 2021, Attention is not all you need: Pure attention loses rank doubly exponentially with depth
Hendrycks, 2016, Gaussian error linear units (GELUs)
Loshchilov, Decoupled weight decay regularization, Proc. Int. Conf. Learn. Represent.
Glorot, Understanding the difficulty of training deep feedforward neural networks, Proc. Int. Conf. Artif. Intell. Statist., 249
Contributors, 2020, MMSegmentation: OpenMMLab semantic segmentation toolbox and benchmark
Chen, 2019, MMDetection: Open MMLab detection toolbox and benchmark