Estimation of Near-Instance-Level Attribute Bottleneck for Zero-Shot Learning
Tóm tắt
Zero-Shot Learning (ZSL) involves transferring knowledge from seen classes to unseen classes by establishing connections between visual and semantic spaces. Traditional ZSL methods identify novel classes by class-level attribute vectors, which implies an information bottleneck. These approaches often use class-level attribute vectors as the fitting target during training, disregarding the individual variations within a class. Moreover, the attributes used for training lack location information and are prone to mismatch with local regions of visual features. To this end, we introduce a Near-Instance-Level Attribute Bottleneck (IAB) to alter class-level attribute vectors as well as visual features throughout the training phase to better reflect their naturalistic correspondences. Specifically, our Near-Instance-Wise Attribute Adaptation (NAA) modifies class attribute vectors to obtain multiple attribute basis vectors, generating a subspace that is more relevant to instance-level samples. Additionally, our Vision Attribute Relation Strengthening (VARS) module searches for attribute-related regions within the features, offering additional location information during the training phase. The proposed method is evaluated on four ZSL benchmarks, revealing that it is superior or competitive to the state-of-the-art methods on ZSL and the more challenging Generalized Zero-Shot Learning (GZSL) settings. Extensive experiments corroborate the sustainability of this study as one of the most potential directions for ZSL, i.e., the effectiveness of enhancing the visual-semantic relationships formed during training using a simple model structure. Code is available at:
https://github.com/LanchJL/IAB-GZSL
.
Từ khóa
Tài liệu tham khảo
Adler, J., & Lunz, S. (2018). Banach wasserstein gan. In NeurIPS .
Akata, Z., Perronnin, F., & Harchaoui, Z., et al. (2013). Label-embedding for attribute-based classification. In CVPR, pp. 819–826.
Akata, Z., Reed, S., & Walter, D., et al. (2015). Evaluation of output embeddings for fine-grained image classification. CVPR, , 2927–2936.
Alemi, A. A., Fischer, I., & Dillon, J. V., et al. (2017). Deep variational information bottleneck. In ICLR.
Atzmon, Y., & Chechik, G. (2019). Adaptive confidence smoothing for generalized zero-shot learning. In CVPR.
Brown, T., Mann, B., Ryder, N., et al. (2020). Language models are few-shot learners. NeurIPS, 33, 1877–1901.
Cacheux, Y. L., Borgne, H. L., & Crucianu, M. (2019). Modeling inter and intra-class relations in the triplet loss for zero-shot learning. ICCV, pp. 10333–10342.
Cavazza, J., Murino, V., & Del Bue, A. (2023). No adversaries to zero-shot learning: Distilling an ensemble of gaussian feature generators. TPAMI, 45(10), 12167–12178.
Changpinyo, S., Chao, W. L., & Gong, B., et al. (2016). Synthesized classifiers for zero-shot learning. In CVPR.
Changpinyo, S., Chao, W. L., Gong, B., et al. (2020). Classifier and exemplar synthesis for zero-shot learning. IJCV, 128(1), 166–201.
Chao, W. L., Changpinyo, S., & Gong, B., et al. (2016). An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In: ECCV, Springer, pp. 52–68.
Chen, D., Shen, Y., & Zhang, H., et al. (2022). Zero-shot logit adjustment. In: IJCAI, pp. 813–819.
Chen, S., Wang, W., & Xia, B., et al. (2021). Free: Feature refinement for generalized zero-shot learning. In ICCV.
Chen, S., Hong, Z., Hou, W., et al. (2022). Transzero++: Cross attribute-guided transformer for zero-shot learning. TPAMI. https://doi.org/10.1109/TPAMI.2022.3229526
Chen, S., Hong, Z., & Liu, Y., et al. (2022). Transzero: Attribute-guided transformer for zero-shot learning. In AAAI.
Chen, S., Hong, Z., & Xie, G. S., et al. (2022). Msdn: Mutually semantic distillation network for zero-shot learning. In CVPR, pp. 7612–7621.
Chen, Z., Luo, Y., & Qiu, R., et al. (2021). Semantics disentangling for generalized zero-shot learning. In: ICCV.
Chen, Z., Huang, Y., & Chen, J., et al. (2023). Duet: Cross-modal semantic grounding for contrastive zero-shot learning. In: AAAI, pp. 405–413.
Cheng, Y., Qiao, X., & Wang, X. (2016). An improved indirect attribute weighted prediction model for zero-shot image classification. IEICE Transactions on Information and Systems, 99(2), 435–442.
Deng, J., Dong, W., & Socher, R., et al. (2009). Imagenet: A large-scale hierarchical image database. In CVPR, pp. 248–255.
Donahue, J., Jia, Y., & Vinyals, O., et al. (2014). Decaf: A deep convolutional activation feature for generic visual recognition. In ICML, PMLR, pp. 647–655.
Elhoseiny, M., Saleh, B., & Elgammal, A. (2013). Write a classifier: Zero-shot learning using purely textual descriptions. In ICCV, pp. 2584–2591.
Elhoseiny, M., Elgammal, A., & Saleh, B. (2016). Write a classifier: Predicting visual classifiers from unstructured text. TPAMI, 39(12), 2539–2553.
Elhoseiny, M., Zhu, Y., & Zhang, H., et al. (2017). Link the head to the“beak”: Zero shot learning from noisy text description at part precision. In CVPR, pp. 5640–5649.
Feng, Y., Huang, X., & Yang, P., et al. (2022). Non-generative generalized zero-shot learning via task-correlated disentanglement and controllable samples synthesis. In: CVPR, pp. 9346–9355.
Frome, A., Corrado, G., Shlens, J., et al. (2013). Devise: A deep visual-semantic embedding model. NeurIPS, 2121–2129.
Girshick, R. (2015). Fast r-cnn. In ICCV, pp. 1440–1448.
Goodfellow, I., Pouget-Abadie, J., & Mirza, M., et al. (2014). Generative adversarial nets. NeurIPS .
Han, Z., Fu, Z., & Yang, J. (2020). Learning the redundancy-free features for generalized zero-shot object recognition. InCVPR, pp. 12865–12874.
Han, Z., Fu, Z., & Chen, S., et al. (2021). Contrastive embedding for generalized zero-shot learning. In CVPR, pp. 2371–2381.
Han, Z., Fu, Z., Chen, S., et al. (2022). Semantic contrastive embedding for generalized zero-shot learning. IJCV, 130(11), 2606–2622.
He, K., Zhang, X., & Ren, S., et al. (2016). Deep residual learning for image recognition. In CVPR, pp. 770–778.
Hjelm, R. D., Fedorov, A., & Lavoie-Marchildon, S., et al. (2019). Learning deep representations by mutual information estimation and maximization. In ICLR.
Huynh, D., & Elhamifar, E. (2020). Fine-grained generalized zero-shot learning via dense attribute-based attention. In CVPR, pp. 4483–4493.
Kampffmeyer, M., Chen, Y., & Liang, X., et al. (2019). Rethinking knowledge graph propagation for zero-shot learning. In CVPR, pp. 11487–11496.
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In ICLR.
Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. In ICLR.
Kodirov, E., Xiang, T., & Gong, S. (2017). Semantic autoencoder for zero-shot learning. In CVPR.
Kong, X., Gao, Z., & Li, X., et al. (2022). En-compactness: Self-distillation embedding & contrastive generation for generalized zero-shot learning. In CVPR, pp. 9306–9315.
Lampert, C. H., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In CVPR, pp. 951–958.
Lampert, C. H., Nickisch, H., & Harmeling, S. (2013). Attribute-based classification for zero-shot visual object categorization. TPAMI, 36(3), 453–465.
Lee, C. W., Fang, W., & Yeh, C. K., et al. (2018). Multi-label zero-shot learning with structured knowledge graphs. In CVPR, pp. 1576–1585.
Li, A., Lu, Z., Guan, J., et al. (2020). Transferrable feature and projection learning with class hierarchy for zero-shot learning. IJCV, 128(12), 2810–2827.
Li, J., Jing, M., & Lu, K., et al. (2019). Leveraging the invariant side of generative zero-shot learning. In CVPR .
Li, K., Min, M. R., & Fu, Y. (2019). Rethinking zero-shot learning: A conditional visual classification perspective. In ICCV, pp. 3583–3592.
Li, Y. H., Chao, T. Y., Huang, C. C., et al. (2022). Make an omelette with breaking eggs: Zero-shot learning for novel attribute synthesis. NeurIPS, 35, 22477–22489.
Liang, K., Chang, H., Ma, B., et al. (2018). Unifying visual attribute learning with object recognition in a multiplicative framework. TPAMI, 41(7), 1747–1760.
Liu, J., Bai, H., & Zhang, H., et al. (2021). Near-real feature generative network for generalized zero-shot learning. In ICME, pp. 1–6.
Liu, M., Li, F., & Zhang, C., et al. (2023). Progressive semantic-visual mutual adaption for generalized zero-shot learning. In CVPR, pp. 15337–15346.
Liu, S., Long, M., & Wang, J., et al. (2018). Generalized zero-shot learning with deep calibration network. NeurIPS 2005–2015.
Liu, S., Chen, J., & Pan, L., et al. (2020). Hyperbolic visual embedding learning for zero-shot recognition. In CVPR, pp. 9273–9281.
Liu, Y., Guo, J., & Cai, D., et al. (2019). Attribute attention for semantic disambiguation in zero-shot learning. In: ICCV .
Liu, Y., Zhou, L., & Bai, X., et al. (2021). Goal-oriented gaze estimation for zero-shot learning. In CVPR, pp. 3794–3803.
Liu, Z., Guo, S., & Lu, X., et al. (2023b). (ml)\$ \(^{2}\) \$ p-encoder: On exploration of channel-class correlation for multi-label zero-shot learning. In CVPR, vol. 1, pp. 23859–23868.
Marcos Gonzalez, D., Potze, A., & Xu, W., et al. (2022). Attribute prediction as multiple instance learning. TMLR 8.
Mazzetto, A., Menghini, C., Yuan, A., et al. (2022). Tight lower bounds on worst-case guarantees for zero-shot learning with attributes. NeurIPS, 35, 19732–19745.
Menon, S., & Vondrick, C. (2022) Visual classification via description from large language models. arXiv preprint arXiv:2210.07183
Mikolov, T., Sutskever, I., & Chen, K., et al. (2013). Distributed representations of words and phrases and their compositionality. NeurIPS, 26.
Miller, G. A. (1995). Wordnet: a lexical database for English. Communications of the ACM, 38(11), 39–41.
Min, S., Yao, H., & Xie, H., et al. (2020). Domain-aware visual bias eliminating for generalized zero-shot learning. In: CVPR.
Naeem, M. F., Xian, Y., Gool, L. V., et al. (2022). I2dformer: Learning image to document attention for zero-shot image classification. NeurIPS, 35, 12283–12294.
Narayan, S., Gupta, A., & Khan, F. S., et al. (2020). Latent embedding feedback and discriminative features for zero-shot classification. In ECCV, pp. 479–495.
Nilsback, M. E., & Zisserman, A. (2008). Automated flower classification over a large number of classes. In: ICVGIP, pp. 722–729.
Paszke, A., Gross, S., & Massa, F., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In NeurIPS.
Patterson, G., & Hays, J. (2012). Sun attribute database: Discovering, annotating, and recognizing scene attributes. In: CVPR, pp. 2751–2758.
Paul, A., Krishnan, N. C., & Munjal, P. (2019). Semantically aligned bias reducing zero shot learning. In CVPR, pp. 7056–7065.
Pratt, S., Covert, I., & Liu, R., et al. (2023). What does a platypus look like? generating customized prompts for zero-shot image classification. In ICCV, pp. 15691–15701.
Prillo, S., & Eisenschlos, J. (2020). Softsort: A continuous relaxation for the argsort operator. In ICML, pp. 7793–7802.
Qiao, R., Liu, L., & Shen, C., et al. (2016). Less is more: Zero-shot learning from online textual documents with noise suppression. In CVPR, pp. 2249–2257.
Radford, A., Kim, J. W., & Hallacy, C., et al. (2021). Learning transferable visual models from natural language supervision. In ICML, PMLR, pp. 8748–8763.
Reed, S., Akata, Z., & Lee, H., et al. (2016). Learning deep representations of fine-grained visual descriptions. In CVPR, pp. 49–58.
Ridnik, T., Ben-Baruch, E., & Noy, A. et al (2021) Imagenet-21k pretraining for the masses. arXiv preprint arXiv:2104.10972.
Romera-Paredes, B., & Torr, P. (2015). An embarrassingly simple approach to zero-shot learning. In ICML, pp. 2152–2161.
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523.
Schonfeld, E., Ebrahimi, S., & Sinha, S., et al. (2019). Generalized zero-and few-shot learning via aligned variational autoencoders. In CVPR, pp. 8247–8255.
Shen, Y., Qin, J., & Huang, L., et al. (2020). Invertible zero-shot recognition flows. In: ECCV, pp. 614–631.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Skorokhodov, I., & Elhoseiny, M. (2021). Class normalization for (continual)? generalized zero-shot learning. In: ICLR.
Su, H., Li, J., & Chen, Z., et al. (2022). Distinguishing unseen from seen for generalized zero-shot learning. In CVPR, pp. 7885–7894.
Szegedy, C., Vanhoucke, V., & Ioffe, S., et al. (2016). Rethinking the inception architecture for computer vision. In CVPR, pp. 2818–2826.
Torresani, L., Szummer, M., & Fitzgibbon, A. (2010). Efficient object category recognition using classemes. In ECCV, Springer, pp. 776–789.
Vaswani, A., Shazeer, N., & Parmar, N., et al. (2017). Attention is all you need. NeurIPS, 30.
Verma, V. K., Arora, G., & Mishra, A., et al. (2018). Generalized zero-shot learning via synthesized examples. In CVPR, pp. 4281–4289.
Vyas, M. R., Venkateswara, H., & Panchanathan, S. (2020). Leveraging seen and unseen semantic relationships for generative zero-shot learning. In ECCV, pp. 70–86.
Wah, C., Branson, S., Welinder, P., et al. (2011). The caltech-ucsd birds-200-2011 dataset. California Institute of Technology: Tech. rep.
Wang, C., Min, S., Chen, X., et al. (2021). Dual progressive prototype network for generalized zero-shot learning. NeurIPS, 34, 2936–2948.
Wang, X., Ye, Y., & Gupta, A. (2018). Zero-shot recognition via semantic embeddings and knowledge graphs. In CVPR, pp. 6857–6866.
Wang, Z., Hao, Y., & Mu, T., et al. (2023). Bi-directional distribution alignment for transductive zero-shot learning. In: CVPR, pp. 19893–19902.
Xian, Y., Schiele, B., & Akata, Z. (2017). Zero-shot learning-the good, the bad and the ugly. In CVPR, pp 4582–4591.
Xian, Y., Lampert, C. H., Schiele, B., et al. (2018). Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. TPAMI, 41(9), 2251–2265.
Xian, Y., Lorenz, T., & Schiele, B., et al. (2018). Feature generating networks for zero-shot learning. In CVPR, pp. 5542–5551.
Xian, Y., Sharma, S., & Schiele, B., et al. (2019). f-gan-d2: A feature generating framework for any-shot learning. In CVPR, pp. 10275–10284.
Xie, G. S., Liu, L., & Jin, X., et al. (2019). Attentive region embedding network for zero-shot learning. In CVPR .
Xie, G. S., Liu, L., & Zhu, F., et al. (2020). Region graph embedding network for zero-shot learning. In ECCV, Springer, pp. 562–580.
Xie, J., Xiang, J., & Chen, J., et al. (2022). C2AM: Contrastive learning of class-agnostic activation map for weakly supervised object localization and semantic segmentation. In CVPR, pp 989–998.
Xu, W., Xian, Y., Wang, J., et al. (2020). Attribute prototype network for zero-shot learning. NeurIPS, 33, 21969–21980.
Xu, W., Xian, Y., Wang, J., et al. (2022). Attribute prototype network for any-shot learning. IJCV, 130(7), 1735–1753.
Xu, W., Xian, Y., & Wang, J., et al. (2022). Vgse: Visually-grounded semantic embeddings for zero-shot learning. In CVPR, pp. 9316–9325.
Yang, F. E., Lee, Y. H., Lin, C. C., et al. (2023). Semantics-guided intra-category knowledge transfer for generalized zero-shot learning. IJCV, 131(6), 1331–1345.
Ye, H. J., Hu, H., & Zhan, D. C. (2021). Learning adaptive classifiers synthesis for generalized few-shot learning. IJCV, 129(6), 1930–1953.
Yi, K., Shen, X., & Gou, Y. et al (2022) Exploring hierarchical graph representation for large-scale zero-shot image classification. In ECCV, Springer, pp. 116–132.
Yu, Y., Ji, Z., & Han, J., et al. (2020). Episode-based prototype generating network for zero-shot learning. In CVPR, pp. 14035–14044.
Yue, Z., Wang, T., & Sun, Q., et al. (2021). Counterfactual zero-shot and open-set visual recognition. In CVPR, pp. 15404–15414.
Zhang, L., Xiang, T., & Gong, S. (2017). Learning a deep embedding model for zero-shot learning. In CVPR, pp. 2021–2030.
Zhao, X., Shen, Y., Wang, S., et al. (2023). Generating diverse augmented attributes for generalized zero shot learning. PR Letters, 166, 126–133.
Zhou, B., Khosla, A., & Lapedriza, A., et al. (2016). Learning deep features for discriminative localization. In CVPR, pp. 2921–2929.
Zhou, K., Yang, J., Loy, C. C., et al. (2022). Learning to prompt for vision-language models. IJCV, 130(9), 2337–2348.
Zhu, P., Wang, H., & Saligrama, V. (2020). Generalized zero-shot recognition based on visually semantic embedding. In CVPR .
Zhu, Y., Elhoseiny, M., & Liu, B., et al. (2018). A generative adversarial approach for zero-shot learning from noisy texts. In CVPR, pp. 1004–1013.