Classifier and Exemplar Synthesis for Zero-Shot Learning
Tóm tắt
Zero-shot learning (ZSL) enables solving a task without the need to see its examples. In this paper, we propose two ZSL frameworks that learn to synthesize parameters for novel unseen classes. First, we propose to cast the problem of ZSL as learning manifold embeddings from graphs composed of object classes, leading to a flexible approach that synthesizes “classifiers” for the unseen classes. Then, we define an auxiliary task of synthesizing “exemplars” for the unseen classes to be used as an automatic denoising mechanism for any existing ZSL approaches or as an effective ZSL model by itself. On five visual recognition benchmark datasets, we demonstrate the superior performances of our proposed frameworks in various scenarios of both conventional and generalized ZSL. Finally, we provide valuable insights through a series of empirical analyses, among which are a comparison of semantic representations on the full ImageNet benchmark as well as a comparison of metrics used in generalized ZSL. Our code and data are publicly available at https://github.com/pujols/Zero-shot-learning-journal.
Từ khóa
Tài liệu tham khảo
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I. J., Harp, A., Irving, G., Isard, M., Jia, Y., Józefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D. G., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P. A., Vanhoucke, V., Vasudevan, V., Viégas, F. B., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., & Zheng, X. (2016). Tensorflow: Large-scale machine learning on heterogeneous distributed systems. In: OSDI.
Akata, Z., Perronnin, F., Harchaoui, Z., & Schmid, C. (2013) . Label-embedding for attribute-based classification. In: CVPR.
Akata, Z., Reed, S., Walter, D., Lee, H., & Schiele, B. (2015) . Evaluation of output embeddings for fine-grained image classification. In: CVPR.
Al-Halah, Z., & Stiefelhagen, R. (2015) . How to transfer? zero-shot object recognition via hierarchical transfer of semantic attributes. In: WACV.
Argyriou, A., Evgeniou, T., & Pontil, M. (2008). Convex multi-task feature learning. Machine Learning, 73, 243–272.
Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6), 1373–1396.
Bucher, M., Herbin, S., & Jurie, F. (2018) . Zero-shot classification by generating artificial visual features. In: RFIAP.
Changpinyo, S., Chao, W.-L., Gong, B., & Sha, F. (2016) . Synthesized classifiers for zero-shot learning. In CVPR.
Changpinyo, S., Chao, W.-L., & Sha, F. (2017) . Predicting visual exemplars of unseen classes for zero-shot learning. In ICCV.
Chao, W.-L., Changpinyo, S., Gong, B., & Sha, F. (2016). An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In ECCV.
Chen, C.-Y., & Grauman, K. (2014). Inferring analogous attributes. In CVPR.
Crammer, K., & Singer, Y. (2002). On the algorithmic implementation of multiclass kernel-based vector machines. JMLR, 2, 265–292.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR.
Duan, K., Parikh, D., Crandall, D., & Grauman, K. (2012) . Discovering localized attributes for fine-grained recognition. In CVPR.
Elhoseiny, M., Saleh, B., & Elgammal, A. (2013) . Write a classifier: Zero-shot learning using purely textual descriptions. In ICCV.
Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. (2009). Describing objects by their attributes. In CVPR.
Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., Ranzato, M. A., & Mikolov, T. (2013) . Devise: A deep visual-semantic embedding model. In NIPS.
Fu, Y., Hospedales, T. M., Xiang, T., Fu, Z., & Gong, S. (2014). Transductive multi-view embedding for zero-shot recognition and annotation. In ECCV.
Fu, Y., Hospedales, T. M., Xiang, T., & Gong, S. (2015) . Transductive multi-view zero-shot learning. TPAMI.
Fu, Y., Xiang, T., Jiang, Y.-G., Xue, X., Sigal, L., & Gong, S. (2018). Recent advances in zero-shot recognition: Toward data-efficient understanding of visual content. IEEE Signal Processing Magazine, 35, 112–125.
Gan, C., Lin, M., Yang, Y., Zhuang, Y., & Hauptmann, A. G. (2015) . Exploring semantic interclass relationships (sir) for zero-shot action recognition. In AAAI.
Gan, C., Yang, T., & Gong, B. (2016). Learning attributes equals multi-source domain generalization. In CVPR.
Garcia, S., & Herrera, F. (2008) . An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. JMLR, 9:2677–2694.
Gavves, E., Mensink, T., Tommasi, T., Snoek, C. G., & Tuytelaars, T. (2015). Active transfer learning with zero-shot priors: Reusing past datasets for future tasks. In ICCV.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.
Hinton, G. E., & Roweis, S. T. (2002) . Stochastic neighbor embedding. In NIPS.
Jayaraman, D., & Grauman, K. (2014) . Zero-shot recognition with unreliable attributes. In NIPS.
Jayaraman, D., Sha, F., & Grauman, K. (2014). Decorrelating semantic visual attributes by resisting the urge to share. In CVPR.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014) . Caffe: Convolutional architecture for fast feature embedding. In ACM Multimedia.
Kampffmeyer, M., Chen, Y., Liang, X., Wang, H., Zhang, Y., & Xing, E. P. (2019). Rethinking knowledge graph propagation for zero-shot learning. In CVPR.
Karessli, N., Akata, Z., Bulling, A., & Schiele, B. (2017) . Gaze embeddings for zero-shot image classification. In CVPR.
Kipf, T. N., Welling, M. (2017) . Semi-supervised classification with graph convolutional networks. In ICLR.
Kodirov, E., Xiang, T., Fu, Z., & Gong, S. (2015). Unsupervised domain adaptation for zero-shot learning. In: ICCV.
Kodirov, E., Xiang, T., & Gong, S. (2017). Semantic autoencoder for zero-shot learning. In CVPR.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012) . Imagenet classification with deep convolutional neural networks. In NIPS.
Kumar Verma, V., Arora, G., Mishra, A., & Rai, P. (2018). Generalized zero-shot learning via synthesized examples. In CVPR.
Lampert, C. H., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In CVPR.
Lampert, C. H., Nickisch, H., & Harmeling, S. (2014). Attribute-based classification for zero-shot visual object categorization. TPAMI, 36(3), 453–465.
Lei Ba, J., Swersky, K., Fidler, S., & Salakhutdinov, R. (2015). Predicting deep zero-shot convolutional neural networks using textual descriptions. In ICCV.
Li, X., Guo, Y., & Schuurmans, D. (2015). Semi-supervised zero-shot classification with label representation learning. In ICCV.
Long, Y., Liu, L., Shao, L., Shen, F., Ding, G., & Han, J. (2017). From zero-shot learning to conventional supervised classification: Unseen visual data synthesis. In CVPR.
Lu, Y. (2016). Unsupervised learning of neural network outputs. In IJCAI.
Mansimov, E., Parisotto, E., Ba, J. L., & Salakhutdinov, R. (2016). Generating images from captions with attention. In ICLR.
Mensink, T., Gavves, E., & Snoek, C. G. (2014). COSTA: Co-occurrence statistics for zero-shot classification. In CVPR.
Mensink, T., Verbeek, J., Perronnin, F., & Csurka, G. (2013). Distance-based image classification: Generalizing to new classes at near-zero cost. TPAMI, 35(11), 2624–2637.
Mikolov, T., Chen, K., Corrado, G. S., & Dean, J. (2013a). Efficient estimation of word representations in vector space. In ICLR Workshops.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In NIPS.
Miller, G. A. (1995). Wordnet: a lexical database for english. Communications of the ACM, 38(11), 39–41.
Morgado, P., & Vasconcelos, N. (2017). Semantically consistent regularization for zero-shot recognition. In CVPR.
Norouzi, M., Mikolov, T., Bengio, S., Singer, Y., Shlens, J., Frome, A., Corrado, G. S., & Dean, J. (2014). Zero-shot learning by convex combination of semantic embeddings. In ICLR Workshops.
Palatucci, M., Pomerleau, D., Hinton, G. E., & Mitchell, T. M. (2009). Zero-shot learning with semantic output codes. In NIPS.
Parikh, D., & Grauman, K. (2011). Interactively building a discriminative vocabulary of nameable attributes. In CVPR.
Patterson, G., Xu, C., Su, H., & Hays, J. (2014). The SUN Attribute Database: Beyond categories for deeper scene understanding. IJCV, 108(1–2), 59–81.
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In EMNLP.
Rebuffi, S.-A., Kolesnikov, A., Sperl, G., & Lampert, C. H. (2017) . iCaRL: Incremental classifier and representation learning. In CVPR.
Reed, S., Akata, Z., Lee, H., & Schiele, B. (2016a). Learning deep representations of fine-grained visual descriptions. In CVPR.
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. (2016). Generative adversarial text to image synthesis. In ICML.
Ristin, M., Guillaumin, M., Gall, J., & Van Gool, L. (2016). Incremental learning of random forests for large-scale image classification. TPAMI, 38(3), 490–503.
Rohrbach, M., Stark, M., & Schiele, B. (2011). Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In CVPR.
Rohrbach, M., Stark, M., Szarvas, G., Gurevych, I., & Schiele, B. (2010). What helps where–and why? semantic relatedness for knowledge transfer. In CVPR.
Romera-Paredes, B., & Torr, P. H. S. (2015). An embarrassingly simple approach to zero-shot learning. In ICML.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., & Fei-Fei, L. (2015). ImageNet large scale visual recognition challenge. IJCV.
Salakhutdinov, R., Torralba, A., & Tenenbaum, J. (2011). Learning to share visual appearance for multiclass object detection. In CVPR.
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press.
Schölkopf, B., Smola, A. J., Williamson, R. C., & Bartlett, P. L. (2000). New support vector algorithms. Neural computation, 12(5), 1207–1245.
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR.
Socher, R., Ganjoo, M., Manning, C. D., & Ng, A. Y. (2013). Zero-shot learning through cross-modal transfer. In NIPS.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In CVPR.
Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-sne. JMLR, 9(2579–2605), 85.
Van Horn, G., & Perona, P. (2017). The devil is in the tails: Fine-grained classification in the wild. arXiv preprint arXiv:1709.01450.
Verma, V. K., & Rai, P. (2017). A simple exponential family framework for zero-shot learning. In ECML/PKDD.
Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology.
Wang, Q., & Chen, K. (2017). Zero-shot visual recognition via bidirectional latent embedding. IJCV, 124, 356–383.
Wang, X., Ye, Y., & Gupta, A. (2018). Zero-shot recognition via semantic embeddings and knowledge graphs. In CVPR.
Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., & Schiele, B. (2016). Latent embeddings for zero-shot classification. In CVPR.
Xian, Y., Lampert, C. H., Schiele, B., & Akata, Z. (2018a). Zero-shot learning - a comprehensive evaluation of the Good, the Bad and the Ugly. TPAMI.
Xian, Y., Lorenz, T., Schiele, B., & Akata, Z. (2018b). Feature generating networks for zero-shot learning. In CVPR.
Xian, Y., Schiele, B., & Akata, Z. (2017). Zero-shot learning - the Good, the Bad and the Ugly. In CVPR.
Xiao, J., Hays, J., Ehinger, K., Oliva, A., & Torralba, A. (2010). SUN Database: Large-scale scene recognition from abbey to zoo. In CVPR.
Xu, X., Hospedales, T., & Gong, S. (2015). Semantic embedding space for zero-shot action recognition. In ICIP.
Yan, X., Yang, J., Sohn, K., & Lee, H. (2016). Attribute2Image: Conditional image generation from visual attributes. In ECCV.
Yang, Y., Hospedales, T. M. (2015). A unified perspective on multi-domain and multi-task learning. In ICLR.
Yu, F. X., Cao, L., Feris, R. S., Smith, J. R., & Chang, S.-F. (2013). Designing category-level attributes for discriminative visual recognition. In CVPR.
Zhang, L., Xiang, T., Gong, S. (2017). Learning a deep embedding model for zero-shot learning. In CVPR.
Zhang, Z., & Saligrama, V. (2015). Zero-shot learning via semantic similarity embedding. In ICCV.
Zhang, Z., & Saligrama, V. (2016). Zero-shot learning via joint latent similarity embedding. In CVPR.
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., & Torralba, A. (2018). Places: A 10 million image database for scene recognition. TPAMI, 40, 1452–1464.
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In NIPS.
Zhu, X., Anguelov, D., & Ramanan, D. (2014). Capturing long-tail distributions of object subcategories. In CVPR.
Zhu, Y., Elhoseiny, M., Liu, B., Peng, X., & Elgammal, A. (2018). A generative adversarial approach for zero-shot learning from noisy texts. In CVPR.