Learning to assess visual aesthetics of food images

Kekai Sheng1, Weiming Dong2, Haibin Huang3, Menglei Chai4, Yong Zhang5, Chongyang Ma3, Bao Gang Hu2
1Youtu Lab, Tencent, Shanghai, 200233, China
2NLPR, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
3Kuaishou Technology, Beijing, 100085, China
4Snap Inc., Santa Monica, 90405, USA
5AI Lab, Tencent Inc., Shenzhen, 518000, China

Tóm tắt

AbstractDistinguishing aesthetically pleasing food photos from others is an important visual analysis task for social media and ranking systems related to food. Nevertheless, aesthetic assessment of food images remains a challenging and relatively unexplored task, largely due to the lack of related food image datasets and practical knowledge. Thus, we present the Gourmet Photography Dataset (GPD), the first large-scale dataset for aesthetic assessment of food photos. It contains 24,000 images with corresponding binary aesthetic labels, covering a large variety of foods and scenes. We also provide a non-stationary regularization method to combat over-fitting and enhance the ability of tuned models to generalize. Quantitative results from extensive experiments, including a generalization ability test, verify that neural networks trained on the GPD achieve comparable performance to human experts on the task of aesthetic assessment. We reveal several valuable findings to support further research and applications related to visual aesthetic analysis of food images. To encourage further research, we have made the GPD publicly available at https://github.com/Openning07/GPA.

Từ khóa


Tài liệu tham khảo

Manna, L. Digital food photography. Cengage Learning PTR, 2015.

Murray, N.; Marchesotti, L.; Perronnin, F. Ava: A large-scale database for aesthetic visual analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2408–2415, 2012.

Ma, S.; Liu, J.; Chen, C. W. A-lamp: Adaptive layout-aware multi-patch deep convolutional neural network for photo aesthetic assessment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 722–731, 2017.

Hosu, V.; Goldlücke, B.; Saupe, D. Efiective aesthetics prediction with multi-level spatially pooled features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9367–9375, 2019.

Bossard, L.; Guillaumin, M.; van Gool, L. Food-101—mining discriminative components with random forests. In: Computer Vision-ECCV 2014. Lecture Notes in Computer Science, Vol. 8694. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 446–461, 2014.

Zhang, X. J.; Lu, Y. F.; Zhang, S. H. Multi-task learning for food identification and analysis with deep convolutional neural networks. Journal of Computer Science and Technology Vol. 31, No. 3, 489–500, 2016.

Salvador, A.; Hynes, N.; Aytar, Y.; Marin, J.; Oi, F.; Weber, I.; Torralba, A. Learning cross-modal embeddings for cooking recipes and food images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3068–3076, 2017.

Li, Y.; Sheopuri, A. Applying image analysis to assess food aesthetics and uniqueness. In: Proceedings of the IEEE International Conference on Image Processing, 311–314, 2015.

Luo, W.; Wang, X.; Tang, X. Content-based photo quality assessment. In: Proceedings of the IEEE International Conference on Computer Vision, 2206–2213, 2011.

Chen, X.; Zhu, Y.; Zhou, H.; Diao, L.; Wang, D. ChineseFoodNet: A large-scale image dataset for chinese food recognition. arXiv preprint arXiv:1705.02743, 2017.

Sheng, K. K.; Dong, W. M.; Huang, H. B.; Ma, C. Y.; Hu, B. G. Gourmet photography dataset for aesthetic assessment of food images. In: Proceedings of the SIGGRAPH Asia 2018 Technical Briefs, Article No. 20, 2018.

Datta, R.; Joshi, D.; Li, J.; Wang, J. Z. Studying aesthetics in photographic images using a computational approach. In: Computer Vision-ECCV 2006. Lecture Notes in Computer Science, Vol. 3953. Leonardis, A.; Bischof, H.; Pinz, A. Eds. Springer Berlin Heidelberg, 288–301, 2006.

Zhang, F. L., Wang, M.; Hu, S. M. Aesthetic image enhancement by dependence-aware object recomposition. IEEE Transactions on Multimedia Vol. 15, No. 7, 1480–1490, 2013.

Kong, S.; Shen, X. H.; Lin, Z.; Mech, R.; Fowlkes, C. Photo aesthetics ranking network with attributes and content adaptation. In: Computer Vision-ECCV 2016. Lecture Notes in Computer Science, Vol. 9905. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 662–679, 2016.

Lu, X.; Lin, Z.; Shen, X.; Mech, R.; Wang, J. Z. Deep multi-patch aggregation network for image style, aesthetics, and quality estimation. In: Proceedings of the IEEE International Conference on Computer Vision, 990–998, 2015.

Talebi, H., Milanfar, P. NIMA: Neural image assessment. IEEE Transactions on Image Processing Vol. 27, No. 8, 3998–4011, 2018.

Sheng, K. K.; Dong, W. M.; Ma, C. Y.; Mei, X.; Huang, F. Y.; Hu, B. G. Attention-based multipatch aggregation for image aesthetic assessment. In: Proceedings of the 26th ACM International Conference on Multimedia, 879–886, 2018.

Kucer, M.; Loui, A. C.; Messinger, D. W. Leveraging expert feature knowledge for predicting image aesthetics. IEEE Transactions on Image Processing Vol. 27, No. 10, 5100–5112, 2018.

Liu, Z. G.; Wang, Z. P.; Yao, Y. Y.; Zhang, L. M.; Shao, L. Deep active learning with contaminated tags for image aesthetics assessment. IEEE Transactions on Image Processing doi: https://doi.org/10.1109/TIP.2018.2828326, 2018.

Sun, R.; Lian, Z.; Tang, Y.; Xiao, J. Aesthetic visual quality evaluation of Chinese handwritings. In: Proceedings of the International Joint Conferences on Artificial Intelligence, 2510–2516, 2015.

Chang, H. W.; Yu, F.; Wang, J.; Ashley, D.; Finkelstein, A. Automatic triage for a photo series. ACM Transactions on Graphics Vol. 35, No. 4, Article No. 148, 2016.

Chang, K.-Y.; Lu, K.-H.; Chen, C.-S. Aesthetic critiques generation for photos. In: Proceedings of the IEEE International Conference on Computer Vision, 3514–3523, 2017.

Hung, W.-C.; Zhang, J.; Shen, X.; Lin, Z.; Lee, J.-Y.; Yang, M.-H. Learning to blend photos. In: Proceedings of the European Conference on Computer Vision, 70–86, 2018.

Yu, W. H.; Zhang, H. D.; He, X. N.; Chen, X.; Xiong, L.; Qin, Z. Aesthetic-based clothing recommendation. In: Proceedings of the World Wide Web Conference, 649–658, 2018.

Hassannejad, H.; Matrella, G.; Ciampolini, P.; de Munari, I.; Mordonini, M.; Cagnoni, S. Food image recognition using very deep convolutional networks. In: Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management, 41–49, 2016.

Meyers, A.; Johnston, N.; Rathod, V.; Korattikara, A.; Gorban, A.; Silberman, N.; Guadarrama, S.; Papandreou, G.; Huang, J.; Murphy, K. P. Im2Calories: Towards an automated mobile vision food diary. In: Proceedings of the IEEE International Conference on Computer Vision, 1233–1241, 2015.

Hinton, G. E.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2014.

Szegedy, C.; Vanhoucke, V.; Iofie, S.; Shlens, J.; Z. Wojna. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2818–2826, 2016.

Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research Vol. 15, No. 1, 1929–1958, 2014.

Krizhevsky, A.; Sutskever, I.; Hinton, G. E. ImageNet classification with deep convolutional neural networks. Communications of the ACM Vol. 60, No. 6, 84–90, 2017.

Hein, M.; Andriushchenko, M.; Bitterwolf, J. Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 41–50, 2019.

Manning, C. D.; Raghavan, P.; Schütze, H. Introduction to Information Retrieval. Cambridge University Press, 2008.

Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 248–255, 2009.

Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–9, 2015.

He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778, 2016.

Oliva, A.; Torralba, A. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision Vol. 42, No. 3, 145–175, 2001.

Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556v6, 2015.

Zhou, B. L.; Lapedriza, A.; Khosla, A.; Oliva, A.; Torralba, A. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 40, No. 6, 1452–1464, 2018.

Zhang, R.; Efros, A. A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 586–595, 2018.

Mai, L.; Jin, H.; Liu, F. Composition-preserving deep photo aesthetics assessment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 497–506, 2016.

Zhang, X. D.; Gao, X. B.; Lu, W.; He, L. H. A gated peripheral-foveal convolutional neural network for unified image aesthetic prediction. IEEE Transactions on Multimedia Vol. 21, No. 11, 2815–2826, 2019.

Deng, Y.; Loy, C. C.; Tang, X. Aesthetic-driven image enhancement by adversarial learning. In: Proceedings of the 26th ACM International Conference on Multimedia, 870–878, 2018.

Hu, Y.; He, H.; Xu, C.; Wang, B.; Lin, S. Exposure: A white-box photo post-processing framework. ACM Transactions on Graphics Vol. 37, No. 2, Article No. 26, 2018.

Xu, Z.; Huang, S. L.; Zhang, Y.; Tao, D. C. Webly-supervised fine-grained visual categorization via deep domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 40, No. 5, 1100–1113, 2018.

Sheng, K. K.; Dong, W. M.; Kong, Y.; Mei, X.; Li, J. L.; Wang, C. J.; Huang, F.; Hu, B. Evaluating the quality of face alignment without ground truth. Computer Graphics Forum Vol. 34, No. 7, 213–223, 2015.

Papadopoulos, D. P.; Tamaazousti, Y.; Oi, F.; Weber, I.; Torralba, A. How to make a pizza: Learning a compositional layer-based GAN model. In: proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8002–8011, 2019.