Effect of attention and triplet loss on chart classification: a study on noisy charts and confusing chart pairs
Tóm tắt
Charts are powerful tools for visualizing and comparing data. With the increase in the presence of various chart types in scientific documents in electronic media, the development of an automatic chart classification system is becoming an important task. Existing studies on chart classification fail to address the presence of noise in charts and confusing chart class pairs. Motivated by the above observations, in this paper, we propose an attention and triplet loss based deep CNN framework to address the above issues. From various experimental results over four datasets, it is evident that the proposed framework can effectively handle noise in the charts and confusing chart samples and outperforms its counterparts.
Tài liệu tham khảo
Amara, J., Kaur, P., Owonibi, M., & Bouaziz, B. (2017). Convolutional neural network based chart image classification.
Bajić, F., & Job, J. (2021). Chart classification using siamese CNN. Journal of Imaging, 7(11), 220.
Balaji, A., Ramanathan, T., & Sonathi, V. (2018). Chart-text: a fully automated chart image descriptor CVPR.
Chagas, P., Akiyama, R., Meiguins, A., Santos, C., Saraiva, F., Meiguins, B., & Morais, J. (2018). Evaluation of convolutional neural network architectures for chart image classification. In IJCNN, pp. 1–8.
Chagas, P., Freitas, A., Daisuke, R., Miranda, B., Araújo, T. D. O. D., Santos, C., Meiguins, B., & Morais, J.M.D. (2017). Architecture proposal for data extraction of chart images using convolutional neural network. In 2017 IV, pp. 318–323.
Cui, Y., Zhou, F., Lin, Y., & Belongie, S.J. (2016a). Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop. IEEE CVPR, 1153–1162.
Cui, Y., Zhou, F., Lin, Y., & Belongie, S. (2016b). Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop. 1153–1162.
Davila, K., Kota, B.U., Setlur, S., Govindaraju, V., Tensmeyer, C., Shekhar, S., & Chaudhry, R. (2019). Icdar 2019 competition on harvesting raw tables from infographics (chart-infographics). In ICDAR, pp. 1594–1599.
Davila, K., Setlur, S., Doermann, D., Bhargava, U.K., & Govindaraju, V. (2020). Chart mining: a survey of methods for automated chart analysis. IEEE TPAMI, 1–1.
Davila, K., Tensmeyer, C., Shekhar, S., Singh, H., Setlur, S., & Govindaraju, V. (2021). Icpr 2020. In A Del Bimbo, R Cucchiara, S Sclaroff, GM Farinella, T Mei, M Bertini, HJ Escalante, & R Vezzani (Eds.) ICPR, pp. 361–380. Springer.
Futrelle, R.P., Kakadiaris, I.A., Alexander, J., Carriero, C.M., Nikolakis, N., & Futrelle, J.M. (1992). Understanding diagrams in technical documents. Computer, 25(7), 75–78.
Futrelle, R.P., Shao, M., Cieslik, C., & Grimes, A.E. (2003). Extraction, layout analysis and classification of diagrams in pdf documents. In ICDAR. ICDAR ’03, P. 1007. IEEE computer society.
Gao, J., Zhou, Y., & Barner, K.E. (2012). View: Visual information extraction widget for improving chart images accessibility. In 2012 19Th IEEE international conference on image processing, pp. 2865–2868.
Guo, S., Wang, S., Guo, J., & Xu, J. (2021). Classification of aquatic animals by the spherical amphibian robot based on transfer learning. In 2021 IEEE International conference on mechatronics and automation (ICMA), pp. 1213–1218.
Hadsell, R., Chopra, S., & LeCun, Y. (2006). Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer society conference on computer vision and pattern recognition (CVPR’06), vol. 2, pp. 1735–1742.
Hermans, A., Beyer, L., & Leibe, B. (2017). In Defense of the Triplet Loss for Person Re-Identification. arXiv.
Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In IEEE/CVF, pp. 7132–7141.
Huang, W., & Tan, C.L. (2007). A system for understanding imaged infographics and its applications. In ACM Symposium on doceng. Doceng ’07, pp. 9–18. ACM.
Huang, W., Zong, S., & Tan, C.L. (2007). Chart image classification using multiple-instance learning. In IEEE WACV, pp. 27–27.
Jung, D., Kim, W., Song, H., Hwang, J.-I., Lee, B., Kim, B., & Seo, J. (2017). Chartsense: Interactive Data Extraction from Chart Images, pp. 6706–6717 ACM.
Kang, K., Pang, G., Zhao, X., Wang, J., & Li, Y. (2020). A new benchmark for instance-level image classification. IEEE Access, 8, 70306–70315.
Karthikeyani, V., & Nagarajan, S. (2012). Machine learning classification algorithms to recognize chart types in portable document format (pdf) files. IJCA, 39, 1–5.
Koch, G., Zemel, R., & Salakhutdinov, R. (2015). Siamese neural networks for one-shot image recognition.
Kumar, R., Weill, E., Aghdasi, F., & Sriram, P. (2019). Vehicle re-identification: an efficient baseline using triplet embedding. In 2019 International Joint Conference on Neural Networks (IJCNN), pp 1–9.
Mishchenko, A., & Vassilieva, N. (2011). Model-based recognition and extraction of information from chart images. In JMPT, vol. 2, pp. 76–89.
Mishra, P., Kumar, S., & Chaube, M.K. (2021). Dissimilarity-based regularized learning of charts. ACM TOMM 17(4).
Poco, J., & Heer, J. (2017). Reverse-engineering visualizations: Recovering visual encodings from chart images. Computer Graphics Forum, 36, 353–363.
Prasad, V.S.N., Siddiquie, B., Golbeck, J., & Davis, L.S. (2007). Classifying computer generated charts. In IWCBMI, pp. 85–92.
Savva, M., Kong, N., Chhajta, A., Fei-Fei, L., Agrawala, M., & Heer, J. (2011). Revision: Automated classification, analysis and redesign of chart images. UIST ’11 ACM.
Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: a unified embedding for face recognition and clustering. CoRR 1503.03832.
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. In IEEE ICCV, pp. 618–626.
Shao, M., & Futrelle, R.P. (2006). Recognition and classification of figures in pdf documents. In W. Liu J. Lladós (Eds.) GREC, pp. 231–242. Springer.
Siegel, N., Horvitz, Z., Levin, R., Divvala, S., & Farhadi, A. (2016). Figureseer: Parsing result-figures in research papers. 9911:664–680.
Tang, B., Liu, X., Lei, J., Song, M., Tao, D., Sun, S., & Dong, F. (2015). Deepchart: Combining deep convolutional networks and deep belief networks in chart classification. Signal Processing, 124.
Thiyam, J., Singh, S.R., & Bora, P.K. (2021a). Challenges in chart image classification: a comparative study of different deep learning methods. In ACM Symposium on doceng. Doceng ’21. ACM.
Thiyam, J., Singh, S.R., & Bora, P.K. (2021b). Chart classification: an empirical comparative study of different learning models. ACM.
Wang, S. -H., Fernandes, S., Zhu, Z., & Zhang, Y. -D. (2021). Avnc: Attention-based vgg-style network for covid-19 diagnosis by cbam. IEEE Sensors, 1–1.
Wang, J., Li, Y., Miao, Z., Zhao, X., & Rui, Z. (2019). Multi-level metric learning network for fine-grained classification. IEEE Access, 7, 166390–166397.
Wang, S. -H., Zhou, Q., Yang, M., & Zhang, Y. -D. (2021). Advian: Alzheimer’s disease vgg-inspired attention network based on convolutional block attention module and multiple way data augmentation. Frontiers in Aging Neuroscience, 13, 313.
Woo, S., Park, J., Lee, J. -Y., & Kweon, I.S. (2018). Cbam: Convolutional block attention module. In V. Ferrari, M. Hebert, C. Sminchisescu, & Y. Weiss (Eds.) ECCV, pp. 3–19. Springer.
Yokokura, W.T. (1998). Naoko Layout-based approach for extracting constructive elements of bar-charts. Tombre, C.A.K.K. (ed.) GRAS, pp. 163–174. Springer.
Zhang, M., Su, H., & Wen, J. (2021). Classification of flower image based on attention mechanism and multi-loss attention network. Computer Communications, 179, 307–317.
Zhao, Z., Luo, Z., Li, J., Wang, K., & Shi, B. (2018). Large-scale fine-grained bird recognition based on a triplet network and bilinear model Applied Sciences 8(10).
Zhou, Y., & Tan, C.L. (2001). Learning-based scientific chart recognition. In IAPR GREC2001, pp. 482–492.