Cải thiện các lớp phát hiện đa quy mô trong mạng học sâu cho việc phát hiện bông lúa mì dựa trên phân tích diễn giải

Plant Methods - Tập 19 - Trang 1-13 - 2023
Jiawei Yan1,2, Jianqing Zhao1,2, Yucheng Cai1,2, Suwan Wang1,2, Xiaolei Qiu1,2, Xia Yao1,2,3, Yongchao Tian1, Yan Zhu1,2, Weixing Cao1,2, Xiaohu Zhang1,2,4
1National Engineering and Technology Center for Information Agriculture, Nanjing Agricultural University, Nanjing, China
2Key Laboratory for Crop System Analysis and Decision Making, Ministry of Agriculture and Rural Affairs, Nanjing, China
3Jiangsu Key Laboratory for Information Agriculture, Nanjing, China
4Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing, China

Tóm tắt

Việc phát hiện và đếm bông lúa mì là rất quan trọng để dự đoán và đo lường năng suất lúa mì. Tuy nhiên, các nghiên cứu hiện tại về phát hiện bông lúa mì thường trực tiếp áp dụng cấu trúc mạng mới. Chưa có nhiều nghiên cứu có thể kết hợp kiến thức trước đó về đặc điểm kích thước bông lúa mì để thiết kế một mô hình phát hiện bông lúa mì phù hợp. Vẫn chưa rõ liệu các lớp phát hiện phức tạp của mạng có phát huy vai trò như mong muốn hay không. Nghiên cứu này đề xuất một phương pháp phân tích diễn giải để đánh giá định lượng vai trò của các lớp phát hiện ba quy mô trong mô hình phát hiện bông lúa mì dựa trên học sâu. Điểm chú ý ở mỗi lớp phát hiện của mạng YOLOv5 được tính toán bằng thuật toán Gradient-weighted Class Activation Mapping (Grad-CAM), so sánh các hộp giới hạn bông lúa mì đã được gán nhãn trước đó với các vùng chú ý của mạng. Bằng cách tinh chỉnh các lớp phát hiện đa quy mô sử dụng các điểm chú ý, một mạng phát hiện bông lúa mì tốt hơn được tạo ra. Các thí nghiệm trên bộ dữ liệu Phát hiện Đầu lúa mì Toàn cầu (GWHD) cho thấy lớp phát hiện quy mô lớn hoạt động kém, trong khi lớp phát hiện quy mô trung bình hoạt động tốt nhất trong ba lớp phát hiện quy mô. Do đó, lớp phát hiện quy mô lớn bị loại bỏ, một lớp phát hiện quy mô vi mô được thêm vào, và khả năng trích xuất đặc trưng trong lớp phát hiện quy mô trung bình được cải thiện. Mô hình tinh chỉnh tăng độ chính xác phát hiện và giảm độ phức tạp của mạng bằng cách giảm các tham số mạng. Phương pháp phân tích diễn giải được đề xuất để đánh giá sự đóng góp của các lớp phát hiện khác nhau trong mạng phát hiện bông lúa mì và cung cấp một sơ đồ cải tiến mạng chính xác. Những phát hiện của nghiên cứu này sẽ cung cấp một tài liệu tham khảo hữu ích cho các ứng dụng trong tương lai của việc tinh chỉnh mạng sâu trong lĩnh vực này.

Từ khóa


Tài liệu tham khảo

FAOSTAT. http://www.fao.org/faostat/en/. Accessed 22 Dec 2022. Hasan MM, Chopin JP, Laga H, Miklavcic SJ. Detection and analysis of wheat spikes using convolutional neural networks. Plant Methods. 2018;14(1):1–13. https://doi.org/10.1186/s13007-018-0366-8. Thakur AK, Singh S, Goyal N, Gupta K. A comparative analysis on the existing techniques of wheat spike detection. In: 2021 2nd International Conference for Emerging Technology (INCET). IEEE. 2021. pp. 1–6. https://doi.org/10.1109/INCET51464.2021.9456284 Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. pp. 580–7. https://doi.org/10.1109/CVPR.2014.81. Girshick R. Fast R-CNN. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). 2015. pp. 1440–8. https://doi.org/10.1109/ICCV.2015.169. Ren S, He K, Girshick R, Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell. 2017;39(06):1137–49. https://doi.org/10.1109/TPAMI.2016.2577031. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. 2016. pp. 779–88. https://doi.org/10.1109/CVPR.2016.91. Redmon J, Farhadi A. YOLO9000: better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society. 2017. pp. 6517–25. https://doi.org/10.1109/CVPR.2017.690. Redmon J, Farhadi A. Yolov3: an incremental improvement. arXiv. 2018. https://doi.org/10.48550/arXiv.1804.02767. Bochkovskiy A, Wang CY, Liao HYM. Yolov4: optimal speed and accuracy of object detection. arXiv. 2020. https://doi.org/10.48550/arXiv.2004.10934. Ultralytics. YOLOv5. https://github.com/ultralytics/yolov5. Accessed 1 Mar 2022. Yang B, Gao Z, Gao Y, Zhu Y. Rapid detection and counting of wheat ears in the field using YOLOv4 with attention module. Agronomy. 2021;11(6):1202. https://doi.org/10.3390/agronomy11061202. Bhagat S, Kokare M, Haswani V, Hambarde P, Kamble R. WheatNet-Lite: a novel light weight network for wheat head detection. In: 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). IEEE. 2021. pp. 1332–41. https://doi.org/10.1109/ICCVW54120.2021.00154. Wang Y, Qin Y, Cui J. Occlusion robust wheat ear counting algorithm based on deep learning. Front Plant Sci. 2021;12:645899. https://doi.org/10.3389/fpls.2021.645899. Gong B, Ergu D, Cai Y, Ma B. Real-time detection for wheat head applying deep neural network. Sensors. 2020;21(1):191. https://doi.org/10.3390/s21010191. Fernandez-Gallego JA, Kefauver SC, Gutiérrez NA, Nieto-Taladriz MT, Araus JL. Wheat ear counting in-field conditions: high throughput and low-cost approach using RGB images. Plant Methods. 2018;14:1–12. https://doi.org/10.1186/s13007-018-0289-4. Zhu Y, Cao Z, Lu H, Li Y, Xiao Y. In-field automatic observation of wheat heading stage using computer vision. Biosys Eng. 2016;143:28–41. https://doi.org/10.1016/j.biosystemseng.2015.12.015. Xiang Yu, Choi W, Lin Y, Savarese S. Subcategory-aware convolutional neural networks for object proposals and detection. In: 2017 IEEE winter conference on applications of computer vision (WACV). IEEE. 2017. pp. 924–33. https://doi.org/10.1109/WACV.2017.108. Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE. 2012. pp. 3354–61. https://doi.org/10.1109/CVPR.2012.6248074 Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. Int J Comput Vision. 2020;128(2):336–59. https://doi.org/10.1007/s11263-019-01228-7. David E, Madec S, Sadeghi-Tehran P, Aasen H, Zheng B, Liu S, Kirchgessner N, Ishikawa G, Nagasawa K, Badhon MA, Pozniak C, Solan B, Hund A, Chapman SC, Baret F, Stavness I, Guo W. Global Wheat Head Detection (GWHD) dataset: a large and diverse dataset of high-resolution RGB-labelled images to develop and benchmark wheat head detection methods. Plant Phenom. 2020. https://doi.org/10.34133/2020/3521852. David E, Serouart M, Smith D, Madec S, Velumani K, Liu S, Wang X, Pinto F, Shafiee S, Tahir ISA, Tsujimoto H, Nasuda S, Zheng B, Kirchgessner N, Aasen H, Hund A, Sadhegi-Tehran P, Nagasawa K, Ishikawa G, Dandrifosse S, Carlier A, Dumont B, Mercatoris B, Evers B, Kuroki K, Wang H, Ishii M, Badhon MA, Pozniak C, LeBauer DS, Lillemo M, Poland J, Chapman S, Solan B, Baret F, Stavness I, Guo W. Global wheat head detection 2021: an improved dataset for benchmarking wheat head detection methods. Plant Phenom. 2021. https://doi.org/10.34133/2021/9846158. Liu S, Qi L, Qin H, Shi J, Jia J. Path aggregation network for instance segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE. 2018. pp. 8759–68. https://doi.org/10.1109/CVPR.2018.00913. Tan M, Pang R, Le Q V. Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020. pp. 10781–90. https://doi.org/10.1109/CVPR.2018.00913. Fan FL, Xiong J, Li M, Wang G. On interpretability of artificial neural networks: a survey. IEEE Trans Radiat Plasma Med Sci. 2021;5(6):741–60. https://doi.org/10.1109/TRPMS.2021.3066428. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. 2016. pp. 2921–9. https://doi.org/10.1109/CVPR.2016.319. Zhao J, Zhang X, Yan J, Qiu X, Yao X, Tian Y, Zhu Y, Cao W. A wheat spike detection method in UAV images based on improved YOLOv5. Remote Sens. 2021;13(16):3095. https://doi.org/10.3390/rs13163095. Xiao Y, Tian Z, Yu J, Zhang Y, Liu S, Du S, Lan X. A review of object detection based on deep learning. Multimed Tools Appl. 2020;79(33):23729–91. https://doi.org/10.1007/s11042-020-08976-6. Li Z, Peng C, Yu G, Zhang X, Deng Y, Sun J. Detnet: design backbone for object detection. In: European Conference on Computer Vision. Cham: Springer. 2018. pp. 339–54. https://doi.org/10.1007/978-3-030-01240-3_21. Mansour A, Hussein W M, Said E. Small objects detection in satellite images using deep learning. In: 2019 Ninth International Conference on Intelligent Computing and Information Systems (ICICIS). IEEE. 2019. pp. 86–91. https://doi.org/10.1109/ICICIS46948.2019.9014842. Pang Y, Cao J, Li Y, Xie J, Sun H, Gong J. TJU-DHD: a diverse high-resolution dataset for object detection. IEEE Trans Image Process. 2020;30:207–19. https://doi.org/10.1109/TIP.2020.3034487. Duan R, Deng H, Tian M, Deng Y, Lin J. SODA: a large-scale open site object detection dataset for deep learning in construction. Autom Constr. 2022;142:104499. https://doi.org/10.1016/j.autcon.2022.104499. Pathak AR, Pandey M, Rautaray S. Application of deep learning for object detection. Procedia Comput Sci. 2018;132:1706–17. https://doi.org/10.1016/j.procs.2018.05.144. Lin T Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. 2017. pp. 936–44. https://doi.org/10.1109/CVPR.2017.106. Woo S, Park J, Lee J Y, Kweon I S. CBAM: convolutional block attention module. In: European Conference on Computer Vision. Cham: Springer. 2018. pp. 3–19. https://doi.org/10.1007/978-3-030-01234-2_1. Zhang R, Wen C. SOD-YOLO: a small target defect detection algorithm for wind turbine blades based on improved YOLOv5. Adv Theory Simul. 2022. https://doi.org/10.1002/adts.202100631. Qi G, Zhang Y, Wang K, Mazur N, Liu Y, Malaviya D. Small object detection method based on adaptive spatial parallel convolution and fast multi-scale fusion. Remote Sens. 2022;14(2):420. https://doi.org/10.3390/rs14020420. Gong Y, Yu X, Ding Y, Peng X, Zhao J, Han Z. Effective fusion factor in FPN for tiny object detection. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE. 2021. pp. 1159–67. https://doi.org/10.1109/WACV48630.2021.00120. Jing Y, Ren Y, Liu Y, Wang D, Yu L. Automatic extraction of damaged houses by earthquake based on improved YOLOv5: a case study in Yangbi. Remote Sens. 2022;14(2):382. https://doi.org/10.3390/rs14020382. Sun Z, Yang H, Zhang Z, Liu J, Zhang X. An improved YOLOv5-based tapping trajectory detection method for natural rubber trees. Agriculture. 2022;12(9):1309. https://doi.org/10.3390/agriculture12091309. Liao X, Lv S, Li D, Luo Y, Zhu Z, Jiang C. YOLOv4-MN3 for PCB surface defect detection. Appl Sci. 2021;11(24):11701. https://doi.org/10.3390/app112411701. Deng Z, Sun H, Zhou S, Zhao J, Lei L, Zou H. Multi-scale object detection in remote sensing imagery with convolutional neural networks. ISPRS J Photogr Remote Sens. 2018;145:3–22. https://doi.org/10.1016/j.isprsjprs.2018.04.003. Liu B, Luo H. An improved Yolov5 for multi-rotor UAV detection. Electronics. 2022;11(15):2330. https://doi.org/10.3390/electronics11152330. Yang X, Yang J, Yan J, Zhang Y, Zhang T, Guo Z, Sun X, Fu K. SCRDet: towards more robust detection for small, cluttered and rotated objects. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE. 2019. pp 8231–40. https://doi.org/10.1109/ICCV.2019.00832. Chen C, Zhong J, Tan Y. Multiple-oriented and small object detection with convolutional neural networks for aerial image. Remote Sens. 2019;11(18):2176. https://doi.org/10.3390/rs11182176. Zhao J, Yan J, Xue T, Wang S, Qiu X, Yao X, Tian Y, Zhu Y, Cao W, Zhang X. A deep learning method for oriented and small wheat spike detection (OSWSDet) in UAV images. Comput Electron Agric. 2022;198:107087. https://doi.org/10.1016/j.compag.2022.107087. Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D. A survey of methods for explaining black box models. ACM Comput Surv. 2018;51(5):1–42. https://doi.org/10.1145/3236009. Ghose D, Desai SM, Bhattacharya S, Chakraborty D, Fiterau M, Rahman T. Pedestrian detection in thermal images using saliency maps. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE. 2019. pp. 988–97. https://doi.org/10.1109/CVPRW.2019.00130. Brahimi M, Arsenovic M, Laraba S, Sladojevic S, Boukhalfa K, Moussaoui A. Deep learning for plant diseases: detection and saliency map visualization. Human and machine learning. Cham: Springer; 2018. https://doi.org/10.1007/978-3-319-90403-0_6. Arun N, Gaw N, Singh P, Chang K, Aggarwal M, Chen B, Li MD, Kalpathy-Cramer J. Assessing the trustworthiness of saliency maps for localizing abnormalities in medical imaging. Radiol Artif Intell. 2021. https://doi.org/10.1148/ryai.2021200267. Nagasubramanian K, Singh AK, Singh A, Sarkar S, Ganapathysubramanian B. Usefulness of interpretability methods to explain deep learning based plant stress phenotyping. arXiv. 2020. https://doi.org/10.48550/arXiv.2007.05729. Zhang Y, Shen T. Small object detection with multiple receptive fields. In: IOP Conference Series: Earth and Environmental Science. IOP Publishing. 2020; 440(3): 032093. https://doi.org/10.1088/1755-1315/440/3/032093. Cao J, Chen Q, Guo J, Shi R. Attention-guided context feature pyramid network for object detection. arXiv. 2020. https://doi.org/10.48550/arXiv.2005.11475. Sabottke CF, Spieler BM. The effect of image resolution on deep learning in radiography. Radiol Artif Intell. 2020. https://doi.org/10.1148/ryai.2019190015. Carion N, Massa F, Synnaeve G. End-to-end object detection with transformers. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. Springer International Publishing, 2020. pp. 213–29. https://doi.org/10.1007/978-3-030-58452-8_13 Wang D, Zhang J, Du B, et al. An empirical study of remote sensing pretraining. IEEE Trans Geosci Remote Sens. 2022. https://doi.org/10.1109/LGRS.2022.3143368.