Cross-validation of a semantic segmentation network for natural history collection specimens

Machine Vision and Applications - Tập 33 Số 3 - 2022

Abraham Nieva de la Hidalga¹, Paul L. Rosin², Xianfang Sun², Laurence Livermore³, James R. Durrant⁴, James Turner⁵, Mathias Dillen⁶, Alicia Musson⁷, Sarah Phillips⁷, Quentin Groom⁶, Alex Hardisty²

¹Cardiff University

²School of Computer Science and Informatics, Cardiff University, Cardiff, UK

³Natural History Museum, London, UK

⁴London, UK

⁵National Museum Wales, Cardiff, UK

⁶Meise Botanic Garden, Meise, Belgium

⁷Royal Botanic Gardens, Kew, Richmond, UK

Tóm tắt

AbstractSemantic segmentation has been proposed as a tool to accelerate the processing of natural history collection images. However, developing a flexible and resilient segmentation network requires an approach for adaptation which allows processing different datasets with minimal training and validation. This paper presents a cross-validation approach designed to determine whether a semantic segmentation network possesses the flexibility required for application across different collections and institutions. Consequently, the specific objectives of cross-validating the semantic segmentation network are to (a) evaluate the effectiveness of the network for segmenting image sets derived from collections different from the one in which the network was initially trained on; and (b) test the adaptability of the segmentation network for use in other types of collections. The resilience to data variations from different institutions and the portability of the network across different types of collections are required to confirm its general applicability. The proposed validation method is tested on the Natural History Museum semantic segmentation network, designed to process entomological microscope slides. The proposed semantic segmentation network is evaluated through a series of cross-validation experiments designed to test using data from two types of collections: microscope slides (from three institutions) and herbarium sheets (from seven institutions). The main contribution of this work is the method, software and ground truth sets created for this cross-validation as they can be reused in testing similar segmentation proposals in the context of digitization of natural history collections. The cross-validation of segmentation methods should be a required step in the integration of such methods into image processing workflows for natural history collections.

Từ khóa

Tài liệu tham khảo

Allan, E., Dupont, S., Hardy, H., Livermore, L., Price, B., Smith, V.: High-throughput digitization of natural history specimens. Biodiversity Information Science and Standards 3, e37337 (2019). https://doi.org/10.3897/biss.3.37337

Allan, E., Livermore, L., Price, B., Shchedrina, O., Smith, V.: A Novel Automated Mass Digitization Workflow for Natural History Microscope Slides. Biodiversity Data Journal 7, e32342 (2019). https://doi.org/10.3897/BDJ.7.e32342

Bisong E. (2019) Google Colaboratory. In: Building Machine Learning and Deep Learning Models on Google Cloud Platform. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-4470-8_7

Brown, P. A. (1997). A Review of Techniques Used in the Preparation, Curation and Conservation of Microscope Slides at the Natural History Museum, London. The Biology Curator, 10 ‐ Supplement, 1 ‐ 4. URL: http://www.natsca.org/article/455

Can, Y.S., Kabadayı, M.E.: CNN-Based Page Segmentation and Object Classification for Counting Population in Ottoman Archival Documentation. Journal of Imaging 6(5), 32 (2020)

Carranza-Rojas, J., Goeau, H., Bonnet, P., Mata-Montero, E., & Joly, A. (2017). Going deeper in the automated identification of Herbarium specimens. BMC Evolutionary Biology, 17(1), 181. https://bmcevolbiol.biomedcentral.com/articles/https://doi.org/10.1186/s12862-017-1014-z

Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587.

De Brabandere, B., Neven, D., & Van Gool, L. (2017). Semantic instance segmentation with a discriminative loss function. arXiv preprint arXiv:1708.02551.

Dillen, M., Groom, Q., Chagnoux, S., Güntsch, A., Hardisty, A., Haston, E., Livermore, L., Runnel, V., Schulman, L., Willemse, L., Wu, Z., Phillips, S.: A benchmark dataset of herbarium specimen images with label data. Biodiversity Data Journal 7, e31817 (2019). https://doi.org/10.3897/BDJ.7.e31817

Durrant, J., Livermore, L. (2018) Semi-supervised semantic and instance segmentation, Labelling Training Data. Retrieved on 2018–02–13, from: https://github.com/NaturalHistoryMuseum/semantic-segmentation/wiki/Labelling-training-data

Gaikwad, J., Triki, A., Bouaziz, B.: Measuring Morphological Functional Leaf Traits From Digitized Herbarium Specimens Using TraitEx Software. Biodiversity Information Science and Standards 3, e37091 (2019). https://doi.org/10.3897/biss.3.37091

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778). https://arxiv.org/abs/1512.03385

Hudson, L. N., Blagoderov, V., Heaton, A., Holtzhausen, P., Livermore, L., Price, B. W., ... & Smith, V. S. (2015). Inselect: automating the digitization of natural history collections. PLoS one, 10(11), e0143402.

JSTOR (2018). JSTOR Global Plants: Guidelines for Scanning Specimens. From: https://guides.jstor.org/ld.php?content_id=31764146

JSTOR (2018). JSTOR Plants Handbook. From http://www.snsb.info/SNSBInfoOpenWiki/attach/Attachments/JSTOR-Plants-Handbook.pdf

Kirchhoff, A., et al.: Toward a service-based workflow for automated information extraction from herbarium specimens. Databese 2018, 1–11 (2018). https://doi.org/10.1093/database/bay103

Kumar, S.S., Rajendran, P., Prabaharan, P., Soman, K.P.: Text/Image Region Separation for Document Layout Detection of Old Document Images Using Non-linear Diffusion and Level Set. Procedia Computer Science 93, 469–477 (2016)

Livermore, L., et.al. (2017) Digitising Louse Slides. NERC / Natural History Museum pilot project. http://www.nhm.ac.uk/our-science/our-work/digital-museum/digital-collections-programme/digitising-slide-collections.html

Lorieul, T., Pearson, K.D., Ellwood, E.R., Goëau, H., Molino, J.F., Sweeney, P.W., Soltis, P.S.: Toward a large-scale and deep phenological stage annotation of herbarium specimens: Case studies from temperate, tropical, and equatorial floras. Applications in plant sciences 7(3), e01233 (2019). https://doi.org/10.1002/aps3.1233

Ma, K., Shu, Z., Bai, X., Wang, J., & Samaras, D. (2018). DocUNet: document image unwarping via a stacked U-Net. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4700–4709).

Mehri, M., Gomez-Krämer, P., Héroux, P., & Mullot, R. (2013). Old document image segmentation using the autocorrelation function and multiresolution analysis. In Document Recognition and Retrieval XX (Vol. 8658, p. 86580K). International Society for Optics and Photonics.

Meise Botanic Garden (2018) Botanical Collections Virtual Herbarium digital specimens cited where used.

Natural History Museum (2018) The Natural History Museum Data Portal, digital specimens cited where used.

Naturalis Biodiversity Center (2018) BioPortal, the Data Portal of the Naturalis Biodiversity Center, digital specimens cited where used.

Nieva de la Hidalga, A., Owen, D., Spasic, I., Rosin, P., Sun, X.: Use of Semantic Segmentation for Increasing the Throughput of Digitization Workflows for Natural History Collections. Biodiversity Information Science and Standards 3, e37161 (2019). https://doi.org/10.3897/biss.3.37161

Okun, O., Dœrmann, D., & Pietikainen, M. (1999). Page segmentation and zone classification: the state of the art. OULU UNIV (FINLAND) DEPT OF ELECTRICAL ENGINEERING.

Owen, D., Groom, Q., Hardisty, A., Leegwater, T., van Walsum, M., Wijkamp, N., Spasić, I.: Methods for Automated Text Digitisation. Zenodo (2019). https://doi.org/10.5281/zenodo.3364501

Rasmus, A., Berglund, M., Honkala, M., Valpola, H., & Raiko, T. (2015). Semi-supervised learning with ladder networks. In Advances in neural information processing systems (NIPS 2015). pp. 3546–3554. http://papers.nips.cc/paper/5947-semi-supervised-learning-with-ladder-networks.pdf

Rouhan, G., Chagnoux, S., Dennetière, B., Shchäfer, V., & Pignal, M. (2016). The herbonauts website: Recruiting the general public to acquire the data from herbarium labels. Botanists of the Twenty-First Century: Roles, Challenges and Opportunities. United Nations Educational, Scientific and Cultural Organisation, 143–148.

Scharr, H., Minervini, M., French, A.P., Klukas, C., Kramer, D.M., Liu, X., Yin, X.: Leaf segmentation in plant phenotyping: a collation study. Mach. Vis. Appl. 27(4), 585–606 (2016). https://doi.org/10.1007/s00138-015-0737-3

Shafait, F. (2008). Geometric Layout Analysis of scanned documents.Doctoral Thesis, Technical University of Kaiserslautern.

Smith, V.S., Gorman, K., Addink, W., Arvanitidis, C., Casino, A., Dixey, K., Dröge, G., Groom, Q., Haston, E.M., Hobern, D., Knapp, S., Koureas, D., Livermore, L., Seberg, O.: SYNTHESYS+ Abridged Grant Proposal. Research Ideas and Outcomes 5, e46404 (2019). https://doi.org/10.3897/rio.5.e46404

Soltis, P.S., Nelson, G., James, S.A.: Green digitization: Online botanical collections data answering real-world questions. Applications in Plant Sciences 6(2), e1028 (2018). https://doi.org/10.1002/aps3.1028

Saito, T., Rehmsmeier, M.: (2015) The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLoS ONE 10(3), e0118432 (2015). https://doi.org/10.1371/journal.pone.0118432

Triki, A., Bouaziz, B., & Gaikwad, J. (2018) Refined Methodology for Accurately Detecting Objects from Digitized Herbarium Specimens. In ICEI 2018: 10th International Conference on Ecological Informatics-Translating Ecological Data into Knowledge and Decisions in a Rapidly Changing World. https://www.db-thueringen.de/receive/dbt_mods_00037908

Triki, A.; Bouaziz, B.; Mahdi, W. and Gaikwad, J. (2020). Objects Detection from Digitized Herbarium Specimen based on Improved YOLO V3.In Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, ISBN 978–989–758–402–2, pages 523–529. DOI: https://doi.org/10.5220/0009170005230529

White, A., Trizna, M., Frandsen, P., Dorr, L., Dikow, R., Schuettpelz, E.: Evaluating Geographic Patterns of Morphological Diversity in Ferns and Lycophytes Using Deep Neural Networks. Biodiversity Information Science and Standards 3, e37559 (2019). https://doi.org/10.3897/biss.3.37559

Willis, C.G., Ellwood, E.R., Primack, R.B., Davis, C.C., Pearson, K.D., Gallinat, A.S., Yost, J.M., et al.: Old plants, new tricks: Phenological research using herbarium specimens. Trends Ecol. Evol. 32, 531–546 (2017)

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA