Cross-validation of a semantic segmentation network for natural history collection specimens
Tóm tắt
Semantic segmentation has been proposed as a tool to accelerate the processing of natural history collection images. However, developing a flexible and resilient segmentation network requires an approach for adaptation which allows processing different datasets with minimal training and validation. This paper presents a cross-validation approach designed to determine whether a semantic segmentation network possesses the flexibility required for application across different collections and institutions. Consequently, the specific objectives of cross-validating the semantic segmentation network are to (a) evaluate the effectiveness of the network for segmenting image sets derived from collections different from the one in which the network was initially trained on; and (b) test the adaptability of the segmentation network for use in other types of collections. The resilience to data variations from different institutions and the portability of the network across different types of collections are required to confirm its general applicability. The proposed validation method is tested on the Natural History Museum semantic segmentation network, designed to process entomological microscope slides. The proposed semantic segmentation network is evaluated through a series of cross-validation experiments designed to test using data from two types of collections: microscope slides (from three institutions) and herbarium sheets (from seven institutions). The main contribution of this work is the method, software and ground truth sets created for this cross-validation as they can be reused in testing similar segmentation proposals in the context of digitization of natural history collections. The cross-validation of segmentation methods should be a required step in the integration of such methods into image processing workflows for natural history collections.
Từ khóa
Tài liệu tham khảo
Allan, E., Dupont, S., Hardy, H., Livermore, L., Price, B., Smith, V.: High-throughput digitization of natural history specimens. Biodiversity Information Science and Standards 3, e37337 (2019). https://doi.org/10.3897/biss.3.37337
Allan, E., Livermore, L., Price, B., Shchedrina, O., Smith, V.: A Novel Automated Mass Digitization Workflow for Natural History Microscope Slides. Biodiversity Data Journal 7, e32342 (2019). https://doi.org/10.3897/BDJ.7.e32342
Bisong E. (2019) Google Colaboratory. In: Building Machine Learning and Deep Learning Models on Google Cloud Platform. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-4470-8_7
Brown, P. A. (1997). A Review of Techniques Used in the Preparation, Curation and Conservation of Microscope Slides at the Natural History Museum, London. The Biology Curator, 10 ‐ Supplement, 1 ‐ 4. URL: http://www.natsca.org/article/455
Can, Y.S., Kabadayı, M.E.: CNN-Based Page Segmentation and Object Classification for Counting Population in Ottoman Archival Documentation. Journal of Imaging 6(5), 32 (2020)
Carranza-Rojas, J., Goeau, H., Bonnet, P., Mata-Montero, E., & Joly, A. (2017). Going deeper in the automated identification of Herbarium specimens. BMC Evolutionary Biology, 17(1), 181. https://bmcevolbiol.biomedcentral.com/articles/https://doi.org/10.1186/s12862-017-1014-z
Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587.
De Brabandere, B., Neven, D., & Van Gool, L. (2017). Semantic instance segmentation with a discriminative loss function. arXiv preprint arXiv:1708.02551.
Dillen, M., Groom, Q., Chagnoux, S., Güntsch, A., Hardisty, A., Haston, E., Livermore, L., Runnel, V., Schulman, L., Willemse, L., Wu, Z., Phillips, S.: A benchmark dataset of herbarium specimen images with label data. Biodiversity Data Journal 7, e31817 (2019). https://doi.org/10.3897/BDJ.7.e31817
Durrant, J., Livermore, L. (2018) Semi-supervised semantic and instance segmentation, Labelling Training Data. Retrieved on 2018–02–13, from: https://github.com/NaturalHistoryMuseum/semantic-segmentation/wiki/Labelling-training-data
Gaikwad, J., Triki, A., Bouaziz, B.: Measuring Morphological Functional Leaf Traits From Digitized Herbarium Specimens Using TraitEx Software. Biodiversity Information Science and Standards 3, e37091 (2019). https://doi.org/10.3897/biss.3.37091
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778). https://arxiv.org/abs/1512.03385
Hudson, L. N., Blagoderov, V., Heaton, A., Holtzhausen, P., Livermore, L., Price, B. W., ... & Smith, V. S. (2015). Inselect: automating the digitization of natural history collections. PLoS one, 10(11), e0143402.
JSTOR (2018). JSTOR Global Plants: Guidelines for Scanning Specimens. From: https://guides.jstor.org/ld.php?content_id=31764146
JSTOR (2018). JSTOR Plants Handbook. From http://www.snsb.info/SNSBInfoOpenWiki/attach/Attachments/JSTOR-Plants-Handbook.pdf
Kirchhoff, A., et al.: Toward a service-based workflow for automated information extraction from herbarium specimens. Databese 2018, 1–11 (2018). https://doi.org/10.1093/database/bay103
Kumar, S.S., Rajendran, P., Prabaharan, P., Soman, K.P.: Text/Image Region Separation for Document Layout Detection of Old Document Images Using Non-linear Diffusion and Level Set. Procedia Computer Science 93, 469–477 (2016)
Livermore, L., et.al. (2017) Digitising Louse Slides. NERC / Natural History Museum pilot project. http://www.nhm.ac.uk/our-science/our-work/digital-museum/digital-collections-programme/digitising-slide-collections.html
Lorieul, T., Pearson, K.D., Ellwood, E.R., Goëau, H., Molino, J.F., Sweeney, P.W., Soltis, P.S.: Toward a large-scale and deep phenological stage annotation of herbarium specimens: Case studies from temperate, tropical, and equatorial floras. Applications in plant sciences 7(3), e01233 (2019). https://doi.org/10.1002/aps3.1233
Ma, K., Shu, Z., Bai, X., Wang, J., & Samaras, D. (2018). DocUNet: document image unwarping via a stacked U-Net. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4700–4709).
Mehri, M., Gomez-Krämer, P., Héroux, P., & Mullot, R. (2013). Old document image segmentation using the autocorrelation function and multiresolution analysis. In Document Recognition and Retrieval XX (Vol. 8658, p. 86580K). International Society for Optics and Photonics.
Meise Botanic Garden (2018) Botanical Collections Virtual Herbarium digital specimens cited where used.
Natural History Museum (2018) The Natural History Museum Data Portal, digital specimens cited where used.
Naturalis Biodiversity Center (2018) BioPortal, the Data Portal of the Naturalis Biodiversity Center, digital specimens cited where used.
Nieva de la Hidalga, A., Owen, D., Spasic, I., Rosin, P., Sun, X.: Use of Semantic Segmentation for Increasing the Throughput of Digitization Workflows for Natural History Collections. Biodiversity Information Science and Standards 3, e37161 (2019). https://doi.org/10.3897/biss.3.37161
Okun, O., Dœrmann, D., & Pietikainen, M. (1999). Page segmentation and zone classification: the state of the art. OULU UNIV (FINLAND) DEPT OF ELECTRICAL ENGINEERING.
Owen, D., Groom, Q., Hardisty, A., Leegwater, T., van Walsum, M., Wijkamp, N., Spasić, I.: Methods for Automated Text Digitisation. Zenodo (2019). https://doi.org/10.5281/zenodo.3364501
Rasmus, A., Berglund, M., Honkala, M., Valpola, H., & Raiko, T. (2015). Semi-supervised learning with ladder networks. In Advances in neural information processing systems (NIPS 2015). pp. 3546–3554. http://papers.nips.cc/paper/5947-semi-supervised-learning-with-ladder-networks.pdf
Rouhan, G., Chagnoux, S., Dennetière, B., Shchäfer, V., & Pignal, M. (2016). The herbonauts website: Recruiting the general public to acquire the data from herbarium labels. Botanists of the Twenty-First Century: Roles, Challenges and Opportunities. United Nations Educational, Scientific and Cultural Organisation, 143–148.
Scharr, H., Minervini, M., French, A.P., Klukas, C., Kramer, D.M., Liu, X., Yin, X.: Leaf segmentation in plant phenotyping: a collation study. Mach. Vis. Appl. 27(4), 585–606 (2016). https://doi.org/10.1007/s00138-015-0737-3
Shafait, F. (2008). Geometric Layout Analysis of scanned documents.Doctoral Thesis, Technical University of Kaiserslautern.
Smith, V.S., Gorman, K., Addink, W., Arvanitidis, C., Casino, A., Dixey, K., Dröge, G., Groom, Q., Haston, E.M., Hobern, D., Knapp, S., Koureas, D., Livermore, L., Seberg, O.: SYNTHESYS+ Abridged Grant Proposal. Research Ideas and Outcomes 5, e46404 (2019). https://doi.org/10.3897/rio.5.e46404
Soltis, P.S., Nelson, G., James, S.A.: Green digitization: Online botanical collections data answering real-world questions. Applications in Plant Sciences 6(2), e1028 (2018). https://doi.org/10.1002/aps3.1028
Saito, T., Rehmsmeier, M.: (2015) The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLoS ONE 10(3), e0118432 (2015). https://doi.org/10.1371/journal.pone.0118432
Triki, A., Bouaziz, B., & Gaikwad, J. (2018) Refined Methodology for Accurately Detecting Objects from Digitized Herbarium Specimens. In ICEI 2018: 10th International Conference on Ecological Informatics-Translating Ecological Data into Knowledge and Decisions in a Rapidly Changing World. https://www.db-thueringen.de/receive/dbt_mods_00037908
Triki, A.; Bouaziz, B.; Mahdi, W. and Gaikwad, J. (2020). Objects Detection from Digitized Herbarium Specimen based on Improved YOLO V3.In Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, ISBN 978–989–758–402–2, pages 523–529. DOI: https://doi.org/10.5220/0009170005230529
White, A., Trizna, M., Frandsen, P., Dorr, L., Dikow, R., Schuettpelz, E.: Evaluating Geographic Patterns of Morphological Diversity in Ferns and Lycophytes Using Deep Neural Networks. Biodiversity Information Science and Standards 3, e37559 (2019). https://doi.org/10.3897/biss.3.37559