Understanding the role of training sample size in the uncertainty of high-resolution LULC mapping using random forest

Springer Science and Business Media LLC - Tập 16 - Trang 3667-3677 - 2023

Kwanele Phinzi¹, Njoya Silas Ngetar², Quoc Bao Pham³, Gashaw Gismu Chakilu⁴, Szilárd Szabó⁵

¹Department of Geography and Environmental Studies, University of Zululand, KwaDlangezwa, South Africa

²School of Agricultural, Earth and Environmental Sciences, University of KwaZulu-Natal, Durban, South Africa

³Faculty of Natural Sciences, Institute of Earth Sciences, University of Silesia in Katowice, Sosnowiec, Poland

⁴Department of Natural Resources Management, Debark University, Debark, Ethiopia

⁵Department of Physical Geography and Geoinformatics, University of Debrecen, Debrecen, Hungary

Tóm tắt

High-resolution sensors onboard satellites are generally reputed for rapidly producing land-use/land-cover (LULC) maps with improved spatial detail. However, such maps are subject to uncertainties due to several factors, including the training sample size. We investigated the effects of different training sample sizes (from 1000 to 12,000 pixels) on LULC classification accuracy using the random forest (RF) classifier. Then, we analyzed classification uncertainties by determining the median and the interquartile range (IQR) of the overall accuracy (OA) values through repeated k-fold cross-validation. Results showed that increasing training pixels significantly improved OA while minimizing model uncertainty. Specifically, larger training samples, ranging from 9000 to 12,000 pixels, exhibited narrower IQRs than smaller samples (1000–2000 pixels). Furthermore, there was a significant variation (Chi2 = 85.073; df = 11; p < 0.001) and a significant trend (J-T = 4641, p < 0.001) in OA values across various training sample sizes. Although larger training samples generally yielded high accuracies, this trend was not always consistent, as the lowest accuracy did not necessarily correspond to the smallest training sample. Nevertheless, models using 9000–11,000 pixels were effective (OA > 96%) and provided an accurate visual representation of LULC. Our findings emphasize the importance of selecting an appropriate training sample size to reduce uncertainties in high-resolution LULC classification.

Tài liệu tham khảo

Abriha D, Srivastava PK, Szabó S (2023) Smaller is better? Unduly nice accuracy assessments in roof detection using remote sensing data with machine learning and k-fold cross-validation. Heliyon 9:1–17. https://doi.org/10.1016/j.heliyon.2023.e14045 Anderson JR, Hardy EE, Roach JT, Witmer RE (1976) A land use and land cover classification system for use with remote sensor data. US Geol Surv Prof Paper 964:28 Aune-Lundberg L, Strand G-H (2014) Environ Model Softw 61:87–97. https://doi.org/10.1016/j.envsoft.2014.07.001. Comparison of variance estimation methods for use with two-dimensional systematic sampling of land use/land cover data Belgiu M, Drăgu L (2016) Random forest in remote sensing: a review of applications and future directions. ISPRS J Photogrammetry Remote Sens 114:24–31. https://doi.org/10.1016/j.isprsjprs.2016.01.011 Bobalova H, Benová A, Kožuch M (2021) Hierarchical object-based mapping of Urban Land Cover using Sentinel-2 data: a case study of six cities in Central Europe. PFG–Journal of Photogrammetry Remote Sensing and Geoinformation Science 89:15–31. https://doi.org/10.1007/s41064-020-00135-8 Breiman L (2001) Random forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324 Bui DH, Mucsi L (2022) Predicting the future land-use change and evaluating the change in landscape pattern in Binh Duong province, Vietnam. Hung Geographical Bull 71:349–364. https://doi.org/10.15201/hungeobull.71.4.3 Burai P, Deák B, Valkó O, Tomor T (2015) Classification of herbaceous vegetation using airborne hyperspectral imagery. Remote Sens 7:2046–2066. https://doi.org/10.3390/rs70202046 Chatziantoniou A, Petropoulos GP, Psomiadis E (2017) Co-Orbital Sentinel 1 and 2 for LULC mapping with emphasis on wetlands in a mediterranean setting based on machine learning. Remote Sens 9:1259. https://doi.org/10.3390/rs9121259 Cheng KS, Ling JY, Lin TW et al (2021) Quantifying uncertainty in Land-Use/Land-Cover classification accuracy: a Stochastic Simulation Approach. Front Environ Sci 9:1–18. https://doi.org/10.3389/fenvs.2021.628214 Congalton RG (1991) A review of assessing the accuracy of classifications of remotely sensed data. Remote Sens Environ 37:35–46. https://doi.org/10.1016/0034-4257(91)90048-B Cutler DR, Edwards TC Jr, Beard KH et al (2007) Random forests for classification in ecology. Ecology 88:2783–2792. https://doi.org/10.1890/07-0539.1 Ebrahimy H, Mirbagheri B, Matkan AA, Azadbakht M (2021) Per-pixel land cover accuracy prediction: a random forest-based method with limited reference sample data. ISPRS J Photogrammetry Remote Sens 172:17–27. https://doi.org/10.1016/j.isprsjprs.2020.11.024 ESRI (2022) ArcGIS Desktop Software (Version 10.4) Everitt JH, Yang C, Fletcher R, Deloach CJ (2008) Comparison of QuickBird and SPOT 5 satellite imagery for mapping giant reed. J Aquat Plant Manag 46:77–82 Foody GM, Mathur A, Sanchez-Hernandez C, Boyd DS (2006) Training set size requirements for the classification of a specific class. Remote Sens Environ 104:1–14. https://doi.org/10.1016/j.rse.2006.03.004 Gascon F, Ramoino F (2017) Sentinel-2 data exploitation with ESA’s Sentinel-2 Toolbox. In: EGU General Assembly Conference Abstracts. p 19548 Gudmann A, Mucsi L (2022) Pixel and object-based Land Cover Mapping and Change Detection from 1986 to 2020 for Hungary using Histogram-based gradient boosting classification Tree Classifier. Geogr Pannonica 26:165–175. https://doi.org/10.5937/gp26-37720 Heydari SS, Mountrakis G (2018) Effect of classifier selection, reference sample size, reference class distribution and scene heterogeneity in per-pixel classification accuracy using 26 landsat sites. Remote Sens Environ 204:648–658. https://doi.org/10.1016/j.rse.2017.09.035 Higgs C, van Niekerk A (2022) Impact of Training Set Configurations for differentiating Plantation Forest Genera with Sentinel-2 Imagery and Machine Learning. Remote Sens 14:3992. https://doi.org/10.3390/rs14163992 Huang C, Asner GP (2009) Applications of remote sensing to alien invasive plant studies. Sensors 9:4869–4889. https://doi.org/10.3390/s90604869 Jensen JR, Cowen DC (1999) Remote sensing of urban/suburban infrastructure and socio-economic attributes. Photogramm Eng Remote Sensing 65:611–622 Jia Y, Ge Y, Ling F et al (2018) Urban land use mapping by combining remote sensing imagery and mobile phone positioning data. Remote Sens 10:446. https://doi.org/10.3390/rs10030446 Jonckheere AR (1954) A distribution-free k-sample test against ordered alternatives. Biometrika 41:133–145. https://doi.org/10.2307/2333011 Khatami R, Mountrakis G, Stehman SV (2016) A meta-analysis of remote sensing research on supervised pixel-based land-cover image classification processes: General guidelines for practitioners and future research. Remote Sens Environ 177:89–100. https://doi.org/10.1016/j.rse.2016.02.028 Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet Classification with Deep Convolutional Neural Networks. In: Advances in Neural Information Processing Systems 25. pp 1–9 Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47:583–621. https://doi.org/10.1080/01621459.1952.10483441 Kuhn M, Wing S, Weston A, Williams C et al (2023) Caret: classification and regression training. R Package Version 6:0–94. https://github.com/topepo/caret/ Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159–174. https://doi.org/10.2307/2529310 Liaw A, Wiener M (2002) Classification and regression by randomForest. R news 2:18–22 Luo X, Tong X, Hu Z, Wu G (2020) Improving urban land cover/use mapping by integrating a hybrid convolutional neural network and an automatic training sample expanding strategy. Remote Sens 12:2292. https://doi.org/10.3390/rs12142292 Ma L, Li M, Ma X et al (2017) A review of supervised object-based land-cover image classification. ISPRS J Photogrammetry Remote Sens 130:277–293. https://doi.org/10.1016/j.isprsjprs.2017.06.001 Matcı DK, Avdan U (2022) Data-driven automatic labelling of land cover classes from remotely sensed images. Earth Sci Inform 15:1059–1071. https://doi.org/10.1007/s12145-022-00788-6 Maxwell AE, Strager MP, Warner TA et al (2019) Large-Area, high spatial Resolution Land Cover Mapping using Random forests, GEOBIA, and NAIP Orthophotography: findings and recommendations. Remote Sens 11:1409. https://doi.org/10.3390/rs11121409 Mazeka B, Phinzi K, Sutherland C (2021) Monitoring changing Land Use-Land Cover Change to reflect the impact of Urbanisation on Environmental Assets in Durban, South Africa. Sustainable Urban futures in Africa. Routledge, pp 132–158. https://doi.org/10.4324/9781003181484-7 Millard K, Richardson M (2015) On the importance of training data sample selection in random forest image classification: a case study in peatland ecosystem mapping. Remote Sens 7:8489–8515. https://doi.org/10.3390/rs70708489 Myburgh G, Van Niekerk A (2013) Effect of feature dimensionality on object-based land cover classification: a comparison of three classifiers. South Afr J Geomatics 2:13–27 Nagel P, Yuan F (2016) High-resolution land cover and impervious surface classifications in the twin cities metropolitan area with NAIP imagery. Photogramm Eng Remote Sensing 82:63–71. https://doi.org/10.14358/PERS.83.1.63 Padmanaban R, Bhowmik AK, Cabral P (2019) Satellite image fusion to detect changing surface permeability and emerging urban heat islands in a fast-growing city. PLoS ONE 14:1–20. https://doi.org/10.1371/journal.pone.0208949 Pawłuszek K, Marczak S, Borkowski A, Tarolli P (2019) Multi-aspect analysis of object-oriented landslide detection based on an extended set of LiDAR-derived terrain features. ISPRS Int J Geoinf 8:321. https://doi.org/10.3390/ijgi8080321 Podsiadlo I, Paris C, Bruzzone L (2021) An approach based on low resolution land-cover-maps and domain adaptation to define representative training sets at large scale. In: International Geoscience and Remote Sensing Symposium (IGARSS). Institute of Electrical and Electronics Engineers Inc., pp 313–316. https://doi.org/10.1109/IGARSS47720.2021.9553498 Qian Y, Zhou W, Yan J et al (2015) Comparing machine learning classifiers for object-based land cover classification using very high resolution imagery. Remote Sens 7:153–168. https://doi.org/10.3390/rs70100153 R Core Team (2021) R: a language and environment for statistical computing. R Foundation for statistical computing, Vienna Ramezan CA, Warner TA, Maxwell AE, Price BS (2021) Effects of training set size on supervised machine-learning land-cover classification of large-area high-resolution remotely sensed data. Remote Sens 13:368. https://doi.org/10.3390/rs13030368 Shang M, Wang S-X, Zhou Y, Du C (2018) Effects of Training samples and classifiers on classification of Landsat-8 imagery. J Indian Soc Remote Sens 46:1333–1340. https://doi.org/10.1007/s12524-018-0777-z Shao Y, Cooner AJ, Walsh SJ (2021) Assessing deep convolutional neural networks and assisted machine perception for urban mapping. Remote Sens 13:1523. https://doi.org/10.3390/rs13081523 Statistics South Africa (2011) “Greater Kokstad Municipality”. https://www.statssa.gov.za/?page_id=993&id=greater-kokstad-municipality. Accessed on 22 August 2023 Talukdar S, Singha P, Mahato S et al (2020) Land-use land-cover classification by machine learning classifiers for satellite observations—a review. Remote Sens 12:1135. https://doi.org/10.3390/rs12071135 Terpstra TJ (1952) The asymptotic normality and consistency of Kendall’s test against trend, when ties are present in one ranking. Indagationes Math 14:327–333 Thanh NP, Kappas M (2017) Comparison of Random Forest, k-Nearest neighbor, and support Vector Machine Classifiers for Land Cover classification using Sentinel-2 imagery. Sensors 18:18. https://doi.org/10.3390/s18010018 Therneau T, Atkinson B, Ripley B (2022) rpart: Recursive partitioning and regression trees. R package version 4.1.19. https://cran.r-project.org/package=rpart Topaloğlu RH, Sertel E, Musaoğlu N (2016) Int archives photogrammetry remote Sens Spat Inform Sci 41:12–49. https://doi.org/10.5194/isprsarchives-XLI-B8-1055-2016. assessment of classification accuracies of Sentinel-2 and landsat-8 data for land cover/use mapping Ustuner M, Sanli FB, Abdikan S (2016) Balanced vs imbalanced training data: classifying RapidEye data with support vector machines. Int Archives Photogrammetry Remote Sens Spat Inform Sci 41:379–384. https://doi.org/10.5194/isprs-archives-XLI-B7-379-2016 Van Niel TG, McVicar TR, Datt B (2005) On the relationship between training sample size and data dimensionality: Monte Carlo analysis of broadband multi-temporal classification. Remote Sens Environ 98:468–480. https://doi.org/10.1016/j.rse.2005.08.011

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA