Two-Speed Deep-Learning Ensemble for Classification of Incremental Land-Cover Satellite Image Patches
Tóm tắt
High-velocity data streams present a challenge to deep learning-based computer vision models due to the resources needed to retrain for new incremental data. This study presents a novel staggered training approach using an ensemble model comprising the following: (i) a resource-intensive high-accuracy vision transformer; and (ii) a fast training, but less accurate, low parameter-count convolutional neural network. The vision transformer provides a scalable and accurate base model. A convolutional neural network (CNN) quickly incorporates new data into the ensemble model. Incremental data are simulated by dividing the very large So2Sat LCZ42 satellite image dataset into four intervals. The CNN is trained every interval and the vision transformer trained every half interval. We call this combination of a complementary ensemble with staggered training a “two-speed” network. The novelty of this approach is in the use of a staggered training schedule that allows the ensemble model to efficiently incorporate new data by retraining the high-speed CNN in advance of the resource-intensive vision transformer, thereby allowing for stable continuous improvement of the ensemble. Additionally, the ensemble models for each data increment out-perform each of the component models, with best accuracy of 65% against a holdout test partition of the RGB version of the So2Sat dataset.
Tài liệu tham khảo
Abbas T, Fereydoon S, Amin M, Chamran Taghati Hossien P, Amir Hossein Esmaile S (2015) Land use classification using support vector machine and maximum likelihood algorithms by Landsat 5 TM images. Walailak J Sci Technol 12:681–687. https://doi.org/10.14456/WJST.2015.33
Abbasi S, Hajabdollahi M, Karimi N, Samavi S (2020) Modeling teacher-student techniques in deep neural networks for knowledge distillation. In: 2020 International conference on machine vision and image processing (MVIP). IEEE, pp 1–6
Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel MA, Al-Amidie M, Farhan L (2021) Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data 8:53. https://doi.org/10.1186/s40537-021-00444-8
Apache Sedona (2022) https://sedona.apache.org/. Accessed 6 Sept 2022
Appel M, Pebesma E (2019) On-demand processing of data cubes from satellite image collections with the gdalcubes library. Data 4:92
Artstein R, Poesio M (2008) Survey article: inter-coder agreement for computational linguistics. Comput Linguist 34:555–596. https://doi.org/10.1162/coli.07-034-R2
Bau D, Zhu J-Y, Strobelt H, Lapedriza A, Zhou B, Torralba A (2020) Understanding the role of individual units in a deep neural network. Proc Natl Acad Sci 117:30071–30078. https://doi.org/10.1073/pnas.1907375117
Bazi Y, Bashmal L, Rahhal MMA, Dayil RA, Ajlan NA (2021) Vision transformers for remote sensing image classification. Remote Sensing 13:516
Bhatt D, Patel C, Talsania H, Patel J, Vaghela R, Pandya S, Modi K, Ghayvat H (2021) CNN variants for computer vision: history, architecture, application, challenges and future scope. Electronics 10:2470
Boudriki Semlali B-E, Freitag F (2021) SAT-hadoop-processor: a distributed remote sensing big data processing software for earth observation applications. Appl Sci 11:10610
Calandra R, Raiko T, Deisenroth MP, Pouzols FM (2012) Learning deep belief networks from non-stationary streams. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 379–386
Câmara G, Assis LF, Queiroz G, Ferreira K, Llapa E, Vinhas L, Maus V, Ipia A, Souza R (2016) Big earth observation data analytics: matching requirements to system architectures
Chen X, Hsieh C-J, Gong B (2021) When vision transformers outperform ResNets without pre-training or strong data augmentations. Preprint at arXiv:2106.01548
Cheng G, Han J, Lu X (2017a) Remote sensing image scene classification: benchmark and state of the art. Proc IEEE 105:1865–1883
Cheng G, Han J, Lu X (2017b) resisc45. https://www.tensorflow.org/datasets/catalog/resisc45. Accessed 2 Mar 2022
Chollet F (2020) Transfer learning & fine-tuning. Complete guide to transfer learning & fine-tuning in Keras. https://keras.io/guides/transfer_learning/. Accessed 22 Feb 2022
Cudre-Mauroux P (2018) SciDB. In: Sakr S, Zomaya A (eds) Encyclopedia of big data technologies. Springer International Publishing, Cham, pp 1–3
Czyzewski MA (2021) Transfer learning between different architectures via weights injection. Preprint at arXiv:2101.02757
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05). IEEE, pp 886–893
Deng J, Dong W, Socher R, Li L, Kai L, Li F-F (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255
Dhar P (2020) The carbon impact of artificial intelligence. Nat Mach Intell 2:423–425. https://doi.org/10.1038/s42256-020-0219-9
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2020) An image is worth 16x16 words: transformers for image recognition at scale. Preprint at arXiv:2010.11929
Du P, Samat A, Waske B, Liu S, Li Z (2015) Random forest and rotation forest for fully polarized SAR image classification using polarimetric and spatial features. Int J Photogramm Remote Sens 105:38–53
García-Martín E, Rodrigues CF, Riley G, Grahn H (2019) Estimation of energy consumption in machine learning. J Parallel Distrib Comput 134:75–88. https://doi.org/10.1016/j.jpdc.2019.07.007
Gavrilov AD, Jordache A, Vasdani M, Deng J (2018) Preventing model overfitting and underfitting in convolutional neural networks. Int J Softw Sci Comput Intell 10:19–28. https://doi.org/10.4018/IJSSCI.2018100102
Ge S, Isah H, Zulkernine F, Khan S (2019) A scalable framework for multilevel streaming data analytics using deep learning. In: Getov V, Gaudiot JL, Yamai N, Cimato S, Chang M, Teranishi Y, Yang JJ, Leong HV, Shahriar H, Takemoto M, Towey D, Takakura H, Elci A, Takeuchi S, Puri S (eds). 43rd IEEE annual computer software and applications conference, COMPSAC 2019. IEEE Computer Society, pp 189–194
Gomes HM, Read J, Bifet A, Barddal JP, Gama J (2019) Machine learning for streaming data: state of the art, challenges, and opportunities. SIGKDD Explor Newsl 21:6–22. https://doi.org/10.1145/3373464.3373470
Gorelick N, Hancher M, Dixon M, Ilyushchenko S, Thau D, Moore R (2017) Google earth engine: planetary-scale geospatial analysis for everyone. Remote Sens Environ 202:18–27. https://doi.org/10.1016/j.rse.2017.06.031
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. Preprint at arXiv:1503.02531
Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24:417
Joshi A, Pebesma E, Henriques R, Appel M (2019) Scidb based framework for storage and analysis of remote sensing big data. Int Arch Photogramm Remote Sens Spatial Inform Sci-ISPRS Arch 42:43–47. https://doi.org/10.5194/isprs-archives-XLII-5-W3-43-2019
Kim J, Kim T, Kim S, Yoo CD (2019) Edge-labeling graph neural network for few-shot learning. Preprint at arXiv:1905.01436
Landsat Archive Adds Its 10 Millionth Image (2021) https://www.usgs.gov/landsat-missions/news/landsat-archive-adds-its-10-millionth-image. Accessed 5 Sept 2022
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1:541–551. https://doi.org/10.1162/neco.1989.1.4.541
Li D, Zhang HR (2021) Improved regularization and robustness for fine-tuning in neural networks
Li Y, Zhang H, Xue X, Jiang Y, Shen Q (2018) Deep learning for remote sensing image classification: a survey. Wires Data Min Knowl Discov 8:e1264. https://doi.org/10.1002/widm.1264
Lowe G (2004) Sift-the scale invariant feature transform. Int J Comput Vision 60:91–110
Morales F (2021) vit-keras. https://github.com/faustomorales/vit-keras. Accessed Jan 10 2022
Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data 2:1. https://doi.org/10.1186/s40537-014-0007-7
Nayak GK, Mopuri KR, Shaj V, Radhakrishnan VB, Chakraborty A (2019) Zero-shot knowledge distillation in deep networks. In: International conference on machine learning. PMLR, pp 4743–4751
Ng AY (2004) Feature selection, L 1 vs. L 2 regularization, and rotational invariance. Proceedings of the twenty-first international conference on Machine learning, p 78
Niknejad M, Zadeh VM, Heydari M (2014) Comparing different classifications of satellite imagery in forest mapping (case study: Zagros forests in Iran). Int Res J Appl Basic Sci 8:1407–1415
NIST Big Data Public Working Group (2022) https://bigdatawg.nist.gov/home.php. Accessed 5 Sept 2022
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vision 42:145–175. https://doi.org/10.1023/A:1011139631724
Open Data Cube (2022) https://www.opendatacube.org. Accessed 5 Sept 2022
Parker B, Mustafa AM, Khan L (2012) Novel class detection and feature via a tiered ensemble approach for stream mining. In: 2012 IEEE 24th international conference on tools with artificial intelligence, pp 1171–1178
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine Learning in {P}ython. J Mach Learn Res 12:2825–2830
Qiu C, Tong X, Schmitt M, Bechtel B, Zhu XX (2020) Multilevel feature fusion-based CNN for local climate zone classification from sentinel-2 images: benchmark results on the So2Sat LCZ42 dataset. IEEE J Sel Top Appl Earth Obs Remote Sens 13:2793–2806
Raghu M, Unterthiner T, Kornblith S, Zhang C, Dosovitskiy A (2021) Do vision transformers see like convolutional neural networks? Adv Neural Inf Process Syst 34:12116–12128
Rajak R, Raveendran D, Bh MC, Medasani SS (2015) High resolution satellite image processing using hadoop framework. In: 2015 IEEE international conference on cloud computing in emerging markets (CCEM), pp 16–21
Rekik A, Zribi M, Hamida AB, Benjelloun M (2009) An optimal unsupervised satellite image segmentation approach based on pearson system and k-means clustering algorithm initialization. Methods 8
Richards JA, Jia X (2006) Remote sensing digital image analysis: an introduction, 5th 2013 edn. Springer Berlin/Heidelberg, Berlin, Heidelberg
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536. https://doi.org/10.1038/323533a0
Sarle WS (1996) Stopped training and other remedies for overfitting. Comput Sci Stat 352–360
Sedona R, Cavallaro G, Jitsev J, Strube A, Riedel M, Benediktsson JA (2019) Remote sensing big data classification with high performance distributed deep learning. Remote Sens 11:3056
Shakya AK, Ramola A, Vidyarthi A (2021) Exploration of pixel‐based and object‐based change detection techniques by analyzing ALOS PALSAR and LANDSAT data. Smart and Sustainable Intelligent Systems pp 229–244
Simoes R, Camara G, Queiroz G, Souza F, Andrade PR, Santos L, Carvalho A, Ferreira K (2021) Satellite image time series analysis for big earth observation data. Remote Sens 13:2428
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. Preprint at arXiv:1409.1556
Steiner A, Kolesnikov A, Zhai X, Wightman R, Uszkoreit J, Beyer L (2021) How to train your vit? data, augmentation, and regularization in vision transformers. Preprint at arXiv:2106.10270
The CEOS Database (2022) http://database.eohandbook.com/. Accessed 5 Sept 2022
Tho, Nam V, Nguyen D, Le HA (2020) A Big Data Framework for Satellite Images Processing using Apache Hadoop and RasterFrames: A Case Study of Surface Water Extraction in Phu Tho, Viet Nam
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jegou H (2021) Training data-efficient image transformers & distillation through attention. In: Marina M, Tong Z (eds). Proceedings of the 38th international conference on Machine Learning. PMLR, Proceedings of Machine Learning Research, pp 10347–10357
USGS (2021) What is the Landsat satellite program and why is it important? https://www.usgs.gov/faqs/what-landsat-satellite-program-and-why-it-important. Accessed 21 Feb 2022
Valdivieso-Ros C, Alonso-Sarria F, Gomariz-Castillo F (2021) Effect of different atmospheric correction algorithms on sentinel-2 imagery classification accuracy in a semiarid mediterranean area. Remote Sens 13:1770
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Vincent P, Larochelle H, Bengio Y, Manzagol P-A (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning. Association for Computing Machinery, Helsinki, Finland, pp 1096–1103
Wang Q, Liu S, Chanussot J, Li X (2018) Scene classification with recurrent attention of VHR remote sensing images. IEEE Trans Geosci Remote Sens 57:1155–1167
Xia G-S, Yang W, Delon J, Gousseau Y, Sun H, Maître H (2010) Structural high-resolution satellite image indexing. ISPRS TC VII Symposium-100 Years ISPRS, pp 298–303
Xia G-S, Hu J, Hu F, Shi B, Bai X, Zhong Y, Zhang L, Lu X (2017) AID: a benchmark data set for performance evaluation of aerial scene classification. IEEE Trans Geosci Remote Sens 55:3965–3981
Xie S, Girshick R, Dollár P, Tu Z, He K (2016) Aggregated residual transformations for deep neural networks. Preprint at arXiv:1611.05431
Yang Y, Newsam S (2010) Bag-of-visual-words and spatial extensions for land-use classification. In: Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems, pp 270–279
Yang C, Yu M, Li Y, Hu F, Jiang Y, Liu Q, Sha D, Xu M, Gu J (2019) Big Earth data analytics: a survey. Big Earth Data 3:83–107. https://doi.org/10.1080/20964471.2019.1611175
Zhai X, Kolesnikov A, Houlsby N, Beyer L (2021) Scaling vision transformers. Preprint at arXiv:2106.04560
Zhao B, Zhong Y, Xia G-S, Zhang L (2015) Dirichlet-derived multiple topic scene classification model for high spatial resolution remote sensing imagery. IEEE Trans Geosci Remote Sens 54:2108–2123
Zhao Q, Yu L, Du Z, Peng D, Hao P, Zhang Y, Gong P (2022) An overview of the applications of earth observation satellite data: impacts and future trends. Remote Sens (basel, Switzerland) 14:1863. https://doi.org/10.3390/rs14081863
Zhou G, Sohn K, Lee H (2012) Online Incremental feature learning with denoising autoencoders. In: Neil DL, Mark G (eds). Proceedings of the fifteenth international conference on artificial intelligence and statistics. PMLR, Proceedings of Machine Learning Research, pp 1453--1461
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929
Zhou D, Yu Z, Xie E, Xiao C, Anandkumar A, Feng J, Alvarez JM (2022) Understanding the robustness in vision transformers. In: Kamalika C, Stefanie J, Le S, Csaba S, Gang N, Sivan S (eds), Proceedings of the 39th international conference on machine learning. PMLR, Proceedings of Machine Learning Research, pp 27378–27394
Zhu X, Hu J, Qiu C, Shi Y, Bagheri H, Kang J, Li H, Mou L, Zhang G, Häberle M, Han S, Hua Y, Huang R, Hughes L, Sun Y, Schmitt M, Wang Y (2019a) So2Sat LCZ42 30 August 2018 edn. TUM
Zhu XX, Hu J, Qiu C, Shi Y, Kang J, Mou L, Bagheri H, Häberle M, Hua Y, Huang R (2019b) So2Sat LCZ42: A benchmark dataset for global local climate zones classification. Preprint at arXiv:1912.12171
Zou Q, Ni L, Zhang T, Wang Q (2015) Deep learning based feature selection for remote sensing scene classification. IEEE Geosci Remote Sens Lett 12:2321–2325