Tăng cường dữ liệu ảnh tàu thuyền chụp từ UAV trong giám sát hàng hải sử dụng mô hình ngôn ngữ đa phương thức và mô hình khuếch tán
Tóm tắt
Từ khóa
#Diffusion; Image synthesis; Data augmentation; Vessel detection.Tài liệu tham khảo
[1]. Cheng, S., Zhu, Y., & Wu, S. “Deep learning based efficient ship detection from drone-captured images for maritime surveillance.” Ocean engineering, 285, 115440, (2023).
[2]. Shorten, C., & Khoshgoftaar, T. M. “A survey on image data augmentation for deep learning.” Journal of big data, 6(1), 1–48, (2019).
[3]. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. “High-resolution image synthesis with latent diffusion models.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10684–10695, (2022).
[4]. Team, G et al. “Gemma: Open models based on gemini research and technology.” arXiv preprint arXiv:2403.08295, (2024).
[5]. Black Forest Lab. “FLUX.”, (2024). https://github.com/black-forest-labs/flux.
[6]. Cheng, T., Song, L., Ge, Y., Liu, W., Wang, X., & Shan, Y. “Yolo-world: Real-time open-vocabulary object detection.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, –, 16901–16911, (2024).
[7]. Glenn, J., & Jing, Q. “Ultralytics YOLO11.”, (2024). https://github.com/ultralytics/ultralytics.
[8]. Goodfellow. I et al. “Generative adversarial nets.” Advances in neural information processing systems, pp. 2672–2680, (2014).
[9]. Xu, M., Xie, L., Liu, Y., Wang, S., & Zhang, Y. “Generative adversarial networks in remote sensing: A review.” ISPRS journal of photogrammetry and remote sensing, 166, 296–312, (2020).
[10]. Zhang, Y., Zhang, C., Zhang, Q., & Xie, W. “Data augmentation with conditional GAN for aerial scene classification.” Remote sensing, 11(3), 243, (2019).
[11]. Dhariwal, P., & Nichol, A. “Diffusion models beat GANs on image synthesis.” Advances in neural information processing systems, 34, 8780–8794, (2021).
[12]. Ho, J., Jain, A., & Abbeel, P. “Denoising diffusion probabilistic models.” arXiv preprint arXiv:2006.11239, (2020).
[13]. Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E., Salimans, T., Ho, J., Fleet, D., & Norouzi, M. “Imagen: Text-to-image diffusion models.” International conference on machine learning (ICML), (2022).
[14]. Wolleb, J., Dejakum, K., Sandkühler, P., Reich, M., Lunz, S., & Cattin, P. C. “Diffusion models for medical anomaly detection.” Medical image analysis, 76, 102327, (2022).
[15]. Rubis, B., Cacace, J., Rodriguez, J., Company, R., Tanner, M., Arzo, R., & Cayero, J. “VESSELImg: A large UAV-based vessel image dataset for port surveillance.” International conference on unmanned aircraft systems (ICUAS), 76–83, (2024).
[16]. https://huggingface.co/google/gemma-3-4b-it
[17]. https://huggingface.co/black-forest-labs/FLUX.1-dev
