On the role of geometry in geo-localization
Tóm tắt
Consider the geo-localization task of finding the pose of a camera in a large 3D scene from a single image. Most existing CNN-based methods use as input textured images. We aim to experimentally explore whether texture and correlation between nearby images are necessary in a CNN-based solution for the geo-localization task. To do so, we consider lean images, textureless projections of a simple 3D model of a city. They only contain information related to the geometry of the scene viewed (edges, faces, and relative depth). The main contributions of this paper are: (i) to demonstrate the ability of CNNs to recover camera pose using lean images; and (ii) to provide insight into the role of geometry in the CNN learning process.
Tài liệu tham khảo
Se, S.; Lowe, D.; Little, J. Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks. The International Journal of Robotics Research Vol. 21, No. 8, 735–758, 2002.
Lowe, D. G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision Vol. 60, No. 2, 91–110, 2004.
Li, Y. P.; Snavely, N.; Huttenlocher, D. P. Location recognition using prioritized feature matching. In: Computer Vision — ECCV 2010. Lecture Notes in Computer Science, Vol. 6312. Daniilidis, K.; Maragos, P.; Paragios, N. Eds. Springer Berlin Heidelberg, 791–804, 2010.
Ramalingam, S.; Bouaziz, S.; Sturm, P. Pose estimation using both points and lines for geo-localization. In: Proceedings of the IEEE International Conference on Robotics and Automation, 4716–4723, 2011.
Bansal, M.; Daniilidis, K. Geometric urban geolocalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3978–3985, 2014.
Kendall, A.; Grimes, M.; Cipolla, R. PoseNet: A convolutional network for real-time 6-DOF camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision, 2938–2946, 2015.
Walch, F.; Hazirbas, C.; Leal-Taixé, L.; Sattler, T.; Hilsenbeck, S.; Cremers, D. Image-based localization using LSTMs for structured feature correlation. In: Proceedings of the IEEE International Conference on Computer Vision, 627–637, 2017.
Melekhov, I.; Ylioinas, J.; Kannala, J.; Rahtu, E. Image-based localization using hourglass networks. arXiv preprint arXiv:1703.07971, 2017.
Sattler, T.; Torii, A.; Sivic, J.; Pollefeys, M.; Taira, H.; Okutomi, M.; Pajdla, T. Are large-scale 3D models really necessary for accurate visual localization? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6175–6184, 2017.
Sivic, J.; Zisserman, A. Video Google: A text retrieval approach to object matching in videos. In: Proceedings 9th IEEE International Conference on Computer Vision, 1470–1477, 2003.
Robertsone, D.; Cipolla, R. An Image-based system for urban navigation. In: Proceedings of the British Machine Conference, 84.1–84.10, 2004.
Hays, J.; Efros, A. A. IM2GPS: Estimating geographic information from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, 1–8, 2008.
Bergamo, A.; Sinha, S. N.; Torresani, L. Leveraging structure from motion to learn discriminative codebooks for scalable landmark classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 763–770, 2013.
Zhang, W.; Kosecka, J. Image based localization in urban environments. In: Proceedings of the 3rd International Symposium on 3D Data Processing, Visualization, and Transmission, 33–40, 2006.
Nister, D.; Stewenius, H. Scalable recognition with a vocabulary tree. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2161–2168, 2006.
Schindler, G.; Brown, M.; Szeliski, R. City-scale location recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–7, 2007.
Irschara, A.; Zach, C.; Frahm, J.; Bischof, H. From structure-from-motion point clouds to fast location recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2599–2606, 2009.
Sattler, T.; Leibe, B.; Kobbelt, L. Fast image-based localization using direct 2D-to-3D matching. In: Proceedings of the International Conference on Computer Vision, 667–674, 2011.
Matei, B. C.; Vander Valk, N.; Zhu, Z.; Cheng, H.; Sawhney, H. S. Image to LIDAR matching for geotagging in urban environments. In: Proceedings of the IEEE Workshop on Applications of Computer Vision, 413–420, 2013.
Svarm, L.; Enqvist, O.; Oskarsson, M.; Kahl, F. Accurate localization and pose estimation for large 3D models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 532–539, 2014.
Baatz, G.; Saurer, O.; Köser, K.; Pollefeys, M. Large scale visual geo-localization of images in mountainous terrain. In: Computer Vision — ECCV 2012. Lecture Notes in Computer Science, Vol. 7573. Fitzgibbon, A.; Lazebnik, S.; Perona, P.; Sato, Y.; Schmid, C. Eds. Springer Berlin Heidelberg, 517–530, 2012.
Svarm, L.; Enqvist, O.; Kahl, F.; Oskarsson, M. City-scale localization for cameras with known vertical direction. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 7, 1455–1461, 2017.
Piasco, N.; Sidibé, D.; Demonceaux, C.; Gouet-Brunet, V. A survey on visual-based localization: On the benefit of heterogeneous data. Pattern Recognition Vol. 74, 90–109, 2018.
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–9, 2015.
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778, 2016.
Kendall, A.; Cipolla, R. Modelling uncertainty in deep learning for camera relocalization. In: Proceedings of the IEEE International Conference on Robotics and Automation, 4762–4769, 2016.
Kendall, A.; Cipolla, R. Geometric loss functions for camera pose regression with deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6555–6564, 2017.
Berlin Partner für Wirtschaft und Technologie GmbH. Berlin 3D city model. 2016. Available at https://www.businesslocationcenter.de/en/WA/B/seite0.jsp.
Zhang, C.; Bengio, S.; Hardt, M.; Recht, B.; Vinyals, O. Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530, 2016.
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; Berg, A. C.; Fei-Fei, L. ImageNet large scale visual recognition challenge. International Journal of Computer Vision Vol. 115, No. 3, 211–252, 2015.
Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 248–255, 2009.
OpenStreetMap Wiki contributors. OSM-3D.org.OpenStreetMap Wiki, 2018. Available at https://wiki.openstreetmap.org/w/index.php?title=OSM-3D.org&oldid=2025859.
