Scale-invariant localization using quasi-semantic object landmarks

Autonomous Robots - Tập 45 - Trang 407-420 - 2021

Andrew Holliday¹, Gregory Dudek²

¹McGill University Center for Intelligent Machines, Montréal, Canada

²Samsung AI Center, Montréal, Canada

Tóm tắt

This work presents Object Landmarks, a new type of visual feature designed for visual localization over major changes in distance and scale. An Object Landmark consists of a bounding box $${\mathbf {b}}$$ defining an object, a descriptor $${\mathbf {q}}$$ of that object produced by a Convolutional Neural Network, and a set of classical point features within $${\mathbf {b}}$$ . We evaluate Object Landmarks on visual odometry and place-recognition tasks, and compare them against several modern approaches. We find that Object Landmarks enable superior localization over major scale changes, reducing error by as much as 18% and increasing robustness to failure by as much as 80% versus the state-of-the-art. They allow localization under scale change factors up to 6, where state-of-the-art approaches break down at factors of 3 or more.

Tài liệu tham khảo

Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-up robust features (surf). Computer Vision and Image Understanding, 110(3), 346–359. https://doi.org/10.1016/j.cviu.2007.09.014. Bowman, S. L., Atanasov, N., Daniilidis, K., & Pappas, G. J. (2017). Probabilistic data association for semantic slam. In: 2017 IEEE international conference on robotics and automation (ICRA), pp. 1722–1729. Brown, M., & Lowe, D. G. (2002). Invariant features from interest point groups. In: BMVC, Vol. 4. Chen, Z., Liu, L., Sa, I., Ge, Z., & Chli, M. (2018). Learning context flexible attention model for long-term visual place recognition. IEEE Robotics and Automation Letters, 3(4), 4015–4022. Cummins, M., & Newman, P. (2010) Invited Applications Paper FAB-MAP: Appearance-based place recognition and mapping using a learned visual vocabulary model. In: 27th International conference on machine learning (ICML2010). Cummins, M., & Newman, P. (2011). Appearance-only slam at large scale with fab-map 20. The International Journal of Robotics Research, 30(9), 1100–1123. https://doi.org/10.1177/0278364910385483. DeTone, D., Malisiewicz, T., & Rabinovich, A. (2018). Superpoint: Self-supervised interest point detection and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 224–236. Dudek, G., & Jenkin, M. (2010). Computational principles of mobile robotics. Cambridge: Cambridge University Press. Dudek, G., & Zhang, C. (1995). Pose estimation from image data without explicit object models. In: Research in computer and robot vision. World Scientific, pp. 19–35. Dudek, G., & Zhang, C. (1996). Vision-based robot localization without explicit object models. In: IEEE International conference on robotics and automation, Vol. 1. IEEE, pp. 76–82. Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., & Sattler, T. (2019). D2-Net: A Trainable CNN for joint detection and description of local features. In: Proceedings of the 2019 IEEE/CVF conference on computer vision and pattern recognition Engel, J., Schöps, T., & Cremers, D. (2014). Lsd-slam: Large-scale direct monocular slam. In: European conference on computer vision. Springer, pp. 834–849. Faugeras, O., Luong, Q.-T., & Papadopoulo, T. (2001). The geometry of multiple images: The laws that govern the formation of multiple images of a scene and some of their applications. MIT press Fox, D., Burgard, W., & Thrun, S. (1999). Markov localization for mobile robots in dynamic environments. Journal of Artificial Intelligence Research, 11, 391–427. Galindo, C., Saffiotti, A., Coradeschi, S., Buschka, P., Fernandez-Madrigal, J. A., & Gonzalez, J. (2005). Multi-hierarchical semantic maps for mobile robotics. In: 2005 IEEE/RSJ international conference on intelligent robots and systems, pp. 2278–2283. Garg, S., Suenderhauf, N., & Milford, M. (2018). Lost? appearance-invariant place recognition for opposite viewpoints using visual semantics. arXiv preprintarXiv:1804.05526. Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? the kitti vision benchmark suite. In: Conference on computer vision and pattern recognition (CVPR) Glover, A., Maddern, W., Warren, M., Reid, S., Milford, M., & Wyeth, G. (2011). Openfabmap: An open source toolbox for appearance-based loop closure detection. In: The international conference on robotics and automation. St Paul, Minnesota: IEEE. Hartley, R. I. (1993). Cheirality invariants. In: Proceedings of DARPA image understanding workshop, pp. 745–753. He, K., Zhang, X., Ren, S., & Sun, J. (2015) Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 Holliday, A., & Dudek, G. (2018). Scale-robust localization using general object landmarks. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 1688–1694 Holliday, A., & Dudek, G. (2020). Pre-trained cnns as visual feature extractors: A broad evaluation. In: 2020 17th conference on computer and robot vision (CRV). IEEE, pp. 78–84. Huang, G., Liu, Z., Maaten, L. v. d., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp. 2261–2269. Kaeli, J. W., Leonard, J. J., & Singh, H. (2014). Visual summaries for low-bandwidth semantic mapping with autonomous underwater vehicles. In: 2014 IEEE/OES Autonomous underwater vehicles (AUV), pp. 1–7. Kendall, A., & Cipolla, R. (2016). Modelling uncertainty in deep learning for camera relocalization. In: Proceedings of the international conference on robotics and automation (ICRA) Khaliq, A., Ehsan, S., Chen, Z., Milford, M., & McDonald-Maier, K. (2019). A holistic visual place recognition approach using lightweight cnns for significant viewpoint and appearance changes. IEEE Transactions on Robotics, pp. 1–9. Klein, G., & Murray, D. (2007). Parallel tracking and mapping for small ar workspaces. In: Proceedings of the 2007 6th IEEE and ACM international symposium on mixed and augmented reality, ser. ISMAR ’07. Washington, DC: IEEE Computer Society, pp. 1–10. https://doi.org/10.1109/ISMAR.2007.4538852 Konolige, K. (1998). Small vision systems: Hardware and implementation. Robotics Research. Springer, pp. 203–212. Kriegman, D. J., Triendl, E., & Binford, T. O. (1989). Stereo vision and navigation in buildings for mobile robots. IEEE Transactions on Robotics and Automation, 5(6), 792–803. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 25, pp. 1097–1105). Curran Associates Inc. Leonard, J. J., & Durrant-Whyte, H. F. (1991). Mobile robot localization by tracking geometric beacons. IEEE Transactions on Robotics and Automation, 7(3), 376–382. Li, J., Eustice, R. M., & Johnson-Roberson, M. (2015). High-level visual features for underwater place recognition. Li, J., Meger, D., Dudek, G. (2019). Semantic mapping for view-invariant relocalization. In: Proceedings of the. (2019). IEEE international conference on robotics and automation (ICRA 19), Montreal, Canada Lindeberg, T. (1994). Scale-space theory: a basic tool for analyzing structures at different scales. Journal of Applied Statistics, 21(1–2), 225–270. https://doi.org/10.1080/757582976. Linegar, C., Churchill, W., Newman, P. (2016). Made to measure: Bespoke landmarks for 24-hour, all-weather localisation with a camera. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 787–794. Lowe, D. G. (1999). Object recognition from local scale-invariant features. In: Proceedings of the international conference on computer vision-volume 2, ser. ICCV ’99. IEEE Computer Society, Washington, pp. 1150, http://dl.acm.org/citation.cfm?id=850924.851523 MacKenzie, P., & Dudek, G. (1994). Precise positioning using model-based maps. In: 1994 IEEE international conference on robotics and automation. IEEE, pp. 1615–1621. Merrill, N., & Huang, G. (2018). Lightweight unsupervised deep loop closure. In: Proceedings of robotics: science and systems (RSS), Pittsburgh Mirowski, P., Grimes, M. Koichi, Malinowski, M., Moritz Hermann, K., Anderson, K., Teplyashin, D., Simonyan, K., Kavukcuoglu, K. Zisserman, A., & Hadsell, R. (2018). Learning to navigate in cities without a map, 03. Mur-Artal, R., Montiel, J. M. M., & Tardós, J. D. (2015). Orb-slam: A versatile and accurate monocular slam system. CoRR Nistér, D. (2004). An efficient solution to the five-point relative pose problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6), 756–777. https://doi.org/10.1109/TPAMI.2004.17. Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175. https://doi.org/10.1023/A:1011139631724. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., & Lerer, A. (2017). Automatic differentiation in PyTorch. In: NIPS Autodiff workshop Pronobis, A., & Caputo, B. (2009). COLD: COsy localization database. The International Journal of Robotics Research (IJRR), 28(5):588–594 http://www.pronobis.pro/publications/pronobis2009ijrr Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. (2011). Orb: An efficient alternative to sift or surf. In: proceedings of the 2011 international conference on computer vision, ser. ICCV ’11. Washington, DC: IEEE Computer Society, pp. 2564–2571. https://doi.org/10.1109/ICCV.2011.6126544 Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3), 211–252. Salas-Moreno, R. F., Newcombe, R. A., Strasdat, H., Kelly, P. H., & Davison, A. J. (2013). Slam++: Simultaneous localisation and mapping at the level of objects. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1352–1359. Simo-Serra, E., Trulls, E., Ferraz, L., Kokkinos, I., Fua, P., & Moreno-Noguer, F. (2015). Discriminative learning of deep convolutional feature point descriptors. In: 2015 IEEE international conference on computer vision (ICCV), pp. 118–126. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, arXiv:1409.1556 Spencer, C., & Darvizeh, Z. (1981). The case for developing a cognitive environmental psychology that does not underestimate the abilities of young children. Journal of Environmental Psychology, 1(1), 21–31. Sünderhauf, N., Dayoub, F., Shirazi, S., Upcroft, B., & Milford, M. (2015). On the performance of convnet features for place recognition. CoRR, arXiv:1501.04158 Sünderhauf, N., Shirazi, S., Jacobson, A., Dayoub, F., Pepperell, E., Upcroft, B., & Milford, M. (2015). Place recognition with convnet landmarks: Viewpoint-robust, condition-robust, training-free. In: Proceedings of robotics: science and systems (RSS) Uijlings, J. R. R., van de Sande, K. E. A., Gevers, T., & Smeulders, A. W. M. (2013) Selective search for object recognition. International Journal of Computer Vision, 104(2):154–171, https://ivi.fnwi.uva.nl/isis/publications/2013/UijlingsIJCV2013 Yi, K. M., Trulls, E., Lepetit, V., & Fua, P. (2016). Lift: Learned invariant feature transform. In: European conference on computer vision (ECCV), pp. 467–483. Zitnick, L., & Dollar, P. (2014) Edge boxes: Locating object proposals from edges. In: ECCV: European conference on computer vision, https://www.microsoft.com/en-us/research/publication/edge-boxes-locating-object-proposals-from-edges/

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA