Adaptive submodular inverse reinforcement learning for spatial search and map exploration
Tóm tắt
Finding optimal paths for spatial search and map exploration problems are NP-hard. Since spatial search and environmental exploration are parts of human central activities, learning human behavior from data is a way to solve these problems. Utilizing the adaptive submodularity of two problems, this research proposes an adaptive submodular inverse reinforcement learning (ASIRL) algorithm to learn human behavior. The ASIRL approach is to learn the reward functions in the Fourier domain and then recover it in the spatial domain. The near-optimal path can be computed through learned reward functions. The experiments demonstrate that the ASIRL outperforms state of the art approaches (e.g., REWARDAGG and QVALAGG).
Tài liệu tham khảo
Abbeel, P., Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. Proceedings of the twenty-first international conference on Machine learning pp. 1–8.
Alger, M. (2016). Deep inverse reinforcement learning. Tech. Rep.
Audiffren, J., Valko, M., Lazaric, A., & Ghavamzadeh, M. (2015). Maximum entropy semi-supervised inverse reinforcement learning. Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI).
Balcan, M. F. & Harvey, N. J. (2011). Learning submodular functions. Proceedings of the forty-third annual ACM symposium on Theory of computing, pp. 793–802.
Baraniuk, R. G. (2007). Compressive sensing. IEEE Signal Processing Magazine, 24(4), 118–121.
Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1), 183–202.
Binney, J., Krause, A., & Sukhatme, G. S. (2010). Informative path planning for an autonomous underwater vehicle. IEEE International Conference on Robotics and Automation pp. 4791–4796.
Binney, J., & Sukhatme, G. S. (2012). Branch and bound for informative path planning. IEEE International conference on Robotics and Automation, pp. 2147–2154.
Candès, E. J., Romberg, J., & Tao, T. (2006). Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory, 52(2), 489–509.
Choudhury, S., Bhardwaj, M., Arora, S., Kapoor, A., Ranade, G., Scherer, S., & Dey, D. (2018). Data-driven planning via imitation learning. The International Journal of Robotics Research, 37(13–14), 1632–1672.
Choudhury, S., Kapoor, A., Ranade, G., & Dey, D. (2017). Learning to gather information via imitation. IEEE International Conference on Robotics and Automation (ICRA) pp. 908–915.
Feige, U. (1998). A threshold of ln n for approximating set cover. Journal of the ACM (JACM), 45(4), 634–652.
Gabillon, V., Kveton, B., Wen, Z., Eriksson, B., & Muthukrishnan, S. (2013). Adaptive submodular maximization in bandit setting. In Advances in Neural Information Processing Systems, 26, 2697–2705.
Gabillon, V., Kveton, B., Wen, Z., Eriksson, B., & Muthukrishnan, S. (2014). Large-scale optimistic adaptive submodularity. AAAI Conference on Artificial Intelligence pp. 1816–1823.
Golovin, D., & Krause, A. (2011). Adaptive submodularity: Theory and applications in active learning and stochastic optimization. The Journal of Artificial Intelligence Research, 42(1), 427–486.
Ho, J., & Ermon, S. (2016). Generative adversarial imitation learning. arXiv:1606.03476
Hussein, A., Gaber, M. M., Elyan, E., & Jayne, C. (2017). Imitation learning: A survey of learning methods. ACM Computing Surveys (CSUR), 50(2), 1–35.
Isler, S., Sabzevari, R., Delmerico, J., & Scaramuzza, D. (2016). An information gain formulation for active volumetric 3d reconstruction. IEEE International Conference on Robotics and Automation (ICRA) pp. 3477–3484.
Justin Fu, K. L., & Levine, S. (2018). Learning robust rewards with adversarial inverse reinforcement learning. International Conference on Learning Representations (ICLR).
Khuller, S., Moss, A., & Naor, J. S. (1999). The budgeted maximum coverage problem. Information Processing Letters, 70(1), 39–45.
Kitani, K. M., Ziebart, B. D., Bagnell, J. A., & Hebert, M. (2012). Activity forecasting. In European Conference on Computer Vision (ECCV) pp. 201–214.
Krause, A., & Guestrin, C. (2007). Near-optimal observation selection using submodular functions. AAAI Conference on Artificial Intelligence, 7, 1650–1654.
Krause, A., Singh, A., & Guestrin, C. (2008). Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies. The Journal of Machine Learning Research, 9, 235–284.
Kretzschmar, H., Spies, M., Sprunk, C., & Burgard, W. (2016). Socially compliant mobile robot navigation via inverse reinforcement learning. The International Journal of Robotics Research, 35(11), 1289–1307.
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3431–3440.
Lu, B. X., & Tseng, K. S. (2020). 3d map exploration via learning submodular functions in the fourier domain. International Conference on Unmanned Aircraft Systems (ICUAS).
Lu, B. X., Wu, J. J., Tsai, Y. C., Jiang, W. T., & Tseng, K. S. (2020). A novel telerobotic search system using an unmanned aerial vehicle. IEEE International Conference on Robotic Computing.
Nemhauser, G. L., Wolsey, L. A., & Fisher, M. L. (1978). An analysis of approximations for maximizing submodular set functions. Mathematical Programming, 14, 265–294.
Qaisar, S., Bilal, R. M., Iqbal, W., Naureen, M., & Lee, S. (2013). Compressive sensing: From theory to applications, a survey. Journal of Communications and Networks, 15(5), 443–456.
Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T., Leibs, J., Wheeler, R. & Ng, A. Y. (2009). Ros: an open-source robot operating system. In: ICRA workshop on open source software, vol. 3, p. 5. Kobe, Japan.
Sadeghi, F., & Levine, S. (2016). Cad2rl: Real single-image flight without a single real image. arXiv preprint arXiv:1611.04201
Schmidt, M. (2005). Least squares optimization with l1-norm regularization. CS542B Project Report 504, 195–221.
Shimosaka, M., Sato, J., Takenaka, K., & Hitomi, K. (2017). Fast inverse reinforcement learning with interval consistent graph for driving behavior prediction. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence.
Singh, A., Krause, A., Guestrin, C., Kaiser, W.J., & Batalin, M.A. (2007). Efficient planning of informative paths for multiple robots. Proceedings of the 20th International Joint Conference on Artificial Intelligence 7, 2204–2211.
Stobbe, P., & Krause, A. (2012). Learning fourier sparse set functions. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics pp. 1125–1133.
Sukumar, N. (2004). Construction of polygonal interpolants: A maximum entropy approach. International Journal for Numerical Methods in Engineering, 61(12), 2159–2181.
Sun, L., Zhan, W., & Tomizuka, M. (2018). Probabilistic prediction of interactive driving behavior via hierarchical inverse reinforcement learning. IEEE 21st International Conference on Intelligent Transportation Systems (ITSC) pp. 2111–2117.
Szepesvári, C. (2009). Algorithms for reinforcement learning. Morgan and Claypool.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288.
Tsai, Y. C., Lu, B. X., & Tseng, K. S. (2019). Spatial search via adaptive submodularity and deep learning. IEEE International Symposium on Safety, Security and Rescue Robotics (SSRR). https://doi.org/10.1109/SSRR.2019.8848962
Tsai, Y. C., & Tseng, K. S. (2020). Deep compressed sensing for learning submodular functions. Sensors, 20(9), 2591.
Tseng, K.S. (2016). Learning in human and robot search: Subgoal, submodularity, and sparsity. University of Minnesota Ph.D. dissertation.
Tseng, K. S. (2021). Transfer learning of coverage functions via invariant properties in the fourier domain. Autonomous Robots, 45, 519–542.
Tseng, K. S., & Mettler, B. (2015). Near-optimal probabilistic search via submodularity and sparse regression. Autonomous Robots. https://doi.org/10.1007/s10514-015-9521-5
Tseng, K. S., Mettler, B. (2017) Near-optimal probabilistic search using spatial fourier sparse set. Autonomous Robots.
Tseng, K. S., & Mettler, B. (2018). Analysis of coordination patterns between gaze and control in human spatial search. 2nd IFAC Conference on Cyber-Physical and Human-Systems.
Tseng, K. S., & Mettler, B. (2020). Analysis and augmentation of human performance on telerobotic search problems. IEEE Access, 8, 56590–56606.
Wu, J. J., & Tseng, K. S. (2020). Learning spatial search using submodular inverse reinforcement learning. IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR).
Wulfmeier, M., Ondruska, P., & Posner, I. (2015). Deep inverse reinforcement learning. CoRR, arXiv:1507.04888
Wulfmeier, M., Ondruska, P., & Posner, I. (2015). Maximum entropy deep inverse reinforcement learning. arXiv preprint arXiv:1507.04888
Wulfmeier, M., Wang, D. Z., & Posner, I. (2016). Watch this: Scalable cost-function learning for path planning in urban environments. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) pp. 2089–2095.
Zhang, H., & Vorobeychik, Y. (2016). Submodular optimization with routing constraints. AAAI Conference on Artificial Intelligence, 16, 819–826.
Ziebart, B. D. (2010). Modeling purposeful adaptive behavior with the principle of maximum causal entropy. PhD thesis, Carnegie Mellon University.
Ziebart, B. D., Maas, A. L., Bagnell, J. A., & Dey, A. K. (2008). Maximum entropy inverse reinforcement learning. The Twenty-Third AAAI Conference on Artificial Intelligence.