Learning to search: Functional gradient techniques for imitation learning

Autonomous Robots - Tập 27 Số 1 - Trang 25-53 - 2009
Nathan Ratliff1, David Silver2, J. Andrew Bagnell3
1Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
2Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
3Robotics Institute and Machine Learning, Carnegie Mellon University, Pittsburgh, PA, 15213, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In ICML ’04: Proceedings of the twenty-first international conference on machine learning.

Anderson, B. D. O., & Moore, J. B. (1990). Optimal control: linear quadratic methods. Englewood Cliffs: Prentice Hall.

Argall, B., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems.

Atkeson, C., Schaal, S., & Moore, A. (1995). Locally weighted learning. AI Review.

Bain, M., & Sammut, C. (1995). A framework for behavioral cloning. In Machine intelligence agents. London: Oxford University Press.

Boyd, S., Ghaoui, L. E., Feron, E., & Balakrishnan, V. (1994). Linear matrix inequalities in system and control theory. Society for Industrial and Applied Mathematics (SIAM).

Calinon, S., Guenter, F., & Billard, A. (2007). On learning, representing and generalizing a task in a humanoid robot. In IEEE Transactions on Systems, Man and Cybernetics, Part B. Special issue on robot learning by observation, demonstration and imitation, 37, 286–298.

Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning, and games. New York: Cambridge University Press.

Chestnutt, J., Kuffner, J., Nishiwaki, K., & Kagami, S. (2003). Planning biped navigation strategies in complex environments. In Proceedings of the IEEE-RAS, international conference on humanoid robots. Karlsruhe, Germany.

Chestnutt, J., Lau, M., Cheng, G., Kuffner, J., Hodgins, J., & Kanade, T. (2005). Footstep planning for the Honda ASIMO humanoid. In Proceedings of the IEEE, international conference on robotics and automation.

Donoho, D. L., & Elad, M. (2003). Maximal sparsity representation via l1 minimization. Proceedings of the National Academy Sciences, 100, 2197–2202.

Ferguson, D., & Stentz, A. (2006). Using interpolation to improve path planning: The field D* algorithm. Journal of Field Robotics, 23, 79–101.

Friedman, J. H. (1999a). Greedy function approximation: A gradient boosting machine. Annals of Statistics.

Gordon, G. (1999). Approximate solutions to Markov decision processes. Doctoral dissertation, Robotics Institute, Carnegie Mellon University.

Hersch, M., Guenter, F., Calinon, S., & Billard, A. (2008). Dynamical system modulation for robot learning via kinesthetic demonstrations. IEEE Transactions on Robotics, 24, 1463–1467.

Jaynes, E. (2003). Probability: The logic of science. Cambridge: Cambridge University Press.

Kalman, R. (1964). When is a linear control system optimal? Transaction ASME, Journal Basic Engineering, 86, 51–60.

Kelly, A., Amidi, O., Happold, M., Herman, H., Pilarski, T., Rander, P., Stentz, A., Vallidis, N., & Warner, R. (2004). Toward reliable autonomous vehicles operating in challenging environments. In Proceedings of the international symposium on experimental robotics (ISER). Singapore.

Kivinen, J., & Warmuth, M. K. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132.

Kolter, J. Z., Abbeel, P., & Ng, A. Y. (2008). Hierarchical apprenticeship learning with application to quadruped locomotion. Neural Information Processing Systems, 20.

Kulesza, A., & Pereira, F. (2008). Structured learning with approximate inference. In Advances in neural information processing systems. Cambridge: MIT.

LeCun, Y., Muller, U., Ben, J., Cosatto, E., & Flepp, B. (2006). Off-road obstacle avoidance through end-to-end learning. In Advances in neural information processing systems (Vol. 18). Cambridge: MIT.

Mason, L., Baxter, J., Bartlett, P., & Frean, M. (1999). Functional gradient techniques for combining hypotheses. In Advances in large margin classifiers. Cambridge: MIT.

Miller, A. T., Knoop, S., Allen, P. K., & Christensen, H. I. (2003). Automatic grasp planning using shape primitives. In Proceedings of the IEEE, International conference on robotics and automation.

Munoz, D., Bagnell, J. A. D., Vandapel, N., & Hebert, M. (2009). Contextual classification with functional max-margin Markov networks. In IEEE computer society conference on computer vision and pattern recognition (CVPR).

Munoz, D., Vandapel, N., & Hebert, M. (2008). Directional associative Markov network for 3-d point cloud classification. In Fourth international symposium on 3D data processing, visualization and transmission.

Neu, G., & Szepesvari, C. (2007). Apprenticeship learning using inverse reinforcement learning and gradient methods. In Uncertainty in artificial intelligence (UAI).

Ng, A. Y., & Russell, S. (2000). Algorithms for inverse reinforcement learning. In Proc. 17th international conf. on machine learning.

Pomerleau, D. (1989). ALVINN: An autonomous land vehicle in a neural network. In Advances in neural information processing systems (Vol. 1).

Puterman, M. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley.

Ratliff, N., & Bagnell, J. A. (2009). Functional bundle methods. In The Learning workshop. Clearwater Beach, Florida.

Ratliff, N., Bagnell, J. A., & Zinkevich, M. (2006a). Maximum margin planning. In Twenty second international conference on machine learning (ICML06).

Ratliff, N., Bagnell, J. A., & Zinkevich, M. (2007a). (Online) subgradient methods for structured prediction. In Artificial intelligence and statistics. San Juan, Puerto Rico.

Ratliff, N., Bradley, D., Bagnell, J. A., & Chestnutt, J. (2006b). Boosting structured prediction for imitation learning. In NIPS. Vancouver, B.C.

Ratliff, N., Srinivasa, S., & Bagnell, J. A. (2007b). Imitation learning for locomotion and manipulation. In IEEE-RAS international conference on humanoid robots.

Rifkin, Y., Poggio (2003). Regularized least squares classification. In Advances in learning theory: methods, models and applications. Amsterdam: IOS Press.

Rosset, S., Zhu, J., & Hastie, T. (2004). Boosting as a regularized path to a maximum margin classifier. Journal Machine Learning Research, 5, 941–973.

Schaal, S., & Atkeson, C. (1994). Robot juggling: An implementation of memory-based learning. IEEE Control Systems Magazine, 14.

Shor, N. Z. (1985). Minimization methods for non-differentiable functions. Berlin: Springer.

Silver, D., Bagnell, J. A., & Stentz, A. (2008). High performance outdoor navigation from overhead data using imitation learning. In Proceedings of Robotics Science and Systems.

Silver, D., Sofman, B., Vandapel, N., Bagnell, J. A., & Stentz, A. (2006). Experimental analysis of overhead data processing to support long range navigation. In Proceedings of the IEEE/JRS international conference on intelligent robots and systems.

Stentz, A., Bares, J., Pilarski, T., & Stager, D. (2007). The crusher system for autonomous navigation. In AUVSI’s unmanned systems.

Taskar, B., Chatalbashev, V., Guestrin, C., & Koller, D. (2005). Learning structured prediction models: A large margin approach. In Twenty second international conference on machine learning (ICML05).

Taskar, B., Guestrin, C., & Koller, D. (2003). Max margin Markov networks. In Advances in neural information processing systems (NIPS-14).

Taskar, B., Lacoste-Julien, S., & Jordan, M. (2006). Structured prediction via the extragradient method. In Advances in neural information processing systems (Vol. 18). Cambridge: MIT.

Tropp, J. A. (2004). Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50, 2231–2242.

Vandapel, N., Donamukkala, R. R., & Hebert, M. (2003). Quality assessment of traversability maps from aerial lidar data for an unmanned ground vehicle. In Proceedings of the IEEE/JRS international conference on intelligent robots and systems.

Ziebart, B., Bagnell, J. A., Mass, A., & Dey, A. (2008). Maximum entropy inverse reinforcement learning. In Twenty-third AAAI conference.

Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the twentieth international conference on machine learning.