Learning to search: Functional gradient techniques for imitation learning
Tóm tắt
Từ khóa
Tài liệu tham khảo
Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In ICML ’04: Proceedings of the twenty-first international conference on machine learning.
Anderson, B. D. O., & Moore, J. B. (1990). Optimal control: linear quadratic methods. Englewood Cliffs: Prentice Hall.
Argall, B., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems.
Atkeson, C., Schaal, S., & Moore, A. (1995). Locally weighted learning. AI Review.
Bain, M., & Sammut, C. (1995). A framework for behavioral cloning. In Machine intelligence agents. London: Oxford University Press.
Boyd, S., Ghaoui, L. E., Feron, E., & Balakrishnan, V. (1994). Linear matrix inequalities in system and control theory. Society for Industrial and Applied Mathematics (SIAM).
Calinon, S., Guenter, F., & Billard, A. (2007). On learning, representing and generalizing a task in a humanoid robot. In IEEE Transactions on Systems, Man and Cybernetics, Part B. Special issue on robot learning by observation, demonstration and imitation, 37, 286–298.
Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning, and games. New York: Cambridge University Press.
Chestnutt, J., Kuffner, J., Nishiwaki, K., & Kagami, S. (2003). Planning biped navigation strategies in complex environments. In Proceedings of the IEEE-RAS, international conference on humanoid robots. Karlsruhe, Germany.
Chestnutt, J., Lau, M., Cheng, G., Kuffner, J., Hodgins, J., & Kanade, T. (2005). Footstep planning for the Honda ASIMO humanoid. In Proceedings of the IEEE, international conference on robotics and automation.
Donoho, D. L., & Elad, M. (2003). Maximal sparsity representation via l1 minimization. Proceedings of the National Academy Sciences, 100, 2197–2202.
Ferguson, D., & Stentz, A. (2006). Using interpolation to improve path planning: The field D* algorithm. Journal of Field Robotics, 23, 79–101.
Friedman, J. H. (1999a). Greedy function approximation: A gradient boosting machine. Annals of Statistics.
Gordon, G. (1999). Approximate solutions to Markov decision processes. Doctoral dissertation, Robotics Institute, Carnegie Mellon University.
Hersch, M., Guenter, F., Calinon, S., & Billard, A. (2008). Dynamical system modulation for robot learning via kinesthetic demonstrations. IEEE Transactions on Robotics, 24, 1463–1467.
Kalman, R. (1964). When is a linear control system optimal? Transaction ASME, Journal Basic Engineering, 86, 51–60.
Kelly, A., Amidi, O., Happold, M., Herman, H., Pilarski, T., Rander, P., Stentz, A., Vallidis, N., & Warner, R. (2004). Toward reliable autonomous vehicles operating in challenging environments. In Proceedings of the international symposium on experimental robotics (ISER). Singapore.
Kivinen, J., & Warmuth, M. K. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132.
Kolter, J. Z., Abbeel, P., & Ng, A. Y. (2008). Hierarchical apprenticeship learning with application to quadruped locomotion. Neural Information Processing Systems, 20.
Kulesza, A., & Pereira, F. (2008). Structured learning with approximate inference. In Advances in neural information processing systems. Cambridge: MIT.
LeCun, Y., Muller, U., Ben, J., Cosatto, E., & Flepp, B. (2006). Off-road obstacle avoidance through end-to-end learning. In Advances in neural information processing systems (Vol. 18). Cambridge: MIT.
Mason, L., Baxter, J., Bartlett, P., & Frean, M. (1999). Functional gradient techniques for combining hypotheses. In Advances in large margin classifiers. Cambridge: MIT.
Miller, A. T., Knoop, S., Allen, P. K., & Christensen, H. I. (2003). Automatic grasp planning using shape primitives. In Proceedings of the IEEE, International conference on robotics and automation.
Munoz, D., Bagnell, J. A. D., Vandapel, N., & Hebert, M. (2009). Contextual classification with functional max-margin Markov networks. In IEEE computer society conference on computer vision and pattern recognition (CVPR).
Munoz, D., Vandapel, N., & Hebert, M. (2008). Directional associative Markov network for 3-d point cloud classification. In Fourth international symposium on 3D data processing, visualization and transmission.
Neu, G., & Szepesvari, C. (2007). Apprenticeship learning using inverse reinforcement learning and gradient methods. In Uncertainty in artificial intelligence (UAI).
Ng, A. Y., & Russell, S. (2000). Algorithms for inverse reinforcement learning. In Proc. 17th international conf. on machine learning.
Pomerleau, D. (1989). ALVINN: An autonomous land vehicle in a neural network. In Advances in neural information processing systems (Vol. 1).
Puterman, M. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley.
Ratliff, N., & Bagnell, J. A. (2009). Functional bundle methods. In The Learning workshop. Clearwater Beach, Florida.
Ratliff, N., Bagnell, J. A., & Zinkevich, M. (2006a). Maximum margin planning. In Twenty second international conference on machine learning (ICML06).
Ratliff, N., Bagnell, J. A., & Zinkevich, M. (2007a). (Online) subgradient methods for structured prediction. In Artificial intelligence and statistics. San Juan, Puerto Rico.
Ratliff, N., Bradley, D., Bagnell, J. A., & Chestnutt, J. (2006b). Boosting structured prediction for imitation learning. In NIPS. Vancouver, B.C.
Ratliff, N., Srinivasa, S., & Bagnell, J. A. (2007b). Imitation learning for locomotion and manipulation. In IEEE-RAS international conference on humanoid robots.
Rifkin, Y., Poggio (2003). Regularized least squares classification. In Advances in learning theory: methods, models and applications. Amsterdam: IOS Press.
Rosset, S., Zhu, J., & Hastie, T. (2004). Boosting as a regularized path to a maximum margin classifier. Journal Machine Learning Research, 5, 941–973.
Schaal, S., & Atkeson, C. (1994). Robot juggling: An implementation of memory-based learning. IEEE Control Systems Magazine, 14.
Silver, D., Bagnell, J. A., & Stentz, A. (2008). High performance outdoor navigation from overhead data using imitation learning. In Proceedings of Robotics Science and Systems.
Silver, D., Sofman, B., Vandapel, N., Bagnell, J. A., & Stentz, A. (2006). Experimental analysis of overhead data processing to support long range navigation. In Proceedings of the IEEE/JRS international conference on intelligent robots and systems.
Stentz, A., Bares, J., Pilarski, T., & Stager, D. (2007). The crusher system for autonomous navigation. In AUVSI’s unmanned systems.
Taskar, B., Chatalbashev, V., Guestrin, C., & Koller, D. (2005). Learning structured prediction models: A large margin approach. In Twenty second international conference on machine learning (ICML05).
Taskar, B., Guestrin, C., & Koller, D. (2003). Max margin Markov networks. In Advances in neural information processing systems (NIPS-14).
Taskar, B., Lacoste-Julien, S., & Jordan, M. (2006). Structured prediction via the extragradient method. In Advances in neural information processing systems (Vol. 18). Cambridge: MIT.
Tropp, J. A. (2004). Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50, 2231–2242.
Vandapel, N., Donamukkala, R. R., & Hebert, M. (2003). Quality assessment of traversability maps from aerial lidar data for an unmanned ground vehicle. In Proceedings of the IEEE/JRS international conference on intelligent robots and systems.
Ziebart, B., Bagnell, J. A., Mass, A., & Dey, A. (2008). Maximum entropy inverse reinforcement learning. In Twenty-third AAAI conference.
Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the twentieth international conference on machine learning.