Neuroevolutionary reinforcement learning for generalized control of simulated helicopters
Tóm tắt
Từ khóa
Tài liệu tham khảo
Abbeel P, Coates A, Ng A (2010) Autonomous helicopter aerobatics through apprenticeship learning. Int J Robotics Res 29(13):1608–1639
Abbeel P, Coates A, Quigley M, Ng AY (2007) An application of reinforcement learning to aerobatic helicopter flight. In: Advances in neural information processing systems 19. MIT Press, Cambridge, pp 1–8
Abbeel P, Ganapathi V, Ng AY (2006) Learning vehicular dynamics with application to modeling helicopters. In: Proceedings of neural information processing systems (NIPS)
Abbeel P, Ng A (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on machine learning
Abbeel P, Ng AY (2005) Exploration and apprenticeship learning in reinforcement learning. In: Proceedings of the twenty-first international conference on machine learning
Bagnell J, Schneider J (2001) Autonomous helicopter control using reinforcement learning policy search methods. In: Proceedings of the IEEE international conference on robotics and automation 2001
Beielstein T, Markon S (2002) Threshold selection, hypothesis tests and DOE methods. In: 2002 congress on evolutionary computation, pp 777–782
Bellman RE (1957) Dynamic programming. Princeton University Press, Princeton
Bellman RE (1957) A Markov decision process. J Math Mech 6:679–684
Brafman R, Tennenholtz M, Schuurmans D (2003) R-max-A general polynomial time algorithm for near-optimal reinforcement learning. J Mach Learn Res 3(2):213–231
Branke J, Schmidt C (2003) Selection in the presence of noise. In: Proceedings of the genetic and evolutionary computation conference (GECCO), pp 766–777
Branke J, Schmidt C (2004) Sequential sampling in noisy environments. In: Proceedings of the international conference on parallel problem solving from nature (PPSN), pp 202–211
Butz M, Goldberg D, Lanzi P (2005) Gradient descent methods in learning classifier systems: improving XCS performance in multistep problems. IEEE Trans Evolut Comput 9(5)
Cardamone L, Loiacono D, Lanzi P (2009) On-line neuroevolution applied to the open racing car simulator. In: Proceedings of the congress on evolutionary computation (CEC), pp 2622–2629
Cardamone L, Loiacono D, Lanzi PL (2010) Learning to drive in the open racing car simulator using online neuroevolution. Comput Intell AI in Games IEEE Trans 2(3):176–190
Chen S, Wu Y, Luk B (2002) Combined genetic algorithm optimization and regularized orthogonal least squares learning for radial basis function networks. Neural Netw IEEE Trans 10(5):1239–1243
De Boer P, Kroese D, Mannor S, Rubinstein R (2005) A tutorial on the cross-entropy method. Ann Oper Res 134(1):19–67
De Nardi R, Holland O (2006) Ultraswarm: a further step towards a flock of miniature helicopters. In: Proceedings of the SAB workshop on swarm robotics. Springer, Berlin, pp 116–128
De Nardi R, Holland O (2008) Coevolutionary modelling of a miniature rotorcraft. In: IAS-10: intelligent autonomous systems conference, p 364
Floreano D, Mondada F (2002) Evolution of homing navigation in a real mobile robot. IEEE Trans Syst Man Cybern B 26(3):396–407
Gauci J, Stanley KO (2008) A case study on the critical role of geometric regularity in machine learning. In: Proceedings of the twenty-third AAAI conference on artificial intelligence
Gauci J, Stanley KO (2010) Autonomous evolution of topographic regularities in artificial neural networks. Neural Comput 22(7):1860–1898
Geibel P, Wysotzki F (2005) Risk-sensitive reinforcement learning applied to control under constraints. J Artif Intell Res 24(1):81–108
Goldberg DE, Deb K, Clark JH (1991) Genetic algorithms, noise, and the sizing of populations. Complex Syst 6:333–362
Goldberg D, Rudnick M (1991) Genetic algorithms and the variance of fitness. Complex Syst 5(3):265–278
Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning, 1st edn. Addison-Wesley, Boston
Gomez F, Schmidhuber J, Miikkulainen R (2006) Efficient non-linear control through neuroevolution. In: Proceedings of the European conference on machine learning
Gruau F, Whitley D, Pyeatt L (1996) A comparison between cellular encoding and direct encoding for genetic neural networks. In: Genetic programming 1996: Proceedings of the first annual conference, pp 81–89
Hansen N, Müller S, Koumoutsakos P (2003) Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evolut Comput 11(1):1–18
Harik G, Cantú-Paz E, Goldberg D, Miller B (1999) The gambler’s ruin problem, genetic algorithms, and the sizing of populations. Evolut Comput 7(3):231–253
Heidrich-Meisner V, Igel C (2009) Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search. In: Proceedings of the 26th annual international conference on machinelearning, ACM, pp 401–408
Hernandez-Diaz A, Coello C, Perez F, Caballero R, Molina J, Santana-Quintero L (2008) Seeding the initial population of a multi-objective evolutionary algorithm using gradient-based information. In: evolutionary computation, 2008. CEC 2008. (IEEE world congress on computational intelligence). IEEE congress on, pp 1617–1624
Hoffmann G, Huang H, Waslander S, Tomlin C (2007) Quadrotor helicopter flight dynamics and control: theory and experiment. In: Proceedings of the AIAA guidance, navigation, and control conference, pp 1–20
Hurst J, Bull L (2006) A neural learning classifier system with self-adaptive constructivism for mobile robot control. Artif Life 12(3):353–380
Jin Y (2005) A comprehensive survey of fitness approximation in evolutionary computation. Soft Comput Fusion Found Methodol Appl 9(1):3–12
Jin Y, Olhofer M, Sendhoff B (2002) A framework for evolutionary optimization with approximate fitness functions. IEEE Trans Evolut Comput 6(5):481–494
Julstrom BA (1994) Seeding the population: improved performance in a genetic algorithm for the rectilinear steiner problem. In: Proceedings of the 1994 ACM symposium on applied computing, SAC ’94, pp 222–226
Kaelbling LP, Littman ML, Moore AP (1996) Reinforcement learning: A survey. J Art Intell Res 4:237–285
Kalyanakrishnan S, Stone P (2009) An empirical analysis of value function-based and policy search reinforcement learning. In: Proceedings of the eighth international joint conference on autonomous agents and multi–agent systems (AAMAS 2009)
Kalyanakrishnan S, Stone P (2010) Efficient selection of multiple bandit arms: theory and practice. In: Proceedings of the twenty-seventh international conference on machine learning (ICML 2010) (to appear)
Kassahun Y, Sommer G (2005) Efficient reinforcement learning through evolutionary acquisition of neural topologies. In: 13th European symposium on artificial neural networks, Bruges, Belgium, pp 259–266
Kearns M, Singh S (1998) Near-optimal reinforcement learning in polynomial time. In: Proceedings of the 15th international conference on machine learning. Morgan Kaufmann, San Francisco, pp 260–268
Kernbach S, Meister E, Scholz O, Humza R, Liedke J, Ricotti L, Jemai J, Havlik J, Liu W (2009) Evolutionary robotics: the next-generation-platform for on-line and on-board artificial evolution. In: CEC’09: IEEE congress on evolutionary computation, pp 1079–1086
Koppejan R (2009) Neuroevolutionary reinforcement learning for generalized helicopter control. Master’s thesis, Universiteit van Amsterdam
Koppejan R, Whiteson S (2009) Neuroevolutionary reinforcement learning for generalized helicopter control. In: GECCO 2009: Proceedings of the genetic and evolutionary computation conference, pp 145–152
Lanzi PL, Colombetti M (1999) An extension to the XCS classifier system for stochastic environments. In: GECCO-99: Proceedings of the genetic and evolutionary computation conference, pp 353–360
Lupashin S, Schollig A, Sherback M, D’Andrea R (2010) A simple learning strategy for high-speed quadrocopter multi-flips. In: ICRA-10: IEEE international conference on robotics and automation, pp 1642–1648
Maron O, Moore AW (1997) The racing algorithm: model selection for lazy learners. Artificial Intelligence Review 11(1–5):193–225
Martín HJA, de Lope J (2009) Learning autonomous helicopter flight with evolutionary reinforcement learning. In: 12th international conference on computer aided systems theory (EUROCAST), pp 75–82
Meyer J, Husbands P, Harvey I (1998) Evolutionary robotics: a survey of applications and problems. In: evolutionary robotics. Springer, pp 1–21
Moore A, Atkeson C (1993) Prioritized sweeping: reinforcement learning with less data and less real time. Mach Learn 13:103–130
Moriarty DE, Schultz AC, Grefenstette JJ (1999) Evolutionary algorithms for reinforcement learning. J Art Intell Res 11:199–229
Ng A, Jordan M (2000) PEGASUS: a policy search method for large MDPs and POMDPs. In: Proceedings of the sixteenth conference on uncertainty in artificial intelligence, pp 406–415
Ng A.Y, Coates A, Diel M, Ganapathi V, Schulte J, Tse B, Berger E, Liang E (2004) Inverted autonomous helicopter flight via reinforcement learning. In: Proceedings of the international symposium on experimental robotics
Nordin P, Banzhaf W (1997) An on-line method to evolve behavior and to control a miniature robot in real time with genetic programming. Adapt Behav 5(2):107
Ong Y, Nair P, Keane A (2003) Evolutionary optimization of computationally expensive problems via surrogate modeling. AIAA J 41(4):687–696
Oyekan J, Lu B, Li B, Gu D, Hu H (2010) A behavior based control system for surveillance UAVs. In: Liu H, Gu D, Howlett RJJ, Liu Y (eds) Robot intelligence, advanced information and knowledge processing. Springer, Berlin, pp 209–228
Poli R, Cagnoni S (1997) Genetic programming with user-driven selection: experiments on the evolution of algorithms for image enhancement. In: Proceedings of the second annual conference on genetic programming, pp 269–277
Ponterosso P, Fox DSJ (1999) Heuristically seeded genetic algorithms applied to truss optimisation. Eng Comput 15:345–355
Poupart P, Vlassis N, Hoey J, Regan K (2006) An analytic solution to discrete Bayesian reinforcement learning. In: Proceedings of the twenty-third international conference on machine learning
Priesterjahn S, Weimer A, Eberling M (2008) Real-time imitation-based adaptation of gaming behaviour in modern computer games. In: Proceedings of the genetic and evolutionary computation conference, pp 1431–1432
Purwin O, D’Andrea R (2009) Performing aggressive maneuvers using iterative learning control. In: ICRA-09: IEEE international conference on robotics and automation, 2009, pp 1731–1736
Regis R, Shoemaker C (2004) Local function approximation in evolutionary algorithms for the optimization of costly functions. IEEE Trans Evolut Comput 8(5):490–505
Sastry K, Lima CF, Goldberg DE (2006) Evaluation relaxation using substructural information and linear estimation. In: Proceedings of the 8th annual conference on genetic and evolutionary computation, GECCO ’06, pp 419–426
Schmidt M, Lipson H (2006) Actively probing and modeling users in interactive coevolution. In: Proceedings of the 8th conference on genetic and evolutionary computation, pp 385–386
Schmidt M, Lipson H (2008) Coevolution of fitness predictors. IEEE Trans Evolut Comput 12(6):736–749
Schroder P, Green B, Grum N, Fleming P (2001) On-line evolution of robust control systems: an industrial active magnetic bearing application. Cont Eng Pract 9(1):37–49
Siebel NT, Sommer G (2007) Evolutionary reinforcement learning of artificial neural networks. Int J Hybrid Intell Syst 4(3):171–183
Sigaud O, Wilson S (2007) Learning classifier systems: a survey. Soft Comput Fusion Found Method Appl 11(11):1065–1078
Stagge P (1998) Averaging efficiently in the presence of noise. Parallel Probl Solving Nat 5:188–197
Stanley KO, D’Ambrosio DB, Gauci J (2009) A hypercube-based indirect encoding for evolving large-scale neural networks. Art Life 15(2):185–212
Stanley KO, Miikkulainen R (2002) Evolving neural networks through augmenting topologies. Evolut Comput 10(2):99–127
Steels L (1994) Emergent functionality in robotic agents through on-line evolution. In: artificial life IV: Proceedings of the fourth international workshop on the synthesis and simulation of living systems, pp 8–16
Stone P, Sutton RS, Kuhlmann G (2005) Reinforcement learning in Robocup-soccer keepaway. Adapt Behav 13(3):165–188
Strehl AL, Li L, Wiewiora E, Langford J, Littman ML (2006) PAC model-free reinforcement learning. In: ICML-06: Proceedings of the 23rd international conference on machine learning, pp 881–888
Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3:9–44
Sutton RS (1990) Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of the seventh international conference on machine learning, pp 216–224
Tan C, Ang J, Tan K, Tay A (2008) Online adaptive controller for simulated car racing. In: Congress on evolutionary computation (CEC), pp 2239–2245
Tang J, Singh A, Goehausen N, Abbeel P (2010) Parameterized maneuver learning for autonomous helicopter flight. In: International conference on robotics and automation (ICRA)
Tanner B, White A (2009) RL-Glue : Language-independent software for reinforcement-learning experiments. J Mach Learn Res 10:2133–2136
Tesauro G (1995) Temporal difference learning and TD-gammon. Commun ACM 38(3):58–68. doi: 10.1145/203330.203343
Watkins C (1989) Learning from delayed rewards. Ph.D. thesis, Cambridge University
Whiteson S, Kohl N, Miikkulainen R, Stone P (2005) Evolving keepaway soccer players through task decomposition. Mach Learn 59(1):5–30
Whiteson S, Stone P (2006) Evolutionary function approximation for reinforcement learning. J Mach Learn Res 7:877–917
Whiteson S, Stone P (2006) On-line evolutionary computation for reinforcement learning in stochastic domains. In: GECCO 2006: Proceedings of the genetic and evolutionary computation conference, pp 1577–1584
Whiteson S, Tanner B, Taylor ME, Stone P (2009) Generalized domains for empirical evaluations in reinforcement learning. In: ICML 2009: Proceedings of the twenty-sixth international conference on machine learning: workshop on evaluation methods for machine learning
Whiteson S, Tanner B, Taylor ME, Stone P (2011) Protecting against evaluation overfitting in empirical reinforcement learning. In: ADPRL 2011: Proceedings of the IEEE symposium on adaptive dynamic programming and reinforcement learning (to appear)
Whiteson S, Taylor ME, Stone P (2010) Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning. Auton Agents Multi-Agent Syst 21(1):1–27
Wilson A, Fern A, Ray S, Tadepalli P (2007) Multi-task reinforcement learning: a hierarchical Bayesian approach. In: Proceedings of the 24th international conference on machine learning, pp 1015–1022
Wilson S (2001) Function approximation with a classifier system. In: GECCO-2001: Proceedings of the genetic and evolutionary computation conference, p 974
Yang D, Flockton S (1995) Evolutionary algorithms with a coarse-to-fine function smoothing. In: IEEE international conference on evolutionary computation 2: 657–662
Zufferey J-C, Floreano D, Van Leeuwen M, Merenda T (2002) Evolving vision-based flying robots. In: Lee B, Wallraven P (eds) Proceedings of the 2nd international workshop on biologically motivated computer vision (BMCV). Springer, Berlin, pp 592–600