Neuroevolutionary reinforcement learning for generalized control of simulated helicopters

Evolutionary Intelligence - Tập 4 Số 4 - Trang 219-241 - 2011
Rogier Koppejan1, Shimon Whiteson1
1Informatics Institute, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands

Tóm tắt

Từ khóa


Tài liệu tham khảo

Abbeel P, Coates A, Ng A (2010) Autonomous helicopter aerobatics through apprenticeship learning. Int J Robotics Res 29(13):1608–1639

Abbeel P, Coates A, Quigley M, Ng AY (2007) An application of reinforcement learning to aerobatic helicopter flight. In: Advances in neural information processing systems 19. MIT Press, Cambridge, pp 1–8

Abbeel P, Ganapathi V, Ng AY (2006) Learning vehicular dynamics with application to modeling helicopters. In: Proceedings of neural information processing systems (NIPS)

Abbeel P, Ng A (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on machine learning

Abbeel P, Ng AY (2005) Exploration and apprenticeship learning in reinforcement learning. In: Proceedings of the twenty-first international conference on machine learning

Bagnell J, Schneider J (2001) Autonomous helicopter control using reinforcement learning policy search methods. In: Proceedings of the IEEE international conference on robotics and automation 2001

Beielstein T, Markon S (2002) Threshold selection, hypothesis tests and DOE methods. In: 2002 congress on evolutionary computation, pp 777–782

Bellman RE (1957) Dynamic programming. Princeton University Press, Princeton

Bellman RE (1957) A Markov decision process. J Math Mech 6:679–684

Brafman R, Tennenholtz M, Schuurmans D (2003) R-max-A general polynomial time algorithm for near-optimal reinforcement learning. J Mach Learn Res 3(2):213–231

Branke J, Schmidt C (2003) Selection in the presence of noise. In: Proceedings of the genetic and evolutionary computation conference (GECCO), pp 766–777

Branke J, Schmidt C (2004) Sequential sampling in noisy environments. In: Proceedings of the international conference on parallel problem solving from nature (PPSN), pp 202–211

Butz M, Goldberg D, Lanzi P (2005) Gradient descent methods in learning classifier systems: improving XCS performance in multistep problems. IEEE Trans Evolut Comput 9(5)

Cardamone L, Loiacono D, Lanzi P (2009) On-line neuroevolution applied to the open racing car simulator. In: Proceedings of the congress on evolutionary computation (CEC), pp 2622–2629

Cardamone L, Loiacono D, Lanzi PL (2010) Learning to drive in the open racing car simulator using online neuroevolution. Comput Intell AI in Games IEEE Trans 2(3):176–190

Chen S, Wu Y, Luk B (2002) Combined genetic algorithm optimization and regularized orthogonal least squares learning for radial basis function networks. Neural Netw IEEE Trans 10(5):1239–1243

De Boer P, Kroese D, Mannor S, Rubinstein R (2005) A tutorial on the cross-entropy method. Ann Oper Res 134(1):19–67

De Nardi R, Holland O (2006) Ultraswarm: a further step towards a flock of miniature helicopters. In: Proceedings of the SAB workshop on swarm robotics. Springer, Berlin, pp 116–128

De Nardi R, Holland O (2008) Coevolutionary modelling of a miniature rotorcraft. In: IAS-10: intelligent autonomous systems conference, p 364

Floreano D, Mondada F (2002) Evolution of homing navigation in a real mobile robot. IEEE Trans Syst Man Cybern B 26(3):396–407

Floreano D, Urzelai J (2001) Evolution of plastic control networks. Auton Robots 11(3):311–317

Gauci J, Stanley KO (2008) A case study on the critical role of geometric regularity in machine learning. In: Proceedings of the twenty-third AAAI conference on artificial intelligence

Gauci J, Stanley KO (2010) Autonomous evolution of topographic regularities in artificial neural networks. Neural Comput 22(7):1860–1898

Geibel P, Wysotzki F (2005) Risk-sensitive reinforcement learning applied to control under constraints. J Artif Intell Res 24(1):81–108

Goldberg DE, Deb K, Clark JH (1991) Genetic algorithms, noise, and the sizing of populations. Complex Syst 6:333–362

Goldberg D, Rudnick M (1991) Genetic algorithms and the variance of fitness. Complex Syst 5(3):265–278

Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning, 1st edn. Addison-Wesley, Boston

Gomez F, Schmidhuber J, Miikkulainen R (2006) Efficient non-linear control through neuroevolution. In: Proceedings of the European conference on machine learning

Gruau F, Whitley D, Pyeatt L (1996) A comparison between cellular encoding and direct encoding for genetic neural networks. In: Genetic programming 1996: Proceedings of the first annual conference, pp 81–89

Hansen N, Müller S, Koumoutsakos P (2003) Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evolut Comput 11(1):1–18

Harik G, Cantú-Paz E, Goldberg D, Miller B (1999) The gambler’s ruin problem, genetic algorithms, and the sizing of populations. Evolut Comput 7(3):231–253

Heidrich-Meisner V, Igel C (2009) Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search. In: Proceedings of the 26th annual international conference on machinelearning, ACM, pp 401–408

Hernandez-Diaz A, Coello C, Perez F, Caballero R, Molina J, Santana-Quintero L (2008) Seeding the initial population of a multi-objective evolutionary algorithm using gradient-based information. In: evolutionary computation, 2008. CEC 2008. (IEEE world congress on computational intelligence). IEEE congress on, pp 1617–1624

Hoffmann G, Huang H, Waslander S, Tomlin C (2007) Quadrotor helicopter flight dynamics and control: theory and experiment. In: Proceedings of the AIAA guidance, navigation, and control conference, pp 1–20

Hurst J, Bull L (2006) A neural learning classifier system with self-adaptive constructivism for mobile robot control. Artif Life 12(3):353–380

Jin Y (2005) A comprehensive survey of fitness approximation in evolutionary computation. Soft Comput Fusion Found Methodol Appl 9(1):3–12

Jin Y, Olhofer M, Sendhoff B (2002) A framework for evolutionary optimization with approximate fitness functions. IEEE Trans Evolut Comput 6(5):481–494

Julstrom BA (1994) Seeding the population: improved performance in a genetic algorithm for the rectilinear steiner problem. In: Proceedings of the 1994 ACM symposium on applied computing, SAC ’94, pp 222–226

Kaelbling LP (1993) Learning in embedded systems. MIT Press, Cambridge

Kaelbling LP, Littman ML, Moore AP (1996) Reinforcement learning: A survey. J Art Intell Res 4:237–285

Kalyanakrishnan S, Stone P (2009) An empirical analysis of value function-based and policy search reinforcement learning. In: Proceedings of the eighth international joint conference on autonomous agents and multi–agent systems (AAMAS 2009)

Kalyanakrishnan S, Stone P (2010) Efficient selection of multiple bandit arms: theory and practice. In: Proceedings of the twenty-seventh international conference on machine learning (ICML 2010) (to appear)

Kassahun Y, Sommer G (2005) Efficient reinforcement learning through evolutionary acquisition of neural topologies. In: 13th European symposium on artificial neural networks, Bruges, Belgium, pp 259–266

Kearns M, Singh S (1998) Near-optimal reinforcement learning in polynomial time. In: Proceedings of the 15th international conference on machine learning. Morgan Kaufmann, San Francisco, pp 260–268

Kernbach S, Meister E, Scholz O, Humza R, Liedke J, Ricotti L, Jemai J, Havlik J, Liu W (2009) Evolutionary robotics: the next-generation-platform for on-line and on-board artificial evolution. In: CEC’09: IEEE congress on evolutionary computation, pp 1079–1086

Koppejan R (2009) Neuroevolutionary reinforcement learning for generalized helicopter control. Master’s thesis, Universiteit van Amsterdam

Koppejan R, Whiteson S (2009) Neuroevolutionary reinforcement learning for generalized helicopter control. In: GECCO 2009: Proceedings of the genetic and evolutionary computation conference, pp 145–152

Lanzi PL, Colombetti M (1999) An extension to the XCS classifier system for stochastic environments. In: GECCO-99: Proceedings of the genetic and evolutionary computation conference, pp 353–360

Lupashin S, Schollig A, Sherback M, D’Andrea R (2010) A simple learning strategy for high-speed quadrocopter multi-flips. In: ICRA-10: IEEE international conference on robotics and automation, pp 1642–1648

Maron O, Moore AW (1997) The racing algorithm: model selection for lazy learners. Artificial Intelligence Review 11(1–5):193–225

Martín HJA, de Lope J (2009) Learning autonomous helicopter flight with evolutionary reinforcement learning. In: 12th international conference on computer aided systems theory (EUROCAST), pp 75–82

Meyer J, Husbands P, Harvey I (1998) Evolutionary robotics: a survey of applications and problems. In: evolutionary robotics. Springer, pp 1–21

Mihatsch O, Neuneier R (2002) Risk-sensitive reinforcement learning. Mach Learn 49(2):267–290

Moore A, Atkeson C (1993) Prioritized sweeping: reinforcement learning with less data and less real time. Mach Learn 13:103–130

Moriarty DE, Schultz AC, Grefenstette JJ (1999) Evolutionary algorithms for reinforcement learning. J Art Intell Res 11:199–229

Ng A, Jordan M (2000) PEGASUS: a policy search method for large MDPs and POMDPs. In: Proceedings of the sixteenth conference on uncertainty in artificial intelligence, pp 406–415

Ng A.Y, Coates A, Diel M, Ganapathi V, Schulte J, Tse B, Berger E, Liang E (2004) Inverted autonomous helicopter flight via reinforcement learning. In: Proceedings of the international symposium on experimental robotics

Nordin P, Banzhaf W (1997) An on-line method to evolve behavior and to control a miniature robot in real time with genetic programming. Adapt Behav 5(2):107

Ong Y, Nair P, Keane A (2003) Evolutionary optimization of computationally expensive problems via surrogate modeling. AIAA J 41(4):687–696

Oyekan J, Lu B, Li B, Gu D, Hu H (2010) A behavior based control system for surveillance UAVs. In: Liu H, Gu D, Howlett RJJ, Liu Y (eds) Robot intelligence, advanced information and knowledge processing. Springer, Berlin, pp 209–228

Poli R, Cagnoni S (1997) Genetic programming with user-driven selection: experiments on the evolution of algorithms for image enhancement. In: Proceedings of the second annual conference on genetic programming, pp 269–277

Ponterosso P, Fox DSJ (1999) Heuristically seeded genetic algorithms applied to truss optimisation. Eng Comput 15:345–355

Poupart P, Vlassis N, Hoey J, Regan K (2006) An analytic solution to discrete Bayesian reinforcement learning. In: Proceedings of the twenty-third international conference on machine learning

Pratihar D (2003) Evolutionary robotics: a review. Sadhana 28(6):999–1009

Priesterjahn S, Weimer A, Eberling M (2008) Real-time imitation-based adaptation of gaming behaviour in modern computer games. In: Proceedings of the genetic and evolutionary computation conference, pp 1431–1432

Purwin O, D’Andrea R (2009) Performing aggressive maneuvers using iterative learning control. In: ICRA-09: IEEE international conference on robotics and automation, 2009, pp 1731–1736

Regis R, Shoemaker C (2004) Local function approximation in evolutionary algorithms for the optimization of costly functions. IEEE Trans Evolut Comput 8(5):490–505

Sastry K, Lima CF, Goldberg DE (2006) Evaluation relaxation using substructural information and linear estimation. In: Proceedings of the 8th annual conference on genetic and evolutionary computation, GECCO ’06, pp 419–426

Schmidt M, Lipson H (2006) Actively probing and modeling users in interactive coevolution. In: Proceedings of the 8th conference on genetic and evolutionary computation, pp 385–386

Schmidt M, Lipson H (2008) Coevolution of fitness predictors. IEEE Trans Evolut Comput 12(6):736–749

Schroder P, Green B, Grum N, Fleming P (2001) On-line evolution of robust control systems: an industrial active magnetic bearing application. Cont Eng Pract 9(1):37–49

Siebel NT, Sommer G (2007) Evolutionary reinforcement learning of artificial neural networks. Int J Hybrid Intell Syst 4(3):171–183

Sigaud O, Wilson S (2007) Learning classifier systems: a survey. Soft Comput Fusion Found Method Appl 11(11):1065–1078

Stagge P (1998) Averaging efficiently in the presence of noise. Parallel Probl Solving Nat 5:188–197

Stanley KO, D’Ambrosio DB, Gauci J (2009) A hypercube-based indirect encoding for evolving large-scale neural networks. Art Life 15(2):185–212

Stanley KO, Miikkulainen R (2002) Evolving neural networks through augmenting topologies. Evolut Comput 10(2):99–127

Steels L (1994) Emergent functionality in robotic agents through on-line evolution. In: artificial life IV: Proceedings of the fourth international workshop on the synthesis and simulation of living systems, pp 8–16

Stone P, Sutton RS, Kuhlmann G (2005) Reinforcement learning in Robocup-soccer keepaway. Adapt Behav 13(3):165–188

Strehl AL, Li L, Wiewiora E, Langford J, Littman ML (2006) PAC model-free reinforcement learning. In: ICML-06: Proceedings of the 23rd international conference on machine learning, pp 881–888

Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3:9–44

Sutton RS (1990) Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of the seventh international conference on machine learning, pp 216–224

Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

Tan C, Ang J, Tan K, Tay A (2008) Online adaptive controller for simulated car racing. In: Congress on evolutionary computation (CEC), pp 2239–2245

Tang J, Singh A, Goehausen N, Abbeel P (2010) Parameterized maneuver learning for autonomous helicopter flight. In: International conference on robotics and automation (ICRA)

Tanner B, White A (2009) RL-Glue : Language-independent software for reinforcement-learning experiments. J Mach Learn Res 10:2133–2136

Tesauro G (1995) Temporal difference learning and TD-gammon. Commun ACM 38(3):58–68. doi: 10.1145/203330.203343

Watkins C (1989) Learning from delayed rewards. Ph.D. thesis, Cambridge University

Whiteson S, Kohl N, Miikkulainen R, Stone P (2005) Evolving keepaway soccer players through task decomposition. Mach Learn 59(1):5–30

Whiteson S, Stone P (2006) Evolutionary function approximation for reinforcement learning. J Mach Learn Res 7:877–917

Whiteson S, Stone P (2006) On-line evolutionary computation for reinforcement learning in stochastic domains. In: GECCO 2006: Proceedings of the genetic and evolutionary computation conference, pp 1577–1584

Whiteson S, Tanner B, Taylor ME, Stone P (2009) Generalized domains for empirical evaluations in reinforcement learning. In: ICML 2009: Proceedings of the twenty-sixth international conference on machine learning: workshop on evaluation methods for machine learning

Whiteson S, Tanner B, Taylor ME, Stone P (2011) Protecting against evaluation overfitting in empirical reinforcement learning. In: ADPRL 2011: Proceedings of the IEEE symposium on adaptive dynamic programming and reinforcement learning (to appear)

Whiteson S, Tanner B, White A (2010) The reinforcement learning competitions. AI Mag 31(2):81–94

Whiteson S, Taylor ME, Stone P (2010) Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning. Auton Agents Multi-Agent Syst 21(1):1–27

Wilson A, Fern A, Ray S, Tadepalli P (2007) Multi-task reinforcement learning: a hierarchical Bayesian approach. In: Proceedings of the 24th international conference on machine learning, pp 1015–1022

Wilson S (1995) Classifier fitness based on accuracy. Evolut Comput 3(2):149–175

Wilson S (2001) Function approximation with a classifier system. In: GECCO-2001: Proceedings of the genetic and evolutionary computation conference, p 974

Yang D, Flockton S (1995) Evolutionary algorithms with a coarse-to-fine function smoothing. In: IEEE international conference on evolutionary computation 2: 657–662

Yao X (1999) Evolving artificial neural networks. Proc IEEE 87(9):1423–1447

Zufferey J-C, Floreano D, Van Leeuwen M, Merenda T (2002) Evolving vision-based flying robots. In: Lee B, Wallraven P (eds) Proceedings of the 2nd international workshop on biologically motivated computer vision (BMCV). Springer, Berlin, pp 592–600