Locally geometric semantic crossover: a study on the roles of semantics and homology in recombination operators
Tóm tắt
This study presents an extensive account of Locally Geometric Semantic Crossover (LGX), a semantically-aware recombination operator for genetic programming (GP). LGX is designed to exploit the semantic properties of programs and subprograms, in particular the geometry of semantic space that results from distance-based fitness functions used predominantly in GP. When applied to a pair of parents, LGX picks in them at random a structurally common (homologous) locus, calculates the semantics of subprograms located at that locus, finds a procedure that is semantically medial with respect to these subprograms, and replaces them with that procedure. The library of procedures is prepared prior to the evolutionary run and indexed by a multidimensional structure (kd-tree) allowing for efficient search. The paper presents the rationale for LGX design and an extensive computational experiment concerning performance, computational cost, impact on program size, and capability of generalization. LGX is compared with six other operators, including conventional tree-swapping crossover, semantic-aware operators proposed in previous studies, and control methods designed to verify the importance of homology and geometry of the semantic space. The overall conclusion is that LGX, thanks to combination of the semantically medial operation with homology, improves the efficiency of evolutionary search, lowers the variance of performance, and tends to be more resistant to overfitting.
Tài liệu tham khảo
A. Bajurnow, V. Ciesielski, in Proceedings of the 2004 IEEE Congress on Evolutionary Computation. Layered learning for evolving goal scoring behavior in soccer players (IEEE Press, Portland, 2004), pp. 1828–1835. URL http://goanna.cs.rmit.edu.au/vc/papers/cec2004-bajurnow.pdf
L. Beadle, C. Johnson, in Proceedings of the IEEE World Congress on Computational Intelligence, IEEE Computational Intelligence Society, ed. by J. Wang. Semantically driven crossover in genetic programming (IEEE Press, Hong Kong, 2008), pp. 111–116 doi:10.1109/CEC.2008.4630784
L. Beadle, C.G. Johnson. Semantic analysis of program initialisation in genetic programming. Genet. Program. Evolvable Mach. 10(3), 307–337 (2009) doi:10.1007/s10710-009-9082-5. URL http://www.springerlink.com/content/yn5p45723l6tr487
L. Beadle, C.G. Johnson, in IEEE Congress on Evolutionary Computation, ed. by A. Tyrrell. Semantically driven mutation in genetic programming (IEEE Computational Intelligence Society, IEEE Press, Trondheim, 2009), pp. 1336–1342. doi:10.1109/CEC.2009.4983099
J.L. Bentley, Multidimensional binary search trees used for associative searching. Commun. ACM. 18 (1975)
M. Blum, R.W. Floyd, V. Pratt, R.L. Rivest, R.E. Tarjan, Time bounds for selection. J. Comput. Syst. Sci. (1973)
J.H. Friedman, J.L. Bentley, R.A. Finkel, An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3(3), 209–226 (1977)
E. Galvan Lopez, R. Poli,C.A. Coello Coello, in Genetic Programming 7th European Conference, EuroGP 2004, Proceedings, LNCS, eds. by M. Keijzer, U.M. O’Reilly, S.M. Lucas, E. Costa, T. Soule. Reusing code in genetic programming, vol. 3003, (Springer, Coimbra, 2004), pp. 359–368. URL http://delta.cs.cinvestav.mx/ccoello/conferences/eurogp04.pdf.gz
D.E. Goldberg, The Design of Innovation: Lessons from and for Competent Genetic Algorithms. (Kluwer, Norwell, 2002)
T. Haynes, in Genetic Programming 1997: Proceedings of the Second Annual Conference, eds. by J.R. Koza, K. Deb, M. Dorigo, D.B. Fogel, M. Garzon, H. Iba, R.L. Riolo. On-line adaptation of search via knowledge reuse, (Morgan Kaufmann, Stanford University, CA, USA 1997) p. 156–161. URL http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.3381
G.S. Hornby, J.B. Pollack, Creating high-level components with a generative representation for body-brain evolution. Artif. Life 8(3):223–246 (2002) doi: 10.1162/106454602320991837. URL http://www.demo.cs.brandeis.edu/papers/hornby_alife02.pdf
D. Howard, in Genetic Programming Theory and Practice, chap. 10, eds. by R.L. Riolo, B. Worzel. Modularization by multi-run frequency driven subtree encapsulation (Kluwer, The Netherland, 2003) pp. 155–172.
W.H. Hsu, S.J. Harmon, E. Rodriguez, C. Zhong, in Late Breaking Papers at the 2004 Genetic and Evolutionary Computation Conference, ed. by M. Keijzer. Empirical comparison of incremental reuse strategies in genetic programming for keep-away soccer (Seattle, Washington, 2004). URL http://www.cs.bham.ac.uk/wbl/biblio/gecco2004/LBP010.pdf
D. Jackson, in Proceedings of the 13th European Conference on Genetic Programming, EuroGP 2010, LNCS, eds. by A.I. Esparcia-Alcazar, A. Ekart, S. Silva, S. Dignum, A.S. Uyar. Phenotypic diversity in initial genetic programming populations, vol. 6021, (Springer, Istanbul, 2010), pp. 98–109. doi:10.1007/978-3-642-12148-7_9
W. Jaskowski, K. Krawiec, B. Wieloch, in GECCO ’07: Proceedings of the 9th Annual Conference on Genetic and evolutionary computation, eds. by D. Thierens, H.G. Beyer, J. Bongard, J. Branke, J.A. Clark, D. Cliff, C.B. Congdon, K. Deb, B. Doerr, T. Kovacs, S. Kumar, J.F. Miller, J. Moore, F. Neumann, M. Pelikan, R. Poli, K. Sastry, K.O. Stanley, T. Stutzle, R.A. Watson, I. Wegener. Genetic programming for cross-task knowledge sharing, vol. 2, (ACM Press, London, 2007) pp. 1620–1627. doi: 10.1145/1276958.1277281. URL http://www.cs.bham.ac.uk/wbl/biblio/gecco2007/docs/p1620.pdf
K.E. Kinnear Jr., in Proceedings of the 1994 IEEE World Conference on Computational Intelligence. Fitness landscapes and difficulty in genetic programming, vol. 1, (IEEE Press, Orlando, 1994) pp. 142–147. doi:10.1109/ICEC.1994.350026. URL http://www.cs.ucl.ac.uk/staff/W.Langdon/ftp/ftp.io.com/papers/kinnear.wcci.ps.Z
J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection. (MIT Press, Cambridge, 1992)
K. Krawiec, in Proceedings of the 15th European Conference on Genetic Programming, EuroGP 2012, LNCS vol. 7244, eds. by A. Moraglio, S. Silva, K. Krawiec, P. Machado, C. Cotta. Medial crossovers for genetic programming (Springer, Malaga, Spain 2012), pp. 61–72 doi:10.1007/978-3-642-29139-5_6
K. K.Krawiec, P. Lichocki, in GECCO ’09: Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, eds. by G. Raidl, F. Rothlauf, G. Squillero, R. Drechsler, T. Stuetzle, M. Birattari, C.B. Congdon, M. Middendorf, C. Blum, C. Cotta, P. Bosman, J. Grahl, J. Knowles, D. Corne, H.G. Beyer, K. Stanley, J.F. Miller, J. van Hemert, T. Lenaerts, M. Ebner, J. Bacardit, M. O’Neill, M. Di Penta, B. Doerr, T. Jansen, R. Poli, E. Alba. Approximating geometric crossover in semantic space (ACM, Montreal 2009), pp. 987–994. URL http://doi.acm.org/10.1145/1569901.1570036
K. Krawiec, T. Pawlak, in GECCO ’12: Proceedings of the 2012 GECCO Conference Companion on Genetic and Evolutionary Computation, Locally geometric semantic crossover, (ACM, Philadelphia, 2012), (to appear)
K. Krawiec, B. Wieloch, Analysis of semantic modularity for genetic programming. Found. Comput. Decisi. Sci. 34(4), 265–285 (2009). URL http://fcds.cs.put.poznan.pl/FCDS2/ArticleDetails.aspx?articleId=219
K. Krawiec, Wieloch, B, in GECCO 09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, eds. by G. Raidl, F. Rothlauf, G. Squillero, R. Drechsler, T. Stuetzle, M. Birattari, C.B. Congdon, M. Middendorf, C. Blum, C. Cotta, P. Bosman, J. Grahl, J. Knowles, D. Corne, H.G. Beyer, K. Stanley, J.F. Miller, J. van Hemert, T. Lenaerts, M. Ebner, J. Bacardit, M. O’Neill, M. Di Penta, B. Doerr, T. Jansen, R. Poli, E. Alba. Functional modularity for genetic programming (ACM, Montreal 2009) pp. 995–1002. URL http://doi.acm.org/10.1145/1569901.1570037
K. Krawiec, B. Wieloch, in IEEE Congress on Evolutionary Computation (CEC 2010). Automatic generation and exploitation of related problems in genetic programming, (IEEE Press, Barcelona, 2010). doi:10.1109/CEC.2010.5586120
W. Langdon, R. Poli, Foundations of Genetic Programming. (Springer, Berlin, 2002)
D.T. Lee, C.K. Wong, Worst-case analysis for region and partial region searches in multidimensional binary search trees and balanced quad trees. Acta. Inform. 9, 23–29, (1977)
S. Luke, The ECJ Owner’s Manual—A User Manual for the ECJ Evolutionary Computation Library, zeroth edition, online version 0.2 edn. (2010). URL http://www.cs.gmu.edu/eclab/projects/ecj/docs/manual/manual.pdf
J. McDermott, D.R. White, S. Luke, L. Manzoni, M. Castelli, L. Vanneschi, W. Jaśkowski, K. Krawiec, R. Harper, K.D. Jong, U.M. O’Reilly, in GECCO ’12: Proceedings of the 2012 GECCO Conference on Genetic and Evolutionary Computation. Genetic programming needs better benchmarks, (ACM, Philadelphia, 2012). (to appear)
N.F. McPhee, B. Ohs, T. Hutchison, in Proceedings of the 11th European Conference on Genetic Programming, EuroGP 2008, Lecture Notes in Computer Science, vol. 4971, eds. by M. O’Neill, L. Vanneschi, S. Gustafson, A.I. Esparcia Alcazar, I. De Falco, A. Della Cioppa, E. Tarantino. Semantic building blocks in genetic programming (Springer, Naples, 2008), pp. 134–145. doi:10.1007/978-3-540-78671-9_12
T. Mitchell, Machine Learning. (McGraw-Hill, New York, 1997)
A. Moraglio, Towards a geometric unification of evolutionary algorithms. Ph.D. thesis, Department of Computer Science, University of Essex, UK (2007). URL http://eden.dei.uc.pt/moraglio/Thesis_final.pdf
A. Moraglio, in Foundations of Genetic Algorithms, eds. by H.G. Beyer, W. Langdon. Abstract convex evolutionary search (ACM, Schwarzenberg, Austria 2011), pp. 151–162. doi:10.1145/1967654.1967668
A. Moraglio, K. Krawiec, C. Johnson, in The 5th Workshop on Theory of Randomized Search Heuristics, ThRaSH’2011, eds. by C. Igel, P.K. Lehre, C. Witt. Geometric semantic genetic programming (Copenhagen, Denmark 2011). URL http://www.thrash-workshop.org/slides/moraglio.pdf
A. Moraglio, R. Poli, in Genetic and Evolutionary Computation—GECCO-2004, Part I, Lecture Notes in Computer Science, vol. 3102, K. Deb, R. Poli, W. Banzhaf, H.G. Beyer, E. Burke, P. Darwen, D. Dasgupta, D. Floreano, J. Foster, M. Harman, O. Holland, P.L. Lanzi, L. Spector, A. Tettamanzi, D. Thierens, A. Tyrrell. Topological interpretation of crossover (Springer, Seattle, 2004), pp. 1377–1388. URL http://privatewww.essex.ac.uk/amoragn/gecco2004fin.PDF
A. Moraglio, S. Silva, in GECCO ’11: Proceedings of the 13th Annual Conference on Genetic and evolutionary computation, eds. by N. Krasnogor, P.L. Lanzi, A. Engelbrecht, D. Pelta, C. Gershenson, G. Squillero, A. Freitas, M. Ritchie, M. Preuss, C. Gagne, Y.S. Ong, G. Raidl, M. Gallager, J. Lozano, C. Coello-Coello, D.L. Silva, N. Hansen, S. Meyer-Nieberg, J. Smith, G. Eiben, E. Bernado-Mansilla, W. Browne, L. Spector, T. Yu, J. Clune, G. Hornby, M.L. Wong, P. Collet, S. Gustafson, J.P. Watson, M. Sipper, S. Poulding, G. Ochoa, M. Schoenauer, C. Witt, A. Auger. Geometric nelder-mead algorithm on the space of genetic programs (ACM, Dublin, Ireland 2011), pp. 1307–1314. doi:10.1145/2001576.2001753
Q.U. Nguyen, X.H. Nguyen, M. O’Neill, in Proceedings of the 12th European Conference on Genetic Programming, EuroGP 2009, LNCS, vol. 5481, eds. by L. Vanneschi, S. Gustafson, A. Moraglio, I. De Falco, M. Ebner. Semantic aware crossover for genetic programming: The case for real-valued function regression (Springer, Tuebingen 2009), p. 292–302. doi:10.1007/978-3-642-01181-8_25
M. Pelikan, Hierarchical Bayesian Optimization Algorithms. (Springer, Berlin, 2005)
R. Poli, W.B. Langdon, Genetic programming with one-point crossover and point mutation. Tech. Rep. CSRP-97-13, (University of Birmingham, School of Computer Science, Birmingham, B15 2TT, UK, 1997). URL ftp://ftp.cs.bham.ac.uk/pub/tech-reports/1997/CSRP-97-13.ps.gz
R. Poli, W.B. Langdon, Schema theory for genetic programming with one-point crossover and point mutation. Evol. Comput. 6(3), 231–252 (1998). doi: 10.1162/evco.1998.6.3.253. URL http://cswww.essex.ac.uk/staff/poli/papers/Poli-ECJ1998.pdf
R. Poli, W.B. Langdon, N.F. McPhee, A field guide to genetic programming. Published via http://lulu.com and freely available at http://www.gp-field-guide.org.uk (2008). URL http://www.gp-field-guide.org.uk. (With contributions by J. R. Koza)
J.P. Rosca, D.H. Ballard, in Advances in Genetic Programming 2, chap. 9, P.J. Angeline, K.E. Kinnear, Jr. Discovery of subroutines in genetic programming, (MIT Press, Cambridge, 1996) pp. 177–202. URL ftp://ftp.cs.rochester.edu/pub/u/rosca/gp/96.aigp2.dsgp.ps.gz
C. Ryan, M. Keijzer, M. Cattolico, in Genetic Programming Theory and Practice II, chap. 7, eds. by U.M. O’Reilly, T. Yu, R.L. Riolo, B. Worzel. Favorable biasing of function sets using run transferable libraries. (Springer, Ann Arbor 2004), pp. 103–120. doi:10.1007/0-387-23254-0_7
S. Silva, in GECCO ’11: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, eds. by N. Krasnogor, P.L. Lanzi, A. Engelbrecht, D. Pelta, C. Gershenson, G. Squillero, A. Freitas, M. Ritchie, M. Preuss, C. Gagne, Y.S. Ong, G. Raidl, M. Gallager, J. Lozano, C. Coello-Coello, D.L. Silva, N. Hansen, S. Meyer-Nieberg, J. Smith, G. Eiben, E. Bernado-Mansilla, W. Browne, L. Spector, T. Yu, J. Clune, G. Hornby, M.L. Wong, P. Collet, S. Gustafson, J.P. Watson, M. Sipper, S. Poulding, G. Ochoa, M. Schoenauer, C. Witt, A. Auger. Reassembling operator equalisation: a secret revealed (ACM, Dublin, Ireland 2011), pp. 1395–1402. doi:10.1145/2001576.2001764
H. Simon, The Sciences of the Artificial. (MIT Press, Cambridge, 1969)
N.Q. Uy, N.X. Hoai, M. O’Neill, R.I. McKay, E. Galvan-Lopez, Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet. Program. Evolvable Mach. 12(2), 91–119 (2011) doi:10.1007/s10710-010-9121-2
N.Q. Uy, M. O’Neill, X.H. Nguyen, B. McKay, E.G. Lopez, in 9th International Conference, Evolution Artificielle, EA 2009, Lecture Notes in Computer Science, vol. 5975, P. Collet, N. Monmarche, P. Legrand, M. Schoenauer, E. Lutton. Semantic similarity based crossover in GP: The case for real-valued function regression (Springer, Strasbourg, France 2009), pp. 170–181. doi:10.1007/978-3-642-14156-0