Sample size calculations for the experimental comparison of multiple algorithms on multiple problem instances
Tóm tắt
This work presents a statistically principled method for estimating the required number of instances in the experimental comparison of multiple algorithms on a given problem class of interest. This approach generalises earlier results by allowing researchers to design experiments based on the desired best, worst, mean or median-case statistical power to detect differences between algorithms larger than a certain threshold. Holm’s step-down procedure is used to maintain the overall significance level controlled at desired levels, without resulting in overly conservative experiments. This paper also presents an approach for sampling each algorithm on each instance, based on optimal sample size ratios that minimise the total required number of runs subject to a desired accuracy in the estimation of paired differences. A case study investigating the effect of 21 variants of a custom-tailored Simulated Annealing for a class of scheduling problems is used to illustrate the application of the proposed methods for sample size calculations in the experimental comparison of algorithms.
Tài liệu tham khảo
Barr, R.S., Golden, B.L., Kelly, J.P., Resende, M.G.C., Stewart, W.R.: Designing and reporting on computational experiments with heuristic methods. J. Heuristics 1(1), 9–32 (1995)
Bartroff, J., Lai, T., Shih, M.C.: Sequential Experimentation in Clinical Trials: Design and Analysis. Springer, New York (2013)
Bartz-Beielstein, T.: New Experimentalism Applied to Evolutionary Computation. Ph.D. thesis, Universität Dortmund, Germany (2005)
Bartz-Beielstein, T.: Experimental Research in Evolutionary Computation. Springer, New York (2006)
Bartz-Beielstein, T.: How to create generalizable results. In: Kacprzyk, J., Pedrycz, W. (eds.) Handbook of Computational Intelligence. Springer, New York (2015)
Bartz-Beielstein, T., Chiarandini, M., Paquete, L., Preuss, M.: Experimental Methods for the Analysis of Optimization Algorithms. Springer, New York (2010)
Benavoli, A., Corani, G., Mangili, F., Zaffalon, M., Ruggeri, F.: A Bayesian Wilcoxon signed-rank test based on the Dirichlet process. In: 30th International conference on machine learning, pp. 1026–1034 (2014)
Benavoli, A., Corani, G., Demšar, J., Zaffalon, M.: Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis. J. Mach. Learn. Res. 18(1), 2653–2688 (2017)
Birattari, M.: On the estimation of the expected performance of a metaheuristic on a class of instances: how many instances, how many runs? Technical report IRIDIA/2004-001, Université Libre de Bruxelles, Belgium (2004)
Birattari, M.: Tuning Metaheuristics—A Machine Learning Perspective. Springer, Berlin (2009)
Birattari, M., Dorigo, M.: How to assess and report the performance of a stochastic algorithm on a benchmark problem: mean or best result on a number of runs? Optim. Lett. 1, 309–311 (2007)
Botella, J., Ximénez, C., Revuelta, J., Suero, M.: Optimization of sample size in controlled experiments: the CLAST rule. Behav. Res. Methods 38(1), 65–76 (2006)
Calvo, B., Shir, O.M., Ceberio, J., Doerr, C., Wang, H., Bäck, T., Lozano, J.A.: Bayesian performance analysis for black-box optimization benchmarking. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’19, pp. 1789–1797. ACM (2019)
Campelo, F.: CAISEr: Comparison of Algorithms with Iterative Sample Size Estimation (2019). https://CRAN.R-project.org/package=CAISEr. Package version 1.0.13
Campelo, F., Takahashi, F.: Sample size estimation for power and accuracy in the experimental comparison of algorithms. J. Heuristics 25(2), 305–338 (2019)
Carrano, E.G., Wanner, E.F., Takahashi, R.H.C.: A multicriteria statistical based comparison methodology for evaluating evolutionary algorithms. IEEE Trans. Evol. Comput. 15(6), 848–870 (2011)
Chimani, M., Klein, K.: Algorithm engineering: concepts and practice. In: Bartz-Beielstein, Th, Chiarandini, M., Paquete, L., Preuss, M. (eds.) Experimental Methods for the Analysis of Optimization Algorithms, pp. 131–158. Springer, Berlin (2010)
Coffin, M., Saltzman, M.J.: Statistical analysis of computational tests of algorithms and heuristics. INFORMS J. Comput. 12(1), 24–44 (2000)
Czarn, A., MacNish, C., Vijayan, K., Turlach, B.: Statistical exploratory analysis of genetic algorithms: the importance of interaction. In: Proceedings of the 2004 IEEE Congress on Evolutionary Computation. Institute of Electrical & Electronics Engineers (IEEE) (2004)
del Amo, I.G., Pelta, D.A., González, J.R., Masegosa, A.D.: An algorithm comparison for dynamic optimization problems. Appl. Soft Comput. 12(10), 3176–3192 (2012)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Derrac, J., García, S., Molina, D., Herrera, F.: A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol. Comput. 1(1), 3–18 (2011)
Derrac, J., García, S., Hui, S., Suganthan, P.N., Herrera, F.: Analyzing convergence performance of evolutionary algorithms: a statistical approach. Inf. Sci. 289, 41–58 (2014)
Dunn, O.J.: Multiple comparisons among means. J. Am. Stat. Assoc. 56(293), 52–64 (1961)
Eiben, A., Jelasity, M.: A critical note on experimental research methodology in EC. In: Proceedings of the 2002 IEEE Congress on Evolutionary Computation. Institute of Electrical & Electronics Engineers (IEEE) (2002)
Ellis, P.D.: The Essential Guide to Effect Sizes, 1st edn. Cambridge University Press, Cambridge (2010)
Fieller, E.C.: Some problems in interval estimation. J. R. Stat. Soc. Ser. B (Methodol.) 16(2), 175–185 (1954)
Franz, V.: Ratios: a short guide to confidence limits and proper use (2007). arXiv:0710.2024v1
García, S., Molina, D., Lozano, M., Herrera, F.: A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: a case study on the CEC’2005 Special session on real parameter optimization. J. Heuristics 15(6), 617–644 (2008)
García, S., Fernández, A., Luengo, J., Herrera, F.: A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Comput. 13(10), 959–977 (2009)
García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf. Sci. 180(10), 2044–2064 (2010)
Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge (2006)
Graham, R.L., Lawler, E.L., Lenstra, J.K., Rinnooy Kan, A.H.G.: Optimization and approximation in deterministic sequencing and scheduling: a survey. Ann. Discrete Math. 5, 287–326 (1979)
Hansen, N., Tǔsar, T., Mersmann, O., Auger, A., Brockoff, D.: COCO: the experimental procedure (2016). arXiv:1603.08776
Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6(2), 65–70 (1979)
Hooker, J.N.: Needed: an empirical science of algorithms. Oper. Res. 42(2), 201–212 (1994)
Hooker, J.N.: Testing heuristics: we have it all wrong. J. Heuristics 1(1), 33–42 (1996)
Hurlbert, S.H.: Pseudoreplication and the design of ecological field experiments. Ecol. Monogr. 54(2), 187–211 (1984)
Jain, R.K.: The Art of Computer Systems Performance Analysis. Wiley, New York (1991)
Johnson, D.: A theoretician’s guide to the experimental analysis of algorithms. In: Goldwasser, M., Johnson, D., McGeoch, C. (eds.) Data Structures, Near Neighbor Searches, and Methodology: Fifth and Sixth DIMACS Implementation Challenges, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 59, pp. 215–250. American Mathematical Society, Providence (2002)
Krohling, R.A., Lourenzutti, R., Campos, M.: Ranking and comparing evolutionary algorithms with hellinger-TOPSIS. Appl. Soft Comput. 37, 217–226 (2015)
Kruschke, J.K.: Doing Bayesian Data Analysis: A Tutorial with R and BUGS, 1st edn. Academic Press, Cambridge (2010)
Lawler, E.L., Lenstra, J.K., Rinnooy Kan, A.H., Shmoys, D.B.: Sequencing and scheduling: algorithms and complexity. In: Handbooks in Operations Research and Management Science, chapter 9, vol. 4, pp. 445–522. Elsevier (1993)
Lazic, S.E.: The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis? BMC Neurosci. 11(5), 397–407 (2010)
Lenth, R.V.: Some practical guidelines for effective sample size determination. Am. Stat. 55(3), 187–193 (2001)
Maravilha, A.L., Pereira, L.M., Campelo, F.: Statistical characterization of neighborhood structures for the unrelated parallel machine problem with sequence-dependent setup times (in preparation)
Mathews, P.: Sample Size Calculations: Practical Methods for Engineers and Scientists, 1st edn. Matthews Malnar & Bailey Inc., Painesville (2010)
McGeoch, C.C.: Feature article-toward an experimental method for algorithm simulation. INFORMS J. Comput. 8(1), 1–15 (1996)
Millar, R., Anderson, M.: Remedies for pseudoreplication. Fish. Res. 70, 397–407 (2004)
Montgomery, D.C.: Design and Analysis of Experiments, 8th edn. Wiley, New York (2013)
Montgomery, D.C., Runger, G.C.: Applied Statistics and Probability for Engineers, 6th edn. Wiley, New York (2013)
Pereira, L.M.: Análise de Estruturas de Vizinhança para o Problema de Sequenciamento de Máquinas Paralelas Não Relacionadas com Tempos de Preparação . Master’s thesis, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil (2019). https://ppgee.ufmg.br/defesas/1615M.PDF(in Portuguese)
Ridge, E.: Design of Experiments for the Tuning of Optimisation Algorithms. Ph.D. thesis, The University of York, UK (2007)
Santos, H.G., Toffolo, T.A., Silva, C.L., Berghe, G.V.: Analysis of stochastic local search methods for the unrelated parallel machine scheduling problem. Int. Trans. Oper. Res. (2016). https://doi.org/10.1111/itor.12316
Shaffer, J.P.: Multiple hypothesis testing. Annu. Rev. Psychol. 46(1), 561–584 (1995)
Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures. Taylor & Francis, Milton Park (2011)
Sörensen, K., Sevaux, M., Glover, F.: A history of metaheuristics. In: Martí, R., Pardalos, P.M., Resende, M.G. (eds.) Handbook of Heuristics, pp. 1–18. Springer, New York (2018)
Vallada, E., Ruiz, R.: A genetic algorithm for the unrelated parallel machine scheduling problem with sequence dependent setup times. Eur. J. Oper. Res. 211(3), 612–622 (2011)
Yuan, B., Gallagher, M.: An improved small-sample statistical test for comparing the success rates of evolutionary algorithms. In: Proceedings of the 11th Annual conference on Genetic and evolutionary computation—GECCO09. Association for Computing Machinery (ACM) (2009)
Yuan, B., Gallagher, M.: Statistical racing techniques for improved empirical evaluation of evolutionary algorithms. Parallel Probl. Solving Nat. PPSN VIII 3242, 172–181 (2004)