A multi-colony ant algorithm for optimizing join queries in distributed database systems

Knowledge and Information Systems - Tập 39 - Trang 175-206 - 2013
Ladan Golshanara1, Seyed Mohammad Taghi Rouhani Rankoohi1, Hamed Shah-Hosseini1
1Faculty of Electrical and Computer Engineering, Shahid Beheshti University, G.C., Tehran, Iran

Tóm tắt

Distributed database systems provide a new data processing and storage technology for decentralized organizations of today. Query optimization, the process to generate an optimal execution plan for the posed query, is more challenging in such systems due to the huge search space of alternative plans incurred by distribution. As finding an optimal execution plan is computationally intractable, using stochastic-based algorithms has drawn the attention of most researchers. In this paper, for the first time, a multi-colony ant algorithm is proposed for optimizing join queries in a distributed environment where relations can be replicated but not fragmented. In the proposed algorithm, four types of ants collaborate to create an execution plan. Hence, there are four ant colonies in each iteration. Each type of ant makes an important decision to find the optimal plan. In order to evaluate the quality of the generated plan, two cost models are used—one based on the total time and the other on the response time. The proposed algorithm is compared with two previous genetic-based algorithms on chain, tree and cyclic queries. The experimental results show that the proposed algorithm saves up to about 80 % of optimization time with no significant difference in the quality of generated plans compared with the best existing genetic-based algorithm.

Tài liệu tham khảo

Alamery M, Faraahi A, Javadi H et al (2010) Multi-join query optimization using the bees algorithm. In: De Leon F, De Carvalho A, Rodríguez-González S, De Paz Santana J, Rodríguez J (eds) Distributed computing and artificial intelligence. Springer, Berlin, pp 449–457. doi:10.1007/978-3-642-14883-5_58 Aljanaby A, Abuelrub E, Odeh M (2005) A survey of distributed query processing. Int Arab J Inf Technol 2:48–57 Babb E (1979) Implementing a relational database by means of specialized hardware. ACM Trans Database Syst 4:1–29. doi:10.1145/320064.320065 Bernstein PA, Chiu D-MW (1981a) Using semi-joins to solve relational queries. J. ACM 28:25–40. doi:10.1145/322234.322238 Bernstein PA, Goodman N, Wong E et al (1981b) Query processing in a system for distributed databases (SDD-1). ACM Trans Database Syst 6:602–625. doi:10.1145/319628.319650 Bonabeau E, Dorigo M, Theraulaz G (1999) Swarm intelligence: from natural to artificial systems. Oxford University Press Inc., Oxford Chen MS, Yu PS (1993) Combining join and semi-join operations for distributed query processing. IEEE Trans Knowl Data Eng 5:534–542 Connolly TM, Begg CE (2005) Database systems: a practical approach to design, implementation and management, 4th edn. Addison-Wesley, USA Date CJ (2004) An introduction to database systems, 8th edn. Addison-Wesley, USA Dorigo M, Maniezzo V, Colorni A (1996) Ant system: optimization by a colony of cooperating agents. IEEE Trans Syst Man Cybern Part B Cybern 26:29–41 Dorigo M, Stuzle T (2004) Ant colony optimization. MIT Press, Cambridge Garcia-Molina H, Ullman JD, Jennifer W (2002) Database systems, the complete book. Prentice Hall, USA Goli M, Rouhani Rankoohi SMT (2011) A new vertical fragmentation algorithm based on ant collective behavior in distributed database systems. Knowl Inf Syst 30:435–455. doi:10.1007/s10115-011-0384-6 Gorla N, Song S-K (2010) Subquery allocations in distributed databases using genetic algorithms. J Comput Sci Technol 10:31–37 Hameurlain A (2009) Evolution of query optimization methods: from centralized database systems to data grid systems. Proceedings of the 20th international conference on database and expert systems applications. Springer-Verlag, Linz, Austria Ibaraki T, Kameda T (1984) On the optimal nesting order for computing \(N\)-relational joins. ACM Trans Database Syst 9:482–502. doi: 10.1145/1270.1498 Ioannidis YE (1996) Query optimization. ACM Comput Surv 28:121–123 Karimi Adl R, Rouhani Rankoohi SMT (2009) A new ant colony optimization based algorithm for data allocation problem in distributed databases. Knowl Inf Syst 20:349–373. doi:10.1007/s10115-008-0182-y Kossmann D (2000) The state of the art in distributed query processing. ACM Comput Surv 32:422–469. doi:10.1145/371578.371598 Lanzelotte RSG, Valduriez P, ZaïT M et al (1994) Industrial-strength parallel query optimization: Issues and lessons. Inf Syst 19:311–330. doi:10.1016/0306-4379(94)90017-5 Li N, Liu Y, Dong Y et al (2008) Application of ant colony optimization algorithm to multi-join query optimization. Proceedings of the 3rd international symposium on advances in computation and intelligence. Springer-Verlag, Wuhan, China Li Z, Ross KA (1995) PERF join: an alternative to two-way semijoin and bloomjoin. Proceedings of the fourth international conference on information and knowledge management, Baltimore, Maryland, United States Narendra KS, Thathachar MAL (1974) Learning automata-a survey. IEEE Trans Syst Man Cybern SMC–4:323–334 Ozsu MT, Valduriez P (2011) Principles of distributed database systems, 3rd edn. Springer, USA Rho S, March ST (1997) Optimizing distributed join queries: A genetic algorithm approach. Ann Oper Res 71:199–228. doi:10.1023/a:1018967414664 Ribeiro CC, Ribeiro CD, Lanzelotte RSG (1997) Query optimization in distributed relational databases. J Heuristics 3:5–23. doi:10.1023/a:1009670031749 Roussopoulos N, Kang H (1991) A pipeline \(N\)-way join algorithm based on the 2-way semijoin program. IEEE Trans Knowl Data Eng 3:486–495 Sevinç E, CosşAr A (2011) An evolutionary genetic algorithm for optimization of distributed database queries. Comput J 54:717–725 Shah-Hosseini H (2009) The intelligent water drops algorithm: a nature inspired swarm-based optimization algorithm. Int J Bio-Inspired Comput 1:71–79 Shahabi C, Khan L, Mcleod D (2000) A probe-based technique to optimize join queries in distributed internet databases. Knowl Inf Syst 2:373–385 St T, Tzle Hoos HH (2000) MAX-MIN ant system. Future Gener Comput Syst 16:889–914 Unnava V (1992) Query processing in distributed database. PhD Thesis, The Ohio state university Yu CT, Chang CC (1984) Distributed query processing. ACM Comput Surv 16:399–433 Yu CT, Guh KC, Brill D et al (1989) Partition strategy for distributed query processing in fast local networks. IEEE Trans Softw Eng 15:780–793 Yu CT, Keh-Chang G, Weining Z et al (1987) Algorithms to process distributed queries in fast local networks. IEEE Trans Comput C–36:1153–1164 Yu MJ, Sheu PCY (1997) Adaptive join algorithms in dynamic distributed databases. Distrib Parallel Databases 5:5–30