Top-k queries over web applications

The VLDB Journal - Tập 22 - Trang 519-542 - 2012
Daniel Deutch1, Tova Milo2, Neoklis Polyzotis3
1Ben Gurion University Beer, Sheva, Israel
2Tel Aviv University Tel Aviv, Israel
3UC Santa Cruz, Santa Cruz, USA

Tóm tắt

The core logic of web applications that suggest some particular service, such as online shopping, e-commerce etc., is typically captured by Business Processes (BPs). Among all the (maybe infinitely many) possible execution flows of a BP, analysts are often interested in identifying flows that are “most important”, according to some weight metric. The goal of the present paper is to provide efficient algorithms for top-k query evaluation over the possible executions of Business Processes, under some given weight function. Unique difficulties in top-k analysis in this settings stem from (1) the fact that the number of possible execution flows of a given BP is typically very large, or even infinite in presence of recursion and (2) that the weights (e.g., likelihood, monetary cost, etc.) induced by actions performed during the execution (e.g., product purchase) may be inter-dependent (due to probabilistic dependencies, combined discount deals etc.). We exemplify these difficulties, and overcome them to provide efficient algorithms for query evaluation where possible. We also describe in details an application prototype that we have developed for recommending optimal navigation in an online shopping web site that is based on our model and algorithms.

Tài liệu tham khảo

Abiteboul, S., Senellart, P.: Querying and updating probabilistic information in xml. In: Proceedings of EDBT (2006) Akbarinia, R., Pacitti, E., Valduriez, P.: Best position algorithms for top-k queries. In: Proceedings of VLDB (2007) Bast, H., Majumdar, D., Schenkel, R., Theobald, M., Weikum, G.: Io-top-k: index-access optimized top-k query processing. In: Proceedings of VLDB (2006) Beeri, C., Eyal, A., Kamenkovich, S., Milo, T.: Querying business processes. In: Proceedings of VLDB (2006) Benedikt, M., Kharlamov, E., Olteanu, D., Senellart, P.: Probabilistic xml via markov chains. PVLDB 3(1), 770–781 (2010) Business Process Execution Language for Web Services. http://www.ibm.com/developerworks/library/ws-bpel/ Cohen, S., Kimelfeld, B.: Querying parse trees of stochastic context-free grammars. In: ICDT, pp. 62–75 (2010) Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: Proceedings of VLDB (2004) Dechter, R., Pearl, J.: Generalized best-first search strategies and the optimality of a*. J, ACM 32(3) (1985) Deutch, D., Milo, T.: Type inference and type checking for queries on execution traces. In: Proceedings of VLDB (2008) Deutch, D., Milo, T.: Evaluating top-k queries over business processes. In: ICDE, pp. 1195–1198 (2009) Deutch, D., Milo, T.: Top-k projection queries for probabilistic business processes. In: Proceedings of ICDT (2009) Deutch, D., Milo, T., Polyzotis, N., Yam, T.: Optimal top-k query evaluation for weighted business processes. PVLDB 3(1), 940–951 (2010) Deutch, D., Milo, T., Yam, T.: Goal-oriented web-site navigation for on-line shoppers (demonstration). PVLDB 2(2) (2009) Ding, B., Xu Yu, J., Wang, S., Qin, L., Zhang, X, Lin, X.: Finding top-k min-cost connected trees in databases. In: ICDE, pp. 836–845 (2007) Ebay. http://www.ebay.com/ Eppstein, D.: Finding the \(k\) shortest paths. In: Proceedings of 35th Symposium Foundations of Computer Science. IEEE, pp. 154–165 (1994) Etessami, K., Yannakakis, M.: Algorithmic verification of recursive probabilistic state machines. In: Proceedings of TACAS (2005) Etessami, K., Yannakakis, M.: Recursive markov chains, stochastic grammars, and monotone systems of nonlinear equations. JACM 56(1) (2009) Fagin, R., Kumar, R., Sivakumar, D.: Comparing top-k lists. In: Proceedings of SODA (2003) Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4) (2003) Friedman, N., Getoor, L., Koller, D., Pfeffer, A.: Learning probabilistic relational models. In: Proceedings of IJCAI (1999) Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4) (2008) Jones, T.: Estimating Software Costs. McGraw-Hill, New York (2007) Kemeny, J.G., Snell, J.L.: Finite Markov Chains. Springer, Berlin (1976) Kimelfeld, B., Sagiv, Y.: Finding and approximating top-k answers in keyword proximity search. In: Proceedings of PODS, pp. 173–182 (2006) Kimelfeld, B., Sagiv, Y.: Matching twigs in probabilistic xml. In: Proceedings of VLDB (2007) Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: ACL (2003) Koudas, N., Srivastava, D.: Data stream query processing: a tutorial. In: Proceedings of VLDB (2003) Lary, K., Young, S.J.: The estimation of stochastic context-free grammars using the inside-outside algrithm. Comput. Speech Lang. 4, 35–56 (1990) Nevsetvril, J., de Mendez, P.O.: Tree-depth, subgraph coloring and homomorphism. Eur. J. Comb. 27(6) (2006) Oates, T., Doshi, S., Huang, F.: Estimating maximum likelihood parameters for stochastic context-free graph grammars. In: Proceedings of ILP (2003) Pirolli, P.L.T., Pitkow, J.E.: Distributions of surfers’ paths through the world wide web: empirical characterizations. World Wide Web 2(1–2) (1999) Re, C., Dalvi, N.N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: Proceedings of ICDE (2007) Read, R.C., Tarjan, R.E.: Bounds on backtrack algorithms for listing cycles, paths, and spanning trees. Networks 5 (1975) Sanghai, S., Domingos, P., Weld, D.: Dynamic probabilistic relational models. In: Proceedings of IJCAI (2003) Yahoo! shopping. http://shopping.yahoo.com/