Babcock, B., Datar, M., Motwani, R.: Sampling from a moving window over streaming data. In: Proc. SODA, pp. 633–634 (2002)
Brown P., Haas P., Myllymaki J., Pirahesh H., Reinwald B. and Sismanis Y. (2005). Toward automated large-scale information integration and discovery. In: Härder, T. and Lehner, W. (eds) Data Management in a Connected World, pp 161–180. Springer, Heidelberg
Brown, P., Haas, P.J.: BHUNT: automatic discovery of fuzzy algebraic constraints in relational data. In: Proc. VLDB, pp. 668–679 (2003)
Brown, P.G., Haas, P.J.: Techniques for warehousing of sample data. In: Proc. ICDE (2006)
Chaudhuri, S., Motwani, R., Narasayya, V.R.: On random sampling over joins. In: Proc. ACM SIGMOD, pp. 263–274 (1999)
Colt Library: Open source libraries for high performance scientific and technical computing in Java. http://dsd.lbl.gov/ hoschek/colt/
Cormode, G., Muthukrishnan, S., Rozenbaum, I.: Summarizing and mining inverse distributions on data streams via dynamic inverse sampling. In: Proc. VLDB, pp. 25–36 (2005)
Fan C., Muller M. and Rezucha I. (1962). Development of sampling plans by using sequential (item by item) techniques and digital computers. J. Am. Statist. Assoc. 57: 387–402
Frahling, G., Indyk, P., Sohler, C.: Sampling in dynamic data streams and applications. In: Proc. 21st Symp. Computat. Geom., pp. 142–149 (2005)
Gemulla, R., Lehner, W.: Deferred maintenance of disk-based random samples. In: Proc. EDBT, pp. 423–441 (2006)
Gemulla, R., Lehner, W., Haas, P.J.: A dip in the reservoir: Maintaining sample synopses of evolving datasets. In: Proc. VLDB, pp. 595–606 (2006)
Gemulla, R., Lehner, W., Haas, P.J.: Maintaining Bernoulli samples over evolving multisets. In: Proc. ACM PODS, pp. 93–102 (2007)
Gibbons P., Matias Y. and Poosala V. (1997). AQUA project white paper. Tech. rep., Bell Laboratories, Murray Hill
Gibbons, P.B., Matias, Y.: New sampling-based summary statistics for improving approximate query answers. In: Proc. ACM SIGMOD, pp. 331–342 (1998)
Gibbons P.B., Matias Y. and Poosala V. (2002). Fast incremental maintenance of approximate histograms. ACM Trans. Database Syst. 27: 182–184
GSL: GNU Scientific Library. http://www.gnu.org/software/gsl/
Haas, P., König, C.: A bi-level Bernoulli scheme for database sampling. In: Proc. ACM SIGMOD, pp. 275–286 (2004)
Haas, P.J.: Data stream sampling: Basic techniques and results. In: Garofalakis, M., Gehrke, J., Rastogi, R. (eds.) Data Stream Management: Processing High Speed Data Streams, Springer, Heidelberg (2007)
Halevy, A.Y., Etzioni, O., Doan, A., Ives, Z.G., Madhavan, J., McDowell, L., Tatarinov, I.: Join synopses for approximate query answering. In: Proc. CIDR (2003)
Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online aggregation. In: Proc. ACM SIGMOD, pp. 171–182 (1997)
IBM Corporation: WebSphere Profile Stage User’s Manual (2005)
Ilyas, I.F., Markl, V., Haas, P.J., Brown, P., Aboulnaga, A.: CORDS: automatic discovery of correlations and soft functional dependencies. In: Proc. ACM SIGMOD, pp. 647–658 (2004)
Jermaine, C., Pol, A., Arumugam, S.: Online maintenance of very large random samples. In: Proc. ACM SIGMOD, pp. 299–310 (2004)
John, G.H., Langley, P.: Static versus dynamic sampling for data mining. In: Proc. KDD, pp. 367–370 (2005)
Johnson N.L., Kotz S. and Kemp A.W. (1992). Discrete Univariate Distributions, 2nd edn. Wiley, New York
Kachitvichyanukul V. and Schmeiser B. (1985). Computer generation of hypergeometric random variables. J. Stat. Comput. Simul 22: 127–145
Kivinen, J., Mannila, H.: The power of sampling in knowledge discovery. In: Proc. ACM PODS, pp. 77–85 (1994)
Knuth, D.E.: The Art of Computer Programming, vol. 2: Seminumerical Algorithms, 1st edn. Addison-Wesley, Reading (1969)
Law A.M. (2007). Simulation Modeling and Analysis, 4th edn. McGraw-Hill, New York
L’Ecuyer P. (2006). Uniform random number generation. In: Henderson, S.G. and Nelson, B.L. (eds) Simulation, pp 55–81. Elsevier, Amsterdam
Leser, U., Naumann, F.: (Almost) hands-off information integration for the life sciences. In: Proc. CIDR, pp. 131–143 (2005)
Matsumoto M. and Nishimura T. (1998). Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans. Model. Comput. Simul. 8(1): 3–30
McLeod A.I. and Bellhouse D.R. (1983). A convenient algorithm for drawing a simple random sample. Appl. Statist. 32: 182–184
Norris J.R. (1997). Markov Chains. Cambridge University Press, Cambridge
Olken, F.: Random sampling from databases. Thesis LBL-32883, Information and Computing Sciences Division, Lawrence Berkeley National Laboratory (1993)
Olken, F., Rotem, D.: Maintenance of materialized views of sampling queries. In: Proc. ICDE (1992)
Poosala, V., Haas, P.J., Ioannidis, Y.E., Shekita, E.J.: Improved histograms for selectivity estimation of range predicates. In: Proc. ACM SIGMOD, pp. 294–305 (1996)
Press W.H., Teukolsky S.A., Vetterling W.T. and Flannery B.P. (1992). Numerical Recipes in C, 2nd edn. Cambridge University Press, Cambridge
Robbins H. and Monro S. (1951). A stochastic approximation method. Ann. Math. Statist. 22: 400–407
Ross S.M. (1983). Stochastic Processes. Wiley, New York
Särndal C.E., Swensson B. and Wretman J. (1992). Model Assisted Survey Sampling. Springer, Heidelberg
Spall J.C. (2003). Introduction to Stochastic Search and Optimization. Wiley, New York
Tatbul, N., Çetintemel, U., Zdonik, S.B., Cherniack, M., Stonebraker, M.: Load shedding in a data stream manager. In: Proc. VLDB, pp. 309–320 (2003)
Vitter J.S. (1984). Faster methods for random sampling. Commun. ACM 27(7): 703–718
Vitter J.S. (1985). Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1): 37–57
Zechner, H.: Efficient sampling from continuous and discrete distributions. Ph.D. thesis, Technical University Graz (1997)