Order statistics and estimating cardinalities of massive data sets

Discrete Applied Mathematics - Tập 157 Số 2 - Trang 406-427 - 2009
Frédéric Giroire1,2
1ALGO project, INRIA Rocquencourt, B.P. 105, 78153 Le Chesnay Cedex, France
2MASCOTTE, joint project CNRS-INRIA-UNSA, 2004 Routes des Lucioles, BP 93, F-06902, France

Tóm tắt

Từ khóa


Tài liệu tham khảo

Bar-Yossef, 2002, Counting distinct elements in a data stream, 1

Broder, 1997, On the resemblance and containment of documents, 21

Broder, 2000, Identifying and filtering near-duplicate documents, 1

P. Chassaing, L. Gerin, Efficient estimation of the cardinality of large data sets, in: math.ST/0701347, 2007. Extended abstract in the proceedings of the 4th Colloquium on Mathematics and Computer Science, 2007, pp. 419–422

Considine, 2004, Approximate aggregation techniques for sensor databases, 449

Durand, 2003, Loglog counting of large cardinalities, vol.~2832, 605

Estan, 2006, Bitmap algorithms for counting active flows on high-speed links, IEEE/ACM Trans. Netw., 14, 925, 10.1109/TNET.2006.882836

Flajolet, 1997, Adaptive sampling, vol.~Suppl.~I, 28

P. Flajolet, E. Fusy, O. Gandouet, F. Meunier, Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm, in: Philippe Jacquet (Ed.) Analysis of Algorithm 2007, AofA07, Discrete Mathematics and Theoretical Computer Science Proceedings, 2007 (in press)

Flajolet, 1983, Probabilistic counting, 76

C. Fraleigh, C. Diot, B. Lyles, S. Moon, P. Owezarski, K. Papagiannaki, F. Tobagi, Design and deployment of a passive monitoring infrastructure, in: Passive and Active Measurement Workshop, Amsterdam, April 2001

Fusy, 2007, Estimating the number of active flows in a data stream over a sliding window, 223

L. Getoor, B. Taskar, D. Koller, Selectivity estimation using probabilistic models, in: SIGMOD Conference, 2001

P.B. Gibbons, Distinct sampling for highly-accurate answers to distinct values queries and event reports, VLDB J. (2001) 541–550

Giroire, 2005, Order statistics and estimating cardinalities of massive data sets, 157

F. Giroire, Directions to use probabilistic algorithms for cardinality for DNA analysis, in: Journées Ouvertes Biologie Informatique Mathématiques, JOBIM 2006, pp. 3–5, July 2006

F. Giroire, Réseaux, algorithmique et analyse combinatoire de grands ensembles, Ph.D. Thesis, Université Paris VI, November 2006

G. Iannaccone, C. Diot, I. Graham, N. McKeown, Monitoring very high speed links, in: ACM SIGCOMM Internet Measurement Workshop, San Francisco, CA, November 2001

Knuth, 1973

Whang, 1990, A linear-time probabilistic counting algorithm for database applications, TODS, 15, 208, 10.1145/78922.78925