DOAJ Open Access 2007

HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm

Philippe Flajolet Éric Fusy Olivier Gandouet Frédéric Meunier

Lihat Sumber DOI

Abstrak

This extended abstract describes and analyses a near-optimal probabilistic algorithm, HYPERLOGLOG, dedicated to estimating the number of \emphdistinct elements (the cardinality) of very large data ensembles. Using an auxiliary memory of m units (typically, "short bytes''), HYPERLOGLOG performs a single pass over the data and produces an estimate of the cardinality such that the relative accuracy (the standard error) is typically about $1.04/\sqrt{m}$. This improves on the best previously known cardinality estimator, LOGLOG, whose accuracy can be matched by consuming only 64% of the original memory. For instance, the new algorithm makes it possible to estimate cardinalities well beyond $10^9$ with a typical accuracy of 2% while using a memory of only 1.5 kilobytes. The algorithm parallelizes optimally and adapts to the sliding window model.

Topik & Kata Kunci

Mathematics

Penulis (4)

Philippe Flajolet

Éric Fusy

Olivier Gandouet

Frédéric Meunier

Format Sitasi

APA MLA BibTeX

Flajolet, P., Fusy, É., Gandouet, O., Meunier, F. (2007). HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. https://doi.org/10.46298/dmtcs.3545

Akses Cepat

Lihat di Sumber doi.org/10.46298/dmtcs.3545

Informasi Jurnal

Tahun Terbit: 2007
Sumber Database: DOAJ
DOI: 10.46298/dmtcs.3545
Akses: Open Access ✓