DOAJ Open Access 2006

Efficient estimation of the cardinality of large data sets

Philippe Chassaing Lucas Gerin

Lihat Sumber DOI

Abstrak

Giroire has recently proposed an algorithm which returns the $\textit{approximate}$ number of distinct elements in a large sequence of words, under strong constraints coming from the analysis of large data bases. His estimation is based on statistical properties of uniform random variables in $[0,1]$. In this note we propose an optimal estimation, using Kullback information and estimation theory.

Topik & Kata Kunci

Mathematics

Penulis (2)

Philippe Chassaing

Lucas Gerin

Format Sitasi

APA MLA BibTeX

Chassaing, P., Gerin, L. (2006). Efficient estimation of the cardinality of large data sets. https://doi.org/10.46298/dmtcs.3492

Akses Cepat

Lihat di Sumber doi.org/10.46298/dmtcs.3492

Informasi Jurnal

Tahun Terbit: 2006
Sumber Database: DOAJ
DOI: 10.46298/dmtcs.3492
Akses: Open Access ✓