DOAJ
Open Access
2006
Efficient estimation of the cardinality of large data sets
Philippe Chassaing
Lucas Gerin
Abstrak
Giroire has recently proposed an algorithm which returns the $\textit{approximate}$ number of distinct elements in a large sequence of words, under strong constraints coming from the analysis of large data bases. His estimation is based on statistical properties of uniform random variables in $[0,1]$. In this note we propose an optimal estimation, using Kullback information and estimation theory.
Topik & Kata Kunci
Penulis (2)
P
Philippe Chassaing
L
Lucas Gerin
Akses Cepat
Informasi Jurnal
- Tahun Terbit
- 2006
- Sumber Database
- DOAJ
- DOI
- 10.46298/dmtcs.3492
- Akses
- Open Access ✓