arXiv Open Access 2023

Adding Domain Knowledge to Query-Driven Learned Databases

Peizhi Wu Ryan Marcus Zachary G. Ives
Lihat Sumber

Abstrak

In recent years, \emph{learned cardinality estimation} has emerged as an alternative to traditional query optimization methods: by training machine learning models over observed query performance, learned cardinality estimation techniques can accurately predict query cardinalities and costs -- accounting for skew, correlated predicates, and many other factors that traditional methods struggle to capture. However, query-driven learned cardinality estimators are dependent on sample workloads, requiring vast amounts of labeled queries. Further, we show that state-of-the-art query-driven techniques can make significant and unpredictable errors on queries that are outside the distribution of their training set. We show that these out-of-distribution errors can be mitigated by incorporating the \emph{domain knowledge} used in traditional query optimizers: \emph{constraints} on values and cardinalities (e.g., based on key-foreign-key relationships, range predicates, and more generally on inclusion and functional dependencies). We develop methods for \emph{semi-supervised} query-driven learned query optimization, based on constraints, and we experimentally demonstrate that such techniques can increase a learned query optimizer's accuracy in cardinality estimation, reduce the reliance on massive labeled queries, and improve the robustness of query end-to-end performance.

Topik & Kata Kunci

Penulis (3)

P

Peizhi Wu

R

Ryan Marcus

Z

Zachary G. Ives

Format Sitasi

Wu, P., Marcus, R., Ives, Z.G. (2023). Adding Domain Knowledge to Query-Driven Learned Databases. https://arxiv.org/abs/2312.01025

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓