arXiv Open Access 2023

Grammar Boosting: A New Technique for Proving Lower Bounds for Computation over Compressed Data

Rajat De Dominik Kempa
Lihat Sumber

Abstrak

Grammar compression is a general compression framework in which a string $T$ of length $N$ is represented as a context-free grammar of size $n$ whose language contains only $T$. In this paper, we focus on studying the limitations of algorithms and data structures operating on strings in grammar-compressed form. Previous work focused on proving lower bounds for grammars constructed using algorithms that achieve the approximation ratio $ρ=\mathcal{O}(\text{polylog }N)$. Unfortunately, for the majority of grammar compressors, $ρ$ is either unknown or satisfies $ρ=ω(\text{polylog }N)$. In their seminal paper, Charikar et al. [IEEE Trans. Inf. Theory 2005] studied seven popular grammar compression algorithms: RePair, Greedy, LongestMatch, Sequential, Bisection, LZ78, and $α$-Balanced. Only one of them ($α$-Balanced) is known to achieve $ρ=\mathcal{O}(\text{polylog }N)$. We develop the first technique for proving lower bounds for data structures and algorithms on grammars that is fully general and does not depend on the approximation ratio $ρ$ of the used grammar compressor. Using this technique, we first prove that $Ω(\log N/\log \log N)$ time is required for random access on RePair, Greedy, LongestMatch, Sequential, and Bisection, while $Ω(\log\log N)$ time is required for random access to LZ78. All these lower bounds hold within space $\mathcal{O}(n\text{ polylog }N)$ and match the existing upper bounds. We also generalize this technique to prove several conditional lower bounds for compressed computation. For example, we prove that unless the Combinatorial $k$-Clique Conjecture fails, there is no combinatorial algorithm for CFG parsing on Bisection (for which it holds $ρ=\tildeΘ(N^{1/2})$) that runs in $\mathcal{O}(n^c\cdot N^{3-ε})$ time for all constants $c>0$ and $ε>0$. Previously, this was known only for $c<2ε$.

Topik & Kata Kunci

Penulis (2)

R

Rajat De

D

Dominik Kempa

Format Sitasi

De, R., Kempa, D. (2023). Grammar Boosting: A New Technique for Proving Lower Bounds for Computation over Compressed Data. https://arxiv.org/abs/2307.08833

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓