arXiv Open Access 2020

Grammar compression with probabilistic context-free grammar

Hiroaki Naganuma Diptarama Hendrian Ryo Yoshinaka Ayumi Shinohara Naoki Kobayashi
Lihat Sumber

Abstrak

We propose a new approach for universal lossless text compression, based on grammar compression. In the literature, a target string $T$ has been compressed as a context-free grammar $G$ in Chomsky normal form satisfying $L(G) = \{T\}$. Such a grammar is often called a \emph{straight-line program} (SLP). In this paper, we consider a probabilistic grammar $G$ that generates $T$, but not necessarily as a unique element of $L(G)$. In order to recover the original text $T$ unambiguously, we keep both the grammar $G$ and the derivation tree of $T$ from the start symbol in $G$, in compressed form. We show some simple evidence that our proposal is indeed more efficient than SLPs for certain texts, both from theoretical and practical points of view.

Topik & Kata Kunci

Penulis (5)

H

Hiroaki Naganuma

D

Diptarama Hendrian

R

Ryo Yoshinaka

A

Ayumi Shinohara

N

Naoki Kobayashi

Format Sitasi

Naganuma, H., Hendrian, D., Yoshinaka, R., Shinohara, A., Kobayashi, N. (2020). Grammar compression with probabilistic context-free grammar. https://arxiv.org/abs/2003.08097

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2020
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓