arXiv Open Access 2025

Zerrow: True Zero-Copy Arrow Pipelines in Bauplan

Yifan Dai Jacopo Tagliabue Andrea Arpaci-Dusseau Remzi Arpaci-Dusseau Tyler R. Caraza-Harter
Lihat Sumber

Abstrak

Bauplan is a FaaS-based lakehouse specifically built for data pipelines: its execution engine uses Apache Arrow for data passing between the nodes in the DAG. While Arrow is known as the "zero copy format", in practice, limited Linux kernel support for shared memory makes it difficult to avoid copying entirely. In this work, we introduce several new techniques to eliminate nearly all copying from pipelines: in particular, we implement a new kernel module that performs de-anonymization, thus eliminating a copy to intermediate data. We conclude by sharing our preliminary evaluation on different workloads types, as well as discussing our plan for future improvements.

Topik & Kata Kunci

Penulis (5)

Y

Yifan Dai

J

Jacopo Tagliabue

A

Andrea Arpaci-Dusseau

R

Remzi Arpaci-Dusseau

T

Tyler R. Caraza-Harter

Format Sitasi

Dai, Y., Tagliabue, J., Arpaci-Dusseau, A., Arpaci-Dusseau, R., Caraza-Harter, T.R. (2025). Zerrow: True Zero-Copy Arrow Pipelines in Bauplan. https://arxiv.org/abs/2504.06151

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓