arXiv Open Access 2025

Automated Planning for Optimal Data Pipeline Instantiation

Leonardo Rosa Amado Adriano Vogel Dalvan Griebler Gabriel Paludo Licks Eric Simon +1 lainnya
Lihat Sumber

Abstrak

Data pipeline frameworks provide abstractions for implementing sequences of data-intensive transformation operators, automating the deployment and execution of such transformations in a cluster. Deploying a data pipeline, however, requires computing resources to be allocated in a data center, ideally minimizing the overhead for communicating data and executing operators in the pipeline while considering each operator's execution requirements. In this paper, we model the problem of optimal data pipeline deployment as planning with action costs, where we propose heuristics aiming to minimize total execution time. Experimental results indicate that the heuristics can outperform the baseline deployment and that a heuristic based on connections outperforms other strategies.

Topik & Kata Kunci

Penulis (6)

L

Leonardo Rosa Amado

A

Adriano Vogel

D

Dalvan Griebler

G

Gabriel Paludo Licks

E

Eric Simon

F

Felipe Meneguzzi

Format Sitasi

Amado, L.R., Vogel, A., Griebler, D., Licks, G.P., Simon, E., Meneguzzi, F. (2025). Automated Planning for Optimal Data Pipeline Instantiation. https://arxiv.org/abs/2503.12626

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓