arXiv Open Access 2020

Enabling Collaborative Data Science Development with the Ballet Framework

Micah J. Smith Jürgen Cito Kelvin Lu Kalyan Veeramachaneni
Lihat Sumber

Abstrak

While the open-source software development model has led to successful large-scale collaborations in building software systems, data science projects are frequently developed by individuals or small teams. We describe challenges to scaling data science collaborations and present a conceptual framework and ML programming model to address them. We instantiate these ideas in Ballet, a lightweight framework for collaborative, open-source data science through a focus on feature engineering, and an accompanying cloud-based development environment. Using our framework, collaborators incrementally propose feature definitions to a repository which are each subjected to an ML performance evaluation and can be automatically merged into an executable feature engineering pipeline. We leverage Ballet to conduct a case study analysis of an income prediction problem with 27 collaborators, and discuss implications for future designers of collaborative projects.

Topik & Kata Kunci

Penulis (4)

M

Micah J. Smith

J

Jürgen Cito

K

Kelvin Lu

K

Kalyan Veeramachaneni

Format Sitasi

Smith, M.J., Cito, J., Lu, K., Veeramachaneni, K. (2020). Enabling Collaborative Data Science Development with the Ballet Framework. https://arxiv.org/abs/2012.07816

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2020
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓