arXiv Open Access 2025

AI-Driven Generation of Data Contracts in Modern Data Engineering Systems

Harshraj Bhoite
Lihat Sumber

Abstrak

Data contracts formalize agreements between data producers and consumers regarding schema, semantics, and quality expectations. As data pipelines grow in complexity, manual authoring and maintenance of contracts becomes error-prone and labor-intensive. We present an AI-driven framework for automatic data contract generation using large language models (LLMs). Our system leverages parameter-efficient fine-tuning methods, including LoRA and PEFT, to adapt LLMs to structured data domains. The models take sample data or schema descriptions and output validated contract definitions in formats such as JSON Schema and Avro. We integrate this framework into modern data platforms (e.g., Databricks, Snowflake) to automate contract enforcement at scale. Experimental results on synthetic and real-world datasets demonstrate that the fine-tuned LLMs achieve high accuracy in generating valid contracts and reduce manual workload by over 70%. We also discuss key challenges such as hallucination, version control, and the need for continuous learning. This work demonstrates that generative AI can enable scalable, agile data governance by bridging the gap between intent and implementation in enterprise data management.

Topik & Kata Kunci

Penulis (1)

H

Harshraj Bhoite

Format Sitasi

Bhoite, H. (2025). AI-Driven Generation of Data Contracts in Modern Data Engineering Systems. https://arxiv.org/abs/2507.21056

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓