Semantic Scholar Open Access 2025 2 sitasi

BR-TaxQA-R: A Dataset for Question Answering with References for Brazilian Personal Income Tax Law, including case law

Juvenal Domingos J'unior Augusto Faria Eduardo Seiti de Oliveira Erick de Brito Matheus Teotonio +4 lainnya

Abstrak

This paper presents BR-TaxQA-R, a novel dataset designed to support question answering with references in the context of Brazilian personal income tax law. The dataset contains 715 questions from the 2024 official Q\&A document published by Brazil's Internal Revenue Service, enriched with statutory norms and administrative rulings from the Conselho Administrativo de Recursos Fiscais (CARF). We implement a Retrieval-Augmented Generation (RAG) pipeline using OpenAI embeddings for searching and GPT-4o-mini for answer generation. We compare different text segmentation strategies and benchmark our system against commercial tools such as ChatGPT and Perplexity.ai using RAGAS-based metrics. Results show that our custom RAG pipeline outperforms commercial systems in Response Relevancy, indicating stronger alignment with user queries, while commercial models achieve higher scores in Factual Correctness and fluency. These findings highlight a trade-off between legally grounded generation and linguistic fluency. Crucially, we argue that human expert evaluation remains essential to ensure the legal validity of AI-generated answers in high-stakes domains such as taxation. BR-TaxQA-R is publicly available at https://huggingface.co/datasets/unicamp-dl/BR-TaxQA-R.

Topik & Kata Kunci

Penulis (9)

J

Juvenal Domingos J'unior

A

Augusto Faria

E

Eduardo Seiti de Oliveira

E

Erick de Brito

M

Matheus Teotonio

A

Andre Assumpcao

D

D. Carmo

R

Roberto A. Lotufo

J

Jayr Pereira

Format Sitasi

J'unior, J.D., Faria, A., Oliveira, E.S.d., Brito, E.d., Teotonio, M., Assumpcao, A. et al. (2025). BR-TaxQA-R: A Dataset for Question Answering with References for Brazilian Personal Income Tax Law, including case law. https://doi.org/10.48550/arXiv.2505.15916

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.48550/arXiv.2505.15916
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Total Sitasi
Sumber Database
Semantic Scholar
DOI
10.48550/arXiv.2505.15916
Akses
Open Access ✓