arXiv Open Access 2025

Large Language Models in Thematic Analysis: Prompt Engineering, Evaluation, and Guidelines for Qualitative Software Engineering Research

Cristina Martinez Montes Robert Feldt Cristina Miguel Martos Sofia Ouhbi Shweta Premanandan +1 lainnya
Lihat Sumber

Abstrak

As artificial intelligence advances, large language models (LLMs) are entering qualitative research workflows, yet no reproducible methods exist for integrating them into established approaches like thematic analysis (TA), one of the most common qualitative methods in software engineering research. Moreover, existing studies lack systematic evaluation of LLM-generated qualitative outputs against established quality criteria. We designed and iteratively refined prompts for Phases 2-5 of Braun and Clarke's reflexive TA, then tested outputs from multiple LLMs against codes and themes produced by experienced researchers. Using 15 interviews on software engineers' well-being, we conducted blind evaluations with four expert evaluators who applied rubrics derived directly from Braun and Clarke's quality criteria. Evaluators preferred LLM-generated codes 61% of the time, finding them analytically useful for answering the research question. However, evaluators also identified limitations: LLMs fragmented data unnecessarily, missed latent interpretations, and sometimes produced themes with unclear boundaries. Our contributions are threefold. First, a reproducible approach integrating refined, documented prompts with an evaluation framework to operationalize Braun and Clarke's reflexive TA. Second, an empirical comparison of LLM- and human-generated codes and themes in software engineering data. Third, guidelines for integrating LLMs into qualitative analysis while preserving methodological rigour, clarifying when and how LLMs can assist effectively and when human interpretation remains essential.

Topik & Kata Kunci

Penulis (6)

C

Cristina Martinez Montes

R

Robert Feldt

C

Cristina Miguel Martos

S

Sofia Ouhbi

S

Shweta Premanandan

D

Daniel Graziotin

Format Sitasi

Montes, C.M., Feldt, R., Martos, C.M., Ouhbi, S., Premanandan, S., Graziotin, D. (2025). Large Language Models in Thematic Analysis: Prompt Engineering, Evaluation, and Guidelines for Qualitative Software Engineering Research. https://arxiv.org/abs/2510.18456

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓