arXiv Open Access 2024

Cheap Ways of Extracting Clinical Markers from Texts

Anastasia Sandu Teodor Mihailescu Sergiu Nisioi

Lihat Sumber

Abstrak

This paper describes the work of the UniBuc Archaeology team for CLPsych's 2024 Shared Task, which involved finding evidence within the text supporting the assigned suicide risk level. Two types of evidence were required: highlights (extracting relevant spans within the text) and summaries (aggregating evidence into a synthesis). Our work focuses on evaluating Large Language Models (LLM) as opposed to an alternative method that is much more memory and resource efficient. The first approach employs a good old-fashioned machine learning (GOML) pipeline consisting of a tf-idf vectorizer with a logistic regression classifier, whose representative features are used to extract relevant highlights. The second, more resource intensive, uses an LLM for generating the summaries and is guided by chain-of-thought to provide sequences of text indicating clinical markers.

Topik & Kata Kunci

cs.CL cs.LG

Penulis (3)

Anastasia Sandu

Teodor Mihailescu

Sergiu Nisioi

Format Sitasi

APA MLA BibTeX

Sandu, A., Mihailescu, T., Nisioi, S. (2024). Cheap Ways of Extracting Clinical Markers from Texts. https://arxiv.org/abs/2403.11227

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓