arXiv Open Access 2023

LitSumm: Large language models for literature summarisation of non-coding RNAs

Andrew Green Carlos Ribas Nancy Ontiveros-Palacios Sam Griffiths-Jones Anton I. Petrov +2 lainnya
Lihat Sumber

Abstrak

Curation of literature in life sciences is a growing challenge. The continued increase in the rate of publication, coupled with the relatively fixed number of curators worldwide presents a major challenge to developers of biomedical knowledgebases. Very few knowledgebases have resources to scale to the whole relevant literature and all have to prioritise their efforts. In this work, we take a first step to alleviating the lack of curator time in RNA science by generating summaries of literature for non-coding RNAs using large language models (LLMs). We demonstrate that high-quality, factually accurate summaries with accurate references can be automatically generated from the literature using a commercial LLM and a chain of prompts and checks. Manual assessment was carried out for a subset of summaries, with the majority being rated extremely high quality. We apply our tool to a selection of over 4,600 ncRNAs and make the generated summaries available via the RNAcentral resource. We conclude that automated literature summarization is feasible with the current generation of LLMs, provided careful prompting and automated checking are applied.

Topik & Kata Kunci

Penulis (7)

A

Andrew Green

C

Carlos Ribas

N

Nancy Ontiveros-Palacios

S

Sam Griffiths-Jones

A

Anton I. Petrov

A

Alex Bateman

B

Blake Sweeney

Format Sitasi

Green, A., Ribas, C., Ontiveros-Palacios, N., Griffiths-Jones, S., Petrov, A.I., Bateman, A. et al. (2023). LitSumm: Large language models for literature summarisation of non-coding RNAs. https://arxiv.org/abs/2311.03056

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓