arXiv Open Access 2020

Empirical Analysis of Zipf's Law, Power Law, and Lognormal Distributions in Medical Discharge Reports

Juan C Quiroz Liliana Laranjo Catalin Tufanaru Ahmet Baki Kocaballi Dana Rezazadegan +2 lainnya
Lihat Sumber

Abstrak

Bayesian modelling and statistical text analysis rely on informed probability priors to encourage good solutions. This paper empirically analyses whether text in medical discharge reports follow Zipf's law, a commonly assumed statistical property of language where word frequency follows a discrete power law distribution. We examined 20,000 medical discharge reports from the MIMIC-III dataset. Methods included splitting the discharge reports into tokens, counting token frequency, fitting power law distributions to the data, and testing whether alternative distributions--lognormal, exponential, stretched exponential, and truncated power law--provided superior fits to the data. Results show that discharge reports are best fit by the truncated power law and lognormal distributions. Our findings suggest that Bayesian modelling and statistical text analysis of discharge report text would benefit from using truncated power law and lognormal probability priors.

Topik & Kata Kunci

Penulis (7)

J

Juan C Quiroz

L

Liliana Laranjo

C

Catalin Tufanaru

A

Ahmet Baki Kocaballi

D

Dana Rezazadegan

S

Shlomo Berkovsky

E

Enrico Coiera

Format Sitasi

Quiroz, J.C., Laranjo, L., Tufanaru, C., Kocaballi, A.B., Rezazadegan, D., Berkovsky, S. et al. (2020). Empirical Analysis of Zipf's Law, Power Law, and Lognormal Distributions in Medical Discharge Reports. https://arxiv.org/abs/2003.13352

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2020
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓