DOAJ Open Access 2025

Integrating NLP to Enhance Algorithmic Identification of Metastatic and Castration‐Resistant Prostate Cancer in Large Claims‐Based Studies

Shannon R. Stock Joshua A. Parrish Michael T. Burns Jessica L. Janes Justin Waller +5 lainnya

Abstrak

ABSTRACT Purpose Accurate classification of prostate cancer (PC) disease states defined by the presence or absence of metastasis and castration resistance (CRPC) is critical but challenging in population‐based research. As chart review is not feasible on a large scale, accurate automated methods are needed. Methods We conducted a retrospective study using data from the Veterans Affairs Health Care System to evaluate algorithms for identifying CRPC and metastatic PC, with manual chart review as the gold standard. Our analysis included 8336 patients for CRPC classification and 721 for metastatic disease classification. For CRPC classification, we assessed one novel algorithm using criteria including rising prostate‐specific antigen levels or progression to metastatic disease while receiving androgen deprivation therapy or initiating CRPC‐specific treatments. For metastatic disease detection, we assessed four algorithms based on: ICD codes alone, natural language processing (NLP) alone, a novel algorithm combining ICD codes and treatment patterns, and an enhanced version of the novel algorithm integrating NLP, evaluating the sensitivity and specificity of each. Positive and negative predictive values were reported across a range of assumed disease prevalence. Results Out of 8336 patients with PC, 1190 (14.3%) were identified as having CRPC through chart review, with the CRPC algorithm achieving 85.1% sensitivity and 96.1% specificity. Among 721 patients evaluated for metastatic disease, 179 (24.8%) were identified as having metastatic disease through chart review. The algorithm combining ICD codes, treatment patterns, and NLP demonstrated the highest sensitivity (94.4%) and high specificity (93.0%), while other methods had lower sensitivity with varied specificity. Conclusions Our findings suggest that our CRPC algorithm and the combined ICD codes, treatment patterns, and NLP algorithm for metastasis are effective automated approaches for identifying advanced states of PC. In particular, integrating NLP boosted sensitivity for metastatic classification with minimal specificity trade‐off, highlighting the value of a multifaceted approach to large‐scale PC research.

Penulis (10)

S

Shannon R. Stock

J

Joshua A. Parrish

M

Michael T. Burns

J

Jessica L. Janes

J

Justin Waller

A

Amanda M. De Hoedt

S

Sameer Ghate

J

Jeri Kim

I

Irene M. Shui

S

Stephen J. Freedland

Format Sitasi

Stock, S.R., Parrish, J.A., Burns, M.T., Janes, J.L., Waller, J., Hoedt, A.M.D. et al. (2025). Integrating NLP to Enhance Algorithmic Identification of Metastatic and Castration‐Resistant Prostate Cancer in Large Claims‐Based Studies. https://doi.org/10.1002/cam4.71406

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.1002/cam4.71406
Informasi Jurnal
Tahun Terbit
2025
Sumber Database
DOAJ
DOI
10.1002/cam4.71406
Akses
Open Access ✓