DOAJ Open Access 2026

Benchmarking web-based in-silico toxicity prediction tools using gold-standard datasets across five key endpoints

Anirudh R. Urs Varshini Ganesan Selvi Ananya Sreekumar Ananya Sudarsan Prutha V. Murthy +2 lainnya

Abstrak

Abstract In silico toxicity prediction tools have become indispensable in early drug development for assessing safety risks. However, their reported predictive performance is rarely evaluated against independent experimental datasets. In this study, we systematically benchmark four widely used free, web-based toxicity predictors– ProTox, pkCSM, ADMETLab, and vNN-ADMET across experimentally validated hepatotoxicity, cardiotoxicity (hERG inhibition), nephrotoxicity, blood–brain barrier (BBB) permeability, and mutagenicity (AMES) endpoints using gold-standard datasets, including DILIrank, hERG Central, DIRIL, B3DB, and the ISSTOX Chemical Toxicity database. Tool-reported endpoint-specific performance metrics were first analyzed and subsequently compared against externally benchmarked predictions generated on independent compound sets. Model performance was evaluated using accuracy, precision, recall, F1 score, specificity, and Matthews Correlation Coefficient (MCC). Our results reveal pronounced discrepancies between tool-reported and benchmarked performance across multiple endpoints, indicating limited generalizability of several models beyond their original training and validation domains. While ProTox and vNN-ADMET demonstrated strong reported performance across endpoints, only mutagenicity predictions remained consistently robust under benchmarking conditions (F1 > 0.89; MCC > 0.80). ProTox achieved the highest benchmarked performance for hepatotoxicity (F1 = 0.92; MCC = 0.84), whereas ADMETLab showed balanced recall-driven performance for nephrotoxicity and cardiotoxicity but exhibited reduced specificity for hepatotoxicity, suggesting overprediction. In contrast, BBB permeability and nephrotoxicity emerged as the most challenging endpoints, with substantial performance degradation relative to reported metrics and consistently low MCC values across tools. Overall, no single tool demonstrated uniform reliability across all toxicity endpoints when evaluated on independent datasets. These findings underscore the limitations of relying solely on tool reported performance and highlight the necessity of endpoint aware benchmarking in in-silico toxicity screening. This study provides actionable guidance for tool selection in early drug discovery and supports future development of ensemble and applicability domain aware models to improve predictive robustness and translational relevance.

Topik & Kata Kunci

Penulis (7)

A

Anirudh R. Urs

V

Varshini Ganesan Selvi

A

Ananya Sreekumar

A

Ananya Sudarsan

P

Prutha V. Murthy

M

Manjunatha Reddy A H

S

Sumathra Manokaran

Format Sitasi

Urs, A.R., Selvi, V.G., Sreekumar, A., Sudarsan, A., Murthy, P.V., H, M.R.A. et al. (2026). Benchmarking web-based in-silico toxicity prediction tools using gold-standard datasets across five key endpoints. https://doi.org/10.1007/s42452-026-08308-7

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.1007/s42452-026-08308-7
Informasi Jurnal
Tahun Terbit
2026
Sumber Database
DOAJ
DOI
10.1007/s42452-026-08308-7
Akses
Open Access ✓