arXiv Open Access 2025

Bridging the Plausibility-Validity Gap by Fine-Tuning a Reasoning-Enhanced LLM for Chemical Synthesis and Discovery

Malikussaid Hilal Hudan Nuha Isman Kurniawan
Lihat Sumber

Abstrak

Large Language Models frequently generate outputs that appear scientifically reasonable yet violate fundamental principles--a phenomenon we characterize as the "plausibility-validity gap." This challenge proves especially acute in chemistry, where superficial correctness masks deeper errors in molecular structure, reaction mechanisms, and synthetic pathways. We present a systematic approach combining a reasoning-centric model architecture (Magistral Small) with Low-Rank Adaptation fine-tuning on a dual-domain dataset covering molecular properties and chemical transformations. Evaluation reveals substantial improvements: the fine-tuned system achieves 96.3% format adherence, 97.4% chemical validity, and 74.4% synthesis feasibility. Comparative analysis shows our approach outperforms specialized translation models like MolT5 (97.4% vs 77.2% validity) while achieving performance comparable to complex tool-augmented systems like ChemCrow (9.0/10 vs 9.24/10 expert rating) through a more transparent, efficient methodology. Results demonstrate a learning hierarchy where syntactic correctness develops before chemical understanding, which precedes synthetic planning capability. This work establishes a reproducible framework for transforming generalist language models into dependable scientific tools while identifying critical areas including stereochemical precision, knowledge currency, and computational accessibility as key challenges for future advancement.

Penulis (3)

Malikussaid

H

Hilal Hudan Nuha

I

Isman Kurniawan

Format Sitasi

Malikussaid, Nuha, H.H., Kurniawan, I. (2025). Bridging the Plausibility-Validity Gap by Fine-Tuning a Reasoning-Enhanced LLM for Chemical Synthesis and Discovery. https://arxiv.org/abs/2507.07328

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓