arXiv Open Access 2024

MolTRES: Improving Chemical Language Representation Learning for Molecular Property Prediction

Jun-Hyung Park Yeachan Kim Mingyu Lee Hyuntae Park SangKeun Lee
Lihat Sumber

Abstrak

Chemical representation learning has gained increasing interest due to the limited availability of supervised data in fields such as drug and materials design. This interest particularly extends to chemical language representation learning, which involves pre-training Transformers on SMILES sequences -- textual descriptors of molecules. Despite its success in molecular property prediction, current practices often lead to overfitting and limited scalability due to early convergence. In this paper, we introduce a novel chemical language representation learning framework, called MolTRES, to address these issues. MolTRES incorporates generator-discriminator training, allowing the model to learn from more challenging examples that require structural understanding. In addition, we enrich molecular representations by transferring knowledge from scientific literature by integrating external materials embedding. Experimental results show that our model outperforms existing state-of-the-art models on popular molecular property prediction tasks.

Penulis (5)

J

Jun-Hyung Park

Y

Yeachan Kim

M

Mingyu Lee

H

Hyuntae Park

S

SangKeun Lee

Format Sitasi

Park, J., Kim, Y., Lee, M., Park, H., Lee, S. (2024). MolTRES: Improving Chemical Language Representation Learning for Molecular Property Prediction. https://arxiv.org/abs/2408.01426

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓