Unsupervised evolution of protein and antibody complexes with a structure-informed language model
Abstrak
Large language models trained on sequence information alone can learn high-level principles of protein design. However, beyond sequence, the three-dimensional structures of proteins determine their specific function, activity, and evolvability. Here, we show that a general protein language model augmented with protein structure backbone coordinates can guide evolution for diverse proteins without the need to model individual functional tasks. We also demonstrate that ESM-IF1, which was only trained on single-chain structures, can be extended to engineer protein complexes. Using this approach, we screened about 30 variants of two therapeutic clinical antibodies used to treat severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. We achieved up to 25-fold improvement in neutralization and 37-fold improvement in affinity against antibody-escaped viral variants of concern BQ.1.1 and XBB.1.5, respectively. These findings highlight the advantage of integrating structural information to identify efficient protein evolution trajectories without requiring any task-specific training data. Editor’s summary Despite tremendous advances in protein structure prediction, connecting sequence to function is key for the in silico engineering of proteins for various tasks. Focusing on the problem of antibody engineering, Shanker et al. used a structure-informed protein language model to predict high-fitness sequences constrained by the known structure of the antibody or antibody-antigen complex. In experimental screens of virus-neutralizing antibodies, the authors observed substantial improvement in binding affinity and neutralization for their predicted sequences. These results demonstrate the potential for machine learning and protein language models trained on protein sequence information to contribute to protein engineering tasks even in the absence of task-specific training data. —Michael A. Funk
Topik & Kata Kunci
Penulis (4)
Varun R. Shanker
Theodora U. J. Bruun
Brian L. Hie
Peter S. Kim
Akses Cepat
- Tahun Terbit
- 2024
- Bahasa
- en
- Total Sitasi
- 100×
- Sumber Database
- Semantic Scholar
- DOI
- 10.1126/science.adk8946
- Akses
- Open Access ✓