Semantic Scholar Open Access 2023 252 sitasi

Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks

Erfan Shayegani Md. Abdullah Al Mamun Yu Fu Pedram Zaree Yue Dong +1 lainnya

Abstrak

Large Language Models (LLMs) are swiftly advancing in architecture and capability, and as they integrate more deeply into complex systems, the urgency to scrutinize their security properties grows. This paper surveys research in the emerging interdisciplinary field of adversarial attacks on LLMs, a subfield of trustworthy ML, combining the perspectives of Natural Language Processing and Security. Prior work has shown that even safety-aligned LLMs (via instruction tuning and reinforcement learning through human feedback) can be susceptible to adversarial attacks, which exploit weaknesses and mislead AI systems, as evidenced by the prevalence of `jailbreak' attacks on models like ChatGPT and Bard. In this survey, we first provide an overview of large language models, describe their safety alignment, and categorize existing research based on various learning structures: textual-only attacks, multi-modal attacks, and additional attack methods specifically targeting complex systems, such as federated learning or multi-agent systems. We also offer comprehensive remarks on works that focus on the fundamental sources of vulnerabilities and potential defenses. To make this field more accessible to newcomers, we present a systematic review of existing works, a structured typology of adversarial attack concepts, and additional resources, including slides for presentations on related topics at the 62nd Annual Meeting of the Association for Computational Linguistics (ACL'24).

Topik & Kata Kunci

Computer Science

Penulis (6)

Erfan Shayegani

Md. Abdullah Al Mamun

Yu Fu

Pedram Zaree

Yue Dong

Nael B. Abu-Ghazaleh

Format Sitasi

APA MLA BibTeX

Shayegani, E., Mamun, M.A.A., Fu, Y., Zaree, P., Dong, Y., Abu-Ghazaleh, N.B. (2023). Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks. https://doi.org/10.48550/arXiv.2310.10844

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.48550/arXiv.2310.10844

Informasi Jurnal

Tahun Terbit: 2023
Bahasa: en
Total Sitasi: 252×
Sumber Database: Semantic Scholar
DOI: 10.48550/arXiv.2310.10844
Akses: Open Access ✓