Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks
Erfan Shayegani, Md. Abdullah Al Mamun, Yu Fu
et al.
Large Language Models (LLMs) are swiftly advancing in architecture and capability, and as they integrate more deeply into complex systems, the urgency to scrutinize their security properties grows. This paper surveys research in the emerging interdisciplinary field of adversarial attacks on LLMs, a subfield of trustworthy ML, combining the perspectives of Natural Language Processing and Security. Prior work has shown that even safety-aligned LLMs (via instruction tuning and reinforcement learning through human feedback) can be susceptible to adversarial attacks, which exploit weaknesses and mislead AI systems, as evidenced by the prevalence of `jailbreak' attacks on models like ChatGPT and Bard. In this survey, we first provide an overview of large language models, describe their safety alignment, and categorize existing research based on various learning structures: textual-only attacks, multi-modal attacks, and additional attack methods specifically targeting complex systems, such as federated learning or multi-agent systems. We also offer comprehensive remarks on works that focus on the fundamental sources of vulnerabilities and potential defenses. To make this field more accessible to newcomers, we present a systematic review of existing works, a structured typology of adversarial attack concepts, and additional resources, including slides for presentations on related topics at the 62nd Annual Meeting of the Association for Computational Linguistics (ACL'24).
252 sitasi
en
Computer Science
Mental health early warning for college students by integrating multi-source data and TSEN algorithm
Xingxing Ge
Abstract Regarding the problems of strong lag in traditional questionnaires and limited representation of single source data in monitoring the mental health of college students, a mental health warning method is proposed by integrating multi-source campus behavior data with the Temporal-Behavioral Stream Encoding Network (TSEN). By integrating four types of data including academic performance, daily life behavior, online behavior, and psychological profiles (covering 15,682 students from 10 colleges and 4 grades), a Random Walk-based Kalman model is used to fill in missing temporal values. A high-quality dataset is constructed by combining Bidirectional Encoder Representation from Transformers (BERT) text completion and isolated forest anomaly detection based on Transformers. Then, the TSEN dual stream coding framework is designed to achieve early warning of college students’ mental health. Experiments on the public dataset StudentLife and the real campus dataset CAMP showed that the F1 values of TSEN reached 92.58% and 83.29%, which are 2.94% and 2.15% higher than the optimal baseline. The parameter size was only 6.7 M and the inference delay was ≤ 38 ms. In actual deployment, the college student mental health warning model based on the TSEN algorithm identified high-risk psychological problems 6.2 weeks earlier, with a diagnosis rate of 89.76%. This research result can provide high-precision and low invasive early warning solutions for college students’ mental health problems.
Computational linguistics. Natural language processing, Electronic computers. Computer science
Lexical Neologisms in English: Formation, Trends, and Cultural Impact
Oksana Melnyk, Iryna Kyselova
Lexical neologisms, or neologisms, play an important role in developing the English language, reflecting dynamic changes in society, culture, technology, and politics. This article analyzes the main reasons for the emergence of new lexical units in the English language, their classification, and their impact on modern culture. Among the main reasons for the emergence of neologisms, technological progress, socio-cultural changes, globalization and innovations in communication stand out. The article examines the main methods of neologism formation, particularly composition, affixation, contraction, acronym, and rethinking of meanings. Special attention is paid to technological neologisms that have become integral to professional and everyday speech. Words such as "blockchain", "cryptocurrency" and "Metaverse" illustrate how innovations affect the development of vocabulary. In addition, household neologisms that arise in response to modern social trends and cultural phenomena, such as "selfie", "binge-watch" and "dad joke", are studied. The article also highlights the main challenges associated with studying and integrating neologisms into educational materials and dictionaries. Analyzing the prospects for the development of lexical neologisms, the authors predict a further increase in their number due to the development of artificial intelligence, the impact of intercultural communication, and the spread of virtual realities. The conclusions emphasize the importance of studying neologisms for understanding the linguistic and sociocultural processes that shape modern English.
Discourse analysis, Computational linguistics. Natural language processing
Invface: inversion-based synthetic face recognition
Zhifang Sun, Sukumar Letchmunan, Wulfran Fendzi Mbasso
et al.
Abstract Facial recognition technology has achieved remarkable accuracy across various applications, but its reliance on large-scale real face datasets raises significant privacy and ethical concerns. To address these challenges, we propose Invface, a novel synthetic face recognition (SFR) method leveraging diffusion denoising implicit models (DDIMs) to generate high-quality synthetic face datasets. InvFace ensures identity consistency and intra-class diversity by disentangling style and background semantics from real images through a conditional reverse sampling strategy. Our method effectively synthesizes diverse facial images while preserving identity fidelity, outperforming state-of-the-art synthetic approaches on benchmark datasets. Experiments demonstrate that face recognition models trained on Invface datasets achieve competitive accuracy comparable to those trained on real data, offering a robust solution to privacy issues in real-world face recognition. Additionally, privacy analysis confirms that InvFace generates novel virtual identities distinct from training data.
Computational linguistics. Natural language processing, Electronic computers. Computer science
Emerging trends challenges and research opportunities in artificial intelligence applications in marketing
Hosein Borimnejad, Vali Borimnejad
Abstract This study presents a systematic review of Artificial Intelligence (AI) applications in digital marketing by integrating two complementary text-mining techniques—Latent Dirichlet Allocation (LDA) topic modeling and sentiment analysis—to identify emerging trends, challenges, and research opportunities. Drawing on 381 peer-reviewed journal articles published between 2021 and 2025 from the ScienceDirect and Taylor & Francis databases, the analysis uncovers five dominant research themes: AI-driven personalization and customer engagement, big data analytics and predictive modeling, generative AI applications in digital content creation, ethics and privacy concerns, and AI deployment in B2B and SME marketing contexts. A temporal analysis shows a sharp increase in publications after 2020, peaking in 2024, reflecting the rapid mainstreaming of AI in marketing scholarships. Sentiment analysis reveals predominantly positive academic attitudes toward AI’s role in enhancing efficiency and personalization, yet a growing concern with ethical risks such as algorithmic bias, privacy, and trust erosion. By demonstrating how the field has evolved from technical optimization toward socio-ethical reflection, this review contributes both methodologically and theoretically. It extends the Technology Acceptance Model (TAM) by incorporating transparency and accountability as moderating factors in consumer and organizational adoption. Furthermore, it identifies critical gaps—particularly the lack of cross-cultural comparative studies and limited exploration of AI’s intersection with emotional branding—and offers practical guidance for researchers, marketing practitioners, and policymakers seeking to design trust-oriented, responsible AI strategies.
Computational linguistics. Natural language processing, Electronic computers. Computer science
Formation of Human Resources : A Sociological Reading Of The Importance And Objectives
Mohammed ABID & Sultan BELGHIT
Abstract : This study aims to conduct a sociological reading of the Human Resource formation process, because of its great importance in the environment of institutions that characterized by competition and diverse differences. Therefore, we should try to highlight their products according to the available skills and competencies, as a huge human resource, through training and continuous formation processes, and opening all new knowledge formation in the field of specialization.
Keywords : Formation ; Human Resources ; Organization ; Algeria ; Objectives
Arts in general, Computational linguistics. Natural language processing
On the Language Choices of Ukrainian Refugees in Poland
Yuliia Vaseiko
The article analyses the content, structure and functional purpose of an academic paper which examines the multilingualism of refugees from Ukraine who moved to Poland after the full-scale invasion by Russia. The methodological basis of the study is characterized, and the effectiveness of using the method of language biographies and questionnaires is emphasized.
Computational linguistics. Natural language processing, Semantics
L'OBSERVATION DE L’ÉLECTION PRÉSIDENTIELLE DE 2010 DANS LA VILLE DE KORHOGO : QUELLE LEÇON À EN TIRER POUR UNE GOUVERNANCE ÉLECTORALE AMÉLIORÉE ?
Marc Stéphane GBÉDIA
Résumé : La controverse survenue lors de l'élection présidentielle de 2010 en Côte d’Ivoire était liée à la crédibilité des élections dans certaines villes du pays, notamment Korhogo, troisième ville la plus peuplée de la Côte d’Ivoire et la plus grande ville du Nord du pays. En effet, cette ville avait échappé au contrôle gouvernemental. À l’issue de l’élection présidentielle, des divergences d’opinions, tant nationales qu’internationales, avaient été constatées entre les observateurs, et les débats, qui avaient suscité de vives discussions, portaient principalement sur la qualité du scrutin. Cette situation exceptionnelle avait profondément perturbé la cohésion sociale au sein de la ville et jeté un discrédit majeur sur l'organisation des élections présidentielles de 2010. Cet article se propose donc d’analyser le déroulement et l’impact de l'observation électorale dans la ville de Korhogo lors de ces élections. Son élaboration repose sur des sources orales, des sources écrites et imprimées, ainsi que sur des ouvrages et autres publications scientifiques consultés dans divers centres de documentation en Côte d’Ivoire.
Mots-clés : Côte d’Ivoire, Korhogo, Élection présidentielle, Observateurs, Scrutin.
Arts in general, Computational linguistics. Natural language processing
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Julia Kreutzer, Isaac Caswell, Lisa Wang
et al.
Computational linguistics. Natural language processing
Questions Are All You Need to Train a Dense Passage Retriever
Devendra Singh Sachan, Mike Lewis, Dani Yogatama
et al.
AbstractWe introduce ART, a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data. Dense retrieval is a central challenge for open-domain tasks, such as Open QA, where state-of-the-art methods typically require large supervised datasets with custom hard-negative mining and denoising of positive examples. ART, in contrast, only requires access to unpaired inputs and outputs (e.g., questions and potential answer passages). It uses a new passage-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence passages, and (2) the passages are then used to compute the probability of reconstructing the original question. Training for retrieval based on question reconstruction enables effective unsupervised learning of both passage and question encoders, which can be later incorporated into complete Open QA systems without any further finetuning. Extensive experiments demonstrate that ART obtains state-of-the-art results on multiple QA retrieval benchmarks with only generic initialization from a pre-trained language model, removing the need for labeled data and task-specific losses.1Our code and model checkpoints are available at: https://github.com/DevSinghSachan/art.
Computational linguistics. Natural language processing
APPROCHE MORPHO-SÉMANTIQUE DES PRÉNOMS TRADITIONNELS MOOSE EN RAPPORT AVEC DIEU
Mamadou KABRÉ
Résumé : Les prénoms traditionnels moose sont des prénoms portés par les peuples moose du Burkina Faso. L’approche morpho-sémantique a consisté à analyser la signification de ces prénoms traditionnels en examinant leur construction morphologique et leur signification sémantique. Lorsqu’il s’agit de prénoms traditionnels moose en rapport avec Dieu, il est possible d’identifier certains motifs communs qui reflètent la religiosité et la spiritualité de la culture moaga. Les différents prénoms traditionnels sont constitués du radical « Wendé », qui signifie « créateur » ou « Dieu en Moore. En rapport avec Dieu, « Wendé » renvoie à « Celui qui est le créateur de toutes choses », reflétant la croyance en une entité divine responsable de l’origine de l’univers et de la vie. Notons que l’interprétation de chaque prénom varie selon les familles et les traditions locales.
Mots-clés : Prénom, traditionnel, morphologie, sémantique, Dieu
Arts in general, Computational linguistics. Natural language processing
A model for learning strings is not a model of language
Elliot Murphy, Evelina Leivada
Yang and Piantadosi (1) attempt to show that language acquisition is possible without recourse to “innate knowledge of the structures that occur in natural language.” The authors claim that a domain-general rule-learning algorithm can “acquire key pieces of natural language.” Yang and Piantadosi provide a number of technical innovations and elegant arguments for why acquisition researchers should expand their conception of what a possible domain-general learner can achieve. Yet, we also believe that their findings do not directly pertain to human language. The authors (1) provide a model that can take strings of discrete elements and execute a number of primitive operations. The “assumed primitive functions” make regular reference to linearity: “list,” “first character,” “middle of Y,” and “set of strings.” The postulated “pair” and “first” operations are claimed to be “similar in spirit to ‘merge’ in minimalist linguistics [...], except they come with none of the associated machinery that is required in those theories; here, they only concatenate.” Merge is typically not assumed to be a concatenation process. It simply forms sets and does not impose order. Natural language syntax additionally needs a set categorization or a “labeling” operation. Yang and Piantadosi (1) assume some measure of progress in that their model is free from any “associated machinery” of generative models of Merge—but their model captures only relations between strings, not structures. As such, it falls short of explaining “key pieces of natural language.” The Yang and Piantadosi (1) model successfully learns many types of simple formal languages, and its technical sophistication will likely inspire new research into learnability. However, the model exhibits strikingly poor performance with the English auxiliary system, which the authors say may be due to the “complexity” of this system. Likewise, the model has difficulty learning the simple finite grammar from Braine that mimics phrase structure rules. It has only moderate success with a fragment of English involving center embedding. Drawing comparisons with natural language learning (NLL), string inference seems to differ in two critical dimensions: 1) noise—the Yang and Piantadosi (1) model received grammatically correct tokens, while input in NLL is rife with disfluencies (i.e., repetitions, false starts, incorrect syntax); 2) ambiguity of source—the Yang and Piantadosi model was presented with unambiguous data from each source, while human brains are innately predisposed to deal with multiple languages, acquiring them in parallel (2). It is unclear whether the Yang and Piantadosi model can generate strings respecting the syntax of different languages if it is not told which tokens come from which language. Simply put, any learning model that does not link meaning with structure is not a model of human language (3–8). In the generative framework, language is understood to be about form/meaning associations. The intricate regulation of form/meaning pairs constitutes the stuff of syntactic theory, not the organization of strings into an arrangement that overlaps with the linearized output of a Merge-based computational system. The innate predisposition for language goes well beyond the process of inferring strings. We therefore submit that models of learnability will benefit from focusing on the same objects postulated in theoretical linguistics: structures, not strings.
Martha Palmer and Barbara Di Eugenio Interview Martha Evens
Martha Evens
Computational linguistics. Natural language processing
A comparative study of perceptions and experiences of online Chinese language learners in China and the United States during the COVID-19 pandemic
Wang Yanlin, Zhan Hong, Liu Shijuan
This study compared the perceptions and experiences of 173 students studying Chinese as a foreign language in universities online during the COVID-19 pandemic in China and the United States. Controlling students’ previous diversity of Chinese course delivery modes across countries and Chinese language levels, three two-way analysis of covariance (ANCOVA) were conducted to compare differences among three dependent variables: 1) satisfaction towards online classes; 2) self-perceived learning effectiveness online versus onsite; and 3) willingness to take a virtual Chinese course in the future. The results did not find statistical significances regarding students’ satisfaction and willingness across countries and language levels. However, the results found students in the United States (US) viewed online classes as significantly less effective than learning in-person, which was different from the views of students in China. The Pearson correlation analysis indicated that there were positive correlations among these three variables. Pearson chi-squared tests found that, significantly, students in the US preferred to take Chinese courses in-person. Pearson’s chi-squared tests on categories formed from the three open-ended questions highlighted the importance of four factors influencing the success of students’ online classes: technology, emotion and motivation, learning productivity, and teaching presence. Pedagogical recommendations are discussed.
Computational linguistics. Natural language processing
Raccolta bibliografica dei testi di Alberto Gianquinto
Stefania N'Kombo
Computational linguistics. Natural language processing, Epistemology. Theory of knowledge
Anees Shah Jilani as a Sketch Writer
Munir Ahmad, Dr. Muhammad Asif
<span style="font-size: 12.0pt; font-family: "Times New Roman","serif"; mso-ascii-theme-font: major-bidi; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-hansi-theme-font: major-bidi; mso-bidi-theme-font: major-bidi; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA;">Anees Shah Jilani was a brilliant prose writer. His letters to many of his contemporaries are a great read primarily on account of this mastery of Urdu prose. He has more than twenty books in Urdu and Saraiki. He interest in classical Urdu and Saraiki literature caused him a huge following<span lang="AR-SA">.. </span></span><span style="font-size: 12.0pt; font-family: "Times New Roman","serif"; mso-ascii-theme-font: major-bidi; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-hansi-theme-font: major-bidi; mso-bidi-theme-font: major-bidi; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA;">Anees Shah Jillani also been a prominent sketch writer among the literary giants like Rasheed Ahmad Siddiquei, Molvi Abdul Haq, Chiragh Hassan Hasrat, Abdul Majeed Salik, Raees Akhtar Jafery, Shahid Ahmad Dehilvi and Sadat Hassan Manto. </span><span style="font-size: 12.0pt; font-family: "Times New Roman","serif"; mso-ascii-theme-font: major-bidi; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-hansi-theme-font: major-bidi; mso-bidi-theme-font: major-bidi; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA;">Three of his books on sketch writing have been introduced in the literary world had a style of his own. </span><span style="font-size: 12.0pt; font-family: "Times New Roman","serif"; mso-ascii-theme-font: major-bidi; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-hansi-theme-font: major-bidi; mso-bidi-theme-font: major-bidi; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA;">This Article is an Analysis of Anees Shah Jillani work on sketch writing, His individuality in sketch writing and how has he reached the tradition of sketch writing introducing a style of his own<span lang="AR-SA">.</span></span>
Language. Linguistic theory. Comparative grammar, Computational linguistics. Natural language processing
Walter Benjamin e il narratore
Silvia Cammertoni
L’essenza del saggio Il narratore. Considerazioni sull’opera di Nikolaj Leskov deriva dalla miscela di genialità folgorante, precisione di sguardo e consapevolezza profetica di Walter Benjamin che distilla il suo pensiero goccia a goccia nella misura breve della recensione. Il filosofo, come un chimico raffinato, fa della concentrazione e della purezza le qualità del suo estratto. Attraverso la brevità la scrittura è ridotta al maggior grado di complessità e la sua sostanza è costituita dalla riflessione sul narratore in quanto figura ideale. L’occasione di questo scritto, apparso per la prima volta nel 1936, è l’opera di Nikolaj Leskov, ma si tratta solo di un pretesto per una critica sull’arte della narrazione.
Computational linguistics. Natural language processing, Epistemology. Theory of knowledge
One model for the learning of language
Yuan Yang, S. Piantadosi
Significance It has long been hypothesized that language acquisition may be impossible without innate knowledge of the structures that occur in natural language. Here, we show that a domain general learning setup, originally developed in cognitive psychology to model rule learning, is able to acquire key pieces of natural language from relatively few examples of sentences. This develops a new approach to formalizing linguistic learning and highlights some features of language and language acquisition that may arise from general cognitive processes. A major goal of linguistics and cognitive science is to understand what class of learning systems can acquire natural language. Until recently, the computational requirements of language have been used to argue that learning is impossible without a highly constrained hypothesis space. Here, we describe a learning system that is maximally unconstrained, operating over the space of all computations, and is able to acquire many of the key structures present in natural language from positive evidence alone. We demonstrate this by providing the same learning model with data from 74 distinct formal languages which have been argued to capture key features of language, have been studied in experimental work, or come from an interesting complexity class. The model is able to successfully induce the latent system generating the observed strings from small amounts of evidence in almost all cases, including for regular (e.g., an, (ab)n, and {a,b}+), context-free (e.g., anbn, anbn+m, and xxR), and context-sensitive (e.g., anbncn, anbmcndm, and xx) languages, as well as for many languages studied in learning experiments. These results show that relatively small amounts of positive evidence can support learning of rich classes of generative computations over structures. The model provides an idealized learning setup upon which additional cognitive constraints and biases can be formalized.
58 sitasi
en
Computer Science, Medicine
Understanding User Stories : Computational Linguistics in Agile Requirements Engineering
G. Lucassen