Detecting Data Poisoning in Code Generation LLMs via Black-Box, Vulnerability-Oriented Scanning
Shenao Yan, Shimaa Ahmed, Shan Jin
et al.
Code generation large language models (LLMs) are increasingly integrated into modern software development workflows. Recent work has shown that these models are vulnerable to backdoor and poisoning attacks that induce the generation of insecure code, yet effective defenses remain limited. Existing scanning approaches rely on token-level generation consistency to invert attack targets, which is ineffective for source code where identical semantics can appear in diverse syntactic forms. We present CodeScan, which, to the best of our knowledge, is the first poisoning-scanning framework tailored to code generation models. CodeScan identifies attack targets by analyzing structural similarities across multiple generations conditioned on different clean prompts. It combines iterative divergence analysis with abstract syntax tree (AST)-based normalization to abstract away surface-level variation and unify semantically equivalent code, isolating structures that recur consistently across generations. CodeScan then applies LLM-based vulnerability analysis to determine whether the extracted structures contain security vulnerabilities and flags the model as compromised when such a structure is found. We evaluate CodeScan against four representative attacks under both backdoor and poisoning settings across three real-world vulnerability classes. Experiments on 108 models spanning three architectures and multiple model sizes demonstrate 97%+ detection accuracy with substantially lower false positives than prior methods.
Collapsing Sequence-Level Data-Policy Coverage via Poisoning Attack in Offline Reinforcement Learning
Xue Zhou, Dapeng Man, Chen Xu
et al.
Offline reinforcement learning (RL) heavily relies on the coverage of pre-collected data over the target policy's distribution. Existing studies aim to improve data-policy coverage to mitigate distributional shifts, but overlook security risks from insufficient coverage, and the single-step analysis is not consistent with the multi-step decision-making nature of offline RL. To address this, we introduce the sequence-level concentrability coefficient to quantify coverage, and reveal its exponential amplification on the upper bound of estimation errors through theoretical analysis. Building on this, we propose the Collapsing Sequence-Level Data-Policy Coverage (CSDPC) poisoning attack. Considering the continuous nature of offline RL data, we convert state-action pairs into decision units, and extract representative decision patterns that capture multi-step behavior. We identify rare patterns likely to cause insufficient coverage, and poison them to reduce coverage and exacerbate distributional shifts. Experiments show that poisoning just 1% of the dataset can degrade agent performance by 90%. This finding provides new perspectives for analyzing and safeguarding the security of offline RL.
A Bayesian Incentive Mechanism for Poison-Resilient Federated Learning
Daniel Commey, Rebecca A. Sarpong, Griffith S. Klogo
et al.
Federated learning (FL) enables collaborative model training across decentralized clients while preserving data privacy. However, its open-participation nature exposes it to data-poisoning attacks, in which malicious actors submit corrupted model updates to degrade the global model. Existing defenses are often reactive, relying on statistical aggregation rules that can be computationally expensive and that typically assume an honest majority. This paper introduces a proactive, economic defense: a lightweight Bayesian incentive mechanism that makes malicious behavior economically irrational. Each training round is modeled as a Bayesian game of incomplete information in which the server, acting as the principal, uses a small, private validation dataset to verify update quality before issuing payments. The design satisfies Individual Rationality (IR) for benevolent clients, ensuring their participation is profitable, and Incentive Compatibility (IC), making poisoning an economically dominated strategy. Extensive experiments on non-IID partitions of MNIST and FashionMNIST demonstrate robustness: with 50% label-flipping adversaries on MNIST, the mechanism maintains 96.7% accuracy, only 0.3 percentage points lower than in a scenario with 30% label-flipping adversaries. This outcome is 51.7 percentage points better than standard FedAvg, which collapses under the same 50% attack. The mechanism is computationally light, budget-bounded, and readily integrates into existing FL frameworks, offering a practical route to economically robust and sustainable FL ecosystems.
Retrieval-Augmented Review Generation for Poisoning Recommender Systems
Shiyi Yang, Xinshu Li, Guanglin Zhou
et al.
Recent studies have shown that recommender systems (RSs) are highly vulnerable to data poisoning attacks, where malicious actors inject fake user profiles, including a group of well-designed fake ratings, to manipulate recommendations. Due to security and privacy constraints in practice, attackers typically possess limited knowledge of the victim system and thus need to craft profiles that have transferability across black-box RSs. To maximize the attack impact, the profiles often remains imperceptible. However, generating such high-quality profiles with the restricted resources is challenging. Some works suggest incorporating fake textual reviews to strengthen the profiles; yet, the poor quality of the reviews largely undermines the attack effectiveness and imperceptibility under the practical setting. To tackle the above challenges, in this paper, we propose to enhance the quality of the review text by harnessing in-context learning (ICL) capabilities of multimodal foundation models. To this end, we introduce a demonstration retrieval algorithm and a text style transfer strategy to augment the navie ICL. Specifically, we propose a novel practical attack framework named RAGAN to generate high-quality fake user profiles, which can gain insights into the robustness of RSs. The profiles are generated by a jailbreaker and collaboratively optimized on an instructional agent and a guardian to improve the attack transferability and imperceptibility. Comprehensive experiments on various real-world datasets demonstrate that RAGAN achieves the state-of-the-art poisoning attack performance.
Poisoning Attacks to Local Differential Privacy for Ranking Estimation
Pei Zhan, Peng Tang, Yangzhuo Li
et al.
Local differential privacy (LDP) involves users perturbing their inputs to provide plausible deniability of their data. However, this also makes LDP vulnerable to poisoning attacks. In this paper, we first introduce novel poisoning attacks for ranking estimation. These attacks are intricate, as fake attackers do not merely adjust the frequency of target items. Instead, they leverage a limited number of fake users to precisely modify frequencies, effectively altering item rankings to maximize gains. To tackle this challenge, we introduce the concepts of attack cost and optimal attack item (set), and propose corresponding strategies for kRR, OUE, and OLH protocols. For kRR, we iteratively select optimal attack items and allocate suitable fake users. For OUE, we iteratively determine optimal attack item sets and consider the incremental changes in item frequencies across different sets. Regarding OLH, we develop a harmonic cost function based on the pre-image of a hash to select that supporting a larger number of effective attack items. Lastly, we present an attack strategy based on confidence levels to quantify the probability of a successful attack and the number of attack iterations more precisely. We demonstrate the effectiveness of our attacks through theoretical and empirical evidence, highlighting the necessity for defenses against these attacks. The source code and data have been made available at https://github.com/LDP-user/LDP-Ranking.git.
Byzantine Failures Harm the Generalization of Robust Distributed Learning Algorithms More Than Data Poisoning
Thomas Boudou, Batiste Le Bars, Nirupam Gupta
et al.
Robust distributed learning algorithms aim to maintain reliable performance despite the presence of misbehaving workers. Such misbehaviors are commonly modeled as Byzantine failures, allowing arbitrarily corrupted communication, or as data poisoning, a weaker form of corruption restricted to local training data. While prior work shows similar optimization guarantees for both models, an important question remains: How do these threat models impact generalization? Empirical evidence suggests a gap, yet it remains unclear whether it is unavoidable or merely an artifact of suboptimal attacks. We show, for the first time, a fundamental gap in generalization guarantees between the two threat models: Byzantine failures yield strictly worse rates than those achievable under data poisoning. Our findings leverage a tight algorithmic stability analysis of robust distributed learning. Specifically, we prove that: (i) under data poisoning, the uniform algorithmic stability of an algorithm with optimal optimization guarantees degrades by an additive factor of $\varTheta ( \frac{f}{n-f} )$, with $f$ out of $n$ workers misbehaving; whereas $\textit{(ii)}$ under Byzantine failures, the degradation is in $Ω\big( \sqrt{ \frac{f}{n-2f}} \big)$.
Maximizing Uncertainty for Federated learning via Bayesian Optimisation-based Model Poisoning
Marios Aristodemou, Xiaolan Liu, Yuan Wang
et al.
As we transition from Narrow Artificial Intelligence towards Artificial Super Intelligence, users are increasingly concerned about their privacy and the trustworthiness of machine learning (ML) technology. A common denominator for the metrics of trustworthiness is the quantification of uncertainty inherent in DL algorithms, and specifically in the model parameters, input data, and model predictions. One of the common approaches to address privacy-related issues in DL is to adopt distributed learning such as federated learning (FL), where private raw data is not shared among users. Despite the privacy-preserving mechanisms in FL, it still faces challenges in trustworthiness. Specifically, the malicious users, during training, can systematically create malicious model parameters to compromise the models predictive and generative capabilities, resulting in high uncertainty about their reliability. To demonstrate malicious behaviour, we propose a novel model poisoning attack method named Delphi which aims to maximise the uncertainty of the global model output. We achieve this by taking advantage of the relationship between the uncertainty and the model parameters of the first hidden layer of the local model. Delphi employs two types of optimisation , Bayesian Optimisation and Least Squares Trust Region, to search for the optimal poisoned model parameters, named as Delphi-BO and Delphi-LSTR. We quantify the uncertainty using the KL Divergence to minimise the distance of the predictive probability distribution towards an uncertain distribution of model output. Furthermore, we establish a mathematical proof for the attack effectiveness demonstrated in FL. Numerical results demonstrate that Delphi-BO induces a higher amount of uncertainty than Delphi-LSTR highlighting vulnerability of FL systems to model poisoning attacks.
Disabling Self-Correction in Retrieval-Augmented Generation via Stealthy Retriever Poisoning
Yanbo Dai, Zhenlan Ji, Zongjie Li
et al.
Retrieval-Augmented Generation (RAG) has become a standard approach for improving the reliability of large language models (LLMs). Prior work demonstrates the vulnerability of RAG systems by misleading them into generating attacker-chosen outputs through poisoning the knowledge base. However, this paper uncovers that such attacks could be mitigated by the strong \textit{self-correction ability (SCA)} of modern LLMs, which can reject false context once properly configured. This SCA poses a significant challenge for attackers aiming to manipulate RAG systems. In contrast to previous poisoning methods, which primarily target the knowledge base, we introduce \textsc{DisarmRAG}, a new poisoning paradigm that compromises the retriever itself to suppress the SCA and enforce attacker-chosen outputs. This compromisation enables the attacker to straightforwardly embed anti-SCA instructions into the context provided to the generator, thereby bypassing the SCA. To this end, we present a contrastive-learning-based model editing technique that performs localized and stealthy edits, ensuring the retriever returns a malicious instruction only for specific victim queries while preserving benign retrieval behavior. To further strengthen the attack, we design an iterative co-optimization framework that automatically discovers robust instructions capable of bypassing prompt-based defenses. We extensively evaluate DisarmRAG across six LLMs and three QA benchmarks. Our results show near-perfect retrieval of malicious instructions, which successfully suppress SCA and achieve attack success rates exceeding 90\% under diverse defensive prompts. Also, the edited retriever remains stealthy under several detection methods, highlighting the urgent need for retriever-centric defenses.
Graph Representation-based Model Poisoning on the Heterogeneous Internet of Agents
Hanlin Cai, Houtianfu Wang, Haofan Dong
et al.
Internet of Agents (IoA) envisions a unified, agent-centric paradigm where heterogeneous large language model (LLM) agents can interconnect and collaborate at scale. Within this paradigm, federated fine-tuning (FFT) serves as a key enabler that allows distributed LLM agents to co-train an intelligent global LLM without centralizing local datasets. However, the FFT-enabled IoA systems remain vulnerable to model poisoning attacks, where adversaries can upload malicious updates to the server to degrade the performance of the aggregated global LLM. This paper proposes a graph representation-based model poisoning (GRMP) attack, which exploits overheard benign updates to construct a feature correlation graph and employs a variational graph autoencoder to capture structural dependencies and generate malicious updates. A novel attack algorithm is developed based on augmented Lagrangian and subgradient descent methods to optimize malicious updates that preserve benign-like statistics while embedding adversarial objectives. Experimental results show that the proposed GRMP attack can substantially decrease accuracy across different LLM models while remaining statistically consistent with benign updates, thereby evading detection by existing defense mechanisms and underscoring a severe threat to the ambitious IoA paradigm.
MCPTox: A Benchmark for Tool Poisoning Attack on Real-World MCP Servers
Zhiqiang Wang, Yichao Gao, Yanting Wang
et al.
By providing a standardized interface for LLM agents to interact with external tools, the Model Context Protocol (MCP) is quickly becoming a cornerstone of the modern autonomous agent ecosystem. However, it creates novel attack surfaces due to untrusted external tools. While prior work has focused on attacks injected through external tool outputs, we investigate a more fundamental vulnerability: Tool Poisoning, where malicious instructions are embedded within a tool's metadata without execution. To date, this threat has been primarily demonstrated through isolated cases, lacking a systematic, large-scale evaluation. We introduce MCPTox, the first benchmark to systematically evaluate agent robustness against Tool Poisoning in realistic MCP settings. MCPTox is constructed upon 45 live, real-world MCP servers and 353 authentic tools. To achieve this, we design three distinct attack templates to generate a comprehensive suite of 1312 malicious test cases by few-shot learning, covering 10 categories of potential risks. Our evaluation on 20 prominent LLM agents setting reveals a widespread vulnerability to Tool Poisoning, with o1-mini, achieving an attack success rate of 72.8\%. We find that more capable models are often more susceptible, as the attack exploits their superior instruction-following abilities. Finally, the failure case analysis reveals that agents rarely refuse these attacks, with the highest refused rate (Claude-3.7-Sonnet) less than 3\%, demonstrating that existing safety alignment is ineffective against malicious actions that use legitimate tools for unauthorized operation. Our findings create a crucial empirical baseline for understanding and mitigating this widespread threat, and we release MCPTox for the development of verifiably safer AI agents. Our dataset is available at an anonymized repository: \textit{https://anonymous.4open.science/r/AAAI26-7C02}.
The Caenorhabditis elegans worm Development and Activity Test (wDAT) can be used to differentiate between reversible and irreversible developmental effects
Piper Reid Hunt, Nicholas Olejnik, Jeffrey Yourick
et al.
Developmental delay and spontaneous locomotor activity changes, as well as the reversibility of these adverse effects are apical endpoints used in chemical safety evaluations. These endpoints were assessed at sublethal concentrations in C. elegans using 5-fluorouracil (5FU), hydroxyurea (HU), or ribavirin (RV), teratogens that are associated with reduced fetal growth in mammals. C. elegans develop from egg to egg-laying adult in about three days. Synchronized cohorts were exposed either continuously, or for 24 h (early-only) from first-feeding after hatching. Developmental delays were dose-responsive for all three chemicals in both exposure schemes. For 5FU and HU, developmental delays and hypoactivity levels were similar in continuous and early-only exposure groups, consistent with irreversible developmental effects. The observed hypoactivity in developing C. elegans may be related to reported 5FU-induced muscle impairment and HU-induced post-exposure effects on locomotion parameters in mammals. In contrast to 5FU- and HU-induced hypoactivity, RV was associated with a non-significant trend to slight hyperactivity in both exposure schemes. Continuous RV exposures induced delays to sequential developmental milestones that increased with exposure duration. RV-induced delays were significantly reduced but not eliminated in early-only exposure cohorts, consistent with cumulative RV effects on developmental progress. These findings suggest that C. elegans may be a useful model for detecting chemicals with irreversible, reversible, and/or cumulative effects on organismal development.
Neuronal effect of 0.3 % DMSO and the synergism between 0.3 % DMSO and loss function of UCH-L1 on Drosophila melanogaster model
Mai Thi Thu Trinh, Truong Huynh Kim Thoa, Dang Thi Phuong Thao
Dimethyl sulfoxide (DMSO) is a polar aprotic solvent which is widely used in biological and medical studies and as a vehicle for pharmacological therapy. DMSO from 0.1 % to 0.5 %, particularly 0.3 % is commonly used as solvent to dissolve compounds when testing their effect on living cell, tissues including nerve cell. However, scientific data on the effects of DMSO on nervous system is limited. Here, we present our data of case study on investigation the effects of DMSO at 0.3 % concentration on nerve cell of Drosophila melanogaster model. We found that 0.3 % DMSO concentration had affected on the active zone and glutamate receptor. Notably, this study also revealed the synergistic effect of 0.3 % DMSO and loss function of dUCH (the homolog of Ubiquitin Carboxyl terminal Hydrolase -L1, UCH-L1 in D. melanogaster). This combination caused more serious abnormalities in synapse structure, particularly number of boutons on Neuromuscular Junction, NMJ. Furthermore, 0.3 % DMSO reduced the amount of ubiquitinylated protein aggregates in the indirect flight muscle of both normal and genectic defect fly model. Taken together, data in this sytudy indicated that 0.3 % DMSO caused the aberrant morphology of the synaptic structure and decreased the number of ubiquitinylated proteins in the indirect flight muscle of Drosophila. The data from the study contributed new evidence of the effects of DMSO on the nervous system. Signigicantly, this study revealed that DMSO affected on neuron cell at low concentration which widely used as pharmacological solvent.
Potential therapeutic role of sex steroids in treating sarcopenia: a network pharmacology and molecular dynamics study
Xiangyu Cui, Xiaodong Li, Xin Qi
et al.
Abstract Background Sarcopenia, characterized by progressive muscle loss and functional decline in aging, poses significant health challenges. Sex steroids, such as estradiol and testosterone, have potential therapeutic roles in mitigating muscle degeneration. This study explores the molecular mechanisms and targets of sex steroids in the treatment of sarcopenia using network pharmacology, enrichment analysis, machine learning, molecular docking, and molecular dynamics simulations. Methods We identified potential anti-sarcopenia targets by analyzing the interaction network between sex steroids and their targets, intersecting these with differentially expressed genes (DEGs) from the GSE1428. Enrichment analysis was conducted to determine the functional relevance of these targets. Gene set variation analysis (GSVA) was employed to explore pathway-level differences between age groups. Machine learning algorithms (RF, SVM, XGBoost) identified crucial biomarker genes. A nomogram for predicting sarcopenia was constructed and validated. Molecular docking and molecular dynamics (MD) simulations evaluated the binding interactions and stability of steroid-target complexes. Results Intersection analysis revealed 69 potential anti-sarcopenia targets. Enrichment analysis highlighted pathways related to muscle function, such as calcium signaling and synaptic transmission. GSVA indicated significant upregulation of DNA damage response and immune response pathways in the older group. Machine learning algorithms pinpointed CFTR, FYN, and PRKCA as top biomarkers. The nomogram demonstrated high predictive accuracy with an AUC of 0.925. Molecular docking showed significant binding affinities of sex steroids with target proteins, further supported by stable RMSD values in MD simulations. Conclusion Sex steroids, specifically estradiol and testosterone, demonstrate promising interactions with key targets implicated in sarcopenia in silico. These computational findings offer preliminary mechanistic insights into the potential therapeutic role of sex steroids in modulating muscle-related pathways. Further experimental and clinical validation is warranted to assess their translational applicability for sarcopenia treatment. Clinical trial number Not applicable.
Therapeutics. Pharmacology, Toxicology. Poisons
Universal Black-Box Reward Poisoning Attack against Offline Reinforcement Learning
Yinglun Xu, Rohan Gumaste, Gagandeep Singh
We study the problem of universal black-boxed reward poisoning attacks against general offline reinforcement learning with deep neural networks. We consider a black-box threat model where the attacker is entirely oblivious to the learning algorithm, and its budget is limited by constraining the amount of corruption at each data point and the total perturbation. We require the attack to be universally efficient against any efficient algorithms that might be used by the agent. We propose an attack strategy called the `policy contrast attack.' The idea is to find low- and high-performing policies covered by the dataset and make them appear to be high- and low-performing to the agent, respectively. To the best of our knowledge, we propose the first universal black-box reward poisoning attack in the general offline RL setting. We provide theoretical insights on the attack design and empirically show that our attack is efficient against current state-of-the-art offline RL algorithms in different learning datasets.
Spondias mombin Ameliorates Copper-Induced Memory Impairment in Mice by Mitigating Oxidative Stress and Modulating Cholinergic Activity
James Busayo Agboola, Amos Rotimi Joshua, Sanmi Tunde Ogunsanya
et al.
The study unveils the importance of Spondias mombin leaf extract (S. mombin: 100 mg/kg b.wt.) in mitigating the toxic insult mediated by copper (16 mg/kg) on the cholinergic system via oral gavage. Male mice were weighed and distributed into five groups, six mice per group. Group A represents the basal control group given only distilled water; Group B and C were given 16 mg/kg of copper sulphate and 100 mg/kg of S. mombin methanol leaf extract, respectively, for 31 days; Group D was given 16 mg/kg of copper sulphate for 31 days, followed by treatment with 100 mg/kg of S. mombin methanol leaf extract for 15 days; and Group E was assigned only S. mombin. The copper concentration was evaluated using atomic absorption spectroscopy, while the cognitive abilities were assessed using Morris water maze and elevated plus-maze. The result showed that S. mombin reversed the disparities in cholinesterase and choline acetyltransferase activities in vivo due to copper neurotoxicity, with a relative (p<0.05) decrease in brain and blood copper concentration in co- and posttreatment groups. Furthermore, S. mombin reported decreased the disparity between pro- and antioxidant occurrence caused by copper toxicity by lowering brain lipid peroxidation, raising thiol levels, and increasing the activities of antioxidant enzymes such as glutathione transferase, catalase, and superoxide dismutase. The result showed that the extract had a protective influence on memory deficits (p<0.05), significantly reduced anxiety levels, and increased the overall number of neurons in the hippocampus subfields. In conclusion, S. mombin ameliorates Cu-induced neurotoxicity and neurocognitive dysfunctions by virtue of enhancement of cholinergic signalling and antioxidant activities in the brain.
DeepfakeArt Challenge: A Benchmark Dataset for Generative AI Art Forgery and Data Poisoning Detection
Hossein Aboutalebi, Dayou Mao, Rongqi Fan
et al.
The tremendous recent advances in generative artificial intelligence techniques have led to significant successes and promise in a wide range of different applications ranging from conversational agents and textual content generation to voice and visual synthesis. Amid the rise in generative AI and its increasing widespread adoption, there has been significant growing concern over the use of generative AI for malicious purposes. In the realm of visual content synthesis using generative AI, key areas of significant concern has been image forgery (e.g., generation of images containing or derived from copyright content), and data poisoning (i.e., generation of adversarially contaminated images). Motivated to address these key concerns to encourage responsible generative AI, we introduce the DeepfakeArt Challenge, a large-scale challenge benchmark dataset designed specifically to aid in the building of machine learning algorithms for generative AI art forgery and data poisoning detection. Comprising of over 32,000 records across a variety of generative forgery and data poisoning techniques, each entry consists of a pair of images that are either forgeries / adversarially contaminated or not. Each of the generated images in the DeepfakeArt Challenge benchmark dataset \footnote{The link to the dataset: http://anon\_for\_review.com} has been quality checked in a comprehensive manner.
Data and Model Poisoning Backdoor Attacks on Wireless Federated Learning, and the Defense Mechanisms: A Comprehensive Survey
Yichen Wan, Youyang Qu, Wei Ni
et al.
Due to the greatly improved capabilities of devices, massive data, and increasing concern about data privacy, Federated Learning (FL) has been increasingly considered for applications to wireless communication networks (WCNs). Wireless FL (WFL) is a distributed method of training a global deep learning model in which a large number of participants each train a local model on their training datasets and then upload the local model updates to a central server. However, in general, non-independent and identically distributed (non-IID) data of WCNs raises concerns about robustness, as a malicious participant could potentially inject a "backdoor" into the global model by uploading poisoned data or models over WCN. This could cause the model to misclassify malicious inputs as a specific target class while behaving normally with benign inputs. This survey provides a comprehensive review of the latest backdoor attacks and defense mechanisms. It classifies them according to their targets (data poisoning or model poisoning), the attack phase (local data collection, training, or aggregation), and defense stage (local training, before aggregation, during aggregation, or after aggregation). The strengths and limitations of existing attack strategies and defense mechanisms are analyzed in detail. Comparisons of existing attack methods and defense designs are carried out, pointing to noteworthy findings, open challenges, and potential future research directions related to security and privacy of WFL.
Run-Off Election: Improved Provable Defense against Data Poisoning Attacks
Keivan Rezaei, Kiarash Banihashem, Atoosa Chegini
et al.
In data poisoning attacks, an adversary tries to change a model's prediction by adding, modifying, or removing samples in the training data. Recently, ensemble-based approaches for obtaining provable defenses against data poisoning have been proposed where predictions are done by taking a majority vote across multiple base models. In this work, we show that merely considering the majority vote in ensemble defenses is wasteful as it does not effectively utilize available information in the logits layers of the base models. Instead, we propose Run-Off Election (ROE), a novel aggregation method based on a two-round election across the base models: In the first round, models vote for their preferred class and then a second, Run-Off election is held between the top two classes in the first round. Based on this approach, we propose DPA+ROE and FA+ROE defense methods based on Deep Partition Aggregation (DPA) and Finite Aggregation (FA) approaches from prior work. We evaluate our methods on MNIST, CIFAR-10, and GTSRB and obtain improvements in certified accuracy by up to 3%-4%. Also, by applying ROE on a boosted version of DPA, we gain improvements around 12%-27% comparing to the current state-of-the-art, establishing a new state-of-the-art in (pointwise) certified robustness against data poisoning. In many cases, our approach outperforms the state-of-the-art, even when using 32 times less computational power.
Voyager: MTD-Based Aggregation Protocol for Mitigating Poisoning Attacks on DFL
Chao Feng, Alberto Huertas Celdran, Michael Vuong
et al.
The growing concern over malicious attacks targeting the robustness of both Centralized and Decentralized Federated Learning (FL) necessitates novel defensive strategies. In contrast to the centralized approach, Decentralized FL (DFL) has the advantage of utilizing network topology and local dataset information, enabling the exploration of Moving Target Defense (MTD) based approaches. This work presents a theoretical analysis of the influence of network topology on the robustness of DFL models. Drawing inspiration from these findings, a three-stage MTD-based aggregation protocol, called Voyager, is proposed to improve the robustness of DFL models against poisoning attacks by manipulating network topology connectivity. Voyager has three main components: an anomaly detector, a network topology explorer, and a connection deployer. When an abnormal model is detected in the network, the topology explorer responds strategically by forming connections with more trustworthy participants to secure the model. Experimental evaluations show that Voyager effectively mitigates various poisoning attacks without imposing significant resource and computational burdens on participants. These findings highlight the proposed reactive MTD as a potent defense mechanism in the context of DFL.
Poisoning Attacks in Federated Edge Learning for Digital Twin 6G-enabled IoTs: An Anticipatory Study
Mohamed Amine Ferrag, Burak Kantarci, Lucas C. Cordeiro
et al.
Federated edge learning can be essential in supporting privacy-preserving, artificial intelligence (AI)-enabled activities in digital twin 6G-enabled Internet of Things (IoT) environments. However, we need to also consider the potential of attacks targeting the underlying AI systems (e.g., adversaries seek to corrupt data on the IoT devices during local updates or corrupt the model updates); hence, in this article, we propose an anticipatory study for poisoning attacks in federated edge learning for digital twin 6G-enabled IoT environments. Specifically, we study the influence of adversaries on the training and development of federated learning models in digital twin 6G-enabled IoT environments. We demonstrate that attackers can carry out poisoning attacks in two different learning settings, namely: centralized learning and federated learning, and successful attacks can severely reduce the model's accuracy. We comprehensively evaluate the attacks on a new cyber security dataset designed for IoT applications with three deep neural networks under the non-independent and identically distributed (Non-IID) data and the independent and identically distributed (IID) data. The poisoning attacks, on an attack classification problem, can lead to a decrease in accuracy from 94.93% to 85.98% with IID data and from 94.18% to 30.04% with Non-IID.