Learned indexes are a class of index data structures that enable fast search by approximating the cumulative distribution function (CDF) using machine learning models (Kraska et al., SIGMOD'18). However, recent studies have shown that learned indexes are vulnerable to poisoning attacks, where injecting a small number of poison keys into the training data can significantly degrade model accuracy and reduce index performance (Kornaropoulos et al., SIGMOD'22). In this work, we provide a rigorous theoretical analysis of poisoning attacks targeting linear regression models over CDFs, one of the most basic regression models and a core component in many learned indexes. Our main contributions are as follows: (i) We present a theoretical proof characterizing the optimal single-point poisoning attack and show that the existing method yields the optimal attack. (ii) We show that in multi-point attacks, the existing greedy approach is not always optimal, and we rigorously derive the key properties that an optimal attack should satisfy. (iii) We propose a method to compute an upper bound of the multi-point poisoning attack's impact and empirically demonstrate that the loss under the greedy approach is often close to this bound. Our study deepens the theoretical understanding of attack strategies against linear regression models on CDFs and provides a foundation for the theoretical evaluation of attacks and defenses on learned indexes.
Maletić Jelena Stojčević, Barjaktarović Iva, Andrijević Ljiljana
et al.
Handling clinical samples from patients suspected of SARS-CoV-2 infection puts healthcare workers at risk of exposure to infectious particles. To reduce this risk, samples are often heat-inactivated before nucleic acid isolation, but this procedure may affect the analytical sensitivity of the test. The aim of this study was therefore to evaluate the effects of heat inactivation (56 °C for 30 min) on RT-qPCR results of samples taken from nasopharyngeal and oropharyngeal (NP/OP) swabs collected from 200 symptomatic patients. Each sample was split into two aliquots – one subjected to heat inactivation and the other stored at 4 °C – followed by nucleic acid isolation and RT-qPCR analysis using the GeneFinder COVID-19 nucleic acid test. Heat inactivation did not significantly affect the overall SARS-CoV-2 detection rate (55.5 % vs. 55.0 % in untreated and heat-treated groups; χ2=0.01; p=0.91). However, discrepancies occurred in 15.3 % of samples, all with quantification cycle (Cq) values >31, including target loss, gain, or complete signal disappearance after heat treatment. Heat inactivation also slightly decreased Cq values for the RNA-dependent RNA polymerase (RdRp) and envelope (E) genes and increased those for the nucleocapsid (N) gene, with significant changes in strongly positive samples (Cq≤33). In positive samples (Cq≤40), the human ribonuclease (RNase) P gene also exhibited significantly higher Cq values after heat treatment. In the strongly positive subgroup, correlation analysis showed moderate correlation for RdRp and very strong correlation for the N and E genes, and a weaker correlation for weakly positive samples. In conclusion, heat inactivation at 56 °C for 30 min does not significantly affect viral gene detection but may diminish it in samples with low viral load.
We study the corruption-robustness of in-context reinforcement learning (ICRL), focusing on the Decision-Pretrained Transformer (DPT, Lee et al., 2023). To address the challenge of reward poisoning attacks targeting the DPT, we propose a novel adversarial training framework, called Adversarially Trained Decision-Pretrained Transformer (AT-DPT). Our method simultaneously trains an attacker to minimize the true reward of the DPT by poisoning environment rewards, and a DPT model to infer optimal actions from the poisoned data. We evaluate the effectiveness of our approach against standard bandit algorithms, including robust baselines designed to handle reward contamination. Our results show that the proposed method significantly outperforms these baselines in bandit settings, under a learned attacker. We additionally evaluate AT-DPT on an adaptive attacker, and observe similar results. Furthermore, we extend our evaluation to the MDP setting, confirming that the robustness observed in bandit scenarios generalizes to more complex environments.
Federated learning (FL) allows multiple clients to collaboratively train a global machine learning model with coordination from a central server, without needing to share their raw data. This approach is particularly appealing in the era of privacy regulations like the GDPR, leading many prominent companies to adopt it. However, FL's distributed nature makes it susceptible to poisoning attacks, where malicious clients, controlled by an attacker, send harmful data to compromise the model. Most existing poisoning attacks in FL aim to degrade the model's integrity, such as reducing its accuracy, with limited attention to privacy concerns from these attacks. In this study, we introduce FedPoisonMIA, a novel poisoning membership inference attack targeting FL. FedPoisonMIA involves malicious clients crafting local model updates to infer membership information. Additionally, we propose a robust defense mechanism to mitigate the impact of FedPoisonMIA attacks. Extensive experiments across various datasets demonstrate the attack's effectiveness, while our defense approach reduces its impact to a degree.
Georgios Syros, Anshuman Suri, Farinaz Koushanfar
et al.
Federated Learning is vulnerable to adversarial manipulation, where malicious clients can inject poisoned updates to influence the global model's behavior. While existing defense mechanisms have made notable progress, they fail to protect against adversaries that aim to induce targeted backdoors under different learning and attack configurations. To address this limitation, we introduce DROP (Distillation-based Reduction Of Poisoning), a novel defense mechanism that combines clustering and activity-tracking techniques with extraction of benign behavior from clients via knowledge distillation to tackle stealthy adversaries that manipulate low data poisoning rates and diverse malicious client ratios within the federation. Through extensive experimentation, our approach demonstrates superior robustness compared to existing defenses across a wide range of learning configurations. Finally, we evaluate existing defenses and our method under the challenging setting of non-IID client data distribution and highlight the challenges of designing a resilient FL defense in this setting.
The increased integration of clean yet stochastic energy resources and the growing number of extreme weather events are narrowing the decision-making window of power grid operators. This time constraint is fueling a plethora of research on Machine Learning-, or ML-, based optimization proxies. While finding a fast solution is appealing, the inherent vulnerabilities of the learning-based methods are hindering their adoption. One of these vulnerabilities is data poisoning attacks, which adds perturbations to ML training data, leading to incorrect decisions. The impact of poisoning attacks on learning-based power system optimizers have not been thoroughly studied, which creates a critical vulnerability. In this paper, we examine the impact of data poisoning attacks on ML-based optimization proxies that are used to solve the DC Optimal Power Flow problem. Specifically, we compare the resilience of three different methods-a penalty-based method, a post-repair approach, and a direct mapping approach-against the adverse effects of poisoning attacks. We will use the optimality and feasibility of these proxies as performance metrics. The insights of this work will establish a foundation for enhancing the resilience of neural power system optimizers.
Current parameter-efficient fine-tuning methods for adapting pre-trained language models to downstream tasks are susceptible to interference from noisy data. Conventional noise-handling approaches either rely on laborious data pre-processing or employ model architecture modifications prone to error accumulation. In contrast to existing noise-process paradigms, we propose a noise-robust adaptation method via asymmetric LoRA poisoning experts (LoPE), a novel framework that enhances model robustness to noise only with generated noisy data. Drawing inspiration from the mixture-of-experts architecture, LoPE strategically integrates a dedicated poisoning expert in an asymmetric LoRA configuration. Through a two-stage paradigm, LoPE performs noise injection on the poisoning expert during fine-tuning to enhance its noise discrimination and processing ability. During inference, we selectively mask the dedicated poisoning expert to leverage purified knowledge acquired by normal experts for noise-robust output. Extensive experiments demonstrate that LoPE achieves strong performance and robustness purely through the low-cost noise injection, which completely eliminates the requirement of data cleaning.
Joel Ramanan da Cruz, Philippe Bulet, Cléria Mendonça de Moraes, PhD
The Amazon biome is home to many scorpion species, with around two hundred identified in the region. Of these, forty-eight species have been reported in Brazil so far and six of them are of medical importance: Tityus apiacas, T. metuendus, T. obscurus, T. raquelae, T. silvestris, and T. strandi. Three non-medically important species have also been studied: Opisthanthus cayaporum, Brotheas amazonicus and Rhopalurus laticauda. The venom of the scorpion T. obscurus is the most studied, followed by O. cayaporum. We aim to update the study of these Amazonian scorpion species. We will explore the harmful and beneficial properties of scorpion venom toxins and how they could be applied in drug development. This systematic review will focus on collecting and analyzing venoms from scorpions in Brazil. Only papers on Amazonian scorpion venom studies published between 2001 and 2021 (scientific articles, theses, and dissertations) were selected, based on the lists of scorpions available in the literature. Species found in the Amazon but not confirmed to be Brazilian were omitted from the review. Theses and dissertations were chosen over their derived articles. We found 42 eligible studies (13 theses, 27 articles and 2 patents) out of 17,950 studies and a basic statistical analysis was performed. The literature showed that T. obscurus was the most studied venom with 28 publications, followed by O. cayaporum with seven articles, B. amazonicus with four articles, T. metuendus with two article and R. laticauda with one article. No publication on the characterization of T. silvestris and T. apiacas venoms were found during the reviewed period, only the clinical aspects were covered. There is still much to be explored despite the increasing number of studies conducted in recent years. Amazonian scorpions have promising potential for pharmaceutical and clinical applications.
Federated Distillation (FD) is a novel and promising distributed machine learning paradigm, where knowledge distillation is leveraged to facilitate a more efficient and flexible cross-device knowledge transfer in federated learning. By optimizing local models with knowledge distillation, FD circumvents the necessity of uploading large-scale model parameters to the central server, simultaneously preserving the raw data on local clients. Despite the growing popularity of FD, there is a noticeable gap in previous works concerning the exploration of poisoning attacks within this framework. This can lead to a scant understanding of the vulnerabilities to potential adversarial actions. To this end, we introduce FDLA, a poisoning attack method tailored for FD. FDLA manipulates logit communications in FD, aiming to significantly degrade model performance on clients through misleading the discrimination of private samples. Through extensive simulation experiments across a variety of datasets, attack scenarios, and FD configurations, we demonstrate that LPA effectively compromises client model accuracy, outperforming established baseline algorithms in this regard. Our findings underscore the critical need for robust defense mechanisms in FD settings to mitigate such adversarial threats.
To make room for privacy and efficiency, the deployment of many recommender systems is experiencing a shift from central servers to personal devices, where the federated recommender systems (FedRecs) and decentralized collaborative recommender systems (DecRecs) are arguably the two most representative paradigms. While both leverage knowledge (e.g., gradients) sharing to facilitate learning local models, FedRecs rely on a central server to coordinate the optimization process, yet in DecRecs, the knowledge sharing directly happens between clients. Knowledge sharing also opens a backdoor for model poisoning attacks, where adversaries disguise themselves as benign clients and disseminate polluted knowledge to achieve malicious goals like promoting an item's exposure rate. Although research on such poisoning attacks provides valuable insights into finding security loopholes and corresponding countermeasures, existing attacks mostly focus on FedRecs, and are either inapplicable or ineffective for DecRecs. Compared with FedRecs where the tampered information can be universally distributed to all clients once uploaded to the cloud, each adversary in DecRecs can only communicate with neighbor clients of a small size, confining its impact to a limited range. To fill the gap, we present a novel attack method named Poisoning with Adaptive Malicious Neighbors (PAMN). With item promotion in top-K recommendation as the attack objective, PAMN effectively boosts target items' ranks with several adversaries that emulate benign clients and transfers adaptively crafted gradients conditioned on each adversary's neighbors. Moreover, with the vulnerabilities of DecRecs uncovered, a dedicated defensive mechanism based on user-level gradient clipping with sparsified updating is proposed. Extensive experiments demonstrate the effectiveness of the poisoning attack and the robustness of our defensive mechanism.
Graph analysis has become increasingly popular with the prevalence of big data and machine learning. Traditional graph data analysis methods often assume the existence of a trusted third party to collect and store the graph data, which does not align with real-world situations. To address this, some research has proposed utilizing Local Differential Privacy (LDP) to collect graph data or graph metrics (e.g., clustering coefficient). This line of research focuses on collecting two atomic graph metrics (the adjacency bit vectors and node degrees) from each node locally under LDP to synthesize an entire graph or generate graph metrics. However, they have not considered the security issues of LDP for graphs. In this paper, we bridge the gap by demonstrating that an attacker can inject fake users into LDP protocols for graphs and design data poisoning attacks to degrade the quality of graph metrics. In particular, we present three data poisoning attacks to LDP protocols for graphs. As a proof of concept, we focus on data poisoning attacks on two classical graph metrics: degree centrality and clustering coefficient. We further design two countermeasures for these data poisoning attacks. Experimental study on real-world datasets demonstrates that our attacks can largely degrade the quality of collected graph metrics, and the proposed countermeasures cannot effectively offset the effect, which calls for the development of new defenses.
Recent studies have shown that LLMs are vulnerable to denial-of-service (DoS) attacks, where adversarial inputs like spelling errors or non-semantic prompts trigger endless outputs without generating an [EOS] token. These attacks can potentially cause high latency and make LLM services inaccessible to other users or tasks. However, when there are speech-to-text interfaces (e.g., voice commands to a robot), executing such DoS attacks becomes challenging, as it is difficult to introduce spelling errors or non-semantic prompts through speech. A simple DoS attack in these scenarios would be to instruct the model to "Keep repeating Hello", but we observe that relying solely on natural instructions limits output length, which is bounded by the maximum length of the LLM's supervised finetuning (SFT) data. To overcome this limitation, we propose poisoning-based DoS (P-DoS) attacks for LLMs, demonstrating that injecting a single poisoned sample designed for DoS purposes can break the output length limit. For example, a poisoned sample can successfully attack GPT-4o and GPT-4o mini (via OpenAI's finetuning API) using less than $1, causing repeated outputs up to the maximum inference length (16K tokens, compared to 0.5K before poisoning). Additionally, we perform comprehensive ablation studies on open-source LLMs and extend our method to LLM agents, where attackers can control both the finetuning dataset and algorithm. Our findings underscore the urgent need for defenses against P-DoS attacks to secure LLMs. Our code is available at https://github.com/sail-sg/P-DoS.
As one kind of distributed machine learning technique, federated learning enables multiple clients to build a model across decentralized data collaboratively without explicitly aggregating the data. Due to its ability to break data silos, federated learning has received increasing attention in many fields, including finance, healthcare, and education. However, the invisibility of clients' training data and the local training process result in some security issues. Recently, many works have been proposed to research the security attacks and defenses in federated learning, but there has been no special survey on poisoning attacks on federated learning and the corresponding defenses. In this paper, we investigate the most advanced schemes of federated learning poisoning attacks and defenses and point out the future directions in these areas.
BackgroundKunming is a plateau city with sufficient sunshine, high ultraviolet intensity, and strong radiation. In recent years, ozone (O3) pollution has gradually become the primary problem of air pollution in the city.ObjectiveTo evaluate the health effects of atmospheric O3 exposure on non-accidental deaths in Kunming.MethodsThe data of meteorological variables (average temperature, average relative humidity, average air pressure, and average wind speed), air pollutants (PM2.5, PM10, SO2, CO, and O3) and non-accidental deaths (NAD) of residents were collected in Kunming from 2017 to 2019. A generalized additive model was adopted to conduct time-series analyses on the current-day (lag0), single-day (lag1-lag3), and cumulative lag (lag01-lag03) effects of O3 on NAD; furthermore, hierarchical analyses by gender, age, and season (warm and cold) were conducted.ResultsThe average concentration of O3-8h from 2017 to 2019 was (84.3±32.3) μg·m−3. For every 10 μg·m−3 increase in O3-8h concentration, the NAD risks of lag0, lag01, and lag02 of total population increased by 0.70% (95%CI: 0.11%-1.29%) 0.79% (95%CI: 0.14%-1.44%), and 0.75% (95%CI: 0.08%-1.43%), respectively; for women, the NAD risks of lag2 and lag02 increased by 0.80% (95%CI: 0.08%-1.53%) and 1.05% (95%CI: 0.09%-2.03%) respectively; for the residents over the age of 65, the associated NAD risks of lag0, lag01, and lag02 increased by 0.82% (95%CI: 0.16%-1.48%), 0.93% (95%CI: 0.20%-1.67%), and 0.96% (95%CI: 0.20%-1.73%), respectively; in the warm season, the NAD risks of lag0, lag01, and lag02 increased by 0.91% (95%CI: 0.12%-1.70%), 0.98% (95%CI: 0.12%-1.86%), and 1.00% (95%CI: 0.07%-1.93%), respectively; After introducing PM2.5, PM10, SO2, NO2, and CO to the model, the effects of O3 exposure level on resident’s NAD was not statistically significant.ConclusionAn increase of O3 exposure level associates with an increase of NAD risk in residents, and there is a lag effect. Residents over the age of 65, women, and all residents in warm season may be more sensitive to O3 exposure.
Anita Malhotra, Wolfgang Wüster, John Benjamin Owens
et al.
Snakebite incidence at least partly depends on the biology of the snakes involved. However, studies of snake biology have been largely neglected in favour of anthropic factors, with the exception of taxonomy, which has been recognised for some decades to affect the design of antivenoms. Despite this, within-species venom variation and the unpredictability of the correlation with antivenom cross-reactivity has continued to be problematic. Meanwhile, other aspects of snake biology, including behaviour, spatial ecology and activity patterns, distribution, and population demography, which can contribute to snakebite mitigation and prevention, remain underfunded and understudied. Here, we review the literature relevant to these aspects of snakebite and illustrate how demographic, spatial, and behavioural studies can improve our understanding of why snakebites occur and provide evidence for prevention strategies. We identify the large gaps that remain to be filled and urge that, in the future, data and relevant metadata be shared openly via public data repositories so that studies can be properly replicated and data used in future meta-analyses.
Contextual bandit algorithms have many applicants in a variety of scenarios. In order to develop trustworthy contextual bandit systems, understanding the impacts of various adversarial attacks on contextual bandit algorithms is essential. In this paper, we propose a new class of attacks: action poisoning attacks, where an adversary can change the action signal selected by the agent. We design action poisoning attack schemes against linear contextual bandit algorithms in both white-box and black-box settings. We further analyze the cost of the proposed attack strategies for a very popular and widely used bandit algorithm: LinUCB. We show that, in both white-box and black-box settings, the proposed attack schemes can force the LinUCB agent to pull a target arm very frequently by spending only logarithm cost.
Peru Bhardwaj, John Kelleher, Luca Costabello
et al.
We study the problem of generating data poisoning attacks against Knowledge Graph Embedding (KGE) models for the task of link prediction in knowledge graphs. To poison KGE models, we propose to exploit their inductive abilities which are captured through the relationship patterns like symmetry, inversion and composition in the knowledge graph. Specifically, to degrade the model's prediction confidence on target facts, we propose to improve the model's prediction confidence on a set of decoy facts. Thus, we craft adversarial additions that can improve the model's prediction confidence on decoy facts through different inference patterns. Our experiments demonstrate that the proposed poisoning attacks outperform state-of-art baselines on four KGE models for two publicly available datasets. We also find that the symmetry pattern based attacks generalize across all model-dataset combinations which indicates the sensitivity of KGE models to this pattern.
Akshay Mehra, Bhavya Kailkhura, Pin-Yu Chen
et al.
Unsupervised domain adaptation (UDA) enables cross-domain learning without target domain labels by transferring knowledge from a labeled source domain whose distribution differs from that of the target. However, UDA is not always successful and several accounts of `negative transfer' have been reported in the literature. In this work, we prove a simple lower bound on the target domain error that complements the existing upper bound. Our bound shows the insufficiency of minimizing source domain error and marginal distribution mismatch for a guaranteed reduction in the target domain error, due to the possible increase of induced labeling function mismatch. This insufficiency is further illustrated through simple distributions for which the same UDA approach succeeds, fails, and may succeed or fail with an equal chance. Motivated from this, we propose novel data poisoning attacks to fool UDA methods into learning representations that produce large target domain errors. We evaluate the effect of these attacks on popular UDA methods using benchmark datasets where they have been previously shown to be successful. Our results show that poisoning can significantly decrease the target domain accuracy, dropping it to almost 0% in some cases, with the addition of only 10% poisoned data in the source domain. The failure of these UDA methods demonstrates their limitations at guaranteeing cross-domain generalization consistent with our lower bound. Thus, evaluating UDA methods in adversarial settings such as data poisoning provides a better sense of their robustness to data distributions unfavorable for UDA.
Sanjay Seetharaman, Shubham Malaviya, Rosni KV
et al.
Data poisoning is a type of adversarial attack on training data where an attacker manipulates a fraction of data to degrade the performance of machine learning model. Therefore, applications that rely on external data-sources for training data are at a significantly higher risk. There are several known defensive mechanisms that can help in mitigating the threat from such attacks. For example, data sanitization is a popular defensive mechanism wherein the learner rejects those data points that are sufficiently far from the set of training instances. Prior work on data poisoning defense primarily focused on offline setting, wherein all the data is assumed to be available for analysis. Defensive measures for online learning, where data points arrive sequentially, have not garnered similar interest. In this work, we propose a defense mechanism to minimize the degradation caused by the poisoned training data on a learner's model in an online setup. Our proposed method utilizes an influence function which is a classic technique in robust statistics. Further, we supplement it with the existing data sanitization methods for filtering out some of the poisoned data points. We study the effectiveness of our defense mechanism on multiple datasets and across multiple attack strategies against an online learner.