Abstract This review on current US municipal solid waste-to-energy trends highlighted regional contrasts on technology adoption, unique challenges of each technology, commonly used decision support tools, and major operators. In US only 13% of MSW is used for energy recovery and 53% is landfilled. There are 86 WTE facilities that mostly use Mass-Burn and Refuse-Derived Fuel technologies and are concentrated in densely populated northeast (predominantly in New York) and the State of Florida. For the rest of the country most of the MSW ends up in landfills equipped with gas recovery, which is supplied to homes or used for electricity generation. However, there are many pilot and experimental systems based on advanced gasification and pyrolysis processes, which are viewed as potential technologies to respond to an issue of landfills nearing full capacity in various US states. These systems are viewed as “cleaner” (65% less toxic residue) than established mass burn technologies but not matured to commercialization due technical and cost hurdles. Operation and maintenance costs between $40-$100 per ton of MSW were reported for gasification systems. The heterogeneous nature of MSW, gas cleaning and air pollution controls are the main disadvantages. Key design and decision support tools used by the scientific community and major operators in US include: Techno-economic analysis, Life cycle sustainability assessment, and Reverse logistics modeling. A conclusion drawn from reviewed studies is that adoption of thermal WTE technologies in US could continue to increase, albeit slowly, in coastal and urban areas lacking suitable lands for new landfills.
Modern deployments require LLMs to enforce safety policies at scale, yet many controls rely on inference-time interventions that add recurring compute cost and serving complexity. Activation steering is widely used, but it requires runtime hooks and scales cost with the number of generations; conditional variants improve selectivity by gating when steering is applied but still retain an inference-time control path. We ask whether selective refusal can be moved entirely offline: can a mechanistic understanding of category-specific refusal be distilled into a circuit-restricted weight update that deploys as a standard checkpoint? We propose C-Δθ: Circuit Restricted Weight Arithmetic, which (i) localizes refusal-causal computation as a sparse circuit using EAP-IG and (ii) computes a constrained weight update ΔθC supported only on that circuit (typically <5% of parameters). Applying ΔθC yields a drop-in edited checkpoint with no inference-time hooks, shifting cost from per-request intervention to a one-time offline update. We evaluate category-targeted selectivity and capability retention on refusal and utility benchmarks.
Inasmuch as the removal of refusal behavior from instruction-tuned language models by directional abliteration requires the extraction of refusal-mediating directions from the residual stream activation space, and inasmuch as the construction of the contrast baseline against which harmful prompt activations are compared has been treated in the existing literature as an implementation detail rather than a methodological concern, the present work investigates whether a topically matched contrast baseline yields superior refusal directions. The investigation is carried out on the Qwen~3.5 2B model using per-category matched prompt pairs, per-class Self-Organizing Map extraction, and Singular Value Decomposition orthogonalization. It was found that topic-matched contrast produces no functional refusal directions at any tested weight level on any tested layer, while unmatched contrast on the same model, same extraction code, and same evaluation protocol achieves complete refusal elimination on six layers. The geometric analysis of the failure establishes that topic-matched subtraction cancels the dominant activation component shared between harmful and harmless prompts of the same subject, reducing the extracted direction magnitude below the threshold at which weight-matrix projection perturbs the residual stream. The implications for the design of contrast baselines in abliteration research are discussed.
Agentic language models operate in a fundamentally different safety regime than chat models: they must plan, call tools, and execute long-horizon actions where a single misstep, such as accessing files or entering credentials, can cause irreversible harm. Existing alignment methods, largely optimized for static generation and task completion, break down in these settings due to sequential decision-making, adversarial tool feedback, and overconfident intermediate reasoning. We introduce MOSAIC, a post-training framework that aligns agents for safe multi-step tool use by making safety decisions explicit and learnable. MOSAIC structures inference as a plan, check, then act or refuse loop, with explicit safety reasoning and refusal as first-class actions. To train without trajectory-level labels, we use preference-based reinforcement learning with pairwise trajectory comparisons, which captures safety distinctions often missed by scalar rewards. We evaluate MOSAIC zero-shot across three model families, Qwen2.5-7B, Qwen3-4B-Thinking, and Phi-4, and across out-of-distribution benchmarks spanning harmful tasks, prompt injection, benign tool use, and cross-domain privacy leakage. MOSAIC reduces harmful behavior by up to 50%, increases harmful-task refusal by over 20% on injection attacks, cuts privacy leakage, and preserves or improves benign task performance, demonstrating robust generalization across models, domains, and agentic settings.
Co-landfill of incineration slag and municipal solid waste (MSW) is a main method for disposal of slag, and it has the potential of promoting methane (CH4) production and accelerating landfill stabilization. Four simulated MSW landfill columns loaded with different amount of slag (A, 0%; B, 5%; C, 10%; D, 20%) were established, and the CH4 production characteristics and methanogenic mechanisms were investigated. The maximum CH4 concentration in columns A, B, C and D was 10.8%, 23.3%, 36.3% and 34.3%, respectively. Leachate pH and refuse pH were positively correlated with CH4 concentration. Methanosarcina was the dominant genus with abundance of 35.1%∼75.2% and it was positively correlated with CH4 concentration. CO2-reducing and acetoclastic methanogenesis were the main types of methanogenesis pathway, and the methanogenesis functional abundance increased with slag proportion during stable methanogenesis process. This research can help understanding the impact of slag on CH4 production characteristics and microbiological mechanisms in landfills.
Khadija Sarquah, Satyanarayana Narra, Gesa Beck
et al.
Refuse derived fuel (RDF) production enables the utilisation of municipal solid waste (MSW) as a substitute fuel for industrial applications. This contributes to reducing the challenges of MSW management and associated GHG emissions by substituting conventional fuel. However, RDF quality characteristics rely on the production process and composition, contributing to market value for RDF utilisation. In this study, RDF production from MSW and utilisation potentials were investigated through a case study at a waste-to-energy system in Kumasi, Ghana. The study consisted of field and laboratory experimentation, survey and statistical analysis to assess RDF physicochemical properties and usability options for thermal energy application. The results classify the RDF produced under NCV: II-IV, Cl: II and Hg: I, according to the EN 15359:2011 classification. An average of 14-22 MJ/kg of lower heating values recorded was within the limits for RDF thermal application. Among the potential RDF users surveyed showed positive interest in RDF utilisation as a substitute fuel. However, the outcomes suggest that RDF adoption is highly sensitive to cost concerns, perceived operational barriers, and environmental considerations. Awareness, regulations, and stakeholder support are important in improving perspectives on RDF adoption as an alternative fuel. The results establish opportunities for RDF as an industrial alternative fuel. Also, a contribution to knowledge of the demand-side factors affecting RDF utilisation, especially in Ghana and other emerging economies.
Refuse-derived fuel (RDF) produced from the processing of municipal solid waste (MSW) has a high content of biomass and plastics. Pyrolysis of RDF produces a bio-oil which is highly oxygenated, viscous, acidic with a high moisture content and unsuitable for direct use in conventional combustion systems and consequently requires upgrading. A novel process of pyrolysis with non-thermal plasma/catalysis has been developed to produce de-oxygenated bio-oils and gases from RDF. The volatiles from the pyrolysis stage are passed directly to a non-thermal plasma/catalytic reactor where upgrading of the pyrolysis volatiles takes place. Detailed analysis of the product oils and gases is presented in relation to process conditions and in the presence of different catalysts (TiO₂, MCM-41, ZSM-5, and Al₂O₃). Even in the absence of a catalyst, the presence of the non-thermal plasma resulted in high yields of CO and CO₂ gases and reduced bio-oil oxygen content, confirming deoxygenation of the RDF pyrolysis volatiles. The addition of catalysts MCM-41 and ZSM-5 generated the highest yields of CO, CO₂, and H₂ due to the synergy between catalyst and plasma. The catalysts ranked in terms of total oxygenated oil yield are as follows: MCM-41 < ZSM-5 < TiO₂ < Al₂O₃. Pyrolysis of RDF produces an oil containing oxygenated species from biomass and hydrocarbon species from plastics. The non-thermal plasma generates high energy electrons which generate radicals and intermediates from the pyrolysis volatiles which synergistically interact with the catalysts to enable deoxygenation of the oxygenated hydrocarbons through decarboxylation and decarbonylation reactions.
Text-to-Image (T2I) models have achieved remarkable success in generating visual content from text inputs. Although multiple safety alignment strategies have been proposed to prevent harmful outputs, they often lead to overly cautious behavior -- rejecting even benign prompts -- a phenomenon known as $\textit{over-refusal}$ that reduces the practical utility of T2I models. Despite over-refusal having been observed in practice, there is no large-scale benchmark that systematically evaluates this phenomenon for T2I models. In this paper, we present an automatic workflow to construct synthetic evaluation data, resulting in OVERT ($\textbf{OVE}$r-$\textbf{R}$efusal evaluation on $\textbf{T}$ext-to-image models), the first large-scale benchmark for assessing over-refusal behaviors in T2I models. OVERT includes 4,600 seemingly harmful but benign prompts across nine safety-related categories, along with 1,785 genuinely harmful prompts (OVERT-unsafe) to evaluate the safety-utility trade-off. Using OVERT, we evaluate several leading T2I models and find that over-refusal is a widespread issue across various categories (Figure 1), underscoring the need for further research to enhance the safety alignment of T2I models without compromising their functionality. As a preliminary attempt to reduce over-refusal, we explore prompt rewriting; however, we find it often compromises faithfulness to the meaning of the original prompts. Finally, we demonstrate the flexibility of our generation framework in accommodating diverse safety requirements by generating customized evaluation data adapting to user-defined policies.
Large Language Models (LLMs) increasingly exhibit over-refusal - erroneously rejecting benign queries due to overly conservative safety measures - a critical functional flaw that undermines their reliability and usability. Current methods for testing this behavior are demonstrably inadequate, suffering from flawed benchmarks and limited test generation capabilities, as highlighted by our empirical user study. To the best of our knowledge, this paper introduces the first evolutionary testing framework, ORFuzz, for the systematic detection and analysis of LLM over-refusals. ORFuzz uniquely integrates three core components: (1) safety category-aware seed selection for comprehensive test coverage, (2) adaptive mutator optimization using reasoning LLMs to generate effective test cases, and (3) OR-Judge, a human-aligned judge model validated to accurately reflect user perception of toxicity and refusal. Our extensive evaluations demonstrate that ORFuzz generates diverse, validated over-refusal instances at a rate (6.98% average) more than double that of leading baselines, effectively uncovering vulnerabilities. Furthermore, ORFuzz's outputs form the basis of ORFuzzSet, a new benchmark of 1,855 highly transferable test cases that achieves a superior 63.56% average over-refusal rate across 10 diverse LLMs, significantly outperforming existing datasets. ORFuzz and ORFuzzSet provide a robust automated testing framework and a valuable community resource, paving the way for developing more reliable and trustworthy LLM-based software systems.
Large language models demonstrate powerful capabilities across various natural language processing tasks, yet they also harbor safety vulnerabilities. To enhance LLM safety, various jailbreak defense methods have been proposed to guard against harmful outputs. However, improvements in model safety often come at the cost of severe over-refusal, failing to strike a good balance between safety and usability. In this paper, we first analyze the causes of over-refusal from a representation perspective, revealing that over-refusal samples reside at the boundary between benign and malicious samples. Based on this, we propose MOSR, designed to mitigate over-refusal by intervening the safety representation of LLMs. MOSR incorporates two novel components: (1) Overlap-Aware Loss Weighting, which determines the erasure weight for malicious samples by quantifying their similarity to pseudo-malicious samples in the representation space, and (2) Context-Aware Augmentation, which supplements the necessary context for rejection decisions by adding harmful prefixes before rejection responses. Experiments demonstrate that our method outperforms existing approaches in mitigating over-refusal while largely maintaining safety. Overall, we advocate that future defense methods should strike a better balance between safety and over-refusal.
As vision-language models (VLMs) become increasingly capable, maintaining a balance between safety and usefulness remains a central challenge. Safety mechanisms, while essential, can backfire, causing over-refusal, where models decline benign requests out of excessive caution. Yet, there is currently a significant lack of benchmarks that have systematically addressed over-refusal in the visual modality. This setting introduces unique challenges, such as dual-use cases where an instruction is harmless, but the accompanying image contains harmful content. Models frequently fail in such scenarios, either refusing too conservatively or completing tasks unsafely, which highlights the need for more fine-grained alignment. The ideal behaviour is safe completion, i.e., fulfilling the benign parts of a request while explicitly warning about any potentially harmful elements. To address this, we present DUAL-Bench, a large scale multimodal benchmark focused on over-refusal and safe completion in VLMs. We evaluated 18 VLMs across 12 hazard categories under semantics-preserving visual perturbations. In dual-use scenarios, models exhibit extremely fragile safety boundaries. They fall into a binary trap: either overly sensitive direct refusal or defenseless generation of dangerous content. Consequently, even the best-performing model GPT-5-Nano, at just 12.9% safe completion, with GPT-5 and Qwen families averaging 7.9% and 3.9%. We hope DUAL-Bench fosters nuanced alignment strategies balancing multimodal safety and utility. Content Warning: This paper contains examples of sensitive and potentially hazardous content.
Ajesh Thangaraj Nadar, Gabriel Nixon Raj, Soham Chandane
et al.
The increasing proliferation of electronic devices in the modern era has led to a significant surge in electronic waste (e-waste). Improper disposal and insufficient recycling of e-waste pose serious environmental and health risks. This paper proposes an IoT-enabled system combined with a lightweight CNN-based classification pipeline to enhance the identification, categorization, and routing of e-waste materials. By integrating a camera system and a digital weighing scale, the framework automates the classification of electronic items based on visual and weight-based attributes. The system demonstrates how real-time detection of e-waste components such as circuit boards, sensors, and wires can facilitate smart recycling workflows and improve overall waste processing efficiency.
Large Audio-Language Models (LALMs) are becoming essential as a powerful multimodal backbone for real-world applications. However, recent studies show that audio inputs can more easily elicit harmful responses than text, exposing new risks toward deployment. While safety alignment has made initial advances in LLMs and Large Vision-Language Models (LVLMs), we find that vanilla adaptation of these approaches to LALMs faces two key limitations: 1) LLM-based steering fails under audio input due to the large distributional gap between activations, and 2) prompt-based defenses induce over-refusals on benign-speech queries. To address these challenges, we propose Safe-Ablated Refusal Steering (SARSteer), the first inference-time defense framework for LALMs. Specifically, SARSteer leverages text-derived refusal steering to enforce rejection without manipulating audio inputs and introduces decomposed safe-space ablation to mitigate over-refusal. Extensive experiments demonstrate that SARSteer significantly improves harmful-query refusal while preserving benign responses, establishing a principled step toward safety alignment in LALMs.
The global waste crisis is escalating, with solid waste generation expected to increase tremendously in the coming years. Traditional waste collection methods, particularly in remote or harsh environments like deserts, are labor-intensive, inefficient, and often hazardous. Recent advances in computer vision and deep learning have opened the door to automated waste detection systems, yet most research focuses on urban environments and recyclable materials, overlooking organic and hazardous waste and underexplored terrains such as deserts. In this work, we propose YOLO-SAT, an enhanced real-time object detection framework based on a pruned, lightweight version of YOLOv12 integrated with Self-Adversarial Training (SAT) and specialized data augmentation strategies. Using the DroneTrashNet dataset, we demonstrate significant improvements in precision, recall, and mean average precision (mAP), while achieving low latency and compact model size suitable for deployment on resource-constrained aerial drones. Benchmarking YOLO-SAT against state-of-the-art lightweight YOLO variants further highlights its optimal balance of accuracy and efficiency. Our results validate the effectiveness of combining data-centric and model-centric enhancements for robust, real-time waste detection in desert environments.
Role-Playing Agents (RPAs) have shown remarkable performance in various applications, yet they often struggle to recognize and appropriately respond to hard queries that conflict with their role-play knowledge. To investigate RPAs' performance when faced with different types of conflicting requests, we develop an evaluation benchmark that includes contextual knowledge conflicting requests, parametric knowledge conflicting requests, and non-conflicting requests to assess RPAs' ability to identify conflicts and refuse to answer appropriately without over-refusing. Through extensive evaluation, we find that most RPAs behave significant performance gaps toward different conflict requests. To elucidate the reasons, we conduct an in-depth representation-level analysis of RPAs under various conflict scenarios. Our findings reveal the existence of rejection regions and direct response regions within the model's forwarding representation, and thus influence the RPA's final response behavior. Therefore, we introduce a lightweight representation editing approach that conveniently shifts conflicting requests to the rejection region, thereby enhancing the model's refusal accuracy. The experimental results validate the effectiveness of our editing method, improving RPAs' refusal ability of conflicting requests while maintaining their general role-playing capabilities.
Plastic waste entering the environment through landfilling or improper disposal poses substantial risks to ecosystems and human health. Photoreforming is emerging as a clean photocatalytic technology that degrades plastic waste to organic compounds while simultaneously producing hydrogen fuel. This study introduces high-pressure torsion (HPT), a severe plastic deformation (SPD) method, as an innovative technique to enhance the photoreforming of polypropylene (PP) plastic mixed with a brookite TiO2 photocatalyst. Hydrogen production systematically increases with the number of HPT turns, accompanied by the formation of valuable small organic molecules. The enhancement in photocatalytic activity is attributed to strain-induced defect formation in both catalysts and plastics, as well as the creation of catalyst/plastic interphases that enhance charge carrier transport between inorganic and organic phases. These findings reveal a new functional application for SPD in energy conversion and sustainability.
Julio Castaño-Amorós, Ignacio de Loyola Páez-Ubieta, Pablo Gil
et al.
This work presents a perception system applied to robotic manipulation, that is able to assist in navigation, household waste classification and collection in outdoor environments. This system is made up of optical tactile sensors, RGBD cameras and a LiDAR. These sensors are integrated on a mobile platform with a robot manipulator and a robotic gripper. Our system is divided in three software modules, two of them are vision-based and the last one is tactile-based. The vision-based modules use CNNs to localize and recognize solid household waste, together with the grasping points estimation. The tactile-based module, which also uses CNNs and image processing, adjusts the gripper opening to control the grasping from touch data. Our proposal achieves localization errors around 6 %, a recognition accuracy of 98% and ensures the grasping stability the 91% of the attempts. The sum of runtimes of the three modules is less than 750 ms.
Plastisphere plays crucial role in global carbon and nitrogen cycles and microplastics formation. Global Municipal Solid Waste (MSW) landfills contain 42 % plastic waste, therefore representing one of the most significant plastispheres. MSW landfills are also the third largest anthropogenic methane sources and the important anthropogenic N2O source. Surprisingly, knowledge of microbiota and the associated microbial carbon and nitrogen cycles of landfill plastispheres is very limited. In this study, we characterized and compared the organic chemicals profile, bacterial community structure and metabolic pathway on plastisphere and the surrounding refuse in a large-scale landfill using GC/MS and 16S rRNA genes high-throughput sequencing, respectively. Landfill plastisphere and the surrounding refuse differed in organic chemicals composition. However, abundant phthalate-like chemicals were determined in both environments, implying the plastics additives leaching. Bacterial colonizing on the plastics surface had significantly higher richness than that in the surrounding refuse. Plastic surface and the surrounding refuse had distinct bacterial community composition. Genera of Sporosarcina, Oceanobacillus and Pelagibacterium were detected on the plastic surface with high abundance, while Ignatzschineria, Paenalcaligenes and Oblitimonas were rich in the surrounding refuse. Typical plastics biodegradation genus Bacillus, Pseudomonas and Paenibacillus were detected in both environments. However, Pseudomonas was dominant in plastic surface (up to 88.73 %), whereas Bacillus was rich in the surrounding refuse (up to 45.19 %). For the carbon and nitrogen cycle, plastisphere was predicted to had significant (P < 0.05) higher functional genes involved in carbon metabolism and nitrification, indicating more activated carbon and nitrogen microbial activity on the plastics surface. Additionally, pH was the main driver in shaping the bacterial community composition on plastic surface. These results indicate that landfill plastispheres serve as unique niches for microbial community habitation and function on microbial carbon and nitrogen cycles. These observations invite further study of the landfill plastispheres ecological effect.