Dennis G. Grubb, Dusty R. V. Berggren, Jyoti K. Chetri
This paper presents the results of a laboratory-based treatability study conducted for a confidential former wood treating site heavily impacted by a creosote non-aqueous-phase liquid (NAPL) containing pentachlorophenol (PCP). PCP impacts in the silty sands extended to approximately 33 ft (10 m) below the ground surface (bgs), with discrete soil samples containing PCP concentrations up to 14,500 mg/kg, and groundwater PCP concentrations forming a main plume exceeding 1 mg/L over 2.16 acres (0.87 ha). Treatability testing was performed on unspiked and NAPL-spiked site soils with total PCP concentrations ranging from 10 to 100 mg/kg, respectively, and leachable PCP concentrations of approximately 3 to 8 mg/L. Stabilization/solidification (S/S) mix designs using 5 to 10 weight percent (wt%, dry-reagent-to-wet-soil mass basis) of a Portland cement (PC) blend and 1 wt% powdered bentonite met the minimum unconfined compressive strength (UCS) and maximum hydraulic conductivity (K) performance criteria of 50 lb/in2 (345 kPa) and 1 × 10<sup>−6</sup> cm/s, respectively, within the specified 28-day cure time. Long-term semi-dynamic leach testing was performed on S/S-treated soils using a modified United States Environmental Protection Agency (EPA) Method 1315 test incorporating a polydimethylsiloxane (PDMS) liner to improve the data reliability for hydrocarbons. Results showed that adding 1 wt% organoclay (OC) to the S/S mix designs did not substantially reduce leaching of common semi-volatile organic compounds (SVOCs) such as naphthalene, acenaphthene, phenanthrene and benzo(a)anthracene compared to mixes using only the PC blend with bentonite, consistent with previous studies. However, the inclusion of OC had a decisive effect on PCP immobilization, providing an order-of-magnitude (10×) reduction in the cumulative mass release of PCP over the test duration. This benefit diminished with decreasing degree of chlorination for other phenolic compounds.
Jack FitzGerald, Dylan Bates, Aristotelis Lazaridis
et al.
Military Large Language Models (LLMs) must provide accurate information to the warfighter in time-critical and dangerous situations. However, today's LLMs are imbued with safety behaviors that cause the LLM to refuse many legitimate queries in the military domain, particularly those related to violence, terrorism, or military technology. Our gold benchmark for assessing refusal rates, which was developed by veterans of the US Army and special forces, is to our knowledge the first dataset of its kind. We present results for refusal and deflection rates on 31 public models and 3 military models. We observe hard rejection rates as high as 98.2% and soft deflection rates ranging from 0% to 21.3%. We also present results on two additional synthetic datasets and show their correlations with the gold dataset. Finally, we perform abliteration using the Heretic library on a military-tuned gpt-oss-20b model, showing an absolute increase in answer rate of 66.5 points but an average relative decrease of 2% on other military tasks. In our concluding remarks, we argue for deeper specialization, including with mid-training and end-to-end post-training, to achieve zero refusals and maximum military task accuracy for closed military models.
Nohemi López-Ramírez, Ernesto Favela-Torres, Tania Volke-Sepúlveda
et al.
Water hyacinth is an invasive weed that can valorize through the production of hydrolytic enzymes via solid-state culture. This study explores the application of <i>Trichoderma harzianum</i> in producing xylanases and endoglucanases on water hyacinth beds. Laboratory-scale packed-bed column bioreactors (PBCBs) with a capacity of 8 grams of dry mass (g<sub>dm</sub>) were used to evaluate the effects of temperature (28–36 °C) and initial moisture content (65–80%) on microbial growth and enzyme production. High yields of biomass and enzymes were produced at 30 °C. Moreover, xylanase activity was enhanced in cultures with a moisture content of 65% (~71.24 U/g<sub>dm</sub>), and endoglucanase activity at 75–80% moisture (~20.13 U/g<sub>dm</sub>). The operational conditions identified for xylanase production were applied to 6 L bench-scale cross-flow internally stirred bioreactors, packed to 40% capacity with 450 g<sub>dm</sub>. Two stirring regimes were tested: intermittent and continuous. The results showed that continuous stirring promotes both microbial growth and xylanase activity. In fact, xylanase activity in continuous stirring conditions was comparable to that achieved in PBCBs. Consequently, continuous stirring enables a 56-fold increase in bioreactor capacity without compromising xylanase production. The approaches developed in this study can support the design of large-scale bioprocesses for the valorization of water hyacinth.
Mona-Maria Narra, Essossinam Beguedou, Satyanarayana Narra
et al.
The cement industry faces increasing energy costs and environmental pressures, driving the adoption of alternative fuels derived from waste materials. In Togo, approximately 350,000 t of end-of-life tires (ELT) are generated annually, creating significant environmental and health hazards through uncontrolled disposal and burning practices. This study investigated the technical feasibility and economic viability of incorporating waste tires as an alternative fuel in cement manufacturing. Tire-derived fuel (TDF) performance was evaluated by comparing pre-processed industrial tires with unprocessed ones, focusing on clinker production loss, elemental composition, heating values, and bulk density. The results demonstrate that TDF exhibits superior performance characteristics, with the highest heating values, and meets all the required specifications for cement production. In contrast, whole tire incineration fails to satisfy the recommended criteria, necessitating blending with conventional fuels to maintain clinker quality and combustion efficiency. The investigation revealed no significant adverse effects on production processes or clinker quality while achieving substantial reductions in nitrogen and sulfur oxide emissions. The experimental results were compared with the theoretical burnout times to optimize the shredding operations and injection methods. However, several challenges remain unaddressed, including the absence of streamlined handling processes, limited understanding of long-term ecological and health impacts, and insufficient techno-economic assessments. Future research should prioritize identifying critical aging points, investigating self-rejuvenating behaviors, and quantifying long-term environmental implications. These findings provide a foundation for developing computational models to optimize the mixing ratios of alternative and fossil fuels in cement manufacturing, offering significant environmental, economic, and societal benefits for the cement industry.
This short video explores the potential story of slow hope by looking into how Cainiao, a global delivery company, runs its packaging recycling programme at a pick-up station in Shenzhen, China. Online shopping and efficient delivery logistics have redefined consumption habits and urban landscapes in contemporary China. In 2024, the number of packages moving around the country reached 100 billion, according to statistics from the National Post Office. During the Double Eleven Shopping Festival in November 2024 alone, the number of packages spiked to 700 million. Seeing the mountains of packaging waste produced on a daily basis, the Chinese delivery company Cainiao started a recycling programme called Green Box in 2016. To date, the award-winning Green Box programme has covered over 100,000 pick up stations in 315 cities in China. While many celebrate the company’s environmental and social governance – potentially a story of slow hope for positive environmental change (Mauch 2019), the programme’s operation specifics remain obscure. In this video, Duan Jiaqi, Qin Xiaoyi and Lu Ziyu show more perspectives from one of the pick-up points in Shenzhen. They found that the corporate responsibility programme has continued to rely on the informal network of urban waste collection by rural migrants, and its implementation is limited by various factors.
Municipal refuse. Solid wastes, Standardization. Simplification. Waste
Maximilian Julius Enengel, Tatjana Lasch, Lisa Kandlbauer
et al.
In processing mixed commercial waste (MCW), particle size distribution is as critical as material composition. Detailed knowledge of particle size distribution unlocks the recycling potential of specific material groups and facilitates the efficient conversion of these materials into secondary fuels. Additionally, understanding particle size-dependent element distribution in waste is crucial, particularly given potential legal limits on several heavy metals. While two studies carried out in 2019 have addressed these issues, the inherent variability in MCW composition necessitates further investigation to validate and expand upon these findings. In this study, ten representative samples of MCW were collected and screened with eight screen cuts (200 mm, 100 mm, 80 mm, 60 mm, 40 mm, 20 mm, 10 mm, 5 mm). Six of these fractions (>20 mm) were sorted into 37 material classes, combined again by particle size, and subjected to chemical analyses. These analyses included essential fuel parameters, such as lower heating value and biogenic carbon content, and the concentration of 35 elements across all particle size fractions. A Mann–Whitney U test was conducted to identify correlations in element concentrations between the present study and the study carried out in 2019. Although the results confirm considerable variability in MCW composition, they also reveal trends in element concentrations related to calorific value.
LLM safety and ethical alignment are widely discussed, but the impact of content moderation on user satisfaction remains underexplored. In particular, little is known about how users respond when models refuse to answer a prompt-one of the primary mechanisms used to enforce ethical boundaries in LLMs. We address this gap by analyzing nearly 50,000 model comparisons from Chatbot Arena, a platform where users indicate their preferred LLM response in pairwise matchups, providing a large-scale setting for studying real-world user preferences. Using a novel RoBERTa-based refusal classifier fine-tuned on a hand-labeled dataset, we distinguish between refusals due to ethical concerns and technical limitations. Our results reveal a substantial refusal penalty: ethical refusals yield significantly lower win rates than both technical refusals and standard responses, indicating that users are especially dissatisfied when models decline a task for ethical reasons. However, this penalty is not uniform. Refusals receive more favorable evaluations when the underlying prompt is highly sensitive (e.g., involving illegal content), and when the refusal is phrased in a detailed and contextually aligned manner. These findings underscore a core tension in LLM design: safety-aligned behaviors may conflict with user expectations, calling for more adaptive moderation strategies that account for context and presentation.
Aashiq Muhamed, Leonardo F. R. Ribeiro, Markus Dreyer
et al.
The ability of language models in RAG systems to selectively refuse to answer based on flawed context is critical for safety, yet remains a significant failure point. Our large-scale study reveals that even frontier models struggle in this setting, with refusal accuracy dropping below 50% on multi-document tasks, while exhibiting either dangerous overconfidence or overcaution. Static benchmarks fail to reliably evaluate this capability, as models exploit dataset-specific artifacts and memorize test instances. We introduce RefusalBench, a generative methodology that programmatically creates diagnostic test cases through controlled linguistic perturbation. Our framework employs 176 distinct perturbation strategies across six categories of informational uncertainty and three intensity levels. Evaluation of over 30 models uncovers systematic failure patterns: refusal comprises separable detection and categorization skills, and neither scale nor extended reasoning improves performance. We find that selective refusal is a trainable, alignment-sensitive capability, offering a clear path for improvement. We release two benchmarks -- RefusalBench-NQ (single document) and RefusalBench-GaRAGe (multi-document) -- and our complete generation framework to enable continued, dynamic evaluation of this critical capability.
Newfoundland and Labrador's municipalities face severe soft budget pressures due to narrow tax bases, high fixed service costs, and volatile resource revenues. We develop a Stackelberg style mechanism design model in which the province commits at t = 0 to an ex ante grant schedule and an ex post bailout rule. Municipalities privately observe their fiscal need type, choose effort, investment, and debt, and may receive bailouts when deficits exceed a statutory threshold. Under convexity and single crossing, the problem reduces to one dimensional screening and admits a tractable transfer mechanism with quadratic bailout costs and a statutory cap. The optimal ex ante rule is threshold-cap; under discretionary rescue at t = 2, it becomes threshold-linear-cap. A knife-edge inequality yields a self-consistent no bailout regime, and an explicit discount factor threshold renders hard budgets dynamically credible. We emphasize a class of monotone threshold signal rules; under this class, grant crowd out is null almost everywhere, which justifies the constant grant weight used in closed form expressions. The closed form characterization provides a policy template that maps to Newfoundland's institutions and clarifies the micro-data required for future calibration.
Instruction Fine-Tuning (IFT) has been widely adopted as an effective post-training strategy to enhance various abilities of Large Language Models (LLMs). However, prior studies have shown that IFT can significantly compromise LLMs' safety, particularly their ability to refuse malicious instructions, raising significant concerns. Recent research into the internal mechanisms of LLMs has identified the refusal direction (r-direction) in the hidden states, which plays a pivotal role in governing refusal behavior. Building on this insight, our study reveals that the r-direction tends to drift during training, which we identify as one of the causes of the associated safety risks. To mitigate such drift, our proposed ProCon method introduces a projection-constrained loss term that regularizes the projection magnitude of each training sample's hidden state onto the r-direction. Our initial analysis shows that applying an appropriate constraint can effectively mitigate the refusal direction drift and associated safety risks, but remains limited by overall performance barriers. To overcome this barrier, informed by our observation of early-stage sharp drift and a data-driven perspective, we introduce a warm-up strategy that emphasizes early-stage strong constraints and broaden the data distribution to strengthen constraint signals, leading to an enhanced ProCon method. Experimental results under various datasets, scenarios, and LLMs demonstrate that our method can significantly mitigate safety risks posed by IFT while preserving task performance gains. Even compared with strong baselines, our method consistently delivers superior overall performance. Crucially, our analysis indicates that ProCon can contribute to stabilizing the r-direction during training, while such an interpretability-driven exploration of LLMs' internal mechanisms lays a solid foundation for future safety research.
Leheng Sheng, Changshuo Shen, Weixiang Zhao
et al.
As LLMs are increasingly deployed in real-world applications, ensuring their ability to refuse malicious prompts, especially jailbreak attacks, is essential for safe and reliable use. Recently, activation steering has emerged as an effective approach for enhancing LLM safety by adding a refusal direction vector to internal activations of LLMs during inference, which will further induce the refusal behaviors of LLMs. However, indiscriminately applying activation steering fundamentally suffers from the trade-off between safety and utility, since the same steering vector can also lead to over-refusal and degraded performance on benign prompts. Although prior efforts, such as vector calibration and conditional steering, have attempted to mitigate this trade-off, their lack of theoretical grounding limits their robustness and effectiveness. To better address the trade-off between safety and utility, we present a theoretically grounded and empirically effective activation steering method called AlphaSteer. Specifically, it considers activation steering as a learnable process with two principled learning objectives: utility preservation and safety enhancement. For utility preservation, it learns to construct a nearly zero vector for steering benign data, with the null-space constraints. For safety enhancement, it learns to construct a refusal direction vector for steering malicious data, with the help of linear regression. Experiments across multiple jailbreak attacks and utility benchmarks demonstrate the effectiveness of AlphaSteer, which significantly improves the safety of LLMs without compromising general capabilities. Our codes are available at https://github.com/AlphaLab-USTC/AlphaSteer.
As large language models (LLMs) are increasingly deployed in high-stakes settings, their ability to refuse ethically sensitive prompts-such as those involving hate speech or illegal activities-has become central to content moderation and responsible AI practices. While refusal responses can be viewed as evidence of ethical alignment and safety-conscious behavior, recent research suggests that users may perceive them negatively. At the same time, automated assessments of model outputs are playing a growing role in both evaluation and training. In particular, LLM-as-a-Judge frameworks-in which one model is used to evaluate the output of another-are now widely adopted to guide benchmarking and fine-tuning. This paper examines whether such model-based evaluators assess refusal responses differently than human users. Drawing on data from Chatbot Arena and judgments from two AI judges (GPT-4o and Llama 3 70B), we compare how different types of refusals are rated. We distinguish ethical refusals, which explicitly cite safety or normative concerns (e.g., "I can't help with that because it may be harmful"), and technical refusals, which reflect system limitations (e.g., "I can't answer because I lack real-time data"). We find that LLM-as-a-Judge systems evaluate ethical refusals significantly more favorably than human users, a divergence not observed for technical refusals. We refer to this divergence as a moderation bias-a systematic tendency for model-based evaluators to reward refusal behaviors more than human users do. This raises broader questions about transparency, value alignment, and the normative assumptions embedded in automated evaluation systems.
Non-emergency municipal services such as city 311 systems have been widely implemented across cities in Canada and the United States to enhance residents' quality of life. These systems enable residents to report issues, e.g., noise complaints, missed garbage collection, and potholes, via phone calls, mobile applications, or webpages. However, residents are often given limited information about when their service requests will be addressed, which can reduce transparency, lower resident satisfaction, and increase the number of follow-up inquiries. Predicting the service time for municipal service requests is challenging due to several complex factors: dynamic spatial-temporal correlations, underlying interactions among heterogeneous service request types, and high variation in service duration even within the same request category. In this work, we propose MuST2-Learn: a Multi-view Spatial-Temporal-Type Learning framework designed to address the aforementioned challenges by jointly modeling spatial, temporal, and service type dimensions. In detail, it incorporates an inter-type encoder to capture relationships among heterogeneous service request types and an intra-type variation encoder to model service time variation within homogeneous types. In addition, a spatiotemporal encoder is integrated to capture spatial and temporal correlations in each request type. The proposed framework is evaluated with extensive experiments using two real-world datasets. The results show that MuST2-Learn reduces mean absolute error by at least 32.5%, which outperforms state-of-the-art methods.
Yuan Yuan, Tina Sriskandarajah, Anna-Luisa Brakman
et al.
Large Language Models used in ChatGPT have traditionally been trained to learn a refusal boundary: depending on the user's intent, the model is taught to either fully comply or outright refuse. While this is a strong mitigation for explicitly malicious prompts, focusing safety training on refusals can lead to brittleness for prompts with obscured user intent. Binary refusal boundaries are especially ill-suited for dual-use cases (such as biology or cybersecurity), where a user request can be answered safely at a high level, but in some cases can lead to malicious uplift if sufficiently detailed or actionable. As an alternative, we propose safe-completions: a safety-training approach that centers on the safety of the assistant's output, rather than a binary classification of the user's intent. Safe-completions seek to maximize helpfulness within the safety policy's constraints. We incorporated this approach into GPT-5 and find that across both production comparisons and internally controlled experiments, safe-completion training improves safety (especially on dual-use prompts), reduces the severity of residual safety failures, and substantially increases model helpfulness.
Indigenous knowledge systems related to solid waste management in economically marginalized communities have been largely overlooked in the scientific literature, even though the indigenous communities of developing nations struggling to manage solid waste rely on these practices. It is startling that indigenous solid waste management practices are scarcely documented in the scientific literature despite their position as potential alternative disposal methods. This gap persists amid limited municipal budgets, inadequate waste collection services, and poor infrastructure in economically marginalized indigenous rural communities in developing nations. Subsequently, in the discipline of solid waste management, this obstacle impedes the recognition and inclusion of indigenous waste management practices into integrated waste management plans. As a result, this causes a delay in their progress or elevation to the same level of credibility as mainstream scientific knowledge. In the process, this relegates the waste management practices of indigenous communities to the background. Against this background, the current study sought to investigate the indigenous solid waste management practices of rural communities in Bushbuckridge Local Municipality. As such, ten cases that captured the spatial cultural diversity of indigenous communities’ practices across Bushbuckridge Local Municipality (BLM) were selected for sampling. Data were collected using ethnographic research methods. Data analysis was carried out using the thematic analysis approach. Inductive logic was used in the interpretation of the current study results. The results of the current study indicate that indigenous communities of Bushbuckridge Local Municipality, in the absence of formal waste management services from the local authority, resort to an indigenous knowledge system to manage solid waste. Waste burning (100%), open-air dumping (100%), and backyard pits (90%) are some of the indigenous waste management practices espoused by the rural communities of BLM. The similarity in practices was corroborated by statistical inferences that revealed that between BLM communities, the amount of indigenous waste management practices is not significant (<i>p</i> > 0.05). However, there are concerns that despite the sustainability aspect associated with recycling (<25%) practices, these disposal methods are not common in the rural communities of BLM. This is a setback for an indigenous knowledge system that is supposed to advance environmental sustainability practices.
The development of efficient recovery methods for waste printed circuit boards (WPCBs) not only tackles the environmental risks of disposal but also promotes the conservation of resources within the electronics industry. This study proposes a two-step leaching approach for recovering metals from WPCBs. Initially, transition metals are leached using nitric acid, followed by the recovery of precious metals with thiourea in the second stage. In the first stage, dissolution rates exceeding 90 wt% were achieved for transition metals, including Cu, Fe, Ni, Pb, and Sn. In this stage, the dissolution of precious metals (i.e., Au and Pd) was insignificant. In the second stage, the effect of four parameters was investigated, including the impact of temperature, concentrations of ferric ions, sulfuric media, and thiourea on the recovery of Au and Pd. Precise control over sulfate concentration played a vital role in achieving maximum Au recovery. The optimal acid concentration was 0.2 M, resulting in a recovery rate of ~50 wt%. Ferric ion concentration positively affects Au recovery, whereas, in extracting Pd, optimal conditions imposed the absence of ferric ions. Thiourea concentration positively impacted Au and Pd recovery rates, peaking at 49 wt% for Au at 1 M and 44 wt% for Pd at 1.5 M. Prolonged leaching resulted in declining Au recovery rates, indicating a decrease in reagent concentration. Temperature variation yielded similar outcomes, with 50 °C resulting in peak recovery rates of 53 wt% for Au and 54 wt% for Pd. Metal dissolution kinetics during leaching were analyzed using pseudo-first-order and pseudo-second-order models. The second-order model proved suitable for transition metals in the first stage, while only for Au and Pd in the second stage (with R<sup>2</sup> = 0.99).
Food waste has emerged as a pressing concern, and thus advanced techniques to valorize food waste into nutrition rich materials as well as renewable energy are highly important. The exceptional biodegradability of food waste renders it a highly suitable substrate for anaerobic treatment. This leads to energy production and a reduction in the carbon footprint. Nevertheless, in frigid territories like Canada, the conventional mesophilic anaerobic digestion at 30–40 °C can require substantial amounts of energy. Consequently, this study introduces a new approach to treat food waste at psychrophilic temperatures (1–20 °C). Lower temperatures can negatively impact cellular processes during anaerobic treatment, rendering substrates less accessible to microscopic organisms. To address this challenge associated with lower temperatures, the study introduces an innovative biogas recirculation strategy. The primary objectives of this study are to assess the viability of anaerobic treatment for food waste at psychrophilic temperatures and to investigate the effectiveness of reintroduction of the produced biogas to the anaerobic system in enhancing biomethane generation and stability of the system. Batch experiments were conducted on food waste in various assessments, both with and without biogas recirculation. The outcomes revealed a methane concentration ranging from 68% to 93% when biogas recirculation was employed, whereas without this technique, methane concentration varied between 10% and 45%. Moreover, with biogas recirculation, the reduction in volatile solids reached a maximum of 92%, and there was an 82% decrease in chemical oxygen demand. In conclusion, the utilization of the recirculation of biogas at the psychrophilic temperature range enhanced biomethane production and reduction of volatile solids and chemical oxygen demand. This study underscores the potential of employing anaerobic treatment with reintroduction of produced biogas into the system in cold regions as an economically viable and sustainable choice for treating food waste with nominal energy consumption.
Safety-aligned large language models (LLMs) sometimes falsely refuse pseudo-harmful prompts, like "how to kill a mosquito," which are actually harmless. Frequent false refusals not only frustrate users but also provoke a public backlash against the very values alignment seeks to protect. In this paper, we propose the first method to auto-generate diverse, content-controlled, and model-dependent pseudo-harmful prompts. Using this method, we construct an evaluation dataset called PHTest, which is ten times larger than existing datasets, covers more false refusal patterns, and separately labels controversial prompts. We evaluate 20 LLMs on PHTest, uncovering new insights due to its scale and labeling. Our findings reveal a trade-off between minimizing false refusals and improving safety against jailbreak attacks. Moreover, we show that many jailbreak defenses significantly increase the false refusal rates, thereby undermining usability. Our method and dataset can help developers evaluate and fine-tune safer and more usable LLMs. Our code and dataset are available at https://github.com/umd-huang-lab/FalseRefusal
Background: Open Source Software (OSS) is often seen as an option to mitigate risks of lock-ins. Yet, single-vendor OSS can still result in soft lock-ins due to knowledge asymmetries and technical barriers. Aim: This study explores actors that render such soft lock-ins. Research design: We conduct a qualitative case study of an E-service Platform (ESP) used by over 190+ municipalities. Results: User-driven lock-in factors emerged as a significant category, including limited and non-transparent communication, restrictive qualification requirements in procurement, confusion on maintainership, and comfort in the status quo. Technical lock-in factors include inadequate documentation, dependency management issues, and limited test coverage. Conclusions: Strong leadership and continuous training is needed to address presence of comfort and conservative culture among municipalities. Open Source Stewards, i.e., neutral hosts for OSS projects, can support municipalities in these tasks while also helping to foster an open, competitive collaboration that can enable a broader supplier ecosystem.