PRISM: Parametrically Refactoring Inference for Speculative Sampling Draft Models
Xuliang Wang, Yuetao Chen, Maochan Zhen
et al.
Large Language Models (LLMs), constrained by their auto-regressive nature, suffer from slow decoding. Speculative decoding methods have emerged as a promising solution to accelerate LLM decoding, attracting attention from both systems and AI research communities. Recently, the pursuit of better draft quality has driven a trend toward parametrically larger draft models, which inevitably introduces substantial computational overhead. While existing work attempts to balance the trade-off between prediction accuracy and compute latency, we address this fundamental dilemma through architectural innovation. We propose PRISM, which disaggregates the computation of each predictive step across different parameter sets, refactoring the computational pathways of draft models to successfully decouple model capacity from inference cost. Through extensive experiments, we demonstrate that PRISM outperforms all existing draft architectures, achieving exceptional acceptance lengths while maintaining minimal draft latency for superior end-to-end speedup. We also re-examine scaling laws with PRISM, revealing that PRISM scales more effectively with expanding data volumes than other draft architectures. Through rigorous and fair comparison, we show that PRISM boosts the decoding throughput of an already highly optimized inference engine by more than 2.6x.
Open-Loop Planning, Closed-Loop Verification: Speculative Verification for VLA
Zihua Wang, Zhitao Lin, Ruibo Li
et al.
Vision-Language-Action (VLA) models, as large foundation models for embodied control, have shown strong performance in manipulation tasks. However, their performance comes at high inference cost. To improve efficiency, recent methods adopt action chunking, which predicts a sequence of future actions for open-loop execution. Although effective for reducing computation, open-loop execution is sensitive to environmental changes and prone to error accumulation due to the lack of close-loop feedback. To address this limitation, we propose Speculative Verification for VLA Control (SV-VLA), a framework that combines efficient open-loop long-horizon planning with lightweight closed-loop online verification. Specifically, SV-VLA uses a heavy VLA as a low-frequency macro-planner to generate an action chunk together with a planning context, while a lightweight verifier continuously monitors execution based on the latest observations. Conditioned on both the current observation and the planning context, the verifier compares the planned action against a closed-loop reference action and triggers replanning only when necessary. Experiments demonstrate that SV-VLA combines the efficiency of chunked prediction with the robustness of closed-loop control, enabling efficient and reliable VLA-based control in dynamic environments. Code is available: https://github.com/edsad122/SV-VLA.
La captura
Henri Maldiney
Traducción de «La prise», de Henri Maldiney. En Qu’est-ce que l’homme? Hommage à Alphonse De Waelhens, Bruselas, Presses de l’université Saint-Louis, 1982, pp. 135-157.
Speculative philosophy, Philosophy (General)
$\texttt{SPECS}$: Faster Test-Time Scaling through Speculative Drafts
Mert Cemri, Nived Rajaraman, Rishabh Tiwari
et al.
Scaling test-time compute has driven the recent advances in the reasoning capabilities of large language models (LLMs), typically by allocating additional computation for more thorough exploration. However, increased compute often comes at the expense of higher user-facing latency, directly impacting user experience. Current test-time scaling methods primarily optimize for accuracy based on total compute resources (FLOPS), often overlooking latency constraints. To address this gap, we propose $\texttt{SPECS}$, a latency-aware test-time scaling method inspired by speculative decoding. $\texttt{SPECS}$~uses a smaller, faster model to generate candidate sequences efficiently, and evaluates these candidates using signals from both a larger target model and a dedicated reward model. We introduce new integration strategies, including reward-guided soft verification and a reward-based deferral mechanism. Empirical results on MATH500, AMC23 and OlympiadBench datasets show that $\texttt{SPECS}$~matches or surpasses beam search accuracy while reducing latency by up to $\sim$19.1\%. Our theoretical analysis shows that our algorithm converges to the solution of a KL-regularized reinforcement learning objective with increasing beam width.
SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving
Xiangchen Li, Dimitrios Spatharakis, Saeid Ghafouri
et al.
The growing gap between the increasing complexity of large language models (LLMs) and the limited computational budgets of edge devices poses a key challenge for efficient on-device inference, despite gradual improvements in hardware capabilities. Existing strategies, such as aggressive quantization, pruning, or remote inference, trade accuracy for efficiency or lead to substantial cost burdens. This position paper introduces a new framework that leverages speculative decoding, previously viewed primarily as a decoding acceleration technique for autoregressive generation of LLMs, as a promising approach specifically adapted for edge computing by orchestrating computation across heterogeneous devices. We propose \acronym, a framework that allows lightweight edge devices to draft multiple candidate tokens locally using diverse draft models, while a single, shared edge server verifies the tokens utilizing a more precise target model. To further increase the efficiency of verification, the edge server batch the diverse verification requests from devices. This approach supports device heterogeneity and reduces server-side memory footprint by sharing the same upstream target model across multiple devices. Our initial experiments with Jetson Orin Nano, Raspberry Pi 4B/5, and an edge server equipped with 4 Nvidia A100 GPUs indicate substantial benefits: 2.2 more system throughput, 2.8 more system capacity, and better cost efficiency, all without sacrificing model accuracy.
Exploring post-neoliberal futures for managing commercial heating and cooling through speculative praxis
Oliver Bates, Christian Remy, Kieran Cutting
et al.
What could designing for carbon reduction of heating and cooling in commercial settings look like in the near future? How can we challenge dominant mindsets and paradigms of efficiency and behaviour change? How can we help build worlds through our practice that can become future realities? This paper introduces the fictional consultancy ANCSTRL.LAB to explore opportunities for making space in research projects that can encourage more systems-oriented interventions. We present a design fiction that asks `what if energy management and reduction practice embraced systems thinking?'. Our design fiction explores how future energy consultancies could utilise systems thinking, and (more than) human centred design to re-imagine energy management practice and change systems in ways that are currently unfathomable. We finish by discussing how LIMITS research can utilise design fiction and speculative praxis to help build new material realities where more holistic perspectives, the leveraging of systems change, and the imagining of post-neoliberal futures is the norm.
HotStuff-1: Linear Consensus with One-Phase Speculation
Dakai Kang, Suyash Gupta, Dahlia Malkhi
et al.
This paper introduces HotStuff-1, a BFT consensus protocol that improves the latency of HotStuff-2 by two network hops while maintaining linear communication complexity against faults. Furthermore, HotStuff-1 incorporates an incentive-compatible leader rotation design that motivates leaders to propose transactions promptly. HotStuff-1 achieves a reduction of two network hops by speculatively sending clients early confirmations, after one phase of the protocol. Introducing speculation into streamlined protocols is challenging because, unlike stable-leader protocols, these protocols cannot stop the consensus and recover from failures. Thus, we identify prefix speculation dilemma in the context of streamlined protocols; HotStuff-1 is the first streamlined protocol to resolve it. HotStuff-1 embodies an additional mechanism, slotting, that thwarts delays caused by (1) rationally-incentivized leaders and (2) malicious leaders inclined to sabotage other's progress. The slotting mechanism allows leaders to dynamically drive as many decisions as allowed by network transmission delays before view timers expire, thus mitigating both threats.
Inevitable Trade-off between Watermark Strength and Speculative Sampling Efficiency for Language Models
Zhengmian Hu, Heng Huang
Large language models are probabilistic models, and the process of generating content is essentially sampling from the output distribution of the language model. Existing watermarking techniques inject watermarks into the generated content without altering the output quality. On the other hand, existing acceleration techniques, specifically speculative sampling, leverage a draft model to speed up the sampling process while preserving the output distribution. However, there is no known method to simultaneously accelerate the sampling process and inject watermarks into the generated content. In this paper, we investigate this direction and find that the integration of watermarking and acceleration is non-trivial. We prove a no-go theorem, which states that it is impossible to simultaneously maintain the highest watermark strength and the highest sampling efficiency. Furthermore, we propose two methods that maintain either the sampling efficiency or the watermark strength, but not both. Our work provides a rigorous theoretical foundation for understanding the inherent trade-off between watermark strength and sampling efficiency in accelerating the generation of watermarked tokens for large language models. We also conduct numerical experiments to validate our theoretical findings and demonstrate the effectiveness of the proposed methods.
Metaverse Perspectives from Japan: A Participatory Speculative Design Case Study
Michel Hohendanner, Chiara Ullstein, Dohjin Miyamoto
et al.
Currently, the development of the metaverse lies in the hands of industry. Citizens have little influence on this process. Instead, to do justice to the pluralism of (digital) societies, we should strive for an open discourse including many different perspectives on the metaverse and its core technologies such as AI. We utilize a participatory speculative design (PSD) approach to explore Japanese citizens' perspectives on future metaverse societies, as well as social and ethical implications. Our contributions are twofold. Firstly, we demonstrate the effectiveness of PSD in engaging citizens in critical discourse on emerging technologies like the metaverse, offering our workshop framework as a methodological contribution. Secondly, we identify key themes from participants' perspectives, providing insights for culturally sensitive design and development of virtual environments. Our analysis shows that participants imagine the metaverse to have the potential to solve a variety of societal issues; for example, breaking down barriers of physical environments for communication, social interaction, crisis preparation, and political participation, or tackling identity-related issues. Regarding future metaverse societies, participants' imaginations raise critical questions about human-AI relations, technical solutionism, politics and technology, globalization and local cultures, and immersive technologies. We discuss implications and contribute to expanding conversations on metaverse developments.
LLMs grasp morality in concept
Mark Pock, Andre Ye, Jared Moore
Work in AI ethics and fairness has made much progress in regulating LLMs to reflect certain values, such as fairness, truth, and diversity. However, it has taken the problem of how LLMs might 'mean' anything at all for granted. Without addressing this, it is not clear what imbuing LLMs with such values even means. In response, we provide a general theory of meaning that extends beyond humans. We use this theory to explicate the precise nature of LLMs as meaning-agents. We suggest that the LLM, by virtue of its position as a meaning-agent, already grasps the constructions of human society (e.g. morality, gender, and race) in concept. Consequently, under certain ethical frameworks, currently popular methods for model alignment are limited at best and counterproductive at worst. Moreover, unaligned models may help us better develop our moral and social philosophy.
Contestable Camera Cars: A Speculative Design Exploration of Public AI That Is Open and Responsive to Dispute
Kars Alfrink, Ianus Keller, Neelke Doorn
et al.
Local governments increasingly use artificial intelligence (AI) for automated decision-making. Contestability, making systems responsive to dispute, is a way to ensure they respect human rights to autonomy and dignity. We investigate the design of public urban AI systems for contestability through the example of camera cars: human-driven vehicles equipped with image sensors. Applying a provisional framework for contestable AI, we use speculative design to create a concept video of a contestable camera car. Using this concept video, we then conduct semi-structured interviews with 17 civil servants who work with AI employed by a large northwestern European city. The resulting data is analyzed using reflexive thematic analysis to identify the main challenges facing the implementation of contestability in public AI. We describe how civic participation faces issues of representation, public AI systems should integrate with existing democratic practices, and cities must expand capacities for responsible AI development and operation.
A High-Frequency Load-Store Queue with Speculative Allocations for High-Level Synthesis
Robert Szafarczyk, Syed Waqar Nabi, Wim Vanderbauwhede
Dynamically scheduled high-level synthesis (HLS) enables the use of load-store queues (LSQs) which can disambiguate data hazards at circuit runtime, increasing throughput in codes with unpredictable memory accesses. However, the increased throughput comes at the price of lower clock frequency and higher resource usage compared to statically scheduled circuits without LSQs. The lower frequency often nullifies any throughput improvements over static scheduling, while the resource usage becomes prohibitively expensive with large queue sizes. This paper presents a method for achieving dynamically scheduled memory operations in HLS without significant clock period and resource usage increase. We present a novel LSQ based on shift-registers enabled by the opportunity to specialize queue sizes to a target code in HLS. We show a method to speculatively allocate addresses to our LSQ, significantly increasing pipeline parallelism in codes that could not benefit from an LSQ before. In stark contrast to traditional load value speculation, we do not require pipeline replays and have no overhead on misspeculation. On a set of benchmarks with data hazards, our approach achieves an average speedup of 11$\times$ against static HLS and 5$\times$ against dynamic HLS that uses a state of the art LSQ from previous work. Our LSQ also uses several times fewer resources, scaling to queues with hundreds of entries, and supports both on-chip and off-chip memory.
Lost in the Logistical Funhouse: Speculative Design as Synthetic Media Enterprise
Zoe Horn, Liam Magee, Anna Munster
From the deployment of chatbots as procurement negotiators by corporations such as Walmart to autonomous agents providing 'differentiated chat' for managing overbooked flights, synthetic media are making the world of logistics their 'natural' habitat. Here the coordination of commodities, parts and labour design the problems and produce the training sets from which 'solutions' can be synthesised. But to what extent might synthetic media, surfacing via proto-platforms such as MidJourney and OpenAI and apps such as Eleven Labs and D:ID, be understood as logistical media? This paper details synthetic media experiments with 'ChatFOS', a GPT-based bot tasked with developing a logistics design business. Using its prompt-generated media outputs, we assemble a simulation and parody of AI's emerging functionalities within logistical worlds. In the process, and with clunky 'human-in-the-loop' stitching, we illustrate how large language models become media routers or switches, governing production of image prompts, website code, promotional copy, and investor pitch scenarios. Together these elements become links chained together in media ensembles such as the corporate website or the promotional video, fuelling the fictive logistics visualisation company we have 'founded'. The processes and methods of producing speculative scenarios via ChatFOS lead us to consider how synthetic media might be re-positioned as logistical media. Our experiments probe the ways in which the media of logistics and the logistics of media are increasingly enfolded. We ask: what can a (practice-based) articulation of this double-becoming of logistics and synthetic mediality tell us about the politics and aesthetics of contemporary computation and capital?
Deoptless: Speculation with Dispatched On-Stack Replacement and Specialized Continuations
Olivier Flückiger, Jan Ječmen, Sebastián Krynski
et al.
Just-in-time compilation provides significant performance improvements for programs written in dynamic languages. These benefits come from the ability of the compiler to speculate about likely cases and generate optimized code for these. Unavoidably, speculations sometimes fail and the optimizations must be reverted. In some pathological cases, this can leave the program stuck with suboptimal code. In this paper we propose deoptless, a technique that replaces deoptimization points with dispatched specialized continuations. The goal of deoptless is to take a step towards providing users with a more transparent performance model in which mysterious slowdowns are less frequent and grave.
From Philosophy to Interfaces: an Explanatory Method and a Tool Inspired by Achinstein's Theory of Explanation
Francesco Sovrano, Fabio Vitali
We propose a new method for explanations in Artificial Intelligence (AI) and a tool to test its expressive power within a user interface. In order to bridge the gap between philosophy and human-computer interfaces, we show a new approach for the generation of interactive explanations based on a sophisticated pipeline of AI algorithms for structuring natural language documents into knowledge graphs, answering questions effectively and satisfactorily. Among the mainstream philosophical theories of explanation we identified one that in our view is more easily applicable as a practical model for user-centric tools: Achinstein's Theory of Explanation. With this work we aim to prove that the theory proposed by Achinstein can be actually adapted for being implemented into a concrete software application, as an interactive process answering questions. To this end we found a way to handle the generic (archetypal) questions that implicitly characterise an explanatory processes as preliminary overviews rather than as answers to explicit questions, as commonly understood. To show the expressive power of this approach we designed and implemented a pipeline of AI algorithms for the generation of interactive explanations under the form of overviews, focusing on this aspect of explanations rather than on existing interfaces and presentation logic layers for question answering. We tested our hypothesis on a well-known XAI-powered credit approval system by IBM, comparing CEM, a static explanatory tool for post-hoc explanations, with an extension we developed adding interactive explanations based on our model. The results of the user study, involving more than 100 participants, showed that our proposed solution produced a statistically relevant improvement on effectiveness (U=931.0, p=0.036) over the baseline, thus giving evidence in favour of our theory.
ROUSSEAU E O COURSIER INDOMPTÉ
Israel Alexandria Costa
O presente artigo expõe uma leitura em torno de Jean-Jacques Rousseau que possa servir de contributo à pesquisa contemporânea junto a sua obra. A metodologia aplicada a essa leitura é o da análise histórico-filosófica e hermenêutica, com vistas a desvendar a significação ética de uma passagem do Discours sur l’origine et les fondemens de l’inégalité parmi les hommes que trata do "coursier indompté", noção pouco explorada que, no texto original, articula-se com as noções de "homme barbare" e "orageuse liberté" para, respectivamente, formar um bloco de oposição ao "cheval dressé", ao "homme civilisé" e à "assujettissement tranquille". A hipótese de trabalho que inspira o desenvolvimento desta investigação é a de que o exame que evidencia as peculiaridades da crítica rousseauniana ao mal-estar da civilização e à moral social do seu século se constitui em via de conhecimento para aspectos fundamentais da inteira filosofia de Rousseau, assim como das questões de ética que o autor formulou de modo a jamais perderem a sua força e a sua atualidade.
Speculative philosophy, Philosophy (General)
Presentación
Marta Nogueroles Jové, Cristina Hermida Del Llano
<p><em>BAJO PALABRA</em>. Revista de Filosofía</p><div> </div>
Speculative philosophy, Philosophy (General)
Zwierzę jako absolutny Inny-otwieranie nie/możliwości
Patryk Szaj
The starting point for consideration is to put the Emmanuel Levinas’s philosophy into question whether the status of “absolute Otherness” may also belong to the Other other than man. On the basis of the thought of Levinas it receives a negative responseand it is because of his involvement in the so-called anthropological machine (which he shares with Martin Heidegger and some other critics of metaphysics). But it is, however, possible to open the (broadly defined) phenomenological ethical thought drew on the achievements of Levinas to the question of the animal. This attempt might be centered around the proposals of Jacques Derrida, the author of the essay The Animal That Therefore I Am (More To Follow), where he spoke about the singularity of each animal, the problematic status of border between man and animal,and the being-with animals as a full-fledged modality of being. This is a provocative thought which asks us about our attitude to such issues as “responsibility” and “responsiveness”, “carno-phallogocentrism”, or the status of non-human animals. Derrida’s thought is here very close to some kind of phenomenological language, but it is rather the phenomenology of the otherness than the phenomenology of intentional subject. The same phenomenology that we find in Bernhard Waldenfels’s or John D. Caputo’s writing.
Speculative philosophy, Philosophy (General)
Gottfried W. LEIBNIZ, Ensayos de teodicea
Alba García Guijarro
Reseña de <em>Ensayos de teodicea </em>de<em><strong> </strong></em>Gottfried W. Leibniz, p. 347.
Speculative philosophy, Philosophy (General)
Could it be that there is an improper use of reason? Sensibility, understanding and reason (Schelling, Baader, Jacobi
Ana Carrasco-Conde
The proposal opposes three authors (Jacobi, Baader and Schelling) in a certain moment of the history of the philosophy on having analyzed the dialog that establishes between them through their respective texts: <em>Ueber gelehrte Gesellschaften, ihren Geist und Zweck</em> (Jacobi, 1807); <em>Ueber die Behauptung: dass kein uebler Gebrach der Vernunft sein könne</em> (Baader, 1807) and <em>Philosophische Untersuchungen über give Wesen der menschlichen Freiheit und die damit zusammenhängenden Gegenstände</em> (Schelling, 1809), in order to answer to the question about if the reason is the causer of the evil, if can exists an evil use of it, or if evil, following the philosophical tradition, has its origin in the sensibility or in the understanding.
Metaphysics, Philosophy (General)