Hasil "Speculative philosophy"

arXiv Open Access 2026

TALON: Confidence-Aware Speculative Decoding with Adaptive Token Trees

Tianyu Liu, Qitan Lv, Yuhao Shen et al.

Speculative decoding (SD) has become a standard technique for accelerating LLM inference without sacrificing output quality. Recent advances in speculative decoding have shifted from sequential chain-based drafting to tree-structured generation, where the draft model constructs a tree of candidate tokens to explore multiple possible drafts in parallel. However, existing tree-based SD methods typically build a fixed-width, fixed-depth draft tree, which fails to adapt to the varying difficulty of tokens and contexts. As a result, the draft model cannot dynamically adjust the tree structure to early stop on difficult tokens and extend generation for simple ones. To address these challenges, we introduce TALON, a training-free, budget-driven adaptive tree expansion framework that can be plugged into existing tree-based methods. Unlike static methods, TALON constructs the draft tree iteratively until a fixed token budget is met, using a hybrid expansion strategy that adaptively allocates the node budget to each layer of the draft tree. This framework naturally shapes the draft tree into a "deep-and-narrow" form for deterministic contexts and a "shallow-and-wide" form for uncertain branches, effectively optimizing the trade-off between exploration width and generation depth under a given budget. Extensive experiments across 5 models and 6 datasets demonstrate that TALON consistently outperforms state-of-the-art EAGLE-3, achieving up to 5.16x end-to-end speedup over auto-regressive decoding.

en cs.CL

Detail Sumber

arXiv Open Access 2026

Speculative Policy Orchestration: A Latency-Resilient Framework for Cloud-Robotic Manipulation

Chanh Nguyen, Shutong Jin, Florian T. Pokorny et al.

Cloud robotics enables robots to offload high-dimensional motion planning and reasoning to remote servers. However, for continuous manipulation tasks requiring high-frequency control, network latency and jitter can severely destabilize the system, causing command starvation and unsafe physical execution. To address this, we propose Speculative Policy Orchestration (SPO), a latency-resilient cloud-edge framework. SPO utilizes a cloud-hosted world model to pre-compute and stream future kinematic waypoints to a local edge buffer, decoupling execution frequency from network round-trip time. To mitigate unsafe execution caused by predictive drift, the edge node employs an $ε$-tube verifier that strictly bounds kinematic execution errors. The framework is coupled with an Adaptive Horizon Scaling mechanism that dynamically expands or shrinks the speculative pre-fetch depth based on real-time tracking error. We evaluate SPO on continuous RLBench manipulation tasks under emulated network delays. Results show that even when deployed with learned models of modest accuracy, SPO reduces network-induced idle time by over 60% compared to blocking remote inference. Furthermore, SPO discards approximately 60% fewer cloud predictions than static caching baselines. Ultimately, SPO enables fluid, real-time cloud-robotic control while maintaining bounded physical safety.

en cs.RO, cs.DC

Detail Sumber

arXiv Open Access 2026

LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding

Alexander Samarin, Sergei Krutikov, Anton Shevtsov et al.

Speculative decoding accelerates autoregressive large language model (LLM) inference by using a lightweight draft model to propose candidate tokens that are then verified in parallel by the target model. The speedup is significantly determined by the acceptance rate, yet standard training minimizes Kullback-Leibler (KL) divergence as a proxy objective. While KL divergence and acceptance rate share the same global optimum, small draft models, having limited capacity, typically converge to suboptimal solutions where minimizing KL does not guarantee maximizing acceptance rate. To address this issue, we propose LK losses, special training objectives that directly target acceptance rate. Comprehensive experiments across four draft architectures and six target models, ranging from 8B to 685B parameters, demonstrate consistent improvements in acceptance metrics across all configurations compared to the standard KL-based training. We evaluate our approach on general, coding and math domains and report gains of up to 8-10% in average acceptance length. LK losses are easy to implement, introduce no computational overhead and can be directly integrated into any existing speculator training framework, making them a compelling alternative to the existing draft training objectives.

en cs.LG, cs.CL

Detail Sumber

arXiv Open Access 2025

SpecDiff: Accelerating Diffusion Model Inference with Self-Speculation

Jiayi Pan, Jiaming Xu, Yongkang Zhou et al.

Feature caching has recently emerged as a promising method for diffusion model acceleration. It effectively alleviates the inefficiency problem caused by high computational requirements by caching similar features in the inference process of the diffusion model. In this paper, we analyze existing feature caching methods from the perspective of information utilization, and point out that relying solely on historical information will lead to constrained accuracy and speed performance. And we propose a novel paradigm that introduces future information via self-speculation based on the information similarity at the same time step across different iteration times. Based on this paradigm, we present \textit{SpecDiff}, a training-free multi-level feature caching strategy including a cached feature selection algorithm and a multi-level feature classification algorithm. (1) Feature selection algorithm based on self-speculative information. \textit{SpecDiff} determines a dynamic importance score for each token based on self-speculative information and historical information, and performs cached feature selection through the importance score. (2) Multi-level feature classification algorithm based on feature importance scores. \textit{SpecDiff} classifies tokens by leveraging the differences in feature importance scores and introduces a multi-level feature calculation strategy. Extensive experiments show that \textit{SpecDiff} achieves average 2.80 \times, 2.74 \times , and 3.17\times speedup with negligible quality loss in Stable Diffusion 3, 3.5, and FLUX compared to RFlow on NVIDIA A800-80GB GPU. By merging speculative and historical information, \textit{SpecDiff} overcomes the speedup-accuracy trade-off bottleneck, pushing the Pareto frontier of speedup and accuracy in the efficient diffusion model inference.

en cs.CV, cs.LG

Detail Sumber

arXiv Open Access 2025

Cause or Trigger? From Philosophy to Causal Modeling

Kateřina Hlaváčková-Schindler, Rainer Wöß, Vera Pecorino et al.

Not much has been written about the role of triggers in the literature on causal reasoning, causal modeling, or philosophy. In this paper, we focus on describing triggers and causes in the metaphysical sense and on characterizations that differentiate them from each other. We carry out a philosophical analysis of these differences. From this, we formulate a definition that clearly differentiates triggers from causes and can be used for causal reasoning in natural sciences. We propose a mathematical model and the Cause-Trigger algorithm, which, based on given data to observable processes, is able to determine whether a process is a cause or a trigger of an effect. The possibility to distinguish triggers from causes directly from data makes the algorithm a useful tool in natural sciences using observational data, but also for real-world scenarios. For example, knowing the processes that trigger causes of a tropical storm could give politicians time to develop actions such as evacuation the population. Similarly, knowing the triggers of processes that cause global warming could help politicians focus on effective actions. We demonstrate our algorithm on the climatological data of two recent cyclones, Freddy and Zazu. The Cause-Trigger algorithm detects processes that trigger high wind speed in both storms during their cyclogenesis. The findings obtained agree with expert knowledge.

en cs.LG, stat.ME

Detail Sumber

arXiv Open Access 2025

Look into your Heart -- Prototypes for a Speculative Design Exploration of Personal Heart Rate Visualization

Swaroop Panda

Personal heart rate data from wearable devices contains rich information, yet current visualizations primarily focus on simple metrics, leaving complex temporal patterns largely unexplored. We present a speculative exploration of personal heart rate visualization possibilities through five prototype approaches derived from established visualization literature: pattern/variability heatmaps, recurrence plots, spectrograms, T-SNE, and Poincaré plots. Using physiologically-informed synthetic datasets generated through large language models, we systematically explore how different visualization strategies might reveal distinct aspects of heart rate patterns across temporal scales and analytical complexity. We evaluate these prototypes using established visualization assessment scales from multiple literacy perspectives, then conduct reflective analysis on both the evaluation and the design of the prototypes. Our iterative process reveals recurring design tensions in visualizing complex physiological data. This work offers a speculative map of the personal heart rate visualization design space, providing insights into making heart rate data more visually accessible and meaningful.

en cs.HC

Detail Sumber

arXiv Open Access 2025

HiViS: Hiding Visual Tokens from the Drafter for Speculative Decoding in Vision-Language Models

Zhinan Xie, Peisong Wang, Shuang Qiu et al.

Speculative decoding has proven effective for accelerating inference in Large Language Models (LLMs), yet its extension to Vision-Language Models (VLMs) remains limited by the computational burden and semantic inconsistency introduced by visual tokens. Recent studies reveal that visual tokens in large VLMs are highly redundant, and most of them can be removed without compromising generation quality. Motivated by this observation, we propose HiViS (Hiding Visual Tokens from the Drafter for Speculative Decoding in Vision-Language Models), a framework that utilizes the target VLM as a semantic fusion model, allowing the drafter to obtain visual information without explicitly processing visual tokens, ensuring that the drafter's prefill sequence length matches that of the textual tokens. Furthermore, HiViS employs a time-step-aware aligned training scheme that allows the drafter to autonomously propagate and refine instructive visual-textual semantics during independent drafting, guided by step-dependent bias-correction residuals. Extensive experiments across representative VLMs and benchmarks demonstrate that HiViS achieves significant improvements in average acceptance length and speedup ratio.

en cs.LG, cs.AI

Detail Sumber

arXiv Open Access 2025

From Quarter to All: Accelerating Speculative LLM Decoding via Floating-Point Exponent Remapping and Parameter Sharing

Yushu Zhao, Yubin Qin, Yang Wang et al.

Large language models achieve impressive performance across diverse tasks but exhibit high inference latency due to their large parameter sizes. While quantization reduces model size, it often leads to performance degradation compared to the full model. Speculative decoding remains lossless but typically incurs extra overheads. We propose SPEQ, an algorithm-hardware co-designed speculative decoding method that uses part of the full-model weight bits to form a quantized draft model, thereby eliminating additional training or storage overhead. A reconfigurable processing element array enables efficient execution of both the draft and verification passes. Experimental results across 15 LLMs and tasks demonstrate that SPEQ achieves speedups of 2.07x, 1.53x, and 1.45x compared over FP16, Olive, and Tender, respectively.

en cs.AR

Detail Sumber

arXiv Open Access 2025

Accelerated Test-Time Scaling with Model-Free Speculative Sampling

Woomin Song, Saket Dingliwal, Sai Muralidhar Jayanthi et al.

Language models have demonstrated remarkable capabilities in reasoning tasks through test-time scaling techniques like best-of-N sampling and tree search. However, these approaches often demand substantial computational resources, creating a critical trade-off between performance and efficiency. We introduce STAND (STochastic Adaptive N-gram Drafting), a novel model-free speculative decoding approach that exploits the inherent redundancy in reasoning trajectories to achieve significant acceleration without compromising accuracy. Our analysis shows that reasoning paths frequently reuse similar reasoning patterns, enabling efficient model-free token prediction without requiring separate draft models. By introducing stochastic drafting and preserving probabilistic information through a memory-efficient logit-based N-gram module, combined with optimized Gumbel-Top-K sampling and data-driven tree construction, STAND significantly improves token acceptance rates. Extensive evaluations across multiple models and reasoning tasks (AIME-2024, GPQA-Diamond, and LiveCodeBench) demonstrate that STAND reduces inference latency by 60-65% compared to standard autoregressive decoding while maintaining accuracy. Furthermore, STAND consistently outperforms state-of-the-art speculative decoding methods across diverse inference patterns, including single-trajectory decoding, batch decoding, and test-time tree search. As a model-free approach, STAND can be applied to any existing language model without additional training, making it a powerful plug-and-play solution for accelerating language model reasoning.

en cs.CL

Detail DOI Sumber

arXiv Open Access 2025

The mathematics of periodic anthyphairesis as a basis for the full understanding of Plato's philosophy

Stelios Negrepontis, Athanase Papadopoulos

Even though Plato's philosophy in ancient times was always closely associated with mathematics, modern Platonic scholarship, during the last five centuries, has moved steadily toward de-mathematization. The present work aims to outline a radical re-interpretation of Plato's philosophy, according to which the Platonic Idea, that is, the intelligible Being, has the structure of the philosophical analogue of a geometric dyad in a philosophic anthyphaeresis -- the precursor of modern continued fractions -- which was studied by the Pythagoreans, Theodorus and Theaetetus in relation with the discoveries of quadratic incommensurabilities. This mathematical structure is clearly visible in the Platonic method of Division and Collection, equivalently Name and Logos, equivalently True Opinion plus Logos, in the dialogues Theaetetus, Sophist, Statesman, Meno, and Parmenides. Equipped with this structure of an intelligible Being, we provide definitive answers to fundamental questions, that were not be resolved by Platonists, concerning the following topics: the dialectic numbers, which are based on the anthyphairetic periodicity and the plus one rule, stating that the dialectic number of terms of a sequence is the (number of) ratios of successive terms plus one (stated in the Parmenides 148d-149d); the description of the intelligible being as an Indivisible Line, a statement bordering on the contradictory; the also seemingly contradictory Sophist 's statement that ``the not-Being is a Being'', based on the equalization of the two elements of the dyad defining an intelligible Being; the more general self-similar Oneness of an intelligible Being, based on the equalization of all parts generated by the anthyphairetic division of an intelligible Being; and finally the Third Man Argument in the Introduction to the Parmenides, appearing as a threat for Plato's theory, but essentially innocuous because of the self-similar Oneness. The third part of our study aims to prove that, contrary to the presently dominant interpretation of Zeno's arguments and paradoxes as being devoid of mathematical content, the analysis of Zeno's presence in the Parmenides, Sophist (via the Eleatic Stranger), and Zeno's verbatim Fragments preserved by Simplicius, show that Plato's intelligible Beings essentially coincide with Zeno's true Beings, and hence that Zeno's philosophical thought was already anthyphairetic, and hence heavily influenced by the Pythagorean's Mathematics. These findings run against Burkert's claim that ``ontology is prior to mathematics''. Modern Platonists have never obtained a clear description of the structure of an intelligible Idea in terms of the mathematics of periodic anthyphairesis, and thus were not able to answer fundamental questions, nor to realize the close connection of Zeno's intelligible beings with Zeno's true Beings.

en math.HO

Detail Sumber

arXiv Open Access 2025

DSSD: Efficient Edge-Device LLM Deployment and Collaborative Inference via Distributed Split Speculative Decoding

Jiahong Ning, Ce Zheng, Tingting Yang

Large language models (LLMs) have transformed natural language processing but face critical deployment challenges in device-edge systems due to resource limitations and communication overhead. To address these issues, collaborative frameworks have emerged that combine small language models (SLMs) on devices with LLMs at the edge, using speculative decoding (SD) to improve efficiency. However, existing solutions often trade inference accuracy for latency or suffer from high uplink transmission costs when verifying candidate tokens. In this paper, we propose Distributed Split Speculative Decoding (DSSD), a novel architecture that not only preserves the SLM-LLM split but also partitions the verification phase between the device and edge. In this way, DSSD replaces the uplink transmission of multiple vocabulary distributions with a single downlink transmission, significantly reducing communication latency while maintaining inference quality. Experiments show that our solution outperforms current methods, and codes are at: https://github.com/JasonNing96/DSSD-Efficient-Edge-Computing

en eess.SP

Detail Sumber

arXiv Open Access 2024

KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning

Kaiqi Zhang, Jing Zhao, Rui Chen

Large Language Models (LLMs) exhibit high inference latency due to their autoregressive decoding nature. While the draft head in speculative decoding mitigates this issue, its full potential remains unexplored. In this paper, we introduce KOALA (K-layer Optimized Adversarial Learning Architecture), an orthogonal approach to the draft head. By transforming the conventional single-layer draft head into a multi-layer architecture and incorporating adversarial learning into the traditional supervised training, KOALA significantly improves the accuracy of the draft head in predicting subsequent tokens, thus more closely mirroring the functionality of LLMs. Although this improvement comes at the cost of slightly increased drafting overhead, KOALA substantially unlocks the draft head's potential, greatly enhancing speculative decoding. We conducted comprehensive evaluations of KOALA, including both autoregressive and non-autoregressive draft heads across various tasks, demonstrating a latency speedup ratio improvement of 0.24x-0.41x, which is 10.57%-14.09% faster than the original draft heads.

en cs.CL

Detail Sumber

DOAJ Open Access 2024

MARTIN HEIDEGGER’S METAPHYSICAL QUESTION 1935-1937: GENESIS AND CONSEQUENCES. PART ONE

Юрій МАРИНЧУК

The article deals with the subject of poetic language, its qualities, conditions and purpose. It is the language of creativity and sets out the metaphysics in art. The metaphysics considered is the intersection of being and time in thing and thing as a special place where thoughts appear in the artistic act. The first plan of the stated metaphysics is considered in the fourfold geometric intersection of the essential-artistic space – the square (Gefirt) of the world, earth, mortals and immortals. The result of the semantic field of landscape from our memory of native lands is a history of the meanings of thinking, and things are pre-thought states. A thing is the way the world is given - a narrative of the structure of thinking, life and human activity. The second plan of the stated metaphysics is considered, on the one hand, from the position of poets and oracles, as those who express themselves historically, and, on the other hand, from the inner basis of thinking. This foundation is intuitive, affective-intellectual thought. It emerges in the circle of unknown, unexplored, impossible things – in the topos, where the world of life is formed by co-existence with what we enter into, creating meaning for ourselves. Co-existence means examining the thing for the interruption of being and time and what information about the world can be extracted. But it turns out that there is no being at all and no time at all, for there are no two things of one thing: that which belongs to being and that which belongs to time. It turns out that the intersection is thinking. It builds meaningful forms, lives them and carries them in language. To a certain extent, the thinking we possess is seen as a device of time. Martin Heidegger, through the existential Care, arranges thinking by the relation of being and time in a thing. He draws our attention to the fact that not only man owes his truth to the co-existence of life, but also the thing with its nature. The purpose of this article is to investigate the foundations of poetic language of creation as a way of presentation of metaphysics in art.

Epistemology. Theory of knowledge

Detail DOI Sumber

arXiv Open Access 2023

Speculative Exploration on the Concept of Artificial Agents Conducting Autonomous Research

Shiro Takagi

This paper engages in a speculative exploration of the concept of an artificial agent capable of conducting research. Initially, it examines how the act of research can be conceptually characterized, aiming to provide a starting point for discussions about what it means to create such agents. The focus then shifts to the core components of research: question formulation, hypothesis generation, and hypothesis verification. This discussion includes a consideration of the potential and challenges associated with enabling machines to autonomously perform these tasks. Subsequently, this paper briefly considers the overlapping themes and interconnections that underlie them. Finally, the paper presents preliminary thoughts on prototyping as an initial step towards uncovering the challenges involved in developing these research-capable agents.

en cs.AI, cs.LG

Detail Sumber

arXiv Open Access 2023

PaSS: Parallel Speculative Sampling

Giovanni Monea, Armand Joulin, Edouard Grave

Scaling the size of language models to tens of billions of parameters has led to impressive performance on a wide range of tasks. At generation, these models are used auto-regressively, requiring a forward pass for each generated token, and thus reading the full set of parameters from memory. This memory access forms the primary bottleneck for generation and it worsens as the model size increases. Moreover, executing a forward pass for multiple tokens in parallel often takes nearly the same time as it does for just one token. These two observations lead to the development of speculative sampling, where a second smaller model is used to draft a few tokens, that are then validated or rejected using a single forward pass of the large model. Unfortunately, this method requires two models that share the same tokenizer and thus limits its adoption. As an alternative, we propose to use parallel decoding as a way to draft multiple tokens from a single model with no computational cost, nor the need for a second model. Our approach only requires an additional input token that marks the words that will be generated simultaneously. We show promising performance (up to $30\%$ speed-up) while requiring only as few as $O(d_{emb})$ additional parameters.

en cs.CL

Detail Sumber

DOAJ Open Access 2023

A existência no fio da navalha: propriedade e violência em Grande sertão: veredas

Vinícius Victor A. Barros

Este artigo discute a complexa relação entre propriedade e violência no âmbito do romance Grande sertão: veredas (1956), de João Guimarães Rosa. Entende-se que, para além da rica matéria social, histórica e política do sertão, uma das principais dimensões dos conflitos narrados se vincula intrinsecamente à concentração de terras por parte dos grandes potentados locais, os coronéis, e pela prática quase institucionalizada do banditismo armado, o jaguncismo. Longe de ser mera fabulação, a análise dialética entre literatura e sociedade almeja sublinhar que o romance de Guimarães Rosa é uma possibilidade de pesquisa e interpretação de um passado não muito distante e que ainda reflete, em maior ou menor grau, determinada lógica de coerções, roubos e assassinatos em áreas características do território brasileiro.

Epistemology. Theory of knowledge, History (General)

Detail DOI Sumber

DOAJ Open Access 2022

Comparison of the First Three Waves of the COVID-19 Pandemic in Russia in 2020–21

L. S. Karpova, K. A. Stolyarov, N. M. Popovtseva et al.

Relevance. The ongoing COVID-19 pandemic in the world, which is characterized by a long undulating course, requires an in-depth study of the features of the epidemic process, including the influence of natural, climatic and social factors on it. Aim. Compare the intensity of three waves of the COVID-19 pandemic in Russia. To identify the features of the parameters of the COVID-19 pandemic in Russia in the age groups of the population and in the federal districts. Materials and methods. Data from the computer database of the Influenza Research Institute and the Stop-coronavirus website were used. Results. The construction of the weekly dynamics of COVID-19 made it possible to clarify the start, peak and end dates of each wave in megacities, federal districts and among the population of the Russia. Conclusion. In the dynamics of the incidence of COVID-19 in the population of the Russian Federation from March 2020 to September 2021, three waves were detected: I spring-summer wave, II autumn-winter, III spring -summer. All three waves started in megacities, first in Moscow, and spread across federal districts. The rise of morbidity in Russia as a whole began and peaked in the autumn-winter wave later than in the spring-summer waves (immediately after the megacities). The total duration of the epidemic and the period of its development in the autumn-winter wave were longer than in the spring-summer waves. Morbidity, hospitalization and mortality depended on age, and in all three waves were higher among people over 65 years of age. The intensity of COVID-19 in the first spring-summer wave was the lowest. The II autumn-winter wave was the most intense in terms of morbidity, hospitalization rate and mortality in all age groups. The III spring-summer wave in terms of morbidity and hospitalization was less intensive, than the II autumn-winter wave, but there were no significant differences between the mortality rates in the II and III wave hospitalization and fatal outcomes were revealed.

Epistemology. Theory of knowledge

Detail DOI Sumber

CrossRef Open Access 2020

The Transformation of Scientific Political Philosophy into a Speculative Philosophy of History

Linas Jokubaitis

The paper presents an analysis of the three stages of the development of political philosophy since the 17th century. The rise of modern political theory was marked by attempts to develop a philosophy along the lines of natural sciences. These attempts lead to the development of highly speculative and abstract doctrines; political philosophy ceased being a practical discipline. The paper argues that an important aspect of the traditionalist political thought of the 18th century was an attempt to reestablish the link between theory and practice. In the 19th century, the interest in history was supplemented with new premises about the historical process. Political philosophy, which strived to become scientific, became highly dependent on the premises of various philosophies of history.

en

Detail DOI Sumber

DOAJ Open Access 2020

Escuchar voces. Fantasmas auditivos y escenografías sonoras

Domingo Hernández Sánchez

Este artigo pretende investigar que tipo de escuta é a de quem “ouve vozes”. Tenta-‑se mostrar, com ele, os efeitos que a análise de certas alucinações auditivas e espectros sonoros teria, se fosse realizada do ponto de vista da escuta, não do conteúdo do que é ouvido. Para isso, o artigo conecta o estudo de certos “fantasmas auditivos” com a análise – no território da ficção, especificamente um romance de Robert Silverberg – da perda do poder de um telepata. Pretende-‑se, assim, mostrar uma maneira possível de entender como o ouvinte se ouve a si mesmo como se fosse outro, ou, inversamente, como ouve outro como se não estivesse cercado pela exterioridade, mas permanecesse na parte mais interna do si mesmo.

Speculative philosophy, Philosophy (General)

Detail DOI Sumber

DOAJ Open Access 2020

Claves ecofeministas para rebeldes que aman a la Tierra y a los animales; Alicia H. Puleo

Alba Rodríguea, Camilla de Mas, Joel Juvany

Claves ecofeministas para rebeldes que aman a la Tierra y a los animales de Alicia H. Puleo (ilustrado por Verónica Perales)

Speculative philosophy, Philosophy (General)

Detail Sumber

Hasil untuk "Speculative philosophy"