O presente artigo analisa a religião de matriz africana Umbanda a partir de suas relações sociais e espaciais no Brasil. O objetivo é discutir as práticas de re-existência, ou seja, estratégias de resistência e adaptação, que a Umbanda desenvolve por meio de seu sincretismo religioso e de suas territorialidades. Essas territorialidades incluem espaços como encruzilhadas, praias, rios, pedreiras e estradas, além dos próprios terreiros, utilizados para oferendas e cultos. O artigo debate se o sincretismo representa um embranquecimento da cultura africana ou uma forma de adaptação e sobrevivência diante do racismo estrutural e da repressão histórica. A análise se baseia no conceito de espaço concebido de Henri Lefebvre, compreendendo a Umbanda como uma manifestação que resiste à ordem instituída e ao capitalismo brasileiro.
Epistemology. Theory of knowledge, History (General)
Neurocognitive disorders (NCDs), such as Alzheimer's disease, are globally prevalent and require scalable screening methods for proactive management. Prior research has explored the potential of technologies like conversational AI (CAI) to administer NCD screening tests. However, challenges remain in designing CAI-based solutions that make routine NCD screening socially acceptable, engaging, and capable of encouraging early medical consultation. In this study, we conducted interviews with 36 participants, including clinicians, individuals at risk of NCDs, and their caregivers, to explore the speculative future of adopting CAI for NCD screening. Our findings reveal shared expectations, such as deploying CAI in home or community settings to reduce social stress. Nonetheless, conflicts emerged among stakeholders, for example, users' need for emotional support may conflict with clinicians' preference for CAI's professional and standardized administration. Then, we look into the user journey of NCD screening based on the current practice of manual screening and the expected CAI-supported screening. Finally, leveraging the human-centered approach, we provide actionable implications for future CAI design in NCD screening.
Kevin Garner, Polykarpos Thomadakis, Nikos Chrisochoides
This paper presents a distributed memory method for anisotropic mesh adaptation that is designed to avoid the use of collective communication and global synchronization techniques. In the presented method, meshing functionality is separated from performance aspects by utilizing a separate entity for each - a multicore cc-NUMA-based (shared memory) mesh generation software and a parallel runtime system that is designed to help applications leverage the concurrency offered by emerging high-performance computing (HPC) architectures. First, an initial mesh is decomposed and its interface elements (subdomain boundaries) are adapted on a single multicore node (shared memory). Subdomains are then distributed among the nodes of an HPC cluster so that their interior elements are adapted while interface elements (already adapted) remain frozen to maintain mesh conformity. Lessons are presented regarding some re-designs of the shared memory software and how its speculative execution model is utilized by the distributed memory method to achieve good performance. The presented method is shown to generate meshes (of up to approximately 1 billion elements) with comparable quality and performance to existing state-of-the-art HPC meshing software.
Although current Video-LLMs achieve impressive performance in video understanding tasks, their autoregressive decoding efficiency remains constrained by the massive number of video tokens. Visual token pruning can partially ease this bottleneck, yet existing approaches still suffer from information loss and yield only modest acceleration in decoding. In this paper, we propose ParallelVLM, a training-free draft-then-verify speculative decoding framework that overcomes both mutual waiting and limited speedup-ratio problems between draft and target models in long-video settings. ParallelVLM features two parallelized stages that maximize hardware utilization and incorporate an Unbiased Verifier-Guided Pruning strategy to better align the draft and target models by eliminating the positional bias in attention-guided pruning. Extensive experiments demonstrate that ParallelVLM effectively expands the draft window by $1.6\sim1.8\times$ with high accepted lengths, and accelerates various video understanding benchmarks by 3.36$\times$ on LLaVA-Onevision-72B and 2.42$\times$ on Qwen2.5-VL-32B compared with vanilla autoregressive decoding.
This article draws out a potential encounter between Hegel and film studies. Following a line of thought instantiated by Theodor Adorno, it constructs a method of reading Hegel through cinematic formal analysis. In particular, the article argues that the speculative proposition should be thought through the structure of the dissolve. The speculative proposition is a sentence whose subject and predicate rest in uneasy relation to one another, and which is not a proposition of simple identity. Making use of a famous example from The Phenomenology of Spirit, the article elaborates the confused position of the speculative proposition and demonstrates the necessity of explanatory tools that approach the matter obliquely. In the process of making this argument, other attempts to put dialectics and montage together (notably, Eisenstein’s) are situated in relation to the instructional potential of the dissolve. Close reading of a particular dissolve taken from Moby Dick (John Huston, 1956) demonstrates the isomorphism between the mechanism of Hegelian dialectic and this particular unit of film form. The article concludes by returning to a particular speculative proposition in light of the insights gleaned from formal analysis.
Zhengmian Hu, Tong Zheng, Vignesh Viswanathan
et al.
Large Language Models (LLMs) have become an indispensable part of natural language processing tasks. However, autoregressive sampling has become an efficiency bottleneck. Multi-Draft Speculative Decoding (MDSD) is a recent approach where, when generating each token, a small draft model generates multiple drafts, and the target LLM verifies them in parallel, ensuring that the final output conforms to the target model distribution. The two main design choices in MDSD are the draft sampling method and the verification algorithm. For a fixed draft sampling method, the optimal acceptance rate is a solution to an optimal transport problem, but the complexity of this problem makes it difficult to solve for the optimal acceptance rate and measure the gap between existing verification algorithms and the theoretical upper bound. This paper discusses the dual of the optimal transport problem, providing a way to efficiently compute the optimal acceptance rate. For the first time, we measure the theoretical upper bound of MDSD efficiency for vocabulary sizes in the thousands and quantify the gap between existing verification algorithms and this bound. We also compare different draft sampling methods based on their optimal acceptance rates. Our results show that the draft sampling method strongly influences the optimal acceptance rate, with sampling without replacement outperforming sampling with replacement. Additionally, existing verification algorithms do not reach the theoretical upper bound for both without replacement and with replacement sampling. Our findings suggest that carefully designed draft sampling methods can potentially improve the optimal acceptance rate and enable the development of verification algorithms that closely match the theoretical upper bound.
Robert Szafarczyk, Syed Waqar Nabi, Wim Vanderbauwhede
Irregular codes are bottlenecked by memory and communication latency. Decoupled access/execute (DAE) is a common technique to tackle this problem. It relies on the compiler to separate memory address generation from the rest of the program, however, such a separation is not always possible due to control and data dependencies between the access and execute slices, resulting in a loss of decoupling. In this paper, we present compiler support for speculation in DAE architectures that preserves decoupling in the face of control dependencies. We speculate memory requests in the access slice and poison mis-speculations in the execute slice without the need for replays or synchronization. Our transformation works on arbitrary, reducible control flow and is proven to preserve sequential consistency. We show that our approach applies to a wide range of architectural work on CPU/GPU prefetchers, CGRAs, and accelerators, enabling DAE on a wider range of codes than before.
As a new paradigm of visual content generation, autoregressive text-to-image models suffer from slow inference due to their sequential token-by-token decoding process, often requiring thousands of model forward passes to generate a single image. To address this inefficiency, we propose Speculative Jacobi-Denoising Decoding (SJD2), a framework that incorporates the denoising process into Jacobi iterations to enable parallel token generation in autoregressive models. Our method introduces a next-clean-token prediction paradigm that enables the pre-trained autoregressive models to accept noise-perturbed token embeddings and predict the next clean tokens through low-cost fine-tuning. This denoising paradigm guides the model towards more stable Jacobi trajectories. During inference, our method initializes token sequences with Gaussian noise and performs iterative next-clean-token-prediction in the embedding space. We employ a probabilistic criterion to verify and accept multiple tokens in parallel, and refine the unaccepted tokens for the next iteration with the denoising trajectory. Experiments show that our method can accelerate generation by reducing model forward passes while maintaining the visual quality of generated images.
In a previous article, we discussed a paradox in Timaeus' cosmology: that there is no void inside the universe, even though it is entirely filled with polyhedra-a mathematical impossibility (Brisson-Ofman 2025). In the present article, we examine another paradox. While the first paradox is well known and was already highlighted by Aristotle as a fundamental mathematical contradiction undermining Plato's cosmology, this new paradox has gone almost entirely unnoticed by commentators, both ancient and modern. This oversight may surprise scholars, given the extensive body of work on Timaeus' universe, much of which emphasizes discrepancies with astronomical observations or points out supposed internal contradictions. Like the first paradox, this one arises from the premise of a universe entirely filled with polyhedra. However, in this case, the contradiction stems from the absence of void outside it. In the first section, we demonstrate that the shape of the universe cannot be a perfect mathematical sphere: that is, its boundary is not smooth but exhibits bumps and hollows. Next, we present conceptual arguments from Plato's text that support the necessity of such 'defects' in the universe's shape compared to a perfect mathematical sphere. In the third section, we argue that such a universe cannot move at all. Finally, we propose a solution to this mathematical contradiction in Timaeus' construction, drawing on the same ideas used to address the earlier apparent contradiction: the unique feature of Timaeus' universe as a living being, whose parts are continuously moving, changing, decomposing, and reforming. While this problem does not depend on the various schools of interpretation of the Timaeus, it is related to some important issues concerning Plato's philosophy. These issues include the importance of observations in science-particularly in astronomy-the relationship between intelligible models and their sensible copies, the mythos/ logos approach of Plato's cosmology, and the debate over 'metaphorical' vs 'literal' interpretation. Of course, all these questions fall outside the scope of this article and will not be addressed here.
Large Language Models (LLMs) often excel in specific domains but fall short in others due to the limitations of their training. Thus, enabling LLMs to solve problems collaboratively by integrating their complementary knowledge promises to improve their performance across domains. To realize this potential, we introduce a novel Collaborative Speculative Decoding (CoSD) algorithm that enables efficient LLM knowledge fusion at test time without requiring additional model training. CoSD employs a draft model to generate initial sequences and an easy-to-learn rule or decision tree to decide when to invoke an assistant model to improve these drafts. CoSD not only enhances knowledge fusion but also improves inference efficiency, is transferable across domains and models, and offers greater explainability. Experimental results demonstrate that CoSD improves accuracy by up to 10\% across benchmarks compared to existing methods, providing a scalable and effective solution for LLM-based applications
Speculative decoding (SPD) aims to accelerate the auto-regressive token generation process of a target Large Language Model (LLM). Some approaches employ a draft model with multiple heads to predict a sequence of future tokens, where each head handles a token in the sequence. The target LLM verifies the predicted sequence and accepts aligned tokens, enabling efficient multi-token generation. However, existing methods assume that all tokens within a sequence are equally important, employing identical head structures and relying on a single-generation paradigm, either serial or parallel. To this end, we theoretically demonstrate that initial tokens in the draft sequence are more important than later ones. Building on this insight, we propose Gumiho, a hybrid model combining serial and parallel heads. Specifically, given the critical importance of early tokens, we employ a sophisticated Transformer architecture for the early draft heads in a serial configuration to improve accuracy. For later tokens, we utilize multiple lightweight MLP heads operating in parallel to enhance efficiency. By allocating more advanced model structures and longer running times to the early heads, Gumiho achieves improved overall performance. The experimental results demonstrate that our method outperforms existing approaches, fully validating its effectiveness.
UX professionals routinely conduct design reviews, yet privacy concerns are often overlooked -- not only due to limited tools, but more critically because of low intrinsic motivation. Limited privacy knowledge, weak empathy for unexpectedly affected users, and low confidence in identifying harms make it difficult to address risks. We present PrivacyMotiv, an LLM-powered system that supports privacy-oriented design diagnosis by generating speculative personas with UX user journeys centered on individuals vulnerable to privacy risks. Drawing on narrative strategies, the system constructs relatable and attention-drawing scenarios that show how ordinary design choices may cause unintended harms, expanding the scope of privacy reflection in UX. In a within-subjects study with professional UX practitioners (N=16), we compared participants' self-proposed methods with PrivacyMotiv across two privacy review tasks. Results show significant improvements in empathy, intrinsic motivation, and perceived usefulness. This work contributes a promising privacy review approach which addresses the motivational barriers in privacy-aware UX.
This study explores how virtual environments and artificial intelligence can enhance university students' learning experiences, with particular attention to the digital preferences of Generation Z. An experiment was conducted at the Faculty of Pedagogy, Humanities, and Social Sciences at University of Gyor, where Walter's Cube technology and a trained AI mediator were integrated into the instruction of ten philosophical topics. The curriculum was aligned with the official syllabus and enriched with visual content, quotations, and explanatory texts related to iconic figures in philosophy. A total of 77 first-year undergraduate students from full-time humanities and social sciences programs participated in the study. Following their end-of-semester offline written examination, students voluntarily completed a paper-based, anonymous ten-question test and provided feedback on the method's effectiveness. No sensitive personal data were collected, and the research was conducted with formal approval from the Faculty Dean. Descriptive statistics and inferential tests were applied to evaluate the impact of the virtual environment and AI mediation on learning outcomes. Results indicate that 80 percent of participants achieved good or excellent final exam grades, and the majority rated the virtual material as highly effective. Qualitative feedback emphasized increased motivation and deeper engagement, attributed to the immersive 3D presentation and interactive AI support. This research contributes to the advancement of digital pedagogy and suggests new directions for applying virtual and AI-based methods in higher education, particularly in disciplines where abstract reasoning and conceptual understanding are central.
We study a relaxation of the problem of coupling probability distributions -- a list of samples is generated from one distribution and an accept is declared if any one of these samples is identical to the sample generated from the other distribution. We propose a novel method for generating samples, which extends the Gumbel-max sampling suggested in Daliri et al. (arXiv:2408.07978) for coupling probability distributions. We also establish a corresponding lower bound on the acceptance probability, which we call the list matching lemma. We next discuss two applications of our setup. First, we develop a new mechanism for multi-draft speculative sampling that is simple to implement and achieves performance competitive with baselines such as SpecTr and SpecInfer across a range of language tasks. Our method also guarantees a certain degree of drafter invariance with respect to the output tokens which is not supported by existing schemes. We also provide a theoretical lower bound on the token level acceptance probability. As our second application, we consider distributed lossy compression with side information in a setting where a source sample is compressed and available to multiple decoders, each with independent side information. We propose a compression technique that is based on our generalization of Gumbel-max sampling and show that it provides significant gains in experiments involving synthetic Gaussian sources and the MNIST image dataset.
Guanzhou Hu, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau
We introduce explicit speculation, a variant of I/O speculation technique where I/O system calls can be parallelized under the guidance of explicit application code knowledge. We propose a formal abstraction -- the foreaction graph -- which describes the exact pattern of I/O system calls in an application function as well as any necessary computation associated to produce their argument values. I/O system calls can be issued ahead of time if the graph says it is safe and beneficial to do so. With explicit speculation, serial applications can exploit storage I/O parallelism without involving expensive prediction or checkpointing mechanisms. Based on explicit speculation, we implement Foreactor, a library framework that allows application developers to concretize foreaction graphs and enable concurrent I/O with little or no modification to application source code. Experimental results show that Foreactor is able to improve the performance of both synthetic benchmarks and real applications by significant amounts (29%-50%).
Lawrence Stewart, Matthew Trager, Sujan Kumar Gonugondla
et al.
Speculative decoding aims to speed up autoregressive generation of a language model by verifying in parallel the tokens generated by a smaller draft model.In this work, we explore the effectiveness of learning-free, negligible-cost draft strategies, namely $N$-grams obtained from the model weights and the context. While the predicted next token of the base model is rarely the top prediction of these simple strategies, we observe that it is often within their top-$k$ predictions for small $k$. Based on this, we show that combinations of simple strategies can achieve significant inference speedups over different tasks. The overall performance is comparable to more complex methods, yet does not require expensive preprocessing or modification of the base model, and allows for seamless `plug-and-play' integration into pipelines.
Majid Daliri, Christopher Musco, Ananda Theertha Suresh
Suppose Alice has a distribution $P$ and Bob has a distribution $Q$. Alice wants to draw a sample $a\sim P$ and Bob a sample $b \sim Q$ such that $a = b$ with as high of probability as possible. It is well-known that, by sampling from an optimal coupling between the distributions, Alice and Bob can achieve $\Pr[a = b] = 1 - D_{TV}(P,Q)$, where $D_{TV}(P,Q)$ is the total variation distance between $P$ and $Q$. What if Alice and Bob must solve this same problem \emph{without communicating at all?} Perhaps surprisingly, with access to public randomness, they can still achieve $\Pr[a = b] \geq \frac{1 - D_{TV}(P,Q)}{1 + D_{TV}(P,Q)} \geq 1-2D_{TV}(P,Q)$ using a simple protocol based on the Weighted MinHash algorithm. This bound was shown to be optimal in the worst-case by [Bavarian et al., 2020]. In this work, we revisit the communication-free coupling problem. We provide a simpler proof of the optimality result from [Bavarian et al., 2020]. We show that, while the worst-case success probability of Weighted MinHash cannot be improved, an equally simple protocol based on Gumbel sampling offers a Pareto improvement: for every pair of distributions $P, Q$, Gumbel sampling achieves an equal or higher value of $\Pr[a = b]$ than Weighted MinHash. Importantly, this improvement translates to practice. We demonstrate an application of communication-free coupling to \emph{speculative decoding}, a recent method for accelerating autoregressive large language models [Leviathan, Kalman, Matias, ICML 2023]. We show that communication-free protocols can be used to contruct \emph{\CSD{}} schemes, which have the desirable property that their output is fixed given a fixed random seed, regardless of what drafter is used for speculation. In experiments on a language generation task, Gumbel sampling outperforms Weighted MinHash. Code is available at https://github.com/majid-daliri/DISD.
Tão fecunda quanto paradoxal, a noção de clássico implica uma referência de excelência que só é reconhecida retrospectivamente, uma inserção histórica tanto quanto um valor intemporal, a autoridade de uma tradição bem como a liberdade criadora de uma apropriação actual, o elitismo da recepção aliado à popularidade do acesso. Após uma sinopse etimológica e lexicográfica, este artigo explora a noção de “clássico” na hermenêutica de Paul Ricœur. De facto, é todo o espectro da teoria da interpretação, mas também da tradução e da leitura, que pode ser mobilizado para tratar desta noção, ambivalente na própria utilização que dela faz o autor de Temps et Récit. A hermenêutica começa a ser a “arte de compreender os clássicos” no sentido lato do termo: textos sagrados e profanos na encruzilhada de horizontes geográficos e históricos. Explicação e compreensão, crítica e pertença, afastamento e desdistanciamento, são tantas dicotomias metodológicas através das quais se desenvolve a ideia do carácter plural e activo da recepção dos clássicos. Tanto quanto a sua teoria, é também a prática ricoeuriana que evidencia a sua concepção dos “clássicos”. Paul Ricœur é um leitor-pensador que mediatiza as tensões tanto quanto articula as oposições entre os autores canónicos a partir dos quais elabora o seu próprio pensamento. Estes são alguns dos desafios da impensabilidade e da indispensabilidade dos "clássicos" e da própria noção de "clássico".
Neuroscience and artificial intelligence are closely intertwined, but so are the physics of dynamical system, philosophy and psychology. Each of these fields try in their own way to relate observations at the level of molecules, synapses, neurons or behavior, to a function. An influential conceptual approach to this end was popularized by David Marr, which focused on the interaction between three theoretical 'levels of analysis'. With the convergence of simulation-based approaches, algorithm-oriented Neuro-AI and high-throughput data, we currently see much research organized around four levels of analysis: observations, models, algorithms and functions. Bidirectional interaction between these levels influences how we undertake interdisciplinary science.