Linearized string representations serve as the foundation of scalable autoregressive molecular generation; however, they introduce a fundamental modality mismatch where a single molecular graph maps to multiple distinct sequences. This ambiguity leads to \textit{trajectory divergence}, where the latent representations of structurally equivalent partial graphs drift apart due to differences in linearization history. To resolve this without abandoning the efficient string formulation, we propose Structure-Invariant Generative Molecular Alignment (SIGMA). Rather than altering the linear representation, SIGMA enables the model to strictly recognize geometric symmetries via a token-level contrastive objective, which explicitly aligns the latent states of prefixes that share identical suffixes. Furthermore, we introduce Isomorphic Beam Search (IsoBeam) to eliminate isomorphic redundancy during inference by dynamically pruning equivalent paths. Empirical evaluations on standard benchmarks demonstrate that SIGMA bridges the gap between sequence scalability and graph fidelity, yielding superior sample efficiency and structural diversity in multi-parameter optimization compared to strong baselines.
In the modern era of digitalization, integration with blockchain and machine learning (ML) technologies is most important for improving applications in healthcare management and secure prediction analysis of health data. This research aims to develop a novel methodology for securely storing patient medical data and analyzing it for PCOS prediction. The main goals are to leverage Hyperledger Fabric for immutable, private data and to integrate Explainable Artificial Intelligence (XAI) techniques to enhance transparency in decision-making. The innovation of this study is the unique integration of blockchain technology with ML and XAI, solving critical issues of data security and model interpretability in healthcare. With the Caliper tool, the Hyperledger Fabric blockchain’s performance is evaluated and enhanced. The suggested Explainable AI-based blockchain system for Polycystic Ovary Syndrome detection (EAIBS-PCOS) system demonstrates outstanding performance and records 98% accuracy, 100% precision, 98.04% recall, and a resultant F1-score of 99.01%. Such quantitative measures ensure the success of the proposed methodology in delivering dependable and intelligible predictions for PCOS diagnosis, therefore making a great addition to the literature while serving as a solid solution for healthcare applications in the near future.
Sign language recognition (SLR) plays a vital role in including people with hearing impairment in the community. It facilitates the recognition of sign gestures and converts them into spoken languages. One of the main challenges for developing SLR systems is the lack of annotated datasets. This issue is more noticeable with low–resourced sign languages. To address this issue, we propose a pretrained large vision model, SignVLM, for SLR. This work explores the capability of the contrastive language–image pre-training (CLIP) model for SLR. This model is used to extract spatial features from the sign video frames while a Transformer decoder is used for temporal learning. The proposed model has been evaluated on four different sign languages using the KArSL, WLASL, LSA64, and AUSTL datasets. Different evaluation settings have been followed in this work including zero-shot and few-shot learning. The proposed model outperformed other models on the KArSL, WLASL, and LSA64 datasets and achieved comparable performance on the AUTSL dataset. The obtained results demonstrate the generalization of the proposed model to new datasets with few samples. The code and data are available at https://github.com/Hamzah-Luqman/signVLM.
This study presents three game-playing agents incorporating Vision Transformers (ViTs) into the AlphaZero framework: AlphaViT, AlphaViD (AlphaViT with a transformer decoder), and AlphaVDA (AlphaViD with learnable action embeddings). These agents can play multiple board games with varying sizes using a single shared-weight neural network, thus overcoming AlphaZero’s limitation of fixed board sizes. AlphaViT employs only a transformer encoder, whereas AlphaViD and AlphaVDA incorporate both a transformer encoder and a decoder. In AlphaViD, the decoder processes the output from the encoder, whereas AlphaVDA uses learnable embeddings as decoder input. The additional decoder in AlphaViD and AlphaVDA provides flexibility to adapt to various action spaces and board sizes. Experimental results show that the proposed agents, trained on either individual games or multiple games simultaneously, consistently outperform traditional algorithms, such as Minimax and Monte Carlo Tree Search. They achieve performance close to that of AlphaZero despite relying on a single deep neural network (DNN) with shared weights. In particular, AlphaViT performs well across all evaluated games. Furthermore, fine-tuning the DNN using weights pre-trained on small board games accelerates convergence and improves performance, particularly in Gomoku. Notably, simultaneous training on multiple games yields performance comparable to, or even surpassing, that of single-game training. These results indicate the potential of transformer-based architectures for developing flexible and robust game-playing AI agents that excel in multiple games and dynamic environments.
This paper explores the integration of provenance tracking systems within the context of Semantic Web technologies to enhance data integrity in diverse operational environments. SURROUND Australia Pty Ltd demonstrates innovative applica-tions of the PROV Data Model (PROV-DM) and its Semantic Web variant, PROV-O, to systematically record and manage provenance information across multiple data processing domains. By employing RDF and Knowledge Graphs, SURROUND ad-dresses the critical challenges of shared entity identification and provenance granularity. The paper highlights the company's architecture for capturing comprehensive provenance data, en-abling robust validation, traceability, and knowledge inference. Through the examination of two projects, we illustrate how provenance mechanisms not only improve data reliability but also facilitate seamless integration across heterogeneous systems. Our findings underscore the importance of sophisticated provenance solutions in maintaining data integrity, serving as a reference for industry peers and academics engaged in provenance research and implementation.
Kieron Kretschmar, Walter Laurito, Sharan Maiya
et al.
Prior work has introduced techniques for detecting when large language models (LLMs) lie, that is, generate statements they believe are false. However, these techniques are typically validated in narrow settings that do not capture the diverse lies LLMs can generate. We introduce LIARS' BENCH, a testbed consisting of 72,863 examples of lies and honest responses generated by four open-weight models across seven datasets. Our settings capture qualitatively different types of lies and vary along two dimensions: the model's reason for lying and the object of belief targeted by the lie. Evaluating three black- and white-box lie detection techniques on LIARS' BENCH, we find that existing techniques systematically fail to identify certain types of lies, especially in settings where it's not possible to determine whether the model lied from the transcript alone. Overall, LIARS' BENCH reveals limitations in prior techniques and provides a practical testbed for guiding progress in lie detection.
This paper introduces a cognitive Retrieval-Augmented Generator (RAG) architecture that transcends transformer context-length limitations through phase-coded memory and morphological-semantic resonance. Instead of token embeddings, the system encodes meaning as complex wave patterns with amplitude-phase structure. A three-tier design is presented: a Morphological Mapper that transforms inputs into semantic waveforms, a Field Memory Layer that stores knowledge as distributed holographic traces and retrieves it via phase interference, and a Non-Contextual Generator that produces coherent output guided by resonance rather than fixed context. This approach eliminates sequential token dependence, greatly reduces memory and computational overhead, and enables unlimited effective context through frequency-based semantic access. The paper outlines theoretical foundations, pseudocode implementation, and experimental evidence from related complex-valued neural models, emphasizing substantial energy, storage, and time savings.
Large language models (LLMs) are being piloted for clinical coding and decision support, yet no open benchmark targets the hospital-funding layer where Diagnosis-Related Groups (DRGs) determine reimbursement. In most OECD systems, DRGs route a substantial share of multi-trillion-dollar health spending through governed grouper software, making transparency and auditability first-order concerns. We release NordDRG-AI-Benchmark, the first public, rule-complete test bed for DRG reasoning. The package includes (i) machine-readable approximately 20-sheet NordDRG definition tables and (ii) expert manuals and change-log templates that capture governance workflows. It exposes two suites: a 13-task Logic benchmark (code lookup, cross-table inference, grouping features, multilingual terminology, and CC/MCC validity checks) and a 13-task Grouper benchmark that requires full DRG grouper emulation with strict exact-match scoring on both the DRG and the triggering drg_logic.id. Lightweight reference agents (LogicAgent, GrouperAgent) enable artefact-only evaluation. Under an artefact-only (no web) setting, on the 13 Logic tasks GPT-5 Thinking and Opus 4.1 score 13/13, o3 scores 12/13; mid-tier models (GPT-5 Thinking Mini, o4-mini, GPT-5 Fast) achieve 6-8/13, and remaining models score 5/13 or below. On full grouper emulation across 13 tasks, GPT-5 Thinking solves 7/13, o3 6/13, o4-mini 3/13; GPT-5 Thinking Mini solves 1/13, and all other tested endpoints score 0/13. To our knowledge, this is the first public report of an LLM partially emulating the complete NordDRG grouper logic with governance-grade traceability. Coupling a rule-complete release with exact-match tasks and open scoring provides a reproducible yardstick for head-to-head and longitudinal evaluation in hospital funding. Benchmark materials available in Github.
We present HADA (Human-AI Agent Decision Alignment), a protocol- and framework agnostic reference architecture that keeps both large language model (LLM) agents and legacy algorithms aligned with organizational targets and values. HADA wraps any algorithm or LLM in role-specific stakeholder agents -- business, data-science, audit, ethics, and customer -- each exposing conversational APIs so that technical and non-technical actors can query, steer, audit, or contest every decision across strategic, tactical, and real-time horizons. Alignment objectives, KPIs, and value constraints are expressed in natural language and are continuously propagated, logged, and versioned while thousands of heterogeneous agents run on different orchestration stacks. A cloud-native proof of concept packages a production credit-scoring model (getLoanDecision) and deploys it on Docker/Kubernetes/Python; five scripted retail-bank scenarios show how target changes, parameter tweaks, explanation requests, and ethics triggers flow end to end through the architecture. Evaluation followed the Design-Science Research Methodology. Walkthrough observation and log inspection demonstrated complete coverage of six predefined objectives: every role could invoke conversational control, trace KPIs and value constraints, detect and mitigate ZIP-code bias, and reproduce full decision lineage, independent of the underlying LLM or agent library. Contributions: (1) an open-source HADA architecture, (2) a mid-range design theory for human-AI alignment in multi-agent systems, and (3) empirical evidence that framework-agnostic, protocol-compliant stakeholder agents improve accuracy, transparency, and ethical compliance in real-world decision pipelines.
Analysts require attribution, as nothing can be reported without knowing the source of the information. In this paper, we will focus on automatic methods for attribution, linking each sentence in the summary to a portion of the source text, which may be in one or more documents. We explore using a hybrid summarization, i.e., an automatic paraphrase of an extractive summary, to ease attribution. We also use a custom topology to identify the proportion of different categories of attribution-related errors.
Timur Galimzyanov, Olga Kolomyttseva, Egor Bogomolov
We study retrieval design for code-focused generation tasks under realistic compute budgets. Using two complementary tasks from Long Code Arena -- code completion and bug localization -- we systematically compare retrieval configurations across various context window sizes along three axes: (i) chunking strategy, (ii) similarity scoring, and (iii) splitting granularity. (1) For PL-PL, sparse BM25 with word-level splitting is the most effective and practical, significantly outperforming dense alternatives while being an order of magnitude faster. (2) For NL-PL, proprietary dense encoders (Voyager-3 family) consistently beat sparse retrievers, however requiring 100x larger latency. (3) Optimal chunk size scales with available context: 32-64 line chunks work best at small budgets, and whole-file retrieval becomes competitive at 16000 tokens. (4) Simple line-based chunking matches syntax-aware splitting across budgets. (5) Retrieval latency varies by up to 200x across configurations; BPE-based splitting is needlessly slow, and BM25 + word splitting offers the best quality-latency trade-off. Thus, we provide evidence-based recommendations for implementing effective code-oriented RAG systems based on task requirements, model constraints, and computational efficiency.
Fully Test-Time Adaptation (FTTA) addresses domain shifts without access to source data and training protocols of the pre-trained models. Traditional strategies that align source and target feature distributions are infeasible in FTTA due to the absence of training data and unpredictable target domains. In this work, we exploit a dual perspective on FTTA, and propose Agnostic FTTA (AFTTA) as a novel formulation that enables the usage of off-the-shelf domain transformations during test-time to enable direct generalization to unforeseeable target data. To address this, we develop an uncover-and-unlearn approach. First, we uncover potential unwanted shifts between source and target domains by simulating them through predefined mappings and consider them as nuisances. Then, during test-time prediction, the model is enforced to unlearn these nuisances by regularizing the consequent shifts in latent representations and label predictions. Specifically, a mutual information-based criterion is devised and applied to guide nuisances unlearning in the feature space and encourage confident and consistent prediction in label space. Our proposed approach explicitly addresses agnostic domain shifts, enabling superior model generalization under FTTA constraints. Extensive experiments on various tasks, involving corruption and style shifts, demonstrate that our method consistently outperforms existing approaches.
This paper introduces a formal framework for modeling observer-dependent collapse dynamics and temporal identity drift within artificial and mathematical systems, grounded entirely in the symbolic foundations of Alpay Algebra. Building upon the fixed-point emergence structures developed in Alpay Algebra I and II, this third installment formalizes the observer-coupled φ-collapse process through transfinite categorical flows and curvature-driven identity operators. We define a novel temporal drift mechanism as a recursive deformation of identity signatures under entangled observer influence, constructing categorical invariants that evolve across fold iterations. The proposed system surpasses conventional identity modeling in explainable AI (XAI) by encoding internal transformation history into a symbolic fixed-point structure, offering provable traceability and temporal coherence. Applications range from AI self-awareness architectures to formal logic systems where identity is not static but dynamically induced by observation. The theoretical results also offer a mathematically rigorous basis for future AI systems with stable self-referential behavior, positioning Alpay Algebra as a next-generation symbolic framework bridging category theory, identity logic, and observer dynamics.
Jeremy Zhengqi Huang, Caluã de Lacerda Pataca, Liang-Yuan Wu
et al.
Non-speech captions are essential to the video experience of deaf and hard of hearing (DHH) viewers, yet conventional approaches often overlook the diversity of their preferences. We present CapTune, a system that enables customization of non-speech captions based on DHH viewers' needs while preserving creator intent. CapTune allows caption authors to define safe transformation spaces using concrete examples and empowers viewers to personalize captions across four dimensions: level of detail, expressiveness, sound representation method, and genre alignment. Evaluations with seven caption creators and twelve DHH participants showed that CapTune supported creators' creative control while enhancing viewers' emotional engagement with content. Our findings also reveal trade-offs between information richness and cognitive load, tensions between interpretive and descriptive representations of sound, and the context-dependent nature of caption preferences.
In this paper we develop further the formalism of fibrations of configuration spaces as a tool for modelling motion of autonomous systems in variable environments. We analyse the situations when the external conditions may change during the motion of the system and analyse two possibilities: (a) when the behaviour of the external conditions is known in advance; and (b) when the future changes of the external conditions are unknown but we can measure the current state and the current velocity of the external conditions, at every moment of time. We prove that in the case (a) the complexity of the motion algorithm is the same as in the case of constant external conditions; this generalises the result of \cite{FGY}. In case (b) we introduce a new concept of a reaction mechanism which allows to take into account unexpected and unpredictable changes in the environment. A reaction mechanism is mathematically an infinitesimal lifting function on a fibre bundle, a nonlinear generalisation of the classical concept of an Ehresmann connection. We illustrate these notions by examples which show that nonlinear infinitesimal lifting function (reaction mechanisms) appear naturally, are inevitable and ubiquitous.
Incorporating generative artificial intelligence (GAI) in education has become crucial in contemporary educational environments. This research article thoroughly investigates the ramifications of implementing GAI in the higher education context of Saudi Arabia, employing a blend of quantitative and qualitative research approaches. Survey-based quantitative data reveals a noteworthy correlation between educators’ awareness of GAI and the frequency of its application. Notably, around half of the surveyed educators are at stages characterized by understanding and familiarity with GAI integration, indicating a tangible readiness for its adoption. Moreover, the study’s quantitative findings underscore the perceived value and ease associated with integrating GAI, thus reinforcing the assumption that educators are motivated and inclined to integrate GAI tools like ChatGPT into their teaching methodologies. In addition to the quantitative analysis, qualitative insights from in-depth interviews with educators unveil a rich tapestry of perspectives. The qualitative data emphasizes GAI’s role as a catalyst for collaborative learning, contributing to professional development, and fostering innovative teaching practices.
Colorectal cancer is an enormous health concern since it is among the most lethal types of malignancy. The manual examination has its limitations, including subjectivity and data overload. To overcome these challenges, computer-aided diagnostic systems focusing on image segmentation and abnormality classification have been developed. This study presents a two-stage approach for the automatic detection of five types of colorectal abnormalities in addition to a control group: polyp, low-grade intraepithelial neoplasia, high-grade intraepithelial neoplasia, serrated adenoma, adenocarcinoma. In the first stage, UNet3+ was used for image segmentation to locate the anomalies, while in the second stage, the Cross-Attention Multi-Scale Vision Transformer deep learning model was used to predict the type of anomaly after highlighting the anomaly on the raw images. In anomaly segmentation, UNet3+ achieved values of 0.9872, 0.9422, 0.9832, and 0.9560 for Dice Coefficient, Jaccard Index, Sensitivity, Specificity respectively. In anomaly detection, the Cross-Attention Multi-Scale Vision Transformer model attained a classification performance of 0.9340, 0.9037, 0.9446, 0.8723, 0.9102, 0.9849 for accuracy, F1 score, precision, recall, Matthews correlation coefficient, and specificity, respectively. The proposed approach proves its capacity to alleviate the overwhelm of pathologists and enhance the accuracy of colorectal cancer diagnosis by achieving high performance in both the identification of anomalies and the segmentation of regions.