We formalize a constructive subclass of locality-preserving deterministic operators acting on graph-indexed state systems. We define the class of Bounded Local Generator Classes (BLGC), consisting of finite-range generators operating on bounded state spaces under deterministic composition. Within this class, incremental update cost is independent of total system dimension. We prove that, under the BLGC assumptions, per-step operator work satisfies W_t = O(1) as the number of nodes M \to \infty, establishing a structural decoupling between global state size and incremental computational effort. The framework admits a Hilbert-space embedding in \ell^2(V; \mathbb{R}^d) and yields bounded operator norms on admissible subspaces. The result applies specifically to the defined subclass and does not claim universality beyond the stated locality and boundedness constraints.
Autoware is an autonomous driving system implemented on Robot Operation System (ROS) 2, where an end-to-end timing guarantee is crucial to ensure safety. However, existing ROS 2 cause-effect chain models for analyzing end-to-end latency struggle to accurately represent the complexities of Autoware, particularly regarding sync callbacks, queue consumption patterns, and feedback loops. To address these problems, we propose a new scheduling model that decomposes the end-to-end timing constraints of Autoware into local relative deadlines for each sub-DAG. This multi-deadline DAG scheduling model avoids the need for complex analysis of data flows through queues and loops, while ensuring that all callbacks receive data within correct intervals. Furthermore, we extend the Global Earliest Deadline First (GEDF) algorithm for the proposed model and evaluate its effectiveness using a synthetic workload derived from Autoware.
Yifan Dai, Jacopo Tagliabue, Andrea Arpaci-Dusseau
et al.
Bauplan is a FaaS-based lakehouse specifically built for data pipelines: its execution engine uses Apache Arrow for data passing between the nodes in the DAG. While Arrow is known as the "zero copy format", in practice, limited Linux kernel support for shared memory makes it difficult to avoid copying entirely. In this work, we introduce several new techniques to eliminate nearly all copying from pipelines: in particular, we implement a new kernel module that performs de-anonymization, thus eliminating a copy to intermediate data. We conclude by sharing our preliminary evaluation on different workloads types, as well as discussing our plan for future improvements.
Write-ahead logs (WALs) are a fundamental fault-tolerance technique found in many areas of computer science. WALs must be reliable while maintaining high performance, because all operations will be written to the WAL to ensure their stability. Without reliability a WAL is useless, because its utility is tied to its ability to recover data after a failure. In this paper we describe our experience creating a prototype user space WAL in Rust. We observed that Rust is easy to use, compact and has a very rich set of libraries. More importantly, we have found that the overhead is minimal, with the WAL prototype operating at basically the expected performance of the stable memory device.
This paper reports our experience of providing lightweight correctness guarantees to an open-source Rust OS, Theseus. First, we report new developments in intralingual design that leverage Rust's type system to enforce additional invariants at compile time, trusting the Rust compiler. Second, we develop a hybrid approach that combines formal verification, type checking, and informal reasoning, showing how the type system can assist in increasing the scope of formally verified invariants. By slightly lessening the strength of correctness guarantees, this hybrid approach substantially reduces the proof effort. We share our experience in applying this approach to the memory subsystem and the 10 Gb Ethernet driver of Theseus, demonstrate its utility, and quantify its reduced proof effort.
We leverage eBPF in order to implement custom policies in the Linux memory subsystem. Inspired by CBMM, we create a mechanism that provides the kernel with hints regarding the benefit of promoting a page to a specific size. We introduce a new hook point in Linux page fault handling path for eBPF programs, providing them the necessary context to determine the page size to be used. We then develop a framework that allows users to define profiles for their applications and load them into the kernel. A profile consists of memory regions of interest and their expected benefit from being backed by 4KB, 64KB and 2MB pages. In our evaluation, we profiled our workloads to identify hot memory regions using DAMON.
In many use cases the execution time of tasks is unknown and can be chosen by the designer to increase or decrease the application features depending on the availability of processing capacity. If the application has real-time constraints, such as deadlines, then the necessary and sufficient schedulability test must allow the execution times to be left unspecified. By doing so, the designer can then perform optimization of the execution times by picking the schedulable values that minimize any given cost. In this paper, we review existing results on the formulation of both the Fixed Priority and Earliest Deadline First exact schedulability constraints. The reviewed formulations are expressed by a combination of linear constraints, which enables then optimization routines.
O artigo pretende revisar a dualidade índio do mato e índio do rio atribuída ao sistema social da região do Alto Rio Negro, a partir da posição dos Yuhupdëh, comumente referidos como Makú e índios do mato. Para tanto, será problematizada a distinção de duas figuras sociais que emergem dessa dualidade na qual, de um lado, encontra-se o hierárquico, o sedentário, o rígido, o fechado, o exogâmico, o horticultor, o ‘civilizado’; e de outro, o igualitário, o nômade, o fluido, o aberto, o endogâmico, o caçador, o ‘primitivo’. O experimento desse artigo é tratar a questão da hierarquia e da igualdade que subjaz à distinção índio do mato e índio do rio a partir de dois deslocamentos. O primeiro deslocamento consiste em examinar o problema da hierarquia e da igualdade a partir da perspectiva específica de alguns grupos yuhupdëh. O segundo deslocamento consiste em analisar e revisitar criticamente o modelo da teoria da dominação pressuposta na relação assimétrica entre os grupos yuhupdëh.
Persistent memory (PM) technologies have inspired a wide range of PM-based system optimizations. However, building correct PM-based systems is difficult due to the unique characteristics of PM hardware. To better understand the challenges as well as the opportunities to address them, this paper presents a comprehensive study of PM-related issues in the Linux kernel. By analyzing 1,553 PM-related kernel patches in-depth and conducting experiments on reproducibility and tool extension, we derive multiple insights in terms of PM patch categories, PM bug patterns, consequences, fix strategies, triggering conditions, and remedy solutions. We hope our results could contribute to the development of robust PM-based storage systems
In the realm of computer systems, efficient utilisation of the CPU (Central Processing Unit) has always been a paramount concern. Researchers and engineers have long sought ways to optimise process execution on the CPU, leading to the emergence of CPU scheduling as a field of study. This research proposes a novel algorithm for batch processing that operates on a preemptive model, dynamically assigning priorities based on a robust ratio, employing a dynamic time slice, and utilising periodic sorting technique to achieve fairness. By engineering this responsive and fair model, the proposed algorithm strikes a delicate balance between efficiency and fairness, providing an optimised solution for batch scheduling while ensuring system responsiveness.
Interprocess communication, IPC, is one of the most fundamental functions of a modern operating system, playing an essential role in the fabric of contemporary applications. This report conducts an investigation in FreeBSD of the real world performance considerations behind two of the most common IPC mechanisms; pipes and sockets. A simple benchmark provides a fair sense of effective bandwidth for each, and analysis using DTrace, hardware performance counters and the operating system's source code is presented. We note that pipes outperform sockets by 63% on average across all configurations, and further that the size of userspace transmission buffers has a profound effect on performance - larger buffers are beneficial up to a point (~32-64 KiB) after which performance collapses as a result of devastating cache exhaustion. A deep scrutiny of the probe effects at play is also presented, justifying the validity of conclusions drawn from these experiments.
Typically, even low-level operating system concepts, such as resource sharing strategies and predictability measures, are evaluated with Linux on PC hardware. This leaves a large gap to real industrial applications. Hence, the direct transfer of the results might be difficult. As a solution, we present toki, a prototyping and evaluation platform based on FreeRTOS and several open-source libraries. toki comes with a unified build- and test-environment based on Yocto and Qemu, which makes it well suited for rapid prototyping. With its architecture chosen similar to production industrial systems, toki provides the ground work to implement early prototypes of real-time systems research results, up to technology readiness level 7, with little effort.
This paper examines disaggregated data center architectures from the perspective of the applications that would run on these data centers, and challenges the abstractions that have been proposed to date. In particular, we argue that operating systems for disaggregated data centers should not abstract disaggregated hardware resources, such as memory, compute, and storage away from applications, but should instead give them information about, and control over, these resources. To this end, we propose additional OS abstractions and interfaces for disaggregation and show how they can improve data transfer in data parallel frameworks and speed up failure recovery in replicated, fault-tolerant applications. This paper studies the technical challenges in providing applications with this additional functionality and advances several preliminary proposals to overcome these challenges.
Modern operating systems are typically POSIX-compliant with major system calls specified decades ago. The next generation of non-volatile memory (NVM) technologies raise concerns about the efficiency of the traditional POSIX-based systems. As one step toward building high performance NVM systems, we explore the potential dependencies between system call performance and major hardware components (e.g., CPU, memory, storage) under typical user cases (e.g., software compilation, installation, web browser, office suite) in this paper. We build histograms for the most frequent and time-consuming system calls with the goal to understand the nature of distribution on different platforms. We find that there is a strong dependency between the system call performance and the CPU architecture. On the other hand, the type of persistent storage plays a less important role in affecting the performance.
El artículo presenta la llegada del nuevo milenio, un número cada vez mayor de empresarios se unieron a la aplicación del diseño sostenible que comenzó a replantearse en las empresas y el rol que juegan con el desarrollo del medio ambiente, el planeta y en la sociedad. Podemos decir que el diseño sostenible busca generar soluciones a través de servicios y estilos de vida, pero no exclusivamente a través de objetos. Con el fin de introducir una definición elaborada de diseño sostenible es necesario mencionar los sistemas sostenibles, que básicamente, se refieren a cualquier tipo de red o servicio social que puede existir y replicarse. Además de sistemas sostenibles hay otros principios dentro del diseño sostenible. Por último, cualquier tipo de resultado obtenido para satisfacer la necesidad debe ser sostenible a largo plazo entendiéndose como un proceso que permita una comunidad lograr un resultado a través de estrategias de diseño.
We consider the partitioned scheduling problem of multimode real-time systems upon identical multiprocessor platforms. During the execution of a multimode system, the system can change from one mode to another such that the current task set is replaced with a new one. In this paper, we consider a synchronous transition protocol in order to take into account mode-independent tasks, i.e., tasks of which the execution pattern must not be jeopardized by the mode changes. We propose two methods for handling mode changes in partitioned scheduling. The first method is offline/optimal and computes a static allocation of tasks schedulable and respecting both tasks and transition deadlines (if any). The second approach is subject to a sufficient condition in order to ensure online First Fit based allocation to satisfy the timing constraints.
Operating systems are currently viewed ostensively. As a result they mean different things to different people. The ostensive character makes it is hard to understand OSes formally. An intensional view can enable better formal work, and also offer constructive support for some important problems, e.g. OS architecture. This work argues for an intensional view of operating systems. It proposes to overcome the current ostensive view by defining an OS based on formal models of computation, and also introduces some principles. Together these are used to develop a framework of algorithms of single processor OS structure using an approach similar to function level programming. In this abridged paper we illustrate the essential approach, discuss some advantages and limitations and point out some future possibilities.
Elsayed Saad, Medhat Awadalla, Mohamed Shalan
et al.
Efficient task partitioning plays a crucial role in achieving high performance at multiprocessor plat forms. This paper addresses the problem of energy-aware static partitioning of periodic real-time tasks on heterogeneous multiprocessor platforms. A Particle Swarm Optimization variant based on Min-min technique for task partitioning is proposed. The proposed approach aims to minimize the overall energy consumption, meanwhile avoid deadline violations. An energy-aware cost function is proposed to be considered in the proposed approach. Extensive simulations and comparisons are conducted in order to validate the effectiveness of the proposed technique. The achieved results demonstrate that the proposed partitioning scheme significantly surpasses previous approaches in terms of both number of iterations and energy savings.
Sidney Amani, Peter Chubb, Alastair F. Donaldson
et al.
We develop a practical solution to the problem of automatic verification of the interface between device drivers and the OS. Our solution relies on a combination of improved driver architecture and verification tools. It supports drivers written in C and can be implemented in any existing OS, which sets it apart from previous proposals for verification-friendly drivers. Our Linux-based evaluation shows that this methodology amplifies the power of existing verification tools in detecting driver bugs, making it possible to verify properties beyond the reach of traditional techniques.
Rakesh Mohanty, Manas Das, M. Lakshmi Prasanna
et al.
In this paper, we have proposed a new variant of Round Robin scheduling algorithm by executing the processes according to the new calculated Fit Factor f and using the concept of dynamic time quantum. We have compared the performance of our proposed Fittest Job First Dynamic Round Robin(FJFDRR) algorithm with the Priority Based Static Round Robin(PBSRR) algorithm. Experimental results show that our proposed algorithm performs better than PBSRR in terms of reducing the number of context switches, average waiting time and average turnaround time.