Specializing systems to specifics of the workload they serve and platform they are running on often significantly improves performance. However, specializing systems is difficult in practice because of compounding challenges: i) complexity for the developers to determine and implement optimal specialization; ii) inherent loss of generality of the resulting implementation, and iii) difficulty in identifying and implementing a single optimal specialized configuration for the messy reality of modern systems. To address this, we introduce Iridescent, a framework for automated online system specialization guided by observed overall system performance. Iridescent lets developers specify a space of possible specialization choices, and then at runtime generates and runs different specialization choices through JIT compilation as the system runs. By using overall system performance metrics to guide this search, developers can use Iridescent to find optimal system specializations for the hardware and workload conditions at a given time. We demonstrate feasibility, effectivity, and ease of use.
Wonkyo Choe, Rongxiang Wang, Afsara Benazir
et al.
Proto is a new instructional OS that runs on commodity, portable hardware. It showcases modern features, including per-app address spaces, threading, commodity filesystems, USB, DMA, multicore support, self-hosted debugging, and a window manager. It supports rich applications such as 2D/3D games, music and video players, and a blockchain miner. Unlike traditional instructional systems, Proto emphasizes engaging, media-rich apps that go beyond basic terminal programs. Our method breaks down a full-featured OS into a set of incremental, self-contained prototypes. Each prototype introduces a minimal set of OS mechanisms, driven by the needs of specific apps. The construction process then progressively enables these apps by bringing up one mechanism at a time. Proto enables a wider audience to experience building a self-contained software system used in daily life
Deep learning training relies on periodic checkpoints to recover from failures, but unsafe checkpoint installation can leave corrupted files on disk. This paper presents an experimental study of checkpoint installation protocols and integrity validation for AI training on macOS/APFS. We implement three write modes with increasing durability guarantees: unsafe (baseline, no fsync), atomic_nodirsync (file-level durability via fsync()), and atomic_dirsync (file + directory durability). We design a format-agnostic integrity guard using SHA-256 checksums with automatic rollback. Through controlled experiments including crash injection (430 unsafe-mode trials) and corruption injection (1,600 atomic-mode trials), we demonstrate that the integrity guard detects 99.8-100% of corruptions with zero false positives. Performance overhead is 56.5-108.4% for atomic_nodirsync and 84.2-570.6% for atomic_dirsync relative to the unsafe baseline. Our findings quantify the reliability-performance trade-offs and provide deployment guidance for production AI infrastructure.
Aditya Dani, Shardul Mangade, Piyush Nimbalkar
et al.
The growing value of data as a strategic asset has given rise to the necessity of implementing reliable backup and recovery solutions in the most efficient and cost-effective manner. The data backup methods available today on linux are not effective enough, because while running, most of them block I/Os to guarantee data integrity. We propose and implement Next4 - file system based snapshot feature in Ext4 which creates an instant image of the file system, to provide incremental versions of data, enabling reliable backup and data recovery. In our design, the snapshot feature is implemented by efficiently infusing the copy-on-write strategy in the write-in-place, extent based Ext4 file system, without affecting its basic structure. Each snapshot is an incremental backup of the data within the system. What distinguishes Next4 is the way that the data is backed up, improving both space utilization as well as performance.
Hayley LeBlanc, Nathan Taylor, James Bornholt
et al.
This work introduces a new approach to building crash-safe file systems for persistent memory. We exploit the fact that Rust's typestate pattern allows compile-time enforcement of a specific order of operations. We introduce a novel crash-consistency mechanism, Synchronous Soft Updates, that boils down crash safety to enforcing ordering among updates to file-system metadata. We employ this approach to build SquirrelFS, a new file system with crash-consistency guarantees that are checked at compile time. SquirrelFS avoids the need for separate proofs, instead incorporating correctness guarantees into the typestate itself. Compiling SquirrelFS only takes tens of seconds; successful compilation indicates crash consistency, while an error provides a starting point for fixing the bug. We evaluate SquirrelFS against state of the art file systems such as NOVA and WineFS, and find that SquirrelFS achieves similar or better performance on a wide range of benchmarks and applications.
GPU remoting is a promising technique for supporting AI applications. Networking plays a key role in enabling remoting. However, for efficient remoting, the network requirements in terms of latency and bandwidth are unknown. In this paper, we take a GPU-centric approach to derive the minimum latency and bandwidth requirements for GPU remoting, while ensuring no (or little) performance degradation for AI applications. Our study including theoretical model demonstrates that, with careful remoting design, unmodified AI applications can run on the remoting setup using commodity networking hardware without any overhead or even with better performance, with low network demands.
Operational and performance characteristics of flash SSDs have long been associated with a set of Unwritten Contracts due to their hidden, complex internals and lack of control from the host software stack. These unwritten contracts govern how data should be stored, accessed, and garbage collected. The emergence of Zoned Namespace (ZNS) flash devices with their open and standardized interface allows us to write these unwritten contracts for the storage stack. However, even with a standardized storage-host interface, due to the lack of appropriate end-to-end operational data collection tools, the quantification and reasoning of such contracts remain a challenge. In this paper, we propose zns.tools, an open-source framework for end-to-end event and metadata collection, analysis, and visualization for the ZNS SSDs contract analysis. We showcase how zns.tools can be used to understand how the combination of RocksDB with the F2FS file system interacts with the underlying storage. Our tools are available openly at \url{https://github.com/stonet-research/zns-tools}.
In this work, we propose MUSTACHE, a new page cache replacement algorithm whose logic is learned from observed memory access requests rather than fixed like existing policies. We formulate the page request prediction problem as a categorical time series forecasting task. Then, our method queries the learned page request forecaster to obtain the next $k$ predicted page memory references to better approximate the optimal Bélády's replacement algorithm. We implement several forecasting techniques using advanced deep learning architectures and integrate the best-performing one into an existing open-source cache simulator. Experiments run on benchmark datasets show that MUSTACHE outperforms the best page replacement heuristic (i.e., exact LRU), improving the cache hit ratio by 1.9% and reducing the number of reads/writes required to handle cache misses by 18.4% and 10.3%.
Fabien Bouquillon, Clément Ballabriga, Giuseppe Lipari
et al.
The predictability of a system is the condition to give saferbound on worst case execution timeof real-time tasks which are running on it. Commercial off-the-shelf(COTS) processors are in-creasingly used in embedded systems and contain shared cache memory. This component hasa hard predictable behavior because its state depends of theexecution history of the systems.To increase predictability of COTS component we use cache coloring, a technique widely usedto partition cache memory. Our main contribution is a WCET aware heuristic which parti-tion task according to the needs of each task. Our experiments are made with CPLEX an ILPsolver with random tasks set generated running on preemptive system scheduled with earliestdeadline first(EDF).
In some important application areas of hard real-time systems, preemptive sporadic tasks with harmonic periods and constraint deadlines running upon a uni-processor platform play an important role. We propose a new algorithm for determining the exact worst-case response time for a task that has a lower computational complexity (linear in the number of tasks) than the known algorithm developed for the same system class. We also allow the task executions to start delayed due to release jitter if they are within certain value ranges. For checking if these constraints are met we define a constraint programming problem that has a special structure and can be solved with heuristic components in a time that is linear in the task number. If the check determines the admissibility of the jitter values, the linear time algorithm can be used to determine the worst-case response time also for jitter-aware systems.
Carlos San Vicente Gutiérrez, Lander Usategui San Juan, Irati Zamalloa Ugarte
et al.
As robotics systems become more distributed, the communications between different robot modules play a key role for the reliability of the overall robot control. In this paper, we present a study of the Linux communication stack meant for real-time robotic applications. We evaluate the real-time performance of UDP based communications in Linux on multi-core embedded devices as test platforms. We prove that, under an appropriate configuration, the Linux kernel greatly enhances the determinism of communications using the UDP protocol. Furthermore, we demonstrate that concurrent traffic disrupts the bounded latencies and propose a solution by separating the real-time application and the corresponding interrupt in a CPU.
Energy efficiency is one of the most critical design criteria for modern embedded systems such as multiprocessor system-on-chips (MPSoCs). Dynamic voltage and frequency scaling (DVFS) and dynamic power management (DPM) are two major techniques for reducing energy consumption in such embedded systems. Furthermore, MPSoCs are becoming more popular for many real-time applications. One of the challenges of integrating DPM with DVFS and task scheduling of real-time applications on MPSoCs is the modeling of idle intervals on these platforms. In this paper, we present a novel approach for modeling idle intervals in MPSoC platforms which leads to a mixed integer linear programming (MILP) formulation integrating DPM, DVFS, and task scheduling of periodic task graphs subject to a hard deadline. We also present a heuristic approach for solving the MILP and compare its results with those obtained from solving the MILP.
The emerging hybrid DRAM-NVM architecture is challenging the existing memory management mechanism in operating system. In this paper, we introduce memos, which can schedule memory resources over the entire memory hierarchy including cache, channels, main memory comprising DRAM and NVM simultaneously. Powered by our newly designed kernel-level monitoring module and page migration engine, memos can dynamically optimize the data placement at the memory hierarchy in terms of the on-line memory patterns, current resource utilization and feature of memory medium. Our experimental results show that memos can achieve high memory utilization, contributing to system throughput by 19.1% and QoS by 23.6% on average. Moreover, memos can reduce the NVM side memory latency by 3~83.3%, energy consumption by 25.1~99%, and benefit the NVM lifetime significantly (40X improvement on average).
Houssam Eddine Zahaf, Giuseppe Lipari, Luca Abeni
et al.
This paper presents a new strategy for scheduling soft real-time tasks on multiple identical cores. The proposed approach is based on partitioned CPU reservations and it uses a reclaiming mechanism to reduce the number of missed deadlines. We introduce the possibility for a task to temporarily migrate to another, less charged, CPU when it has exhausted the reserved bandwidth on its allocated CPU. In addition, we propose a simple load balancing method to decrease the number of deadlines missed by the tasks. The proposed algorithm has been evaluated through simulations, showing its effectiveness (compared to other multi-core reclaiming approaches) and comparing the performance of different partitioning heuristics (Best Fit, Worst Fit and First Fit).
Myopic is a hard real-time process scheduling algorithm that selects a suitable process based on a heuristic function from a subset (Window)of all ready processes instead of choosing from all available processes, like original heuristic scheduling algorithm. Performance of the algorithm significantly depends on the chosen heuristic function that assigns weight to different parameters like deadline, earliest starting time, processing time etc. and the sizeof the Window since it considers only k processes from n processes (where, k<= n). This research evaluates the performance of the Myopic algorithm for different parameters to demonstrate the merits and constraints of the algorithm. A comparative performance of the impact of window size in implementing the Myopic algorithm is presented and discussed through a set of experiments.
Youcheng Sun, Giuseppe Lipari, Étienne André
et al.
We propose here a framework to model real-time components consisting of concurrent real-time tasks running on a single processor, using parametric timed automata. Our framework is generic and modular, so as to be easily adapted to different schedulers and more complex task models. We first perform a parametric schedulability analysis of the components using the inverse method. We show that the method unfortunately does not provide satisfactory results when the task periods are consid- ered as parameters. After identifying and explaining the problem, we present a solution adapting the model by making use of the worst-case scenario in schedulability analysis. We show that the analysis with the inverse method always converges on the modified model when the system load is strictly less than 100%. Finally, we show how to use our parametric analysis for the generation of timed interfaces in compositional system design.
With the advancement in the automation industry, to perform complex remote operations is required. Advancements in the networking technology has led to the development of different architectures to implement control from a large distance. In various control applications of the modern industry, the agents, such as sensors, actuators, and controllers are basically geographically distributed. For efficient working of a control application, all of the agents have to exchange information through a communication media. At present, an increasing number of distributed control systems are based on platforms made up of conventional PCs running open-source real-time operating systems. Often, these systems needed to have networked devices supporting synchronized operations with respect to each node. A framework is studied that relies on standard software and protocol as RTAI, EtherCAT, RTnet and IEEE 1588. RTAI and its various protocols are studied in network control systems environment.
Traditional monolithic kernels dominated kernel structures for long time along with small sized kernels,few hardware companies and limited kernel functionalities. Monolithic kernel structure was not applicable when the number of hardware companies increased and kernel services consumed by different users for many purposes. One of the biggest disadvantages of the monolithic kernels is the inflexibility due to the need to include all the available modules in kernel compilation causing high time consuming. Lately, new kernel structure was introduced through multicore operating systems. Unfortunately, many multicore operating systems such as barrelfish and FOS are experimental. This paper aims to simulate the performance of multicore hybrid kernels through dynamic kernel module customized attachment/ deattachment for multicore machines. In addition, this paper proposes a new technique for loading dynamic kernel modules based on the user needs and machine capabilities.
Changing functional and non-functional software implementation at runtime is useful and even sometimes critical both in development and production environments. JooFlux is a JVM agent that allows both the dynamic replacement of method implementations and the application of aspect advices. It works by doing bytecode transformation to take advantage of the new invokedynamic instruction added in Java SE 7 to help implementing dynamic languages for the JVM. JooFlux can be managed using a JMX agent so as to operate dynamic modifications at runtime, without resorting to a dedicated domain-specific language. We compared JooFlux with existing AOP platforms and dynamic languages. Results demonstrate that JooFlux performances are close to the Java ones --- with most of the time a marginal overhead, and sometimes a gain --- where AOP platforms and dynamic languages present significant overheads. This paves the way for interesting future evolutions and applications of JooFlux.