We introduce WritePolicyBench, a benchmark for evaluating memory write policies: decision rules that choose what to store, merge, and evict under a strict byte budget while processing a stream with document/API drift. The benchmark provides (i) task generators with controlled non-stationarity, (ii) an explicit action interface for external memory, (iii) a byte-accurate cost model, and (iv) standardized metrics that measure both task success and budget efficiency.
Ben Asani, Olle Holmberg, Johannes B. Schiefelbein
et al.
Abstract Objectives To determine real-life quantitative changes in OCT biomarkers in a large set of treatment naive patients in a real-life setting undergoing anti-VEGF therapy. For this purpose, we devised a novel deep learning based semantic segmentation algorithm providing the first benchmark results for automatic segmentation of 11 OCT features including biomarkers for neovascular age-related macular degeneration (nAMD). Methods Training of a Deep U-net based semantic segmentation ensemble algorithm for state-of-the-art semantic segmentation performance which was used to analyze OCT features prior to, after 3 and 12 months of anti-VEGF therapy. Results High F1 scores of almost 1.0 for neurosensory retina and subretinal fluid on a separate hold-out test set with unseen patients. The algorithm performed worse for subretinal hyperreflective material and fibrovascular PED, on par with drusenoid PED, and better in segmenting fibrosis. In the evaluation of treatment naive OCT scans, significant changes occurred for intraretinal fluid (mean: 0.03 µm 3 to 0.01 µm 3 , p < 0.001), subretinal fluid (0.08 µm 3 to 0.01 µm 3 , p < 0.001), subretinal hyperreflective material (0.02 µm 3 to 0.01 µm 3 , p < 0.001), fibrovascular PED (0.12 µm 3 to 0.09 µm 3 , p = 0.02) and central retinal thickness C0 (225.78 µm 3 to 169.40 µm 3 ). The amounts of intraretinal fluid, fibrovascular PED, and ERM were predictive of poor outcome. Conclusions The segmentation algorithm allows efficient volumetric analysis of OCT scans. Anti-VEGF provokes most potent changes in the first 3 months while a gradual loss of RPE hints at a progressing decline of visual acuity. Additional research is required to understand how these accurate OCT predictions can be leveraged for a personalized therapy regimen.
Prime numbers are fundamental in number theory and play a significant role in various areas, from pure mathematics to practical applications, including cryptography. In this contribution, we introduce a multithreaded implementation of the Segmented Sieve algorithm. In our implementation, instead of handling large prime ranges in one iteration, the sieving process is broken down incrementally, which theoretically eliminates the challenges of working with large numbers, and can reduce memory usage, providing overall more efficient multi-core utilization over extended computations.
Queueing networks and Markov chains are widely used for conducting performance and reliability studies. In this paper we describe the queueing package, a free software package for queueing networks and Markov chain analysis for GNU Octave. The queueing package provides implementations of numerical algorithms for computing transient and steady-state performance measures of discrete and continuous Markov chains, and for steady-state analysis of single-station queueing systems and queueing networks. We illustrate the design principles of the queueing package, describe its most salient features and provide some usage examples.
We consider the c.o.c. redundancy system with $N$ parallel servers where incoming jobs are immediately replicated to $d$ servers chosen uniformly at random (without replacement). A job finishes service as soon as the first replica is completed, after which all the remaining replicas are abandoned. We compare the performance of the first-come first-served (FCFS) and processor-sharing (PS) discipline based on the stability condition, the tail behavior of the latency and the expected latency.
The authors have uploaded their artifact on Zenodo, which ensures a long-term retention of the artifact. The code is suitably documented, and some examples are given. A minimalistic overall description of the engine is provided. The artifact allows to setup the environment quite quickly, and the dependencies are well documented. The process to regenerate data for the figures in the paper completes, and all results are reproducible. This paper can thus receive the Artifacts Available badge and the Artifacts Evaluated-Functional. Given the high quality of the artifact, also the Artifacts Evaluated-Reusable badge can be assigned.
Retrial phenomenon naturally arises in various systems such as call centers, cellular networks and random access protocols in local area networks. This paper gives a comprehensive survey on theory and applications of retrial queues in these systems. We investigate the state of the art of the theoretical researches including exact solutions, stability, asymptotic analyses and multidimensional models. We present an overview on retrial models arising from real world applications. Some open problems and promising research directions are also discussed.
Sarath Pattathil, Vivek S. Borkar, Gaurav S. Kasbekar
We propose a dynamic formulation of file-sharing networks in terms of an average cost Markov decision process with constraints. By analyzing a Whittle-like relaxation thereof, we propose an index policy in the spirit of Whittle and compare it by simulations with other natural heuristics.
The main goal for this article is to compare performance penalties when using KVM virtualization and Docker containers for creating isolated environments for HPC applications. The article provides both data obtained using commonly accepted synthetic tests (High Performance Linpack) and real life applications (OpenFOAM). The article highlights the influence on resulting application performance of major infrastructure configuration options: CPU type presented to VM, networking connection type used.
We represent a computer cluster as a multi-server queue with some arbitrary bipartite graph of compatibilities between jobs and servers. Each server processes its jobs sequentially in FCFS order. The service rate of a job at any given time is the sum of the service rates of all servers processing this job. We show that the corresponding queue is quasi-reversible and use this property to design a scheduling algorithm achieving balanced fair sharing of the service capacity.
We present an extension of the window flow control analysis by R. Agrawal et.al. (Reference [1]), C.-S. Chang (Reference [6]), and C.-S. Chang et. al. (Reference [8]) to a system with random service time and fixed feedback delay. We consider two network service models. In the first model, the network service process itself has no time correlations. The second model addresses a two-state Markov-modulated service.
We consider in this paper a non work-conserving Generalized Processor Sharing (GPS) system composed of two queues with Poisson arrivals and exponential service times. Using general results due to Fayolle \emph{et al}, we first establish the stability condition for this system. We then determine the functional equation satisfied by the generating function of the numbers of jobs in both queues and the associated Riemann-Hilbert problem. We prove the existence and the uniqueness of the solution. This allows us to completely characterize the system, in particular to compute the empty queue probability. We finally derive the tail asymptotics of the number of jobs in one queue.
Multiclass FIFO is used in communication networks such as in input-queueing routers/switches and in wireless networks. For the concern of providing service guarantees in such networks, it is crucial to have analytical results, e.g. bounds, on the performance of multi-class FIFO. Surprisingly, there are few such results in the literature. This paper is devoted to filling the gap. Specifically, a single hop deterministic case is studied, for which, delay and backlog bounds are derived, in addition to guaranteed rate and service curve characterizations that may be exploited to extend the analysis to network cases.
This report considers a fairly general model of constrained queuing networks that allows us to represent both MMBP (Markov Modulated Bernoulli Processes) arrivals and time-varying service constraints. We derive a set of sufficient conditions for throughput optimality of scheduling policies that encompass and generalize all the previously obtained results in the field. This leads to the definition of new classes of (non diagonal) throughput optimal scheduling policies. We prove the stability of queues by extending the traditional Lyapunov drift criteria methodology.
We present the derivation of post-processing SNR for Minimum-Mean-Squared-Error (MMSE) receivers with imperfect channel estimates, and show that it is an accurate indicator of the error rate performance of MIMO systems in the presence of channel estimation error. Simulation results show the tightness of the analysis.
Intel Array Building Blocks is a high-level data-parallel programming environment designed to produce scalable and portable results on existing and upcoming multi- and many-core platforms. We have chosen several mathematical kernels - a dense matrix-matrix multiplication, a sparse matrix-vector multiplication, a 1-D complex FFT and a conjugate gradients solver - as synthetic benchmarks and representatives of scientific codes and ported them to ArBB. This whitepaper describes the ArBB ports and presents performance and scaling measurements on the Westmere-EX based system SuperMIG at LRZ in comparison with OpenMP and MKL.
We present a performance model for bandwidth limited loop kernels which is founded on the analysis of modern cache based microarchitectures. This model allows an accurate performance prediction and evaluation for existing instruction codes. It provides an in-depth understanding of how performance for different memory hierarchy levels is made up. The performance of raw memory load, store and copy operations and a stream vector triad are analyzed and benchmarked on three modern x86-type quad-core architectures in order to demonstrate the capabilities of the model.