Hasil untuk "cs.PF"

Menampilkan 20 dari ~1 hasil · dari arXiv, DOAJ

JSON API
arXiv Open Access 2024
Overhead Measurement Noise in Different Runtime Environments

David Georg Reichelt, Reiner Jung, André van Hoorn

In order to detect performance changes, measurements are performed with the same execution environment. In cloud environments, the noise from different processes running on the same cluster nodes might change measurement results and thereby make performance changes hard to measure. The benchmark MooBench determines the overhead of different observability tools and is executed continuously. In this study, we compare the suitability of different execution environments to benchmark the observability overhead using MooBench. To do so, we compare the execution times and standard deviation of MooBench in a cloud execution environment to three bare-metal execution environments. We find that bare metal servers have lower runtime and standard deviation for multi-threaded MooBench execution. Nevertheless, we see that performance changes up to 4.41% are detectable by GitHub actions, as long as only sequential workloads are examined.

en cs.PF
arXiv Open Access 2021
Asymptotic analysis of the sojourn time of a batch in an $M^{[X]}/M/1$ Processor Sharing Queue

Fabrice Guillemin, Alain Simonian, Ridha Nasri et al.

In this paper, we exploit results obtained in an earlier study for the Laplace transform of the sojourn time $Ω$ of an entire batch in the $M^{[X]}/M/1$ Processor Sharing (PS) queue in order to derive the asymptotic behavior of the complementary probability distribution function of this random variable, namely the behavior of $P(Ω>x)$ when $x$ tends to infinity. We precisely show that up to a multiplying factor, the behavior of $P(Ω>x)$ for large $x$ is of the same order of magnitude as $P(ω>x)$, where $ω$ is the sojourn time of an arbitrary job is the system. From a practical point of view, this means that if a system has to be dimensioned to guarantee processing time for jobs then the system can also guarantee processing times for entire batches by introducing a marginal amount of processing capacity.

en cs.PF, math.PR
arXiv Open Access 2020
Correlation Coefficient Analysis of the Age of Information in Multi-Source Systems

Yukang Jiang, Kiichi Tokuyama, Yuichiro Wada et al.

This paper studies the age of information (AoI) on an information updating system such that multiple sources share one server to process packets of updated information. In such systems, packets from different sources compete for the server, and thus they may suffer from being interrupted, being backlogged, and becoming stale. Therefore, in order to grasp structures of such systems, it is crucially important to study a metric indicating a correlation of different sources. In this paper, we aim to analyze the correlation of AoIs on a single-server queueing system with multiple sources. As our contribution, we provide the closed-form expression of the correlation coefficient of the AoIs. To this end, we first derive the Laplace-Stieltjes transform of the stationary distribution of each AoI for the multiple sources. Some nontrivial properties on the systems are revealed from our analysis results.

en cs.PF, math.PR
arXiv Open Access 2019
A Processor-Sharing model for the Performance of Virtualized Network Functions

Fabrice Guillemin, Veronica Quintuna Rodriguez, Alain Simonian

The parallel execution of requests in a Cloud Computing platform, as for Virtualized Network Functions, is modeled by an $M^{[X]}/M/1$ Processor-Sharing (PS) system, where each request is seen as a batch of unit jobs. The performance of such paralleled system can then be measured by the quantiles of the batch sojourn time distribution. In this paper, we address the evaluation of this distribution for the $M^{[X]}/M/1$-PS queue with batch arrivals and geometrically distributed batch size. General results on the residual busy period (after a tagged batch arrival time) and the number of unit jobs served during this residual busy period are first derived. This enables us to provide an approximation for the distribution tail of the batch sojourn time whose accuracy is confirmed by simulation.

en cs.PF
arXiv Open Access 2018
A Preliminary Study of Neural Network-based Approximation for HPC Applications

Wenqian Dong, Anzheng Guolu, Dong Li

Machine learning, as a tool to learn and model complicated (non)linear relationships between input and output data sets, has shown preliminary success in some HPC problems. Using machine learning, scientists are able to augment existing simulations by improving accuracy and significantly reducing latencies. Our ongoing research work is to create a general framework to apply neural network-based models to HPC applications. In particular, we want to use the neural network to approximate and replace code regions within the HPC application to improve performance (i.e., reducing the execution time) of the HPC application. In this paper, we present our preliminary study and results. Using two applications (the Newton-Raphson method and the Lennard-Jones (LJ) potential in LAMMP) for our case study, we achieve up to 2.7x and 2.46x speedup, respectively.

en cs.PF, cs.LG
arXiv Open Access 2018
Queuing Theoretic Models for Multicast and Coded-Caching in Downlink Wireless Systems

Mahadesh Panju, Ramkumar Raghu, Vinod Sharma et al.

We consider a server connected to $L$ users over a shared finite capacity link. Each user is equipped with a cache. File requests at the users are generated as independent Poisson processes according to a popularity profile from a library of $M$ files. The server has access to all the files in the library. Users can store parts of the files or full files from the library in their local caches. The server should send missing parts of the files requested by the users. The server attempts to fulfill the pending requests with minimal transmissions exploiting multicasting and coding opportunities among the pending requests. We study the performance of this system in terms of queuing delays for the naive multicasting and several coded multicasting schemes proposed in the literature. We also provide approximate expressions for the mean queuing delay for these models and establish their effectiveness with simulations.

en cs.PF
arXiv Open Access 2018
Enabling Cross-Event Optimization in Discrete-Event Simulation Through Compile-Time Event Batching

Marc Leinweber, Hannes Hartenstein, Philipp Andelfinger

A discrete-event simulation (DES) involves the execution of a sequence of event handlers dynamically scheduled at runtime. As a consequence, a priori knowledge of the control flow of the overall simulation program is limited. In particular, powerful optimizations supported by modern compilers can only be applied on the scope of individual event handlers, which frequently involve only a few lines of code. We propose a method that extends the scope for compiler optimizations in discrete-event simulations by generating batches of multiple events that are subjected to compiler optimizations as contiguous procedures. A runtime mechanism executes suitable batches at negligible overhead. Our method does not require any compiler extensions and introduces only minor additional effort during model development. The feasibility and potential performance gains of the approach are illustrated on the example of an idealized proof-ofconcept model. We believe that the applicability of the approach extends to general event-driven programs.

arXiv Open Access 2016
An ECM-based energy-efficiency optimization approach for bandwidth-limited streaming kernels on recent Intel Xeon processors

Johannes Hofmann, Dietmar Fey

We investigate an approach that uses low-level analysis and the execution-cache-memory (ECM) performance model in combination with tuning of hardware parameters to lower energy requirements of memory-bound applications. The ECM model is extended appropriately to deal with software optimizations such as non-temporal stores. Using incremental steps and the ECM model, we analytically quantify the impact of various single-core optimizations and pinpoint microarchitectural improvements that are relevant to energy consumption. Using a 2D Jacobi solver as example that can serve as a blueprint for other memory-bound applications, we evaluate our approach on the four most recent Intel Xeon E5 processors (Sandy Bridge-EP, Ivy Bridge-EP, Haswell-EP, and Broadwell-EP). We find that chip energy consumption can be reduced in the range of 2.0-2.4$\times$ on the examined processors.

en cs.PF
arXiv Open Access 2016
Automating Large-Scale Simulation and Data Analysis with OMNeT++: Lession Learned and Future Perspectives

Antonio Virdis, Carlo Vallati, Giovanni Nardini

Simulation is widely adopted in the study of modern computer networks. In this context, OMNeT++ provides a set of very effective tools that span from the definition of the network, to the automation of simulation execution and quick result representation. However, as network models become more and more complex to cope with the evolution of network systems, the amount of simulation factors, the number of simulated nodes and the size of results grow consequently, leading to simulations with larger scale. In this work, we perform a critical analysis of the tools provided by OMNeT++ in case of such large-scale simulations. We then propose a unified and flexible software architecture to support simulation automation.

en cs.PF, cs.DC
arXiv Open Access 2016
Cultivating Software Performance in Cloud Computing

Li Chen, Colin Cunningham, Pooja Jain et al.

There exist multitudes of cloud performance metrics, including workload performance, application placement, software/hardware optimization, scalability, capacity, reliability, agility and so on. In this paper, we consider jointly optimizing the performance of the software applications in the cloud. The challenges lie in bringing a diversity of raw data into tidy data format, unifying performance data from multiple systems based on timestamps, and assessing the quality of the processed performance data. Even after verifying the quality of cloud performance data, additional challenges block optimizing cloud computing. In this paper, we identify the challenges of cloud computing from the perspectives of computing environment, data collection, performance analytics and production environment.

en cs.PF, cs.DC
arXiv Open Access 2015
Benchmarking Big Data Systems: State-of-the-Art and Future Directions

Rui Han, Zhen Jia, Wanling Gao et al.

The great prosperity of big data systems such as Hadoop in recent years makes the benchmarking of these systems become crucial for both research and industry communities. The complexity, diversity, and rapid evolution of big data systems gives rise to various new challenges about how we design generators to produce data with the 4V properties (i.e. volume, velocity, variety and veracity), as well as implement application-specific but still comprehensive workloads. However, most of the existing big data benchmarks can be described as attempts to solve specific problems in benchmarking systems. This article investigates the state-of-the-art in benchmarking big data systems along with the future challenges to be addressed to realize a successful and efficient benchmark.

en cs.PF
arXiv Open Access 2015
On the Maximal Shortest Path in a Connected Component in V2V

Michel Marot, Adel Mounir Saïd, Hossam Afifi

In this work, a VANET (Vehicular Ad-hoc NETwork) is considered to operate on a simple lane, without infrastructure. The arrivals of vehicles are assumed to be general with any traffic and speed assumptions. The vehicles communicate through the shortest path. In this paper, we study the probability distribution of the number of hops on the maximal shortest path in a connected component of vehicles. The general formulation is given for any assumption of road traffic. Then, it is applied to calculate the z-transform of this distribution for medium and dense networks in the Poisson case. Our model is validated with the Madrid road traces of the Universitat Politècnica de Catalunya. These results may be useful for example when evaluating diffusion protocols through the shortest path in a VANET, where not only the mean but also the other moments are needed to derive accurate results.

en cs.PF, cs.NI
arXiv Open Access 2014
On Time-Sensitive Revenue Management and Energy Scheduling in Green Data Centers

Huangxin Wang, Jean X. Zhang, Fei Li

In this paper, we design an analytically and experimentally better online energy and job scheduling algorithm with the objective of maximizing net profit for a service provider in green data centers. We first study the previously known algorithms and conclude that these online algorithms have provable poor performance against their worst-case scenarios. To guarantee an online algorithm's performance in hindsight, we design a randomized algorithm to schedule energy and jobs in the data centers and prove the algorithm's expected competitive ratio in various settings. Our algorithm is theoretical-sound and it outperforms the previously known algorithms in many settings using both real traces and simulated data. An optimal offline algorithm is also implemented as an empirical benchmark.

en cs.PF
arXiv Open Access 2014
Investigation of the relationship between code change set n-grams and change in energy consumption

Stephen Romansky

The amount of software running on mobile devices is constantly growing as consumers and industry purchase more battery powered devices. On the other hand, tools that provide developers with feed- back on how their software changes affect battery life are not widely available. This work employs Green Mining, the study of the rela- tionship between energy consumption and software changesets, and n-gram language models to evaluate if source code changeset perplex- ity correlates with change in energy consumption. A correlation be- tween perplexity and change in energy consumption would permit the development of a tool that predicts the impact a code changeset may have on a software applications energy consumption. The case study results show that there is weak to no correlation between cross en- tropy and change in energy consumption. Therefore, future areas of investigation are proposed.

en cs.PF, cs.SE
arXiv Open Access 2013
On the Catalyzing Effect of Randomness on the Per-Flow Throughput in Wireless Networks

Florin Ciucu, Jens Schmitt

This paper investigates the throughput capacity of a flow crossing a multi-hop wireless network, whose geometry is characterized by general randomness laws including Uniform, Poisson, Heavy-Tailed distributions for both the nodes' densities and the number of hops. The key contribution is to demonstrate \textit{how} the \textit{per-flow throughput} depends on the distribution of 1) the number of nodes $N_j$ inside hops' interference sets, 2) the number of hops $K$, and 3) the degree of spatial correlations. The randomness in both $N_j$'s and $K$ is advantageous, i.e., it can yield larger scalings (as large as $Θ(n)$) than in non-random settings. An interesting consequence is that the per-flow capacity can exhibit the opposite behavior to the network capacity, which was shown to suffer from a logarithmic decrease in the presence of randomness. In turn, spatial correlations along the end-to-end path are detrimental by a logarithmic term.

en cs.PF
arXiv Open Access 2011
Expression Templates Revisited: A Performance Analysis of the Current ET Methodology

Klaus Iglberger, Georg Hager, Jan Treibig et al.

In the last decade, Expression Templates (ET) have gained a reputation as an efficient performance optimization tool for C++ codes. This reputation builds on several ET-based linear algebra frameworks focused on combining both elegant and high-performance C++ code. However, on closer examination the assumption that ETs are a performance optimization technique cannot be maintained. In this paper we demonstrate and explain the inability of current ET-based frameworks to deliver high performance for dense and sparse linear algebra operations, and introduce a new "smart" ET implementation that truly allows the combination of high performance code with the elegance and maintainability of a domain-specific language.

en cs.PF, cs.PL
arXiv Open Access 2010
Performance Evaluation of Components Using a Granularity-based Interface Between Real-Time Calculus and Timed Automata

Karine Altisen, Yanhong Liu, Matthieu Moy

To analyze complex and heterogeneous real-time embedded systems, recent works have proposed interface techniques between real-time calculus (RTC) and timed automata (TA), in order to take advantage of the strengths of each technique for analyzing various components. But the time to analyze a state-based component modeled by TA may be prohibitively high, due to the state space explosion problem. In this paper, we propose a framework of granularity-based interfacing to speed up the analysis of a TA modeled component. First, we abstract fine models to work with event streams at coarse granularity. We perform analysis of the component at multiple coarse granularities and then based on RTC theory, we derive lower and upper bounds on arrival patterns of the fine output streams using the causality closure algorithm. Our framework can help to achieve tradeoffs between precision and analysis time.

en cs.PF
arXiv Open Access 2009
The Multi-Branched Method of Moments for Queueing Networks

Giuliano Casale

We propose a new exact solution algorithm for closed multiclass product-form queueing networks that is several orders of magnitude faster and less memory consuming than established methods for multiclass models, such as the Mean Value Analysis (MVA) algorithm. The technique is an important generalization of the recently proposed Method of Moments (MoM) which, differently from MVA, recursively computes higher-order moments of queue-lengths instead of mean values. The main contribution of this paper is to prove that the information used in the MoM recursion can be increased by considering multiple recursive branches that evaluate models with different number of queues. This reformulation allows to formulate a simpler matrix difference equation which leads to large computational savings with respect to the original MoM recursion. Computational analysis shows several cases where the proposed algorithm is between 1,000 and 10,000 times faster and less memory consuming than the original MoM, thus extending the range of multiclass models where exact solutions are feasible.

en cs.PF
arXiv Open Access 2008
Getting in the Zone for Successful Scalability

Jim Holtman, Neil J. Gunther

The universal scalability law (USL) is an analytic model used to quantify application scaling. It is universal because it subsumes Amdahl's law and Gustafson linearized scaling as special cases. Using simulation, we show: (i) that the USL is equivalent to synchronous queueing in a load-dependent machine repairman model and (ii) how USL, Amdahl's law, and Gustafson scaling can be regarded as boundaries defining three scalability zones. Typical throughput measurements lie across all three zones. Simulation scenarios provide deeper insight into queueing effects and thus provide a clearer indication of which application features should be tuned to get into the optimal performance zone.

en cs.PF, cs.DC
arXiv Open Access 2005
Sequential File Programming Patterns and Performance with .NET

Peter Kukol, Jim Gray

Programming patterns for sequential file access in the .NET Framework are described and the performance is measured. The default behavior provides excellent performance on a single disk - 50 MBps both reading and writing. Using large request sizes and doing file pre-allocation when possible have quantifiable benefits. When one considers disk arrays, .NET unbuffered IO delivers 800 MBps on a 16-disk array, but buffered IO delivers about 12% of that performance. Consequently, high-performance file and database utilities are still forced to use unbuffered IO for maximum sequential performance. The report is accompanied by downloadable source code that demonstrates the concepts and code that was used to obtain these measurements.

en cs.PF, cs.OS