Dragan Jovcic
Hasil untuk "cs.DC"
Menampilkan 20 dari ~251696 hasil · dari CrossRef, DOAJ, arXiv
Dc. Christophe Lerebourg, Dc Jean-Sébastien Carrière, Nadine Gobron et al.
Ioannis Korontanis, Antonios Makris, Konstantinos Tserpes
In recent years, there has been a concerted effort in both industry and research sectors to innovate new approaches to DevOps. The primary aim is to facilitate developers in transitioning their applications to Cloud or Edge platforms utilizing Docker or Kubernetes. This paper presents a tool called Converter, designed to interpret a TOSCA extension called CEAML and convert the descriptions into Kubernetes or Kubevirt definition files. Converter is available as a Python package and is recommended for use by orchestrators as an auxiliary tool for implementing CEAML.
Nick Brown
Funded by the UK ExCALIBUR H&ES exascale programme, since early 2022 we have provided a RISC-V testbed for HPC to offer free access for scientific software developers to experiment with RISC-V for their workloads. Based upon our experiences of providing access to RISC-V for the HPC community, and our involvement with the RISC-V community at large, in this extended abstract we summarise the current state of RISC-V for HPC and consider the high priority areas that should be addressed to help drive adoption.
Yankai Jiang, Rohan Basu Roy, Baolin Li et al.
This work introduces ECOLIFE, the first carbon-aware serverless function scheduler to co-optimize carbon footprint and performance. ECOLIFE builds on the key insight of intelligently exploiting multi-generation hardware to achieve high performance and lower carbon footprint. ECOLIFE designs multiple novel extensions to Particle Swarm Optimization (PSO) in the context of serverless execution environment to achieve high performance while effectively reducing the carbon footprint.
Zheng Li, Francisco Millar-Bilbao
Cloud latency has critical influences on the success of cloud applications. Therefore, characterizing cloud network performance is crucial for analyzing and satisfying different latency requirements. By focusing on the cloud's outbound network latency, this case study on Google App Engine confirms the necessity of optimizing application deployment. More importantly, our modeling effort has established a divide-and-conquer framework to address the complexity in understanding and investigating the cloud latency.
David Noel, Elizabeth Graham, Liyuan Liu
With endless amounts of data and very limited bandwidth, fast data compression is one solution for the growing datasharing problem. Compression helps lower transfer times and save memory, but if the compression takes too long, this no longer seems viable. Multi-core processors enable parallel data compression; however, parallelizing the algorithms is anything but straightforward since compression is inherently serial. This paper explores techniques to parallelize three compression schemes: Huffman coding, LZSS, and MP3 coding
Imad Kissami, Ahmed Ratnani
Manapy is a parallel, unstructured, finite-volume based solver for the solution of partial differential equations (PDE). The framework is written using Python, it is object-oriented, and is organized in such a way that it is easy to understand and modify. In this paper, we present the parallel implementation and scalability of the differential operators used on a general case of PDE. The performance of massively parallel direct and iterative methods for solving large sparse systems of linear equations in plasma physics is evaluated on a latest high performance computing system, and 3D test cases for plasma physics are presented.
Shir Cohen, Idit Keidar
We formalize Byzantine linearizability, a correctness condition that specifies whether a concurrent object with a sequential specification is resilient against Byzantine failures. Using this definition, we systematically study Byzantine-tolerant emulations of various objects from registers. We focus on three useful objects -- reliable broadcast, atomic snapshot, and asset transfer. We prove that there is an $f$-resilient implementation of such objects from registers with $n$ processes $f<\frac{n}{2}$.
Nhien-An Le-Khac, M-Tahar Kechadi, Joe Carthy
In this paper, we present the ADMIRE architecture; a new framework for developing novel and innovative data mining techniques to deal with very large and distributed heterogeneous datasets in both commercial and academic applications. The main ADMIRE components are detailed as well as its interfaces allowing the user to efficiently develop and implement their data mining applications techniques on a Grid platform such as Globus ToolKit, DGET, etc.
Quinten Stokkink, Harmjan Treep, Johan Pouwelse
The current onion routing implementation of Tribler works as expected but throttles the overall throughput of the Tribler system. This article discusses a measuring procedure to reproducibly profile the tunnel implementation so further optimizations of the tunnel community can be made. Our work has been integrated into the Tribler eco-system.
Evgeniy Pluzhnik, Oleg Lukyanchikov, Evgeny Nikulchev et al.
Designing applications for use in a hybrid cloud has many features. These include dynamic virtualization management and an unknown route switching customers. This makes it impossible to evaluate the query and hence the optimal distribution of data. In this paper, we formulate the main challenges of designing and simulation offer installation for processing.
Yao Zhu, David F. Gleich
We present a parallel algorithm for the undirected $s,t$-mincut problem with floating-point valued weights. Our overarching algorithm uses an iteratively reweighted least squares framework. This generates a sequence of Laplacian linear systems, which we solve using parallel matrix algorithms. Our overall implementation is up to 30-times faster than a serial solver when using 128 cores.
Tianyi David Han, Tarek S. Abdelrahman
The use of local memory is important to improve the performance of OpenCL programs. However, its use may not always benefit performance, depending on various application characteristics, and there is no simple heuristic for deciding when to use it. We develop a machine learning model to decide if the optimization is beneficial or not. We train the model with millions of synthetic benchmarks and show that it can predict if the optimization should be applied for a single array, in both synthetic and real benchmarks, with high accuracy.
Parul Pandey, Mahshwari Tripathi
One of the traditional mechanisms used in distributed systems for maintaining the consistency of replicated data is voting. A problem involved in voting mechanisms is the size of the Quorums needed on each access to the data. In this paper, we present a novel and efficient distributed algorithm for managing replicated data. We impose a logical wheel structure on the set of copies of an object. The protocol ensures minimum read quorum size of one, by reading one copy of an object while guaranteeing fault-tolerance of write operations.Wheel structure has a wider application area as it can be imposed in a network with any number of nodes.
Dmitry N. Kozlov
Motivated by questions in theoretical distributed computing, we develop the combinatorial theory of abstract simplex path subdivisions. Our main application is a short and structural proof of the theorem of Castaneda and Rajsbaum. This theorem in turn implies the solvability of the weak symmetry breaking task in the immediate snapshot wait-free model in the case when the number of processes is not a power of a prime number.
Vivek Chalotra, Anju Bhasin, Anik Gupta et al.
HEP Analysis Facility is a cluster designed and implemented in Scientific Linux Cern 5.5 to grant High Energy Physics researchers one place where they can go to undertake a particular task or to provide a parallel processing architecture in which CPU resources are shared across a network and all machines function as one large supercomputer.
Gregory Kerr
InfiniBand is a switched fabric interconnect. The InfiniBand specification does not define an API. However the OFED package, libibverbs, has become the default API on Linux and Solaris systems. Sparse documentation exists for the verbs API. The simplest InfiniBand program provided by OFED, ibv_rc_pingpong, is about 800 lines long. The semantics of using the verbs API for this program is not obvious to the first time reader. This paper will dissect the ibv_rc_pingpong program in an attempt to make clear to users how to interact with verbs.
Ben Lund, Justin W Smith
We present a new implementation of the Floyd-Warshall All-Pairs Shortest Paths algorithm on CUDA. Our algorithm runs approximately 5 times faster than the previously best reported algorithm. In order to achieve this speedup, we applied a new technique to reduce usage of on-chip shared memory and allow the CUDA scheduler to more effectively hide instruction latency.
Peter Tröger
Since the very beginning of hardware development, computer processors were invented with ever-increasing clock frequencies and sophisticated in-build optimization strategies. Due to physical limitations, this 'free lunch' of speedup has come to an end. The following article gives a summary and bibliography for recent trends and challenges in CMP architectures. It discusses how 40 years of parallel computing research need to be considered in the upcoming multi-core era. We argue that future research must be driven from two sides - a better expression of hardware structures, and a domain-specific understanding of software parallelism.
Halaman 17 dari 12585