Hasil untuk "cs.DC"

Menampilkan 20 dari ~251697 hasil · dari CrossRef, arXiv, DOAJ

JSON API
arXiv Open Access 2024
Parallel Gaussian process with kernel approximation in CUDA

Davide Carminati

This paper introduces a parallel implementation in CUDA/C++ of the Gaussian process with a decomposed kernel. This recent formulation, introduced by Joukov and Kulić (2022), is characterized by an approximated -- but much smaller -- matrix to be inverted compared to plain Gaussian process. However, it exhibits a limitation when dealing with higher-dimensional samples which degrades execution times. The solution presented in this paper relies on parallelizing the computation of the predictive posterior statistics on a GPU using CUDA and its libraries. The CPU code and GPU code are then benchmarked on different CPU-GPU configurations to show the benefits of the parallel implementation on GPU over the CPU.

en cs.DC
arXiv Open Access 2023
Evaluating The Impact Of Cloud-Based Microservices Architecture On Application Performance

Ganesh Chowdary Desina

The study assesses the impact of cloud-based microservices architectures on application performance. Several aspects of performance evaluation are discussed, including response time, throughput, scalability, and reliability. This article examines the advantages and challenges of adopting a cloud-based approach. It explores potential bottlenecks and issues in a microservices architecture and presents optimization techniques. With the help of case studies and empirical studies, it compares cloud-based microservices architectures with traditional monolithic architectures. Furthermore, the paper examines the challenges of monitoring and troubleshooting distributed microservices. In conclusion, it emphasizes the importance of planning, designing, and testing during the adoption of cloud-based microservices.

en cs.DC
arXiv Open Access 2022
A Creativity Survey of Parallel Sorting Algorithm

Tianyi Yu, Wei Li

Sorting is one of the most fundamental problems in the field of computer science. With the rapid development of manycore processors, it shows great importance to design efficient parallel sort algorithm on manycore architecture. This paper studies the parallel memory sorting method on modern hardware, and summarizes its research status and progress. Classify the research problems, research methods and measurement methods of the target papers and references. In the end, we summarize all the researches and list the directions not researched and innovative places. Keywords: Sorting Algorithm, Parallel Algorithm, Parallel Optimization, CPU, GPU, Memory Hierarchy

en cs.DC
arXiv Open Access 2022
Homogenous and Heterogenous Parallel Clustering: An Overview

Ahmed Ibrahim, Rokaya Hassanien

Recent advances in computer architecture and networking opened the opportunity for parallelizing the clustering algorithms. This divide-and-conquer strategy often results in better results to centralized clustering with a much-improved time performance. This paper reviews key parallel clustering and provides insight into their strategy. The review brings together disparate attempts in parallel clustering to provide a comprehensive account of advances in this emerging field

en cs.DC, cs.LG
arXiv Open Access 2022
A Graph Transformation Strategy for Optimizing SpTRSV

Buse Yılmaz, Abdülkadir Furkan Yıldız

Sparse triangular solve (SpTRSV) is an extensively studied computational kernel. An important obstacle in parallel SpTRSV implementations is that in some parts of a sparse matrix the computation is serial. By transforming the dependency graph, it is possible to increase the parallelism of the parts that lack it. In this work, we present an approach to increase the parallelism degree of a sparse matrix, discuss its limitations and possible improvements, and we compare it to a previous manual approach. The results provide several hints on how to craft a collection of strategies to transform a dependency graph.

en cs.DC
arXiv Open Access 2022
Lessons Learned from a Bare-metal Evaluation of Erasure Coding Algorithms in P2P Networks

Racin Nygaard

We have built a bare-metal testbed in order to perform large-scale, reproducible evaluations of erasure coding algorithms. Our testbed supports at least 1000 Ethereum Swarm peers running on 30 machines. Running experimental evaluation is time-consuming and challenging. Researchers must consider the experimental software's limitations and artifacts. If not controlled, the network behavior may cause inaccurate measurements. This paper shares the lessons learned from a bare-metal evaluation of erasure coding algorithms and how to create a controlled-environment in a cluster consisting of 1000 Ethereum Swarm peers.

en cs.DC
arXiv Open Access 2022
Performance Comparisons of Self-stabilizing Algorithms for Maximal Independent Sets

Barton F. Cone, Stephen T. Hedetniemi, Lance C. Ingle et al.

Sensor networks, such as ultra-wideband sensors for the smart warehouse, may need to run distributed algorithms for automatically determining a topological layout. In this paper, we present 5 different self-stabilizing algorithms (their central and distributed counterparts) for determining maximal independent sets. The performance of the algorithms, in terms of time complexity, simulation analysis, and size of maximal independent sets found are then compared.

en cs.DC
arXiv Open Access 2022
Asynchronous global-local non-invasive coupling for linear elliptic problems

Ahmed El Kerim, Pierre Gosselet, Frédéric Magoulès

This paper presents the first asynchronous version of the Global/Local non-invasive coupling, capable of dealing efficiently with multiple, possibly adjacent, patches. We give a new interpretation of the coupling in terms of primal domain decomposition method, and we prove the convergence of the relaxed asynchronous iteration. The asynchronous paradigm lifts many bottlenecks of the Global/Local coupling performance. We illustrate the method on several linear elliptic problems as encountered in thermal and elasticity studies.

en cs.DC, math.NA
arXiv Open Access 2021
Extending Classic Paxos for High-performance Read-Modify-Write Registers

Vasilis Gavrielatos, Antonios Katsarakis, Vijay Nagarajan

In this work we provide a detailed specification of how we extended and implemented Classic Paxos (CP) to execute Read-Modify-Writes. In addition, we also specify how we implemented All-aboard Paxos over CP and how we use carstamps, to also add ABD reads and writes, to accelerate the common case, where RMWs are not needed. Our specification targets a Key-Value-Store that is deployed within the datacenter, is replicated across 3 to 7 machines and supports reads, writes and RMWs.

en cs.DC
arXiv Open Access 2020
Concurrent Fixed-Size Allocation and Free in Constant Time

Guy E. Blelloch, Yuanhao Wei

Our goal is to efficiently solve the dynamic memory allocation problem in a concurrent setting where processes run asynchronously. On $p$ processes, we can support allocation and free for fixed-sized blocks with $O(1)$ worst-case time per operation, $Θ(p^2)$ additive space overhead, and using only single-word read, write, and CAS. While many algorithms rely on having constant-time fixed-size allocate and free, we present the first implementation of these two operations that is constant time with reasonable space overhead.

en cs.DC
arXiv Open Access 2020
Incentive-Based Selection and Composition of IoT Energy Services

Amani Abusafia, Athman Bouguettaya, Sajib Mistry

We propose a novel incentive-based framework for composing energy service requests. An incentive model is designed that considers the context of the providers and consumers to determine rewards for sharing wireless energy. We propose a novel priority scheduling approach to compose energy service requests that maximizes the reward of the provider. A set of exhaustive experiments with a dataset and collected IoT users' behavior is conducted to evaluate the proposed approach. Experimental results prove the efficiency of the proposed approach.

en cs.DC
arXiv Open Access 2019
Work Stealing Simulator

Mohammed Khatiri, Denis Trystram, Frédéric Wagner

We present in this paper a Work Stealing lightweight PYTHON simulator. Our simulator is used to execute an application (list of tasks with or without dependencies), on a multiple processors platform linked by specific topology. We first give an overview of the different variants of the work stealing algorithm, then we present the architecture of our light Work Stealing simulator. Its architecture facilitates the development of other types of applications and other topologies for interconnecting the processors. We present the use cases of the simulator and the different types of results.

en cs.DC
arXiv Open Access 2018
Submodular Optimization in the MapReduce Model

Paul Liu, Jan Vondrak

Submodular optimization has received significant attention in both practice and theory, as a wide array of problems in machine learning, auction theory, and combinatorial optimization have submodular structure. In practice, these problems often involve large amounts of data, and must be solved in a distributed way. One popular framework for running such distributed algorithms is MapReduce. In this paper, we present two simple algorithms for cardinality constrained submodular optimization in the MapReduce model: the first is a $(1/2-o(1))$-approximation in 2 MapReduce rounds, and the second is a $(1-1/e-ε)$-approximation in $\frac{1+o(1)}ε$ MapReduce rounds.

en cs.DC
arXiv Open Access 2015
Algorithm for Achieving Consensus Over Conflicting Rumors: Convergence Analysis and Applications

Amine Semma, Ismail Elouafiq

Motivated by the large expansion in the study of social networks, this paper deals with the problem of multiple messages spreading over the same network using gossip algorithms. Given two messages distributed over some nodes of the graph, we first investigate the final distribution of the messages given an initial state. Then, an algorithm is presented to achieve consensus over one of the messages. Finally, a game theoretical application and an analogy with word-of-mouth marketing are outlined.

en cs.DC, cs.SI
arXiv Open Access 2015
Designing Applications in a Hybrid Cloud

Evgeny Nikulchev, Evgeniy Pluzhnik, Dmitry Biryukov et al.

Designing applications for hybrid cloud has many features, including dynamic virtualization management and route switching. This makes it impossible to evaluate the query and hence the optimal distribution of data. In this paper, we formulate the main challenges of designing and simulation, offer installation for processing.

en cs.DC, cs.DS
arXiv Open Access 2014
Efficient and Scalable Algorithms for Smoothed Particle Hydrodynamics on Hybrid Shared/Distributed-Memory Architectures

Pedro Gonnet

This paper describes a new fast and implicitly parallel approach to neighbour-finding in multi-resolution Smoothed Particle Hydrodynamics (SPH) simulations. This new approach is based on hierarchical cell decompositions and sorted interactions, within a task-based formulation. It is shown to be faster than traditional tree-based codes, and to scale better than domain decomposition-based approaches on hybrid shared/distributed-memory parallel architectures, e.g. clusters of multi-cores, achieving a $40\times$ speedup over the Gadget-2 simulation code.

en cs.DC, astro-ph.IM
arXiv Open Access 2014
Resource-Aware Replication on Heterogeneous Multicores: Challenges and Opportunities

Björn Döbel, Robert Muschner, Hermann Härtig

Decreasing hardware feature sizes and increasing heterogeneity in multicore hardware require software that can adapt to these platforms' properties. We implemented ROMAIN, an OS service providing redundant multithreading on top of the FIASCO.OC microkernel to address the increasing unreliability of hardware. In this paper we review challenges and opportunities for ROMAIN to adapt to such multicore platforms in order to decrease execution overhead, resource requirements, and vulnerability against faults.

en cs.DC, cs.OS

Halaman 19 dari 12585