The modification of Amdahl's law for the case of increment of processor elements in a computer system is considered. The coefficient $k$ linking accelerations of parallel and parallel specialized computer systems is determined. The limiting values of the coefficient are investigated and its theoretical maximum is calculated. It is proved that $k$ > 1 for any positive increment of processor elements. The obtained formulas are combined into a single method allowing to determine the maximum theoretical acceleration of a parallel specialized computer system in comparison with the acceleration of a minimal parallel computer system. The method is tested on Apriori, k-nearest neighbors, CDF 9/7, fast Fourier transform and naive Bayesian classifier algorithms.
Sebastià Sansó, Carlos Guerrero, Isaac Lera
et al.
This paper presents a platform to facilitate the deployment of applications in Internet of Things (IoT) devices. The platform allows to the programmers to use a Function-as-a-Service programming paradigm that are managed and configured in a Platform-as-a-Service web tool. The tool also allows to establish interoperability between the functions of the applications. The proposed platform obtained faster and easier deployments of the applications and the resource usages of the IoT devices also were lower in relation to a deployment process based in containers of Docker.
The $k$-center problem is a central optimization problem with numerous applications for machine learning, data mining, and communication networks. Despite extensive study in various scenarios, it surprisingly has not been thoroughly explored in the traditional distributed setting, where the communication graph of a network also defines the distance metric. We initiate the study of the $k$-center problem in a setting where the underlying metric is the graph's shortest path metric in three canonical distributed settings: the LOCAL, CONGEST, and CLIQUE models. Our results encompass constant-factor approximation algorithms and lower bounds in these models, as well as hardness results for the bi-criteria approximation setting.
We uncover the extend-only directed posets (EDP) structure as a unification of recently discussed DAG-based Byzantine-tolerant conflict-free replicated data types (CRDT). We also show how a key-value map model can be derived from the EDP formulation, and give an outlook on an EDP-based systemic access control CRDT as a formalization of the CRDT used in the Matrix messaging system.
Stig Bosmans, Siegfried Mercelis Peter Hellinckx, Joachim Denil
With the increase in Internet of Things devices and more decentralized architectures we see a new type of application gain importance, a type where local interactions between individual entities lead to a global emergent behavior, Emergent-based IoT (EBI) Systems. In this position paper we explore techniques to evaluate this emergent behavior in IoT applications. Because of the required scale and diversity this is not an easy task. Therefore, we mainly focus on a distributed simulation approach and provide an overview of possible techniques that could optimize the overall simulation performance. Our focus is both on modeling and simulation technology.
Anas M. Al-Oraiqat, Yuriy O. Ivanov, Aladdein M. Amro
This paper presents an approach for designing software for dynamical systems simulation. An algorithm is proposed to obtain a schedule for calculating each phase variable of a stiff system of differential equations. The problem is classified as a fixed-priority pre-emptive scheduling of periodic tasks. The Branch-and-Bound algorithm is modified to minimize the defined utilization function and to optimize the scheduling process for a numerical solver. A program for the experimental schedule is implemented solving a job-shop problem that proved the effectiveness of the proposed algorithm.
Even in the absence of clocks, time bounds on the duration of actions enable the use of time for distributed coordination. This paper initiates an investigation of coordination in such a setting. A new communication structure called a zigzag pattern is introduced, and shown to guarantee bounds on the relative timing of events in this clockless model. Indeed, zigzag patterns are shown to be necessary and sufficient for establishing that events occur in a manner that satisfies prescribed bounds. We capture when a process can know that an appropriate zigzag pattern exists, and use this to provide necessary and sufficient conditions for timed coordination of events using a full-information protocol in the clockless model.
In this paper, we describe the motivation, innovation, design, running example and future development of a Fault Inject Tool (FIT). This tool enables controlled causing of cloud platform issues such as resource stress and service or VM outages, the purpose being to observe the subsequent effect on deployed applications. It is being designed for use in a DevOps workflow for tighter correlation between application design and cloud operation, although not limited to this usage, and helps improve resiliency for data intensive applications by bringing together fault tolerance, stress testing and benchmarking in a single tool.
Applications developed over the cloud coordinate several, often anonymous, computational resources, distributed over different execution nodes, within flexible architectures. Coordination models able to represent quantitative data provide a powerful basis for their analysis and validation. This paper extends IMCreo, a semantic model for Stochastic reo based on interactive Markov chains, to enhance its scalability, by regarding each channel and node, as well as interface components, as independent stochastic processes that may (or may not) synchronise with the rest of the coordination circuit.
Contemporary GPUs allow concurrent execution of small computational kernels in order to prevent idling of GPU resources. Despite the potential concurrency between independent kernels, the order in which kernels are issued to the GPU will significantly influence the application performance. A technique for deriving suitable kernel launch orders is therefore presented, with the aim of reducing the total execution time. Experimental results indicate that the proposed method yields solutions that are well above the 90 percentile mark in the design space of all possible permutations of the kernel launch sequences.
Krzysztof Kaczmarski, Paweł Rzążewski, Albert Wolant
Motion planning is an important and well-studied field of robotics. A typical approach to finding a route is to construct a {\em cell graph} representing a scene and then to find a path in such a graph. In this paper we present and analyze parallel algorithms for constructing the cell graph on a SIMD-like GPU processor. Additionally, we present a new implementation of the dictionary data type on a GPU device. In the contrary to hash tables, which are common in GPU algorithms, it uses a search tree in which all values are kept in leaves. With such a structure we can effectively perform dictionary operations on a set of long vectors over a limited alphabet.
We present fast deterministic distributed protocols in synchronous networks for leader election and spanning tree construction. The protocols are designed under the assumption that nodes in a network have identifiers but the size of an identifier is unlimited. So time bounds of protocols depend on the sizes of identifiers. We present fast protocols running in time $O(D\log L+L)$, where $L$ is the size of the minimal identifier and $D$ is the diameter of a network.
The overwhelmingly increasing amount of stored data has spurred researchers seeking different methods in order to optimally take advantage of it which mostly have faced a response time problem as a result of this enormous size of data. Most of solutions have suggested materialization as a favourite solution. However, such a solution cannot attain Real- Time answers anyhow. In this paper we propose a framework illustrating the barriers and suggested solutions in the way of achieving Real-Time OLAP answers that are significantly used in decision support systems and data warehouses.
Eric Badouel, Loïc Hélouët, Georges-Edouard Kouamou
et al.
This paper presents a purely declarative approach to artifact-centric case management systems, and a decentralization scheme for this model. Each case is presented as a tree-like structure; nodes bear information that combines data and computations. Each node belongs to a given stakeholder, and semantic rules govern the evolution of the tree structure, as well as how data values derive from information stemming from the context of the node. Stakeholders communicate through asynchronous message passing without shared memory, enabling convenient distribution.
Cloud Computing is a new trend emerging in IT environment with huge requirements of infrastructure and resources. Load Balancing is an important aspect of cloud computing environment. Efficient load balancing scheme ensures efficient resource utilization by provisioning of resources to cloud users on demand basis in pay as you say manner. Load Balancing may even support prioritizing users by applying appropriate scheduling criteria. This paper presents various load balancing schemes in different cloud environment based on requirements specified in Service Level Agreement (SLA).
FDTD codes, such as Sophie developed at CEA/DAM, no longer take advantage of the processor's increased computing power, especially recently with the raising multicore technology. This is rooted in the fact that low order numerical schemes need an important memory bandwidth to bring and store the computed fields. The aim of this article is to present a programming method at the software's architecture level that improves the memory access pattern in order to reuse data in cache instead of constantly accessing RAM memory. We will exhibit a more than two computing time improvement in practical applications. The target audience of this article is made of computing scientists and of electrical engineers that develop simulation codes with no specific knowledge in computer science or electronics.
The paper proposes an alternative proof that Omega, an oracle that outputs a process identifier and guarantees that eventually the same correct process identifier is output at all correct processes, provides minimal information about failures for solving consensus in read-write shared-memory systems: every oracle that gives enough failure information to solve consensus can be used to implement Omega. Unlike the original proof by Chandra, Hadzilacos and Toueg (CHT), the proof presented in this paper builds upon the very fact that 2-process wait-free consensus is impossible. Also, since the oracle that is used to implement can solve consensus, the implementation is allowed to directly access consensus objects. As a result, the proposed proof is shorter and conceptually simpler than the original one.
Fitting complicated models to large datasets is a bottleneck of many analyses. We present GooFit, a library and tool for constructing arbitrarily-complex probability density functions (PDFs) to be evaluated on nVidia GPUs or on multicore CPUs using OpenMP. The massive parallelisation of dividing up event calculations between hundreds of processors can achieve speedups of factors 200-300 in real-world problems.