Hasil untuk "cs.PF"

Menampilkan 20 dari ~90639 hasil · dari CrossRef, DOAJ, arXiv

JSON API
CrossRef Open Access 2025
Integrated FRAM-FMEA based on PF-CRITIC and PF-WASPAS for Pandemic Disaster Risk Assessment

Emine Can, Berk Ayvaz, Emin Tarakci

The utilization of trustworthy and optimistic risk evaluation methodologies is crucial in the continuously evolving context of global disasters and pandemics. This study has provided a summary of the suggested integrated methodology. The Functional Resonance Analysis Method (FRAM) and Failure Mode (FM) and Effect Analysis are combined in this article to present a novel hybrid approach that is further strengthened by the robustness of the Pythagorean Fuzzy-Criteria Importance Through Intercriteria Correlation (CRITIC) and Pythagorean Fuzzy-Weighted Aggregated Sum Product Assessment (WASPAS) method. In the first stage of the presented model, we determined the weight of the criteria by using Pythagorean Fuzzy Critic. In the second stage, we employed Pythagorean Fuzzy Waspas to rank failure modes. A case study was done to identify and evaluate the processes and hazards connected with fighting the COVID-19 virus in healthcare facilities. The use of this integrated approach in healthcare facilities has shown how effective it is at identifying complex risk variables and assisting in well-informed decision-making. Beyond conventional linear models, this integrated approach provides a comprehensive perspective that enables stakeholders to prioritize risks and put actions into place while having an improved understanding of systemic weaknesses. Considering changing worldwide challenges, this research lays the path for resilient and more adaptable risk management methods.

arXiv Open Access 2025
Pinching-Antenna Systems For Indoor Immersive Communications: A 3D-Modeling Based Performance Analysis

Yulei Wang, Yalin Liu, Yaru Fu et al.

The emerging pinching antenna (PA) technology has high flexibility to reconfigure wireless channels and combat line-of-sight blockage, thus holding transformative potential for indoor immersive applications in 6G. This paper investigates Pinching-antenna systems (PASS) for indoor immersive communications. Our contributions are threefold: (1) we construct a 3D model to characterize the distribution of users, waveguides, and PAs in the PASS; (2) we develop a general theoretical model on downlink performance of PASS by capturing PA-user relationships and system parameters' impacts; and (3) we conduct comprehensive numerical results of the theoretical model and provide implementation guidelines for PASS deployments.

en cs.PF
arXiv Open Access 2025
Memory Analysis on the Training Course of DeepSeek Models

Ping Zhang, Lei Su

We present a theoretical analysis of GPU memory consumption during the training of DeepSeek models such as DeepSeek-v2 and DeepSeek-v3. Our primary objective is to clarify the device-level memory requirements associated with various distributed training configurations. Specifically, we examine critical factors influencing memory usage, including micro-batch size, activation recomputation policies, 3D parallelism, and ZeRO optimizations. It is important to emphasize that the training policies discussed in this report are not representative of DeepSeek's official configurations. Instead, they are explored to provide a deeper understanding of memory dynamics in training of large-scale mixture-of-experts model.

en cs.PF, cs.LG
arXiv Open Access 2025
Performance Optimization of 3D Stencil Computation on ARM Scalable Vector Extension

Hongguang Chen

Stencil computation is essential in high-performance computing, especially for large-scale tasks like liquid simulation and weather forecasting. Optimizing its performance can reduce both energy consumption and computation time, which is critical in disaster prediction. This paper explores optimization techniques for 7-point 3D stencil computation on ARM's Scalable Vector Extension (SVE), using the Roofline model and tools like Gem5 and cacti. We evaluate software optimizations such as vectorization and tiling, as well as hardware adjustments in ARM SVE vector lengths and cache configurations. The study also examines performance, power consumption, and chip area trade-offs to identify optimal configurations for ARM-based systems.

en cs.PF
arXiv Open Access 2025
Size-Aware Dispatching to Fluid Queues

Runhan Xie, Esa Hyytiä, Rhonda Righter

We develop a fluid-flow model for routing problems, where fluid consists of different size particles and the task is to route the incoming fluid to $n$ parallel servers using the size information in order to minimize the mean latency. The problem corresponds to the dispatching problem of (discrete) jobs arriving according to a stochastic process. In the fluid model the problem reduces to finding an optimal path to empty the system in $n$-dimensional space. We use the calculus of variation to characterize the structure of optimal policies. Numerical examples shed further light on the fluid routing problem and the optimal control of large distributed service systems.

en cs.PF
arXiv Open Access 2024
Ten Ways in which Virtual Reality Differs from Video Streaming

Gustavo de Veciana, Sonia Fahmy, George Kesidis et al.

Virtual Reality (VR) applications have a number of unique characteristics that set them apart from traditional video streaming. These characteristics have major implications on the design of VR rendering, adaptation, prefetching, caching, and transport mechanisms. This paper contrasts VR to video streaming, stored 2D video streaming in particular, and discusses how to rethink system and network support for VR.

en cs.PF, cs.MM
arXiv Open Access 2022
Towards Comparing Performance of Algorithms in Hardware and Software

Maja H. Kirkeby, Martin Schoeberl

In this paper, we report on a preliminary investigation of the potential performance gain of programs implemented in field-programmable gate arrays (FPGAs) using a high-level language Chisel compared to ordinary high-level software implementations executed on general-purpose computers and small and cheap computers. FPGAs inherently support parallel evaluations, while sequential computers do not. For this preliminary investigation, we have chosen a highly parallelizable program as a case study to show an upper bound of performance gain. The purpose is to demonstrate whether or not programming FPGAs has the potential for performance optimizations of ordinary programs. We have developed and evaluated Conway's Game of Life for an FPGA, a small and cheap computer Raspberry Pi 4, and a MacBook Pro Laptop. We have compared the performance of programs over different input sizes to decide the relative increase in runtime.

en cs.PF
arXiv Open Access 2020
Staffing for many-server systems facing non-standard arrival processes

M. Heemskerk, M. Mandjes, B. Mathijsen

Arrival processes to service systems often display (i) larger than anticipated fluctuations, (ii) a time-varying rate, and (iii) temporal correlation. Motivated by this, we introduce a specific non-homogeneous Poisson process that incorporates these three features. The resulting arrival process is fed into an infinite-server system, which is then used as a proxy for its many-server counterpart. This leads to a staffing rule based on the square-root staffing principle that acknowledges the three features. After a slight rearrangement of servers over the time slots, we succeed to stabilize system performance even under highly varying and strongly correlated conditions. We fit the arrival stream model to real data from an emergency department and demonstrate (by simulation) the performance of the novel staffing rule.

en cs.PF, math.PR
arXiv Open Access 2019
On the sojourn of an arbitrary customer in an $M/M/1$ Processor Sharing Queue

Fabrice Guillemin, Veronica Quintuna Rodriguez

In this paper, we consider the number of both arrivals and departures seen by a tagged customer while in service in a classical $M/M/1$ processor sharing queue. By exploiting the underlying orthogonal structure of this queuing system revealed in an earlier study, we compute the distributions of these two quantities and prove that they are equal in distribution. We moreover derive the asymptotic behavior of this common distribution. The knowledge of the number of departures seen by a tagged customer allows us to test the validity of an approximation, which consists of assuming that the tagged customer is randomly served among those customers in the residual busy period of the queue following the arrival of the tagged customer. A numerical evidence shows that this approximation is reasonable for moderate values of the number of departures, given that the asymptotic behaviors of the distributions are very different even if the exponential decay rates are equal.

en cs.PF
CrossRef Open Access 2018
КЛАСТЕРНАЯ САМООРГАНИЗАЦИЯ ИНТЕРМЕТАЛЛИЧЕСКИХ СИСТЕМ: МЕТАЛЛОКЛАСТЕРЫ Cs И Cs И МЕТАЛЛООКСИДНЫЙ КЛАСТЕР CsO ДЛЯ САМОСБОРКИ КРИСТАЛЛИЧЕСКОЙ СТРУКТУРЫ (Cs)(Cs)(CsO), "Физика и химия стекла"

В. Я. Шевченко, В.А. Блатов, Г.Д. Илюшин

Проведен геометрический и топологический анализ металлооксида с минимальным известным содержанием кислорода CsO, образующегося из кислородсодержащего расплава металлического Cs. Для определения кластеров-прекурсоров кристаллических структур использованы специальные алгоритмы разложения структурных графов на кластерные субструктуры (пакет программ ToposPro). Определены участвующие в самосборке кристаллических структур кластеры-прекурсоры: трехоктаэдрические кластеры CsO, октаэдрические кластеры Cs, тетраэдрические кластеры Cs. Реконструированы симметрийный и топологический коды процессов самосборки кристаллических структур из кластеров-прекурсоров в виде: первичная цепь микрослой микрокаркас.

arXiv Open Access 2018
AdaptMemBench: Application-Specific MemorySubsystem Benchmarking

Mahesh Lakshminarasimhan, Catherine Olschanowsky

Optimizing scientific applications to take full advan-tage of modern memory subsystems is a continual challenge forapplication and compiler developers. Factors beyond working setsize affect performance. A benchmark framework that exploresthe performance in an application-specific manner is essential tocharacterize memory performance and at the same time informmemory-efficient coding practices. We present AdaptMemBench,a configurable benchmark framework that measures achievedmemory performance by emulating application-specific accesspatterns with a set of kernel-independent driver templates. Thisframework can explore the performance characteristics of a widerange of access patterns and can be used as a testbed for potentialoptimizations due to the flexibility of polyhedral code generation.We demonstrate the effectiveness of AdaptMemBench with casestudies on commonly used computational kernels such as triadand multidimensional stencil patterns.

en cs.PF
arXiv Open Access 2018
A Measurement Theory of Locality

Liang Yuan, Chen Ding, Peter Denning et al.

Locality is a fundamental principle used extensively in program and system optimization. It can be measured in many ways. This paper formalizes the metrics of locality into a measurement theory. The new theory includes the precise definition of locality metrics based on access frequency, reuse time, reuse distance, working set, footprint, and the cache miss ratio. It gives the formal relation between these definitions and the proofs of equivalence or non-equivalence. It provides the theoretical justification for four successful locality models in operating systems, programming languages, and computer architectures which were developed empirically.

en cs.PF
arXiv Open Access 2017
A note on integrating products of linear forms over the unit simplex

Giuliano Casale

Integrating a product of linear forms over the unit simplex can be done in polynomial time if the number of variables n is fixed (V. Baldoni et al., 2011). In this note, we highlight that this problem is equivalent to obtaining the normalizing constant of state probabilities for a popular class of Markov processes used in queueing network theory. In light of this equivalence, we survey existing computational algorithms developed in queueing theory that can be used for exact integration. For example, under some regularity conditions, queueing theory algorithms can exactly integrate a product of linear forms of total degree N by solving N systems of linear equations.

en cs.PF, math.MG
arXiv Open Access 2016
Delay Bounds for Multiclass FIFO

Yuming Jiang, Vishal Misra

FIFO is perhaps the simplest scheduling discipline. For single-class FIFO, its delay guarantee performance has been extensively studied: The well-known results include a stochastic delay bound for $GI/GI/1$ by Kingman and a deterministic delay bound for $D/D/1$ by Cruz. However, for multiclass FIFO, few such results are available. To fill the gap, we prove delay bounds for multiclass FIFO in this work, considering both deterministic and stochastic cases. Specifically, delay bounds are presented for multiclass D/D/1, GI/GI/1 and G/G/1. In addition, examples are provided for several basic settings to demonstrate the obtained bounds in more explicit forms, which are also compared with simulation results.

en cs.PF
arXiv Open Access 2014
Heavy Traffic Limits for GI/H/n Queues: Theory and Application

Yousi Zheng, Ness Shroff, Prasun Sinha

We consider a GI/H/n queueing system. In this system, there are multiple servers in the queue. The inter-arrival time is general and independent, and the service time follows hyper-exponential distribution. Instead of stochastic differential equations, we propose two heavy traffic limits for this system, which can be easily applied in practical systems. In applications, we show how to use these heavy traffic limits to design a power efficient cloud computing environment based on different QoS requirements.

en cs.PF, cs.DC
arXiv Open Access 2011
Optimization strategies for parallel CPU and GPU implementations of a meshfree particle method

Jose M. Domínguez, Alejandro J. C. Crespo, Moncho Gómez-Gesteira

Much of the current focus in high performance computing (HPC) for computational fluid dynamics (CFD) deals with grid based methods. However, parallel implementations for new meshfree particle methods such as Smoothed Particle Hydrodynamics (SPH) are less studied. In this work, we present optimizations for both central processing unit (CPU) and graphics processing unit (GPU) of a SPH method. These optimization strategies can be further applied to many other meshfree methods. The obtained performance for each architecture and a comparison between the most efficient implementations for CPU and GPU are shown.

en cs.PF, cs.CE
arXiv Open Access 2011
Profit-Aware Server Allocation for Green Internet Services

Michele Mazzucco, Dmytro Dyachuk, Marios Dikaiakos

A server farm is examined, where a number of servers are used to offer a service to impatient customers. Every completed request generates a certain amount of profit, running servers consume electricity for power and cooling, while waiting customers might leave the system before receiving service if they experience excessive delays. A dynamic allocation policy aiming at satisfying the conflicting goals of maximizing the quality of users' experience while minimizing the cost for the provider is introduced and evaluated. The results of several experiments are described, showing that the proposed scheme performs well under different traffic conditions.

en cs.PF, cs.DC
arXiv Open Access 2010
Scaling Turbo Boost to a 1000 cores

Ananth Narayan S, Somsubhra Sharangi, Alexandra Fedorova

The Intel Core i7 processor code named Nehalem provides a feature named Turbo Boost which opportunistically varies the frequencies of the processor's cores. The frequency of a core is determined by core temperature, the number of active cores, the estimated power consumption, the estimated current consumption, and operating system frequency scaling requests. For a chip multi-processor(CMP) that has a small number of physical cores and a small set of performance states, deciding the Turbo Boost frequency to use on a given core might not be difficult. However, we do not know the complexity of this decision making process in the context of a large number of cores, scaling to the 100s, as predicted by researchers in the field.

en cs.PF, cs.OS

Halaman 9 dari 4532