Hasil "cs.AR" - JURNALIN

arXiv Open Access 2025

Adding MFMA Support to gem5

Marco Kurzynski, Matthew D. Sinclair

In this work we have enhanced gem5's GPU model support to add Matrix Core Engines (MCEs). Specifically, on the AMD MI200 and MI300 GPUs that gem5 supports, these MCEs perform Matrix Fused Multiply Add (MFMA) instructions for a variety of precisions. By adding this support, our changes enable running state-of-the-art ML workloads in gem5, as well as examining how MCE optimizations impact the behavior of future systems.

en cs.AR

Detail Sumber

arXiv Open Access 2025

Functional Stability of Software-Hardware Neural Network Implementation The NeuroComp Project

Bychkov Oleksii, Senysh Taras

This paper presents an innovative approach to ensuring functional stability of neural networks through hardware redundancy at the individual neuron level. Unlike the classical Dropout method, which is used during training for regularization purposes, the proposed system ensures resilience to hardware failures during network operation. Each neuron is implemented on a separate microcomputer (ESP32), allowing the system to continue functioning even when individual computational nodes fail.

en cs.AR, cs.NE

Detail Sumber

arXiv Open Access 2023

Reducing the memory usage of Lattice-Boltzmann schemes with a DWT-based compression

Clément Flint, Philippe Helluy

This paper presents a new solution to address the challenge of increasing memory usage in high-performance computing simulations of Lattice-Bolzmann or Finite-Volume schemes.Our approach utilises a lossy compression scheme based on the Discrete Wavelet Transform (DWT) to achieve high compression ratios while preserving the accuracy of the simulation.Our evaluation on two different FV/LBM schemes demonstrates that the approach can reduce memory usage by several orders of magnitude.

en cs.AR, cs.DC

Detail Sumber

arXiv Open Access 2022

Fast Efficient Fixed-Size Memory Pool: No Loops and No Overhead

Ben Kenwright

In this paper, we examine a ready-to-use, robust, and computationally fast fixed-size memory pool manager with no-loops and no-memory overhead that is highly suited towards time-critical systems such as games. The algorithm achieves this by exploiting the unused memory slots for bookkeeping in combination with a trouble-free indexing scheme. We explain how it works in amalgamation with straightforward step-by-step examples. Furthermore, we compare just how much faster the memory pool manager is when compared with a system allocator (e.g., malloc) over a range of allocations and sizes.

en cs.AR

Detail Sumber

arXiv Open Access 2022

Democratizing Domain-Specific Computing

Yuze Chi, Weikang Qiao, Atefeh Sohrabizadeh et al.

In the past few years, domain-specific accelerators (DSAs), such as Google's Tensor Processing Units, have shown to offer significant performance and energy efficiency over general-purpose CPUs. An important question is whether typical software developers can design and implement their own customized DSAs, with affordability and efficiency, to accelerate their applications. This article presents our answer to this question.

en cs.AR, cs.PL

Detail Sumber

CrossRef Open Access 2021

Identification of Abnormal Patterns in AR (1) Process Using CS-SVM

Hongshuo Zhang, Bo Zhu, Kaimin Pang et al.

2 sitasi en

Detail DOI Sumber

CrossRef Open Access 2019

Total synthesis of (+)-<i>ar</i>-macrocarpene

Arindam Khatua, Sovan Niyogi, Vishnumaya Bisai

First total synthesis of a naturally occurring sesquiterpenoid, (+)-ar-macrocarpene, has been achieved via a key [3,3]-sigmatropic rearrangement effecting reductive transposition through allylic diazene rearrangement (ADR).

20 sitasi en

Detail DOI Sumber

arXiv Open Access 2017

The Effect of Temperature on Amdahl Law in 3D Multicore Era

Leonid Yavits, Amir Morad, Ran Ginosar

This work studies the influence of temperature on performance and scalability of 3D Chip Multiprocessors (CMP) from Amdahl law perspective. We find that 3D CMP may reach its thermal limit before reaching its maximum power. We show that a high level of parallelism may lead to high peak temperatures even in small scale 3D CMPs, thus limiting 3D CMP scalability and calling for different, in-memory computing architectures.

en cs.AR

Detail Sumber

CrossRef Open Access 2016

Differential Patterns of Persistent Opioid Use in Patients with Cancer and Non-Cancer Pain

AR Sani, CS Zin, ZM Noor et al.

en

Detail DOI Sumber

arXiv Open Access 2016

Epiphany-V: A 1024 processor 64-bit RISC System-On-Chip

Andreas Olofsson

This paper describes the design of a 1024-core processor chip in 16nm FinFet technology. The chip ("Epiphany-V") contains an array of 1024 64-bit RISC processors, 64MB of on-chip SRAM, three 136-bit wide mesh Networks-On-Chip, and 1024 programmable IO pins. The chip has taped out and is being manufactured by TSMC. This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA). The views, opinions and/or findings expressed are those of the author and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

en cs.AR

Detail Sumber

arXiv Open Access 2015

FPGA Implementation of High Speed Baugh-Wooley Multiplier using Decomposition Logic

Ananda Kiran, Navdeep Prashar

The Baugh-Wooley algorithm is a well-known iterative algorithm for performing multiplication in digital signal processing applications. Decomposition logic is used with Baugh-Wooley algorithm to enhance the speed and to reduce the critical path delay. In this paper a high speed multiplier is designed and implemented using decomposition logic and Baugh-Wooley algorithm. The result is compared with booth multiplier. FPGA based architecture is presented and design has been implemented using Xilinx 12.3 device.

en cs.AR

Detail Sumber

arXiv Open Access 2015

Proceedings of the Second International Workshop on FPGAs for Software Programmers (FSP 2015)

Frank Hannig, Dirk Koch, Daniel Ziener

This volume contains the papers accepted at the Second International Workshop on FPGAs for Software Programmers (FSP 2015), held in London, United Kingdom, September 1st, 2015. FSP 2015 was co-located with the International Conference on Field Programmable Logic and Applications (FPL).

en cs.AR, cs.DC

Detail Sumber

arXiv Open Access 2013

Advances in computer architecture

Irfan Uddin

In the past, efforts were taken to improve the performance of a processor via frequency scaling. However, industry has reached the limits of increasing the frequency and therefore concurrent execution of instructions on multiple cores seems the only possible option. It is not enough to provide concurrent execution by the hardware, software also have to introduce concurrency in order to exploit the parallelism.

en cs.AR

Detail Sumber

arXiv Open Access 2013

On the Performance Potential of Speculative Execution based on Branch and Value Prediction

Pece Mitrevski, Marjan Gusev

Fluid Stochastic Petri Nets are used to capture the dynamic behavior of an ILP processor, and discrete-event simulation is applied to assess the performance potential of predictions and speculative execution in boosting the performance of ILP processors that fetch, issue, execute and commit a large number of instructions per cycle.

en cs.AR

Detail DOI Sumber

arXiv Open Access 2013

A Technique for Efficiently Managing SRAM-NVM Hybrid Cache

Sparsh Mittal

In this paper, we present a SRAM-PCM hybrid cache design, along with a cache replacement policy, named dead fast block (DFB) to manage the hybrid cache. This design aims to leverage the best features of both SRAM and PCM devices. Compared to a PCM-only cache, the hybrid cache with DFB policy provides superior results on all relevant evaluation metrics, viz. cache lifetime, performance and energy efficiency. Also, use of DFB policy for managing the hybrid cache provides better results compared to LRU replacement policy on all the evaluation metrics.

en cs.AR

Detail Sumber

arXiv Open Access 2011

Multi-core processors - An overview

Balaji Venu

Microprocessors have revolutionized the world we live in and continuous efforts are being made to manufacture not only faster chips but also smarter ones. A number of techniques such as data level parallelism, instruction level parallelism and hyper threading (Intel's HT) already exists which have dramatically improved the performance of microprocessor cores. This paper briefs on evolution of multi-core processors followed by introducing the technology and its advantages in today's world. The paper concludes by detailing on the challenges currently faced by multi-core processors and how the industry is trying to address these issues.

en cs.AR

Detail Sumber

arXiv Open Access 2010

Associative control processor with a rigid structure

Isa Magomedov, Omar Khazamov

The approach of applying associative processor for decision making problem was proposed. It focuses on hardware implementations of fuzzy processing systems, associativity as effective management basis of fuzzy processor. The structural approach is being developed resulting in a quite simple and compact parallel associative memory unit (PAMU). The memory cost and speed comparison of processors with rigid and soft-variable structure is given. Also the example PAMU flashing is considered.

en cs.AR, cs.AI

Detail Sumber

CrossRef Open Access 2009

PSY26 ECONOMIC EVALUATION OF THE ADDITION OF RITUXIMAB TO CVP FOR ADVANCED FOLLICULAR LYMPHOMA IN ROMANIA

AR Lupu, PC Radu, JA Ray et al.

en

Detail DOI Sumber

CrossRef Open Access 1989

Adsorption of Cs and its effects on the oxidation of the Ar+ sputtered Si(100)2 × 1 substrate

C.A. Papageorgopoulos, M. Kamaratos

36 sitasi en

Detail DOI Sumber

arXiv Open Access 2007

Hotspot Prevention Through Runtime Reconfiguration in Network-On-Chip

G. M. Link, N. Vijaykrishnan

Many existing thermal management techniques focus on reducing the overall power consumption of the chip, and do not address location-specific temperature problems referred to as hotspots. We propose the use of dynamic runtime reconfiguration to shift the hotspot-inducing computation periodically and make the thermal profile more uniform. Our analysis shows that dynamic reconfiguration is an effective technique in reducing hotspots for NoCs.

en cs.AR

Detail Sumber

Hasil untuk "cs.AR"