Hasil untuk "cs.AR"

Menampilkan 20 dari ~184910 hasil · dari arXiv, CrossRef

JSON API
arXiv Open Access 2026
LUTstructions: Self-loading FPGA-based Reconfigurable Instructions

Philippos Papaphilippou

General-purpose processors feature a limited number of instructions based on an instruction set. They can be numerous, such as with vector extensions that include hundreds or thousands of instructions, but this comes at a cost; they are often unable to express arbitrary tasks efficiently. This paper explores the concept of having reconfigurable instructions by incorporating reconfigurable areas in a softcore. It follows a relatively-recently proposed computer architecture concept for seamlessly loading instruction implementation-carrying bitstreams from main memory. The resulting softcore is entirely evaluated on an FPGA, essentially having an FPGA-on-an-FPGA for the instruction implementations, with no notable operating frequency overhead. This is achieved with a custom FPGA architecture called LUTstruction, which is tailored towards low-latency for custom instructions and wide reconfiguration, as well as a soft implementation for the purposes of architectural exploration.

en cs.AR
arXiv Open Access 2024
Wavelet Based Frequency Detection Using FPGAs

Caleb Hill, Darshika G. Perera

In the realm of signal processing, frequency and spectrum detection are fundamental tasks that can be computationally intensive. This project leverages the power of FPGAs to perform wavelet analysis on an input signal. The goal is to detect the presence of a specific frequency component - in this case, 6 kHz. Our experiments demonstrate that wavelet-based spectral detection is both possible, and easily implemented using an FPGA.

en cs.AR
arXiv Open Access 2024
Testing Resource Isolation for System-on-Chip Architectures

Philippe Ledent, Radu Mateescu, Wendelin Serwe

Ensuring resource isolation at the hardware level is a crucial step towards more security inside the Internet of Things. Even though there is still no generally accepted technique to generate appropriate tests, it became clear that tests should be generated at the system level. In this paper, we illustrate the modeling aspects in test generation for resource isolation, namely modeling the behavior and expressing the intended test scenario. We present both aspects using the industrial standard PSS and an academic approach based on conformance testing.

en cs.AR, cs.CR
arXiv Open Access 2024
Mexican Computers: A Brief Technical and Historical Overview

Daniel Ortiz-Arroyo

The emergence of the microprocessor in the early 1970s allowed the design of computers that did not require the substantial economic resources of large computer companies of that era. Shortly after this event, a variety of computers based on microprocessors appeared in the United States and other developed countries. Unlike in those countries, where small and large companies developed most personal computers, in Mexico, the first microprocessor-based computers were designed within academic institutions. It is little known that Mexican computers of that era included a variety of systems ranging from purpose-specific research and teaching-oriented computers to high-performance personal computers. The goal of this article is to describe in detail some of these Mexican computers designed between the late 1970s and mid-1980s.

en cs.AR
arXiv Open Access 2024
L2R-CIPU: Efficient CNN Computation with Left-to-Right Composite Inner Product Units

Malik Zohaib Nisar, Mohammad Sohail Ibrahim, Muhammad Usman et al.

This paper proposes a composite inner-product computation unit based on left-to-right (LR) arithmetic for the acceleration of convolution neural networks (CNN) on hardware. The efficacy of the proposed L2R-CIPU method has been shown on the VGG-16 network, and assessment is done on various performance metrics. The L2R-CIPU design achieves 1.06x to 6.22x greater performance, 4.8x to 15x more TOPS/W, and 4.51x to 53.45x higher TOPS/mm2 than prior architectures.

en cs.AR
arXiv Open Access 2024
Hardware for converting floating-point to the microscaling (MX) format

Danila Gorodecky, Leonel Sousa

This paper proposes hardware converters for the microscaling format (MX-format), a reduced representation of floating-point numbers. We present an algorithm and a memory-free hardware model for converting 32 single-precision floating-point numbers to MX-format. The proposed model supports six different types of MX-format: E5M2, E4M3, E3M2, E2M3, E2M1, and INT8. The conversion process consists of three steps: calculating the maximum absolute value among 32 inputs, generating a shared scale, and producing 32 outputs in the selected MX-format type. The hardware converters were implemented in FPGA, and experimental results demonstrate.

en cs.AR
arXiv Open Access 2023
A Bit-Parallel Deterministic Stochastic Multiplier

Sairam Sri Vatsavai, Ishan Thakkar

This paper presents a novel bit-parallel deterministic stochastic multiplier, which improves the area-energy-latency product by up to 10.6$\times$10$^4$, while improving the computational error by 32.2\%, compared to three prior stochastic multipliers.

en cs.AR, cs.ET
arXiv Open Access 2022
Predictable Sharing of Last-level Cache Partitions for Multi-core Safety-critical Systems

Zhuanhao Wu, Hiren Patel

Last-level cache (LLC) partitioning is a technique to provide temporal isolation and low worst-case latency (WCL) bounds when cores access the shared LLC in multicore safety-critical systems. A typical approach to cache partitioning involves allocating a separate partition to a distinct core. A central criticism of this approach is its poor utilization of cache storage. Today's trend of integrating a larger number of cores exacerbates this issue such that we are forced to consider shared LLC partitions for effective deployments. This work presents an approach to share LLC partitions among multiple cores while being able to provide low WCL bounds.

en cs.AR
arXiv Open Access 2021
Elastic Silicon Interconnects: Abstracting Communication in Accelerator Design

John Demme

Communication is an important part of accelerator design, though it is under researched and under developed. Today, designers often face relatively low-level communication tools requiring them to design straightforward but error-prone plumbing. In this paper, we argue that raising the level of abstraction could yield correctness, productivity, and performance benefits not only for RTL-level designers but also for high level language developers.

en cs.AR, cs.PL
arXiv Open Access 2021
Content Addressable Parallel Processors on a FPGA

Ayush Salik, Manor Askenazi, Edward Rietman

In this short article, we report on the implementation of a Content Addressable Parallel Processor using a FPGA. While Content addressable memories have been implemented in FPGAs, to our knowledge this is the first implementation in FPGA of Caxton C. Foster's vision of parallel processing, particularly the notions of parallel write as well as the combining of output values, which are usually missing in more typical CAM implementations, such as the ones designed for network routing. The resulting CAPP is made accessible to a host computer over a USB/UART interface, using a straightforward serial protocol that is demonstrated using a Python-based driver.

en cs.AR
arXiv Open Access 2021
Dynamic Lockstep Processors for Applications with Functional Safety Relevance

Hans Dermot Doran, Timo Lang

Lockstep processing is a recognized technique for helping to secure functional-safety relevant processing against, for instance, single upset errors that might cause faulty execution of code. Lockstepping processors does however bind processing resources in a fashion not beneficial to architectures and applications that would benefit from multi-core/-processors. We propose a novel on-demand synchronizing of cores/processors for lock-step operation featuring post-processing resource release, a concept that facilitates the implementation of modularly redundant core/processor arrays. We discuss the fundamentals of the design and some implementation notes on work achieved to date.

en cs.AR
CrossRef Open Access 2021
The role of circumferential strain in the differential diagnosis of cardiomyopathies with left ventricular hypertrabeculation

A Szucs, ZS Gregor, AR Kiss et al.

Abstract Funding Acknowledgements Type of funding sources: Public grant(s) – National budget only. Main funding source(s): Supported by the ÚNKP-19-3-II New National Excellence Program of the Ministry for Innovation and Technology Dilated (DCM), hypertrophic (HCM) and noncompaction cardiomyopathy (NCMP) are genetically and morphologically overlapping diseases, however they differ in clinical manifestation, treatment and prognosis. Cardiac MRI feature-tracking might help to differentiate between these cardiomyopathies with left ventricular (LV) hypertrabeculation. We aimed to describe the differences in the functional and strain parameters of NCMP patients with good LV ejection fraction (EF, NCMP-G) compared with patients with HCM, and NCMP patients with reduced EF (NCMP-R) compared with patients with DCM . We included 62 NCMP patients from which 31 had good LV function and 31 had decreased LV-EF. The NCMP-G group was compared with an HCM population (n = 31) and the NCMP-R group was compared with a DCM group (n = 31) matching in age and sex (age, EF; NCMP-G 46.0 ± 13.0 years, 65.5 ± 5.3% vs. HCM 47.2 ± 14.4 years, 74.8 ± 6.3%; NCMP-R 54.5 ± 12.1 years, 32.8 ± 10.1% vs. DCM 50.8 ± 16.7 years, 34.0 ± 8.2%).  1.5 T Philips Achieva and Siemens Aera MRI machines were used for the scans, Medis Suite program was used for analysis and MedCalc software for statistics, p < 0.05 was considered statistically significant. Significant differences were found between the functional parameters of HCM and NCMP-G patients, while the DCM and NCMP-R groups differed only in the trabecular mass values (LV-trab, NCMP-G vs. HCM: 26.2 ± 7.5 vs. 30.7 ± 7.0 g/m2, NCMP-R vs. DCM:  48.2 ± 13.2 vs. 42.1 ± 10.1 g/m2, p < 0.05). The global longitudinal strain values of the studied populations were not significantly different, however the global circumferential strain (GCS) values were significantly better in patients with HCM and DCM compared with the NCMP groups (GCS, NCMP-G vs. HCM: -31.2 ± 4.9 vs. -43.0 ± 8.4%, NCMP-R vs. DCM: -11.7 ± 7.3 vs. -16.9 ± 6.1%). The average circumferential strain values of the LV basal, mid and apical parts were significantly better in the HCM and DCM groups compared with the NCMP groups (NCMP-G vs. HCM: -35.7 ± 9.5 vs. -50.5 ± 14.1%, NCMP-R vs. DCM: -29.5 ± 13.2 vs. -15.6 ± 6,7%).  We assessed the cut-off point of the average LV apical circumferential strain to differentiate the studied populations (HCM vs. NCMP-G cut-off: -47.3% sens.: 83.9%, spec.: 67.7%, AUC: 0.81; DCM vs. NCMP-R cut-off: -19.3% sens.: 83.9%, spec.: 83.9%, AUC: 0.86). The diverse circumferential strain values of the hypertrabeculated LV apical third could help the differential diagnosis of NCMP, DCM and HCM.

CrossRef Open Access 2020
Obesity and Multisite Pain in the Lower Limbs: Data from the Osteoarthritis Initiative

Vishal Vennu, Aqeel M. Alenazi, Tariq A. Abdulrahman et al.

Background. Although several studies investigated the relationship between obesity, osteoarthritis, and pain, no study examined the association between obesity and multijoint pain in the lower limbs. The purpose of this study was to address this gap. Method. This cross-sectional study was performed in Riyadh, Saudi Arabia, between March and April 2019. In this study, a total of 4,661 adults aged 45–79 years with or at high risk for knee osteoarthritis were included from the Osteoarthritis Initiative. The persons who had an elevated risk of developing symptoms of knee osteoarthritis during the study were defined as high risk for knee osteoarthritis. According to the body mass index, participants were categorized into three groups: normal weight (n = 1,068), overweight (n = 1,832), and obese (n = 1,761). Logistic regression was used to examine the association between obesity and multisite pain. Results. The odds of multisite pain was associated significantly (p<0.001) by 1.36 times higher with obesity than normal weight, no, or sigle-site pain, even after adjusting for sociodemographic and health variables Conclusion. Obesity is associated with an increased likelihood of multisite pain in the lower limbs. The results enable clinicians to adopt better standards of practice for the prevention and screening of multisite pain in this community.

12 sitasi en
arXiv Open Access 2020
A Ring Router Microarchitecture for NoCs

Wo-Tak Wu

Network-on-Chip (NoC) has become a popular choice for connecting a large number of processing cores in chip multiprocessor design. In a conventional NoC design, most of the area in the router is occupied by the buffers and the crossbar switch. These two components also consume the majority of the router's power. Much of the research in NoC has been based on the conventional router microarchitecture. We propose a novel router microarchitecture that treats the router itself as a small network of the ring topology. It eliminates the large crossbar switch in the conventional design. In addition, network latency is much reduced. Simulation and circuit synthesis show that the proposed microarchitecture can reduce the latency, area and power by 53%, 34% and 27%, respectively, compared to the conventional design.

en cs.AR
arXiv Open Access 2020
Comparing quaternary and binary multipliers

Daniel Etiemble

We compare the implementation of a 8x8 bit multiplier with two different implementations of a 4x4 quaternary digit multiplier. Interfacing this binary multiplier with quaternary to binary decoders and binary to quaternary encoders leads to a 4x4 multiplier that outperforms the best direct implementation of a 4x4 quaternary multiplier. The far greater complexity of the 1-digit multipliers and 1-digit adders used in this direct implementation compared to the binary 1-bit multipliers and full adders cannot be compensated by the reduced count of quaternary operators. As the best quaternary multiplier includes the corresponding binary one, it means that there is no opportunity to get less interconnects, less chip area, less power dissipation with the quaternary multiplier.

en cs.AR
arXiv Open Access 2020
Best implementations of quaternary adders

Daniel Etiemble

The implementation of a quaternary 1-digit adder composed of a 2-bit binary adder, quaternary to binary decoders and binary to quaternary encoders is compared with several recent implementations of quaternary adders. This simple implementation outperforms all other implementations using only one power supply. It is equivalent to the best other implementation using three power supplies. The best quaternary adder using a 2-bit binary adder, the interface circuits between quaternary and binary levels are just overhead compared to the binary adder. This result shows that the quaternary approach for adders use more transistors, more chip area and more power dissipation than the corresponding binary ones.

en cs.AR
arXiv Open Access 2018
Hardware realization of residue number system algorithms by Boolean functions minimization

Danila Gorodecky, Tiziano Villa

Residue number systems (RNS) represent numbers by their remainders modulo a set of relatively prime numbers. This paper pro- poses an efficient hardware implementation of modular multiplication and of the modulo function (X(mod P)), based on Boolean minimiza- tion. We report experiments showing a performance advantage up to 30 times for our approach vs. the results obtained by state-of-art industrial tools.

en cs.AR

Halaman 7 dari 9246