In this paper, our objective is to design, build, and verify a closed-loop environmental control system tailored for small-scale agriculture applications. This project aims to develop a low-cost, safety-critical embedded solution using the Nuvoton NUC140 microcontroller to automate temperature regulation. The goal was to mitigate crop yield losses caused by environmental fluctuations in a greenhouse. Our final implemented system successfully meets all design specifications, demonstrating robust temperature regulation through a PID control loop and ensuring hardware safety through galvanic isolation
Adeel Ahmad, Ahmad Tameem Kamal, Nouman Amir
et al.
This project enables RISC-V microkernel support in IREE, an MLIR-based machine learning compiler and runtime. The approach begins by enabling the lowering of MLIR linalg dialect contraction ops to linalg.mmt4d op for the RISC-V64 target within the IREE pass pipeline, followed by the development of optimized microkernels for RISC-V. The performance gains are compared with upstream IREE and Llama.cpp for the Llama-3.2-1B-Instruct model.
Mastering computational architectures is essential for developing fast and power-efficient programs. Our advanced simulator empowers both IT students and professionals to grasp the fundamentals of superscalar RISC-V processors, HW/SW co-design and HPC optimization techniques. With customizable processor and memory architecture, full C compiler support, and detailed runtime statistics, this tool offers a comprehensive learning experience. Enjoy the convenience of a modern, web-based GUI to enhance your understanding and skills.
Veryl, a hardware description language based on SystemVerilog, offers optimized syntax tailored for logic design, ensuring synthesizability and simplifying common constructs. It prioritizes interoperability with SystemVerilog, allowing for smooth integration with existing projects while maintaining high readability. Additionally, Veryl includes a comprehensive set of development support tools, such as package managers and real-time checkers, to boost productivity and streamline the design process. These features empower designers to conduct high-quality hardware design efficiently.
Many scientific computing problems can be reduced to Matrix-Matrix Multiplications (MMM), making the General Matrix Multiply (GEMM) kernels in the Basic Linear Algebra Subroutine (BLAS) of interest to the high-performance computing community. However, these workloads have a wide range of numerical requirements. Ill-conditioned linear systems require high-precision arithmetic to ensure correct and reproducible results. In contrast, emerging workloads such as deep neural networks, which can have millions up to billions of parameters, have shown resilience to arithmetic tinkering and precision lowering.
AbstractEarthquakes, floods, and landslides are frequent disasters in Indonesia. They can happen anytime. Recently Department of Architecture Faculty of Engineering Universitas Indonesia, together with the Indonesian Red Cross in Bogor Regency and its volunteers squat (SIBAT) had conducted disaster role-playing as a preparedness program for elementary school students. The role-play was aimed to introduce a quick response for elementary school students by the time the disaster happened at their school. Two kinds of disasters were chosen based on the school location's characteristics: flooding and earthquake. The method involved in this study is action-research. In the end, the students were asked to draw their spatial experience in one piece of paper as a reflection. Additionally, the team also gave feedback questions or quiz to measure the starting point of the role-playing which expected to shape their new knowledge. For in-depth aspects, the students also received a badge as the inauguration sign of “Agent of responsive to disaster”. This paper outlines the fact-finding during the role-playing as the students' response to the environment. The findings are useful as the baseline for a disaster quick response preparedness module for elementary students' development.
GPUs rely on large register files to unlock thread-level parallelism for high throughput. Unfortunately, large register files are power hungry, making it important to seek for new approaches to improve their utilization. This paper introduces a new register file organization for efficient register-packing of narrow integer and floating-point operands designed to leverage on advances in static analysis. We show that the hardware/software co-designed register file organization yields a performance improvement of up to 79%, and 18.6%, on average, at a modest output-quality degradation.
In-order scalar RISC architectures have been the dominant paradigm in FPGA soft processor design for twenty years. Prior out-of-order superscalar implementations have not exhibited competitive area or absolute performance. This paper describes a new way to build fast and area-efficient out-of-order superscalar soft processors by utilizing an Explicit Data Graph Execution (EDGE) instruction set architecture. By carefully mapping the EDGE microarchitecture, and in particular, its dataflow instruction scheduler, we demonstrate the feasibility of an out-of-order FPGA architecture. Two scheduler design alternatives are compared.
Caches only exploit spatial and temporal locality in a set of address referenced in a program. Due to dynamic construction of linked data-structures, they are difficult to cache as the spatial locality between the nodes is highly dependent on the data layout. Prefetching can play an important role in improving the performance of linked data-structures. In this project, a pointer chase mechanism along with compiler hints is adopted to design a prefetcher for linked data-structures. The design is evaluated against the baseline of processor with cache in terms of performance, area and energy
We describe the computing tasks involved in autonomous driving, examine existing autonomous driving computing platform implementations. To enable autonomous driving, the computing stack needs to simultaneously provide high performance, low power consumption, and low thermal dissipation, at low cost. We discuss possible approaches to design computing platforms that will meet these needs.
Power consumption, off-chip memory bandwidth, chip area and Network on Chip (NoC) capacity are among main chip resources limiting the scalability of Chip Multiprocessors (CMP). A closed form analytical solution for optimizing the CMP cache hierarchy and optimally allocating area among hierarchy levels under such constrained resources is developed. The optimization framework is extended by incorporating the impact of data sharing on cache miss rate. An analytical model for cache access time as a function of cache size is proposed and verified using CACTI simulation.
Jorge I. Galván-Tejada, José M. Celaya-Padilla, Victor Treviño
et al.
In this work, the potential of X-ray based multivariate prognostic models to predict the onset of chronic knee pain is presented. Using X-rays quantitative image assessments of joint-space-width (JSW) and paired semiquantitative central X-ray scores from the Osteoarthritis Initiative (OAI), a case-control study is presented. The pain assessments of the right knee at the baseline and the 60-month visits were used to screen for case/control subjects. Scores were analyzed at the time of pain incidence (T-0), the year prior incidence (T-1), and two years before pain incidence (T-2). Multivariate models were created by a cross validated elastic-net regularized generalized linear models feature selection tool. Univariate differences between cases and controls were reported by AUC,C-statistics, and ODDs ratios. Univariate analysis indicated that the medial osteophytes were significantly more prevalent in cases than controls:C-stat 0.62, 0.62, and 0.61, at T-0, T-1, and T-2, respectively. The multivariate JSW models significantly predicted pain: AUC = 0.695, 0.623, and 0.620, at T-0, T-1, and T-2, respectively. Semiquantitative multivariate models predicted paint withC-stat = 0.671, 0.648, and 0.645 at T-0, T-1, and T-2, respectively. Multivariate models derived from plain X-ray radiography assessments may be used to predict subjects that are at risk of developing knee pain.
This paper shows the usage of C-Slow Retiming (CSR) in safety critical and low power applications. CSR generates C copies of a design by reusing the given logic resources in a time sliced fashion. When all C design copies are stimulated with the same input values, then all C design copies should behave the same way and will therefore create a redundant system. The paper shows that this special method of using CSR offers great benefits when used in safety critical and low power applications. Additional optimization techniques towards reducing register count are shown and an on-the-fly recovery mechanism is discussed.
We describe the formal language MASC, based on a subset of SystemC and intended for modeling algorithms to be implemented in hardware. By means of a special-purpose parser, an algorithm coded in SystemC is converted to a MASC model for the purpose of documentation, which in turn is translated to ACL2 for formal verification. The parser also generates a SystemC variant that is suitable as input to a high-level synthesis tool. As an illustration of this methodology, we describe a proof of correctness of a simple 32-bit radix-4 multiplier.
It is intended in this document to introduce a handy systematic method for enumerating all possible data dependency cases that could occur between any two instructions that might happen to be processed at the same time at different stages of the pipeline. Given instructions of the instruction set, specific information about operands of each instruction and when an instruction reads or writes data, the method could be used to enumerate all possible data hazard cases and to determine whether forwarding or stalling is suitable for resolving each case.
Vladimir Hahanov, Eugenia Litvinova, Wajeb Gharibi
et al.
An algebra-logical repair method for FPGA functional logic blocks on the basis of solving the coverage problem is proposed. It is focused on implementation into Infrastructure IP for system-on-a chip and system-in-package. A method is designed for providing the operability of FPGA blocks and digital system as a whole. It enables to obtain exact and optimal solution associated with the minimum number of spares needed to repair the FPGA logic components with multiple faults.
Linlin Zhang, Anne Claire Legrand, Virginie Fresse
et al.
An adaptive FPGA architecture based on the NoC (Network-on-Chip) approach is used for the multispectral image correlation. This architecture must contain several distance algorithms depending on the characteristics of spectral images and the precision of the authentication. The analysis of distance algorithms is required which bases on the algorithmic complexity, result precision, execution time and the adaptability of the implementation. This paper presents the comparison of these distance computation algorithms on one spectral database. The result of a RGB algorithm implementation was discussed.
Pipelining is a well understood and often used implementation technique for increasing the performance of a hardware system. We develop several SystemC/C++ modeling techniques that allow us to quickly model, simulate, and evaluate pipelines. We employ a small domain specific language (DSL) based on resource usage patterns that automates the drudgery of boilerplate code needed to configure connectivity in simulation models. The DSL is embedded directly in the host modeling language SystemC/C++. Additionally we develop several techniques for parameterizing a pipeline's behavior based on policies of function, communication, and timing (performance modeling).