When a single core is scaled up to m cores occupying the same chip area and executing the same (parallelizable) task, achievable speedup is square-root m, power is reduced by square-root m and energy is reduced by m. Thus, many-core architectures can efficiently outperform architectures of a single core and a small-count multi-core.
With the continuous development of modern science and technology, radar detection, video surveillance and perimeter alarm system are more and more widely used in the field of social security. This paper introduces video surveillance and perimeter alarm in detail, mathematical modeling and key technologies, analyzes their fusion and application status, and puts forward suggestions combined with the development trend of intelligent security system in the future.
This paper introduces a novel 32-bit microprocessor, based on the RISC-V instruction set architecture, is designed,utilising a dynamic clock source to achieve high efficiency, overcoming the limitations of hardware delays. In addition, the microprocessor is also aimed to operate with both base (32-bit) instructions and 16-bit compressed instructions. The testing of the design is carried out using ModelSim with an ideal result.
Muntjac is an open-source collection of components which can be used to build a multicore, Linux-capable system-on-chip. This includes a 64-bit RISC-V core, a cache subsystem, and TileLink interconnect allowing cache-coherent multicore configurations. Each component is easy to understand, verify, and extend, with most being configurable enough to be useful across a wide range of applications.
This paper presents a brief journey to the evolution of computer hardware and software, and underlines that shift to multicore technology is natual part of the evolution, and highlights the various laws governing the advancement of computer industry. Looking to these, it appears that the HW-SW industry trend can be represented by a mathematical model, for which future developments are predictable. Finally, the paper establishes that future of computer industry lies in more thrust in software to exploit parallelism available in software to utilize the heterogeneity in multicore processors.
The Softmax activation layer is a very popular Deep Neural Network (DNN) component when dealing with multi-class prediction problems. However, in DNN accelerator implementations it creates additional complexities due to the need for computation of the exponential for each of its inputs. In this brief we propose a simplified version of the activation unit for accelerators, where only a comparator unit produces the classification result, by choosing the maximum among its inputs. Due to the nature of the activation function, we show that this result is always identical to the classification produced by the Softmax layer.
In future data centers, applications will make heavy use of far memory (including disaggregated memory pools and NVM). The access latency of far memory is more widely distributed than that of local memory accesses. This makes the efficiency of traditional blocking load/store in most general-purpose processors decrease in this scenario. Therefore, this work proposes an in-core asynchronous memory access unit.
This whitepaper proposes a unified framework for hardware design tools to ease the development and inter-operability of said tools. By creating a large ecosystem of hardware development tools across vendors, academia, and the open source community, we hope to significantly increase much need productivity in hardware design.
This paper is a review of the developments in Instruction level parallelism. It takes into account all the changes made in speeding up the execution. The various drawbacks and dependencies due to pipelining are discussed and various solutions to overcome them are also incorporated. It goes ahead in the last section to explain where is the new research leading us.
Sudarshan Sharma, Dhruv Thapar, Nikhil Bhelave
et al.
Physically Unclonable Functions (PUFs) are lightweight cryptographic primitives for generating unique signatures from minuscule manufacturing variations. In this work, we present lightweight, area efficient and low power adaptive multi-bit SRAM topology based Current Mirror Array (CMA) analog PUF design for securing the sensor nodes, authentication and key generation. The proposed Strong PUF increases the complexity of the machine learning attacks thus making it difficult for the adversary. The design is based on scl180 library.
The FPGA overlay architectures have been mainly proposed to improve design productivity, circuit portability and system debugging. In this paper, we address the use of overlay architectures for building fault tolerant SRAM-based FPGA systems and discuss the main features and design challenges of a reliability-aware overlay architecture.
Modern System-on-Chip (SoC) platforms typically consist of multiple processors and a communication interconnect between them. Network-on-Chip (NoC) arises as a solution to interconnect these systems, which provides a scalable, reusable, and an efficient interconnect. For these SoC platforms, multicast communication is significantly used for parallel applications. Cache coherency in distributed sharedmemory,clock synchronization, replication, or barrier synchronization are examples of these requests. This paper presents an overview of research on NoC with support for multicast communication and delineates the major issues addressed so far by the scientific community in this investigation area.
This volume contains the papers accepted at the First International Workshop on FPGAs for Software Programmers (FSP 2014), held in Munich, Germany, September 1st, 2014. FSP 2014 was co-located with the International Conference on Field Programmable Logic and Applications (FPL).
Christelle Hobeika, Claude Thibeault, Jean-François Boland
We proposed in "Functional Constraint Extraction From Register Transfer Level for ATPG" that is currently submitted to TVLSI, an automatic functional constraint extractor that can be applied on the RT level. These functional constraints are used to generate pseudo functional test patterns with ATPG tools. The patterns are then used to improve the verification process. This technical report complements the work proposed as it contains the implementation details of the proposed methodology and shows the detailed intermediate and final results of the application of this methodology on a concrete example.
Digital mobile systems must function with low power, small size and weight, and low cost. High-performance desktop microprocessors, with built-in floating point hardware, are not suitable in these cases. For embedded systems, it can be advantageous to implement these calculations with fixed point arithmetic instead. We present an automated fixed-point data path synthesis tool FpSynt for designing embedded applications in fixed-point domain with sufficient accuracy for most applications. FpSynt is available under the GNU General Public License from the following GitHub repository: http://github.com/izhbannikov/FPSYNT
Nasif Muslim, Md. Tanvir Adnan, Mohammad Zahidul Kabir
et al.
In this paper, a digital clock is designed where the microcontroller is used for timing controller and the font of the Bangla digits are designed, and programmed within the microcontroller. The design is cost effective, simple and easy for maintenance.
Reversible logic is experience renewed interest as we are approach the limits of CMOS technologies. While physical implementations of reversible gates have yet to materialize, it is safe to assume that they will rely on faulty individual components. In this work we present a present a method to provide fault tolerance to a reversible circuit based on invariant relationships.