Hasil untuk "Computer engineering. Computer hardware"

Menampilkan 20 dari ~8499895 hasil · dari DOAJ, Semantic Scholar, arXiv, CrossRef

JSON API
S2 Open Access 2024
Artificial Neural Networks for Space and Safety-Critical Applications: Reliability Issues and Potential Solutions

Paolo Rech

Machine learning is among the greatest advancements in computer science and engineering and is today used to classify or detect objects, a key feature in autonomous vehicles. Since neural networks are heavily used in safety-critical applications, such as automotive and aerospace, their reliability must be paramount. However, the reliability evaluation of neural network systems is extremely challenging due to the complexity of the software, which is composed of hundreds of layers, and the underlying hardware, typically a parallel device or an embedded accelerator. This article reviews fundamental concepts of artificial intelligence, deep neural networks, and parallel computing device reliability. Then, the reliability studies that consider the radiation effects in the hardware, their propagation through the computing architecture, and their final impact on the software output are summarized. A detailed survey of the available strategies to measure the sensitivity of neural network frameworks and observe fault propagation is given, together with a summary of the data obtained so far. Finally, a discussion on how to use the experimental evaluation to design effective and efficient hardening solutions for artificial neural networks is provided. The available hardening solutions are critically reviewed, highlighting their benefits and drawbacks.

51 sitasi en
DOAJ Open Access 2025
Powertan: a Revolution for the Tanning Leather Sector

Omar Salmi, Fabio Pastori, Marco Marinsalta et al.

Powertan is a disrupting new approach for leather tanning were the penetration inside the hide to be tanned is enhanced by the use of an externally applied electric field. Thus, the penetration is no longer controlled by the fickian diffusion mechanism by ion migration.The result is a dramatical decrease of the process time, from the almost 24 h of the traditional drum operation to few minutes. Moreover, the electric field reduces the necessity of ancillary operations like the pickling and the basification with a reduction to about one tenth of the bath/leather ratio from the about 20 L/kg. Here, the first small scale batch tests will be presented with a preliminar modeling interpretation.

Chemical engineering, Computer engineering. Computer hardware
arXiv Open Access 2025
Computing-In-Memory Dataflow for Minimal Buffer Traffic

Choongseok Song, Doo Seok Jeong

Computing-In-Memory (CIM) offers a potential solution to the memory wall issue and can achieve high energy efficiency by minimizing data movement, making it a promising architecture for edge AI devices. Lightweight models like MobileNet and EfficientNet, which utilize depthwise convolution for feature extraction, have been developed for these devices. However, CIM macros often face challenges in accelerating depthwise convolution, including underutilization of CIM memory and heavy buffer traffic. The latter, in particular, has been overlooked despite its significant impact on latency and energy consumption. To address this, we introduce a novel CIM dataflow that significantly reduces buffer traffic by maximizing data reuse and improving memory utilization during depthwise convolution. The proposed dataflow is grounded in solid theoretical principles, fully demonstrated in this paper. When applied to MobileNet and EfficientNet models, our dataflow reduces buffer traffic by 77.4-87.0%, leading to a total reduction in data traffic energy and latency by 10.1-17.9% and 15.6-27.8%, respectively, compared to the baseline (conventional weight-stationary dataflow).

en cs.AR, cs.AI
arXiv Open Access 2025
Computing Linear Regions in Neural Networks with Skip Connections

Johnny Joyce, Jan Verschelde

Neural networks are important tools in machine learning. Representing piecewise linear activation functions with tropical arithmetic enables the application of tropical geometry. Algorithms are presented to compute regions where the neural networks are linear maps. Through computational experiments, we provide insights on the difficulty to train neural networks, in particular on the problems of overfitting and on the benefits of skip connections.

en cs.LG, cs.SC
arXiv Open Access 2025
Fix: externalizing network I/O in serverless computing

Yuhan Deng, Akshay Srivatsan, Sebastian Ingino et al.

We describe a system for serverless computing where users, programs, and the underlying platform share a common representation of a computation: a deterministic procedure, run in an environment of well-specified data or the outputs of other computations. This representation externalizes I/O: data movement over the network is performed exclusively by the platform. Applications can describe the precise data needed at each stage, helping the provider schedule tasks and network transfers to reduce starvation. The design suggests an end-to-end argument for outsourced computing, shifting the service model from ``pay-for-effort'' to ``pay-for-results.''

en cs.OS, cs.DC
DOAJ Open Access 2024
Non-Motorized License Plate Recognition and Localization Method Based on Semantic Alignment and Hierarchical Optimization

TAN Ruoqi, DONG Minggang, ZHAO Weixiao, WU Tianhao

Holding non-motorized vehicles accountable for legal violations effectively enhances urban traffic safety. Non-motorized vehicle license plates are characterized by small size, dense distribution, and ease of being obscured, which leads to significant feature information loss during the detection process in traditional deep learning-based methods. A non-motorized vehicle license plate recognition and localization method based on semantic alignment and hierarchical optimization is proposed. In this method, a semantic alignment module is designed for the underlying information fusion. During the upsampling process, low-level target information is used to guide the fusion of high-level semantics downwards, addressing the loss of small target features caused by conflicts between high- and low-level semantics. Subsequently, a hierarchical optimization module is constructed within the CSP structure to replace the deep ELAN module. This module uses a stack of a few convolutional kernel modules to extract the target information, reducing the number of network layers and preventing the loss of feature information at deeper levels. In the final stage, the K-Means++ algorithm is employed to cluster and obtain the initial anchor boxes suitable for non-motorized license plates to reduce the matching error during the training process. This approach aims to improve the accuracy of small-object recognition and localization. The experimental results demonstrate that the proposed method achieves a recognition and localization accuracy of 90.95% on a non-motorized vehicle license plate dataset. Compared with representative methods such as YOLOv7 and YOLOv8, it improves the accuracy by at least 3.58%. The proposed approach is effective for non-motorized vehicle license plate recognition and localization.

Computer engineering. Computer hardware, Computer software
DOAJ Open Access 2024
New Insights into Gas-in-Oil-Based Fault Diagnosis of Power Transformers

Felipe M. Laburú, Thales W. Cabral, Felippe V. Gomes et al.

The dissolved gas analysis of insulating oil in power transformers can provide valuable information about fault diagnosis. Power transformer datasets are often imbalanced, worsening the performance of machine learning-based fault classifiers. A critical step is choosing the proper evaluation metric to select features, models, and oversampling techniques. However, no clear-cut, thorough guidance on that choice is available to date. In this work, we shed light on this subject by introducing new tailored evaluation metrics. Our results and discussions bring fresh insights into which learning setups are more effective for imbalanced datasets.

DOAJ Open Access 2024
Fast Exploring Literature by Language Machine Learning for Perovskite Solar Cell Materials Design

Lei Zhang, Yiru Huang, Leiming Yan et al.

Making computers automatically extract latent scientific knowledge from literature is highly desired for future materials and chemical research in the artificial intelligence era. Herein, the natural language processing (NLP)‐based machine learning technique to build language models and automatically extract hidden information regarding perovskite solar cell (PSC) materials from 29 060 publications is employed. The concept that there are light‐absorbing materials, electron‐transporting materials, and hole‐transporting materials in PSCs is successfully learned by the NLP‐based machine learning model without a time‐consuming human expert training process. The NLP model highlights a hole‐transporting material that receives insufficient attention in the literature, which is then elaborated via density functional theory calculations to provide an atomistic view of the perovskite/hole‐transporting layer heterostructures and their optoelectronic properties. Finally, the above results are confirmed by device experiments. The present study demonstrates the viability of NLP as a universal machine learning tool to extract useful information from existing publications.

Computer engineering. Computer hardware, Control engineering systems. Automatic machinery (General)
arXiv Open Access 2024
Using the Abstract Computer Architecture Description Language to Model AI Hardware Accelerators

Mika Markus Müller, Alexander Richard Manfred Borst, Konstantin Lübeck et al.

Artificial Intelligence (AI) has witnessed remarkable growth, particularly through the proliferation of Deep Neural Networks (DNNs). These powerful models drive technological advancements across various domains. However, to harness their potential in real-world applications, specialized hardware accelerators are essential. This demand has sparked a market for parameterizable AI hardware accelerators offered by different vendors. Manufacturers of AI-integrated products face a critical challenge: selecting an accelerator that aligns with their product's performance requirements. The decision involves choosing the right hardware and configuring a suitable set of parameters. However, comparing different accelerator design alternatives remains a complex task. Often, engineers rely on data sheets, spreadsheet calculations, or slow black-box simulators, which only offer a coarse understanding of the performance characteristics. The Abstract Computer Architecture Description Language (ACADL) is a concise formalization of computer architecture block diagrams, which helps to communicate computer architecture on different abstraction levels and allows for inferring performance characteristics. In this paper, we demonstrate how to use the ACADL to model AI hardware accelerators, use their ACADL description to map DNNs onto them, and explain the timing simulation semantics to gather performance results.

en cs.AR, cs.AI
arXiv Open Access 2024
EN-T: Optimizing Tensor Computing Engines Performance via Encoder-Based Methodology

Qizhe Wu, Yuchen Gui, Zhichen Zeng et al.

Tensor computations, with matrix multiplication being the primary operation, serve as the fundamental basis for data analysis, physics, machine learning, and deep learning. As the scale and complexity of data continue to grow rapidly, the demand for tensor computations has also increased significantly. To meet this demand, several research institutions have started developing dedicated hardware for tensor computations. To further improve the computational performance of tensor process units, we have reexamined the issue of computation reuse that was previously overlooked in existing architectures. As a result, we propose a novel EN-T architecture that can reduce chip area and power consumption. Furthermore, our method is compatible with existing tensor processing units. We evaluated our method on prevalent microarchitectures, the results demonstrate an average improvement in area efficiency of 8.7\%, 12.2\%, and 11.0\% for tensor computing units at computational scales of 256 GOPS, 1 TOPS, and 4 TOPS, respectively. Similarly, there were energy efficiency enhancements of 13.0\%, 17.5\%, and 15.5\%.

en cs.AR
arXiv Open Access 2024
Analyzing and Improving Hardware Modeling of Accel-Sim

Rodrigo Huerta, Mojtaba Abaie Shoushtary, Antonio González

GPU architectures have become popular for executing general-purpose programs. Their many-core architecture supports a large number of threads that run concurrently to hide the latency among dependent instructions. In modern GPU architectures, each SM/core is typically composed of several sub-cores, where each sub-core has its own independent pipeline. Simulators are a key tool for investigating novel concepts in computer architecture. They must be performance-accurate and have a proper model related to the target hardware to explore the different bottlenecks properly. This paper presents a wide analysis of different parts of Accel-sim, a popular GPGPU simulator, and some improvements of its model. First, we focus on the front-end and developed a more realistic model. Then, we analyze the way the result bus works and develop a more realistic one. Next, we describe the current memory pipeline model and propose a model for a more cost-effective design. Finally, we discuss other areas of improvement of the simulator.

en cs.AR
DOAJ Open Access 2023
Limitation of the single-domain numerical approach: Comparisons of analytical and numerical solutions for a forced convection heat transfer problem in a composite duct

Andrey V. Kuznetsov

The aim of this paper is to establish the bounds of applicability of the single-domain numerical approach for computations of convection in composite porous/ fluid domains. The large number of papers that have utilized this numerical approach motivates this research. The popularity of this approach is due to the simplicity of its numerical formulation. Since the utilization of the single-domain numerical approach does not require the explicit imposing of any boundary conditions at the porous/ fluid interface, the aim of the this research is to investigate whether this method always produces accurate numerical solutions.

Computer engineering. Computer hardware, Mechanics of engineering. Applied mechanics
DOAJ Open Access 2023
Stability of a circular ring in postcritical equilibrium states with two deformation-dependent loads and geometrical imperfections

Imre Kozák, Tamas Szabó

The circular ring is linearly elastic and its cross-section is rectangular. Two deformation dependent distributed loads, that is follower loads, are applied simultaneously on the outer surface of the ring. The first load is a uniform pressure on the whole outer surface. The second load is uniform normal traction exerted on two surface parts situated in axially symmetric positions. Both loads are selfequilibrated independently from each other. A nonlinear FE program with 3D elements is used for the numerical analysis of a geometrically perfect and two imperfect rings. Displacement control is used in the equilibrium iterations. Equilibrium surfaces are determined in the space of three parameters such as one characteristic displacement coordinate, and two load factors. The stability analysis is performed in the knowledge of the equilibrium surfaces.

Computer engineering. Computer hardware, Mechanics of engineering. Applied mechanics
arXiv Open Access 2023
Elementary Quantum Recursion Schemes That Capture Quantum Polylogarithmic Time Computability of Quantum Functions

Tomoyuki Yamakami

Quantum computing has been studied over the past four decades based on two computational models of quantum circuits and quantum Turing machines. To capture quantum polynomial-time computability, a new recursion-theoretic approach was taken lately by Yamakami [J. Symb. Logic 80, pp.~1546--1587, 2020] by way of recursion schematic definition, which constitutes six initial quantum functions and three construction schemes of composition, branching, and multi-qubit quantum recursion. By taking a similar approach, we look into quantum polylogarithmic-time computability and further explore the expressing power of elementary schemes designed for such quantum computation. In particular, we introduce an elementary form of the quantum recursion, called the fast quantum recursion, and formulate $EQS$ (elementary quantum schemes) of ``elementary'' quantum functions. This class $EQS$ captures exactly quantum polylogarithmic-time computability, which forms the complexity class BQPOLYLOGTIME. We also demonstrate the separation of BQPOLYLOGTIME from NLOGTIME and PPOLYLOGTIME. As a natural extension of $EQS$, we further consider an algorithmic procedural scheme that implements the well-known divide-and-conquer strategy. This divide-and-conquer scheme helps compute the parity function but the scheme cannot be realized within our system $EQS$.

en cs.CC, quant-ph
arXiv Open Access 2023
Optimizing Distributed Networking with Big Data Scheduling and Cloud Computing

Wenbo Zhu

With the rapid transformation of computer hardware and algorithms, mobile networking has evolved from low data carrying capacity and high latency to better-optimized networks, either by enhancing the digital network or using different approaches to reduce network traffic. This paper discusses the big data applications and scheduling in the distributed networking and analyzes the opportunities and challenges of data management systems. The analysis shows that the big data scheduling in the cloud computing environment produces the most efficient way to transfer and synchronize data. Since scheduling problems and cloud models are very complex to analyze in different settings, we set it to the typical software defined networks. The development of cloud management models and coflow scheduling algorithm is proved to be the priority of the digital communications and networks development in the future.

en cs.NI, cs.DC
S2 Open Access 2022
Software Engineering Approaches for TinyML based IoT Embedded Vision: A Systematic Literature Review

Shashank Bangalore Lakshman, Nasir U. Eisty

Internet of Things (IoT) has catapulted human ability to control our environments through ubiquitous sensing, communication, computation, and actuation. Over the past few years, IoT has joined forces with Machine Learning (ML) to embed deep intelligence at the far edge. TinyML (Tiny Machine Learning) has enabled the deployment of ML models for embedded vision on extremely lean edge hardware, bringing the power of IoT and ML together. However, TinyML powered embedded vision applications are still in a nascent stage, and they are just starting to scale to widespread real-world IoT deployment. To harness the true potential of IoT and ML, it is necessary to provide product developers with robust, easy-to-use software engineering (SE) frameworks and best practices that are customized for the unique challenges faced in TinyML engineering. Through this systematic literature review, we aggregated the key challenges reported by TinyML developers and identified state-of-art SE approaches in large-scale Computer Vision, Machine Learning, and Embedded Systems that can help address key challenges in TinyML based IoT embedded vision. In summary, our study draws synergies between SE expertise that embedded systems developers and ML developers have independently developed to help address the unique challenges in the engineering of TinyML based IoT embedded vision.

14 sitasi en Computer Science

Halaman 6 dari 424995