Hasil "Computer engineering. Computer hardware"

CrossRef Open Access 2025

A Double Approximate Neural Hardware Accelerator

Samira Nazari, Ali Azarpeyvand, Tara Ghasempouri

en

DOAJ Open Access 2025

Improved Harmonic Performance in a Reduced Switch Matrix Converter with Space Vector Modulation

Abdul Sattar Larik, Abdul Hameed Soomro, saad Khan Baloch et al.

A new design of three phase modified matrix converter based on use of fewer semiconductor switches has been introduced in this study with an aim of improving efficiency, simplify circuit architecture and reduced overall cost of implementation. Traditional direct matrix converters (DMCs) typically need 18 bidirectional switches and net result is complicated control programs and elevated losses of power. Reduced-switch topology conversion of AC to AC will save the DC intermediate link in the architecture and simplifies the architecture, offering higher reliability. With a low number of active components, the design is capable of less switching losses and complexity of hardware. The different converters proposed, standard voltage source inverters (VSIs), current source inverters (CSIs), and typical DMCs are thoroughly compared. The key performance indicator total harmonic distortion (THD), system efficiency, control complexity. Simulation results show that the new structure has the same level of performance as the current solutions but possesses all the major benefits of compactness and ease of management, which makes it applicable in motor drives, renewable systems, and other space-limited applications.

Electronic computers. Computer science, Computer engineering. Computer hardware

Detail DOI Sumber

arXiv Open Access 2025

Quantum Computing Architecture and Hardware for Engineers -- Step by Step -- Volume II

Hiu Yung Wong

After publishing my book "Quantum Computing Architecture and Hardware for Engineers: Step by Step" [1] (now I call it Volume I), in which spin qubit and superconducting qubit quantum computers were covered, I decided to continue to write the second volume to cover the trapped ion qubit quantum computer, which was also taught in my EE274 class. I follow the same structure as in Volume I by discussing the physics, mathematics, and their connection to laser pulses and electronics based on how they fulfill the five DiVincenzo's criteria. I also think it would be a good idea to share the second volume on arXiv so that more people can read it for free, and I can continue to update the contents. As of July 2025, I have finished the trapped ion quantum computer part. In the future, I plan to write more critical topics in a step-by-step manner to bridge engineers who did not receive rigorous training in Physics to the quantum computing world.

en quant-ph

Detail Sumber

arXiv Open Access 2025

Ten Simple Rules for Catalyzing Collaborations and Building Bridges between Research Software Engineers and Software Engineering Researchers

Nasir U. Eisty, Jeffrey C. Carver, Johanna Cohoon et al.

In the evolving landscape of scientific and scholarly research, effective collaboration between Research Software Engineers (RSEs) and Software Engineering Researchers (SERs) is pivotal for advancing innovation and ensuring the integrity of computational methodologies. This paper presents ten strategic guidelines aimed at fostering productive partnerships between these two distinct yet complementary communities. The guidelines emphasize the importance of recognizing and respecting the cultural and operational differences between RSEs and SERs, proactively initiating and nurturing collaborations, and engaging within each other's professional environments. They advocate for identifying shared challenges, maintaining openness to emerging problems, ensuring mutual benefits, and serving as advocates for one another. Additionally, the guidelines highlight the necessity of vigilance in monitoring collaboration dynamics, securing institutional support, and defining clear, shared objectives. By adhering to these principles, RSEs and SERs can build synergistic relationships that enhance the quality and impact of research outcomes.

en cs.SE

Detail DOI Sumber

arXiv Open Access 2025

Unitho: A Unified Multi-Task Framework for Computational Lithography

Qian Jin, Yumeng Liu, Yuqi Jiang et al.

Reliable, generalizable data foundations are critical for enabling large-scale models in computational lithography. However, essential tasks-mask generation, rule violation detection, and layout optimization-are often handled in isolation, hindered by scarce datasets and limited modeling approaches. To address these challenges, we introduce Unitho, a unified multi-task large vision model built upon the Transformer architecture. Trained on a large-scale industrial lithography simulation dataset with hundreds of thousands of cases, Unitho supports end-to-end mask generation, lithography simulation, and rule violation detection. By enabling agile and high-fidelity lithography simulation, Unitho further facilitates the construction of robust data foundations for intelligent EDA. Experimental results validate its effectiveness and generalizability, with performance substantially surpassing academic baselines.

en cs.LG

Detail Sumber

arXiv Open Access 2024

Multi-Objective Hardware Aware Neural Architecture Search using Hardware Cost Diversity

Nilotpal Sinha, Peyman Rostami, Abd El Rahman Shabayek et al.

Hardware-aware Neural Architecture Search approaches (HW-NAS) automate the design of deep learning architectures, tailored specifically to a given target hardware platform. Yet, these techniques demand substantial computational resources, primarily due to the expensive process of assessing the performance of identified architectures. To alleviate this problem, a recent direction in the literature has employed representation similarity metric for efficiently evaluating architecture performance. Nonetheless, since it is inherently a single objective method, it requires multiple runs to identify the optimal architecture set satisfying the diverse hardware cost constraints, thereby increasing the search cost. Furthermore, simply converting the single objective into a multi-objective approach results in an under-explored architectural search space. In this study, we propose a Multi-Objective method to address the HW-NAS problem, called MO-HDNAS, to identify the trade-off set of architectures in a single run with low computational cost. This is achieved by optimizing three objectives: maximizing the representation similarity metric, minimizing hardware cost, and maximizing the hardware cost diversity. The third objective, i.e. hardware cost diversity, is used to facilitate a better exploration of the architecture search space. Experimental results demonstrate the effectiveness of our proposed method in efficiently addressing the HW-NAS problem across six edge devices for the image classification task.

en cs.LG, cs.AI

Detail Sumber

arXiv Open Access 2024

Emulating a computing grid in a local environment for feature evaluation

Jananga Kalawana, Malith Dilshan, Kaveesha Dinamidu et al.

The necessity for complex calculations in high-energy physics and large-scale data analysis has led to the development of computing grids, such as the ALICE computing grid at CERN. These grids outperform traditional supercomputers but present challenges in directly evaluating new features, as changes can disrupt production operations and require comprehensive assessments, entailing significant time investments across all components. This paper proposes a solution to this challenge by introducing a novel approach for emulating a computing grid within a local environment. This emulation, resembling a mini clone of the original computing grid, encompasses its essential components and functionalities. Local environments provide controlled settings for emulating grid components, enabling researchers to evaluate system features without impacting production environments. This investigation contributes to the evolving field of computing grids and distributed systems, offering insights into the emulation of a computing grid in a local environment for feature evaluation.

en cs.DC

Detail Sumber

DOAJ Open Access 2023

Analog Ion‐Slicing LiNbO3 Memristor Based on Hopping Transport for Neuromorphic Computing

Jiejun Wang, Huizhong Zeng, Yiduo Xie et al.

Inspired by human brain, the emerging analog‐type memristor employed in neuromorphic computing systems has attracted tremendous interest. However, existing analog memristors are still far from accurate tuning of multiple conductance states, which are crucial from the device‐level view. Herein, a reliable analog memristor based on ion‐slicing single‐crystalline LiNbO3 (LNO) thin film is demonstrated. The highly ordered LNO crystal structure provides a stable pathway of oxygen vacancy migration, which is contributed to a stable Mott variable‐range hopping process in trap sites. Excellent analog switching characteristics with high reliability and repeatability, including long retention/great endurance with small fluctuation (fluctuated within 0.22%), a large dynamic range of two orders of magnitude, hundreds of distinguishable conductance states with tunable linearity, and ultralow cyclic variances for multiple weight updating (down to 0.75%), are realized with the proposed memristor. As a result, a multilayer perceptron with a high recognition accuracy of 95.6% for Modified National Institute of Standards and Technology dataset is realized. The proposed analog memristive devices based on ion‐slicing single‐crystalline thin films offer a novel strategy for fabricating high‐performance memristors that combined linear tunability and long‐term repeatability, opening a novel avenue for neuromorphic computing application.

Computer engineering. Computer hardware, Control engineering systems. Automatic machinery (General)

Detail DOI Sumber

DOAJ Open Access 2023

Evaluation of the threshold values for the propagation of a fatigue crack starting at a V-notch

Jan Klusák, Zdeněk Knésl

This paper presents a simple method for evaluating the threshold value for fatigue cracks that emanate from a V-notch. The proposed method is based on the similarities between the elastic-stress fields around the tip of a crack and the tip of a V-notch. Threshold values for fatigue cracks that emanate from a V-notch are expressed by means of the threshold value for the propagation of a high-cycle-fatigue crack and the opening angle of the V-notch. The corresponding calculations were performed by the finite-element method.

Computer engineering. Computer hardware, Mechanics of engineering. Applied mechanics

Detail Sumber

DOAJ Open Access 2023

Modeling and performance study of CZTS solar cell with novel cupric oxide (CuO) as a bilayer absorber

A. A. Md. Monzur-Ul-Akhir, Saiful Islam, Md. Touhidul Imam et al.

A Kesterite material like CZTS provides the steering to the researcher with their tunable bandgap and high optical coefficient above 104 cm−1 for solar cells. These features make it a suitable material for a single junction solar cell increasing the acceptance as well. In this paper, comparative numerical simulations were performed on a regular base structure of CZTS absorber layer with a CdS buffer layer, a ZnO window layer, and a transparent n-ITO conducting layer with a proposed structure where CZTS absorber layer is replaced by a CZTS and CuO bi-layer using SCAPS-1D software to optimize the efficiency. In addition to that the thickness, defect densities and doping concentrations of the absorber layers and temperature were varied to observe the responses of open-circuit voltage (VOC), short-circuit current (JSC), fill factor (FF) and efficiency (η) of the solar cell. Among the three basic researchs on lost mechanism for kesterite materials, we have focused on improving the back contact interface recombination through an absorber bi-layer combination of CZTS and CuO resulting in increased VOC, Quantum efficiency and carrier generation efficiency approximately by 50 %, 8.94 %, and 34 % respectively, elevating the efficiency of the proposed structure to 19.92 %.

Electric apparatus and materials. Electric circuits. Electric networks, Computer engineering. Computer hardware

Detail DOI Sumber

arXiv Open Access 2023

Software engineering to sustain a high-performance computing scientific application: QMCPACK

William F. Godoy, Steven E. Hahn, Michael M. Walsh et al.

We provide an overview of the software engineering efforts and their impact in QMCPACK, a production-level ab-initio Quantum Monte Carlo open-source code targeting high-performance computing (HPC) systems. Aspects included are: (i) strategic expansion of continuous integration (CI) targeting CPUs, using GitHub Actions runners, and NVIDIA and AMD GPUs in pre-exascale systems, using self-hosted hardware; (ii) incremental reduction of memory leaks using sanitizers, (iii) incorporation of Docker containers for CI and reproducibility, and (iv) refactoring efforts to improve maintainability, testing coverage, and memory lifetime management. We quantify the value of these improvements by providing metrics to illustrate the shift towards a predictive, rather than reactive, sustainable maintenance approach. Our goal, in documenting the impact of these efforts on QMCPACK, is to contribute to the body of knowledge on the importance of research software engineering (RSE) for the sustainability of community HPC codes and scientific discovery at scale.

en cs.SE, cs.DC

Detail DOI Sumber

arXiv Open Access 2023

Towards a Better Understanding of the Computer Vision Research Community in Africa

Abdul-Hakeem Omotayo, Mai Gamal, Eman Ehab et al.

Computer vision is a broad field of study that encompasses different tasks (e.g., object detection). Although computer vision is relevant to the African communities in various applications, yet computer vision research is under-explored in the continent and constructs only 0.06% of top-tier publications in the last ten years. In this paper, our goal is to have a better understanding of the computer vision research conducted in Africa and provide pointers on whether there is equity in research or not. We do this through an empirical analysis of the African computer vision publications that are Scopus indexed, where we collect around 63,000 publications over the period 2012-2022. We first study the opportunities available for African institutions to publish in top-tier computer vision venues. We show that African publishing trends in top-tier venues over the years do not exhibit consistent growth, unlike other continents such as North America or Asia. Moreover, we study all computer vision publications beyond top-tier venues in different African regions to find that mainly Northern and Southern Africa are publishing in computer vision with 68.5% and 15.9% of publications, resp. Nonetheless, we highlight that both Eastern and Western Africa are exhibiting a promising increase with the last two years closing the gap with Southern Africa. Additionally, we study the collaboration patterns in these publications to find that most of these exhibit international collaborations rather than African ones. We also show that most of these publications include an African author that is a key contributor as the first or last author. Finally, we present the most recurring keywords in computer vision publications per African region.

en cs.CV

Detail Sumber

DOAJ Open Access 2022

Research on Text Representation of Video Content Based on Multi-Modal Fusion and Multi-Layer Attention

ZHAO Hong, GUO Lan, CHEN Zhiwen, ZHENG Houze

Aiming at the challenges of single-text representation and low accuracy of existing video content text-representation models, a video content text-reprsentation model that integrates frame-level image and audio information is proposed.The network structure of the model includes a single-mode embedding layer based on a self attention mechanism, and learns single-mode feature parameters.Two schemes, joint-representation and cooperative-representation, are adopted to fuse high-dimensional feature vectors output from the single-mode embedding layer, so that the model can focus on different objects in the video and their interaction, thereby generating richer and more accurate video text representation.The model is pretrained through large-scale datasets, and representation information, such as video frames and audio carried by the video, are extracted and sent to the coder to realize the text representation of the video content.The experimental results on MSR-VTT and LSMDC datasets show that the BLEU4, METEOR, ROUGEL, and CIDEr scores of the proposed model are 0.386, 0.250, 0.609 and 0.463 respectively.Compared with the model released by the IIT DeIhi in the MSR-VTT challenge, the proposed model improves the indexes above by 0.082, 0.037, 0.115 and 0.257 respectively.The model in this study can effectively improve the accuracy of the video content text-representation model.

Computer engineering. Computer hardware, Computer software

Detail DOI Sumber

DOAJ Open Access 2022

Investigating the Impact of Consensus Algorithm on Scalability in Blockchain Systems

Kashif Mehboob Khan, Muhammad Abdullah Hayat, Rana Muhammad Ibrahim

In the current era, blockchain has emerged as one the best and promising technology. All the cryptocurrencies have also gained a lot of popularity around the globe which are based on blockchain technology. Blockchain provides a distributed architecture, in which transactions are verified by different validators using different algorithms and then are stored in distributed ledger. The verification of transactions is done using consensus algorithms which verifies that incoming transaction is correct and reliable by different distributed nodes working in a peer-to-peer network. Consensus algorithms ensure the integrity and security of blockchain. There are various types of consensus algorithms used in blockchain technology which are used depending on the architecture and usage, some of the consensus algorithms are Proof of Work (PoW), Proof of Stake (PoS) etc. The Proof of Work algorithm is most widely used across the globe by the community. It is used by many popular cryptocurrency networks like Litecoin and Bitcoin. It requires larger computation power while verifying transactions. The selection of a consensus algorithm is one the most important parts of blockchain, as the consensus mechanism is considered to be the core of a network. It is easier to predict and guarantee the security, reliability, fault tolerance, and recoverability of the system if the correct consensus protocol is selected. A single algorithm can never fulfill all the requirements, there is always a tradeoff in the selection of consensus algorithms. Therefore, it is very important to select the best suited consensus algorithm for the network as the consensus mechanism validates transactions without any third-party platform and prevents malicious activities in the network. This paper investigates the comparison among types of consensus algorithms and their effectiveness and viability.

Electronic computers. Computer science, Computer engineering. Computer hardware

Detail Sumber

arXiv Open Access 2022

Precise Energy Consumption Measurements of Heterogeneous Artificial Intelligence Workloads

René Caspart, Sebastian Ziegler, Arvid Weyrauch et al.

With the rise of AI in recent years and the increase in complexity of the models, the growing demand in computational resources is starting to pose a significant challenge. The need for higher compute power is being met with increasingly more potent accelerators and the use of large compute clusters. However, the gain in prediction accuracy from large models trained on distributed and accelerated systems comes at the price of a substantial increase in energy demand, and researchers have started questioning the environmental friendliness of such AI methods at scale. Consequently, energy efficiency plays an important role for AI model developers and infrastructure operators alike. The energy consumption of AI workloads depends on the model implementation and the utilized hardware. Therefore, accurate measurements of the power draw of AI workflows on different types of compute nodes is key to algorithmic improvements and the design of future compute clusters and hardware. To this end, we present measurements of the energy consumption of two typical applications of deep learning models on different types of compute nodes. Our results indicate that 1. deriving energy consumption directly from runtime is not accurate, but the consumption of the compute node needs to be considered regarding its composition; 2. neglecting accelerator hardware on mixed nodes results in overproportional inefficiency regarding energy consumption; 3. energy consumption of model training and inference should be considered separately - while training on GPUs outperforms all other node types regarding both runtime and energy consumption, inference on CPU nodes can be comparably efficient. One advantage of our approach is that the information on energy consumption is available to all users of the supercomputer, enabling an easy transfer to other workloads alongside a raise in user-awareness of energy consumption.

en cs.DC, cs.AI

Detail Sumber

DOAJ Open Access 2021

An optimized knight traversal technique to detect multiple faults and Module Sequence Graph based reconfiguration of microfluidic biochip

Basudev Saha, Mukta Majumder

Abstract Conventional biomedical analysers are replaced by digital microfluidic biochips and they are adequate to integrate different biomedical functions, essential for diverse bioassay operations. From the last decade, microfluidic biochips are getting plenty of acceptances in the field of miscellaneous healthcare sectors like DNA analysis, drug discovery and clinical diagnosis. These devices are also bearing a vital role in the area of safety critical applications such as food safety testing, air quality monitoring etc. As these devices are used in safety critical applications, clinical diagnosis and real‐time biomolecular assay operations, these must have properties like precision, reliability and robustness. To accept it for discriminating purposes, the microfluidic device must endorse its preciseness and strength by following sublime testing strategy. Here, an optimized droplet traversal technique is proposed to investigate the multiple defective electrodes of a digital microfluidic biochip by embedding boundary cum row traversal and KNIGHT traversal procedure (based on the famous Knight Tour Problem). The proposed approach also enumerates the traversal time for a fault‐free biochip. In addition to identifying the faulty electrodes, a Module Sequencing Graph based reconfiguration technique is proposed here to reinstate the device for normal bioassay operation.

Computer engineering. Computer hardware, Electronic computers. Computer science

Detail DOI Sumber

DOAJ Open Access 2021

Energy Efficiency for Green Internet of Things (IoT) Networks: A Survey

Laith Farhan, Rasha Subhi Hameed, Asraa Safaa Ahmed et al.

The last decade has witnessed the rise of the proliferation of Internet-enabled devices. The Internet of Things (IoT) is becoming ever more pervasive in everyday life, connecting an ever-greater array of diverse physical objects. The key vision of the IoT is to bring a massive number of smart devices together in integrated and interconnected heterogeneous networks, making the Internet even more useful. Therefore, this paper introduces a brief introduction to the history and evolution of the Internet. Then, it presents the IoT, which is followed by a list of application domains and enabling technologies. The wireless sensor network (WSN) is revealed as one of the important elements in IoT applications, and the paper describes the relationship between WSNs and the IoT. This research is concerned with developing energy-efficiency techniques for WSNs that enable the IoT. After having identified sources of energy wastage, this paper reviews the literature that discusses the most relevant methods to minimizing the energy exhaustion of IoT and WSNs. We also identify the gaps in the existing literature in terms of energy preservation measures that could be researched and it can be considered in future works. The survey gives a near-complete and up-to-date view of the IoT in the energy field. It provides a summary and recommendations of a large range of energy-efficiency methods proposed in the literature that will help and support future researchers. Please note that the manuscript is an extended version and based on the summary of the Ph.D. thesis. This paper will give to the researchers an introduction to what they need to know and understand about the networks, WSNs, and IoT applications from scratch. Thus, the fundamental purpose of this paper is to introduce research trends and recent work on the use of IoT technology and the conclusion that has been reached as a result of undertaking the Ph.D. study.

Computer engineering. Computer hardware, Electronic computers. Computer science

Detail DOI Sumber

arXiv Open Access 2021

Reinforcement Learning for Load-balanced Parallel Particle Tracing

Jiayi Xu, Hanqi Guo, Han-Wei Shen et al.

We explore an online reinforcement learning (RL) paradigm to dynamically optimize parallel particle tracing performance in distributed-memory systems. Our method combines three novel components: (1) a work donation algorithm, (2) a high-order workload estimation model, and (3) a communication cost model. First, we design an RL-based work donation algorithm. Our algorithm monitors workloads of processes and creates RL agents to donate data blocks and particles from high-workload processes to low-workload processes to minimize program execution time. The agents learn the donation strategy on the fly based on reward and cost functions designed to consider processes' workload changes and data transfer costs of donation actions. Second, we propose a workload estimation model, helping RL agents estimate the workload distribution of processes in future computations. Third, we design a communication cost model that considers both block and particle data exchange costs, helping RL agents make effective decisions with minimized communication costs. We demonstrate that our algorithm adapts to different flow behaviors in large-scale fluid dynamics, ocean, and weather simulation data. Our algorithm improves parallel particle tracing performance in terms of parallel efficiency, load balance, and costs of I/O and communication for evaluations with up to 16,384 processors.

en cs.GR, cs.AI

Detail DOI Sumber

arXiv Open Access 2021

Fast Robust Tensor Principal Component Analysis via Fiber CUR Decomposition

HanQin Cai, Zehan Chao, Longxiu Huang et al.

We study the problem of tensor robust principal component analysis (TRPCA), which aims to separate an underlying low-multilinear-rank tensor and a sparse outlier tensor from their sum. In this work, we propose a fast non-convex algorithm, coined Robust Tensor CUR (RTCUR), for large-scale TRPCA problems. RTCUR considers a framework of alternating projections and utilizes the recently developed tensor Fiber CUR decomposition to dramatically lower the computational complexity. The performance advantage of RTCUR is empirically verified against the state-of-the-arts on the synthetic datasets and is further demonstrated on the real-world application such as color video background subtraction.

en cs.LG, cs.CV

Detail Sumber

DOAJ Open Access 2020

CNN based lane detection with instance segmentation in edge-cloud computing

Wei Wang, Hui Lin, Junshu Wang

Abstract At present, the number of vehicle owners is increasing, and the cars with autonomous driving functions have attracted more and more attention. The lane detection combined with cloud computing can effectively solve the drawbacks of traditional lane detection relying on feature extraction and high definition, but it also faces the problem of excessive calculation. At the same time, cloud data processing combined with edge computing can effectively reduce the computing load of the central nodes. The traditional lane detection method is improved, and the current popular convolutional neural network (CNN) is used to build a dual model based on instance segmentation. In the image acquisition and processing processes, the distributed computing architecture provided by edge-cloud computing is used to improve data processing efficiency. The lane fitting process generates a variable matrix to achieve effective detection in the scenario of slope change, which improves the real-time performance of lane detection. The method proposed in this paper has achieved good recognition results for lanes in different scenarios, and the lane recognition efficiency is much better than other lane recognition models.

Computer engineering. Computer hardware, Electronic computers. Computer science

Detail DOI Sumber

Hasil untuk "Computer engineering. Computer hardware"