Hasil "Computer engineering. Computer hardware"

arXiv Open Access 2026

InterPUF: Distributed Authentication via Physically Unclonable Functions and Multi-party Computation for Reconfigurable Interposers

Ishraq Tashdid, Tasnuva Farheen, Sazadur Rahman

Modern system-in-package (SiP) platforms increasingly adopt reconfigurable interposers to enable plug-and-play chiplet integration across heterogeneous multi-vendor ecosystems. However, this flexibility introduces severe trust challenges, as traditional authentication schemes fail to scale or adapt in decentralized, post-fabrication programmable environments. This paper presents InterPUF, a compact and scalable authentication framework that transforms the interposer into a distributed root of trust. InterPUF embeds a route-based differential delay physically unclonable function (PUF) across the reconfigurable interconnect and secures authentication using multi-party computation (MPC), ensuring raw PUF signatures are never exposed. Our hardware evaluation shows only 0.23% area and 0.072% power overhead across diverse chiplets while preserving authentication latency within tens of nanoseconds. Simulation results using pyPUF confirm strong uniqueness, reliability, and modeling resistance under process, voltage, and temperature variations. By combining interposer-resident PUF primitives with cryptographic hashing and collaborative verification, InterPUF enforces a minimal-trust authentication model without relying on a centralized anchor.

en cs.CR, cs.AR

Detail Sumber

arXiv Open Access 2026

Designing and Implementing a Comprehensive Research Software Engineer Career Ladder: A Case Study from Princeton University

Ian A. Cosden, Elizabeth Holtz, Joel U. Bretheim

Research Software Engineers (RSEs) have become indispensable to computational research and scholarship. The fast rise of RSEs in higher education and the trend of universities to be slow creating or adopting models for new technology roles means a lack of structured career pathways that recognize technical mastery, scholarly impact, and leadership growth. In response to an immense demand for RSEs at Princeton University, and dedicated funding to grow the RSE group at least two-fold, Princeton was forced to strategize how to cohesively define job descriptions to match the rapid hiring of RSE positions but with enough flexibility to recognize the unique nature of each individual position. This case study describes our design and implementation of a comprehensive RSE career ladder spanning Associate through Principal levels, with parallel team-lead and managerial tracks. We outline the guiding principles, competency framework, Human Resources (HR) alignment, and implementation process, including engagement with external consultants and mapping to a standard job leveling framework utilizing market benchmarks. We share early lessons learned and outcomes including improved hiring efficiency, clearer promotion pathways, and positive reception among staff.

en cs.SE

Detail Sumber

CrossRef Open Access 2025

Implementation: Hardware Prototype, Simulation and Testing

Igor Schagaev

en

Detail DOI Sumber

DOAJ Open Access 2025

Generalizations of ChiChi: Families of Low-Latency Permutations in Any Even Dimension

Samuele Andreoli, Gregor Leander, Enrico Piccione et al.

At Eurocrypt’25, Belkheyar et al. introduced a new non-linear layer called ChiChi or χχ. ChiChi is built based on Daemen’s χ function, but crucially gives a permutation in even dimension divisible by four. In this work, we generalize their construction in multiple ways, prove their open conjecture regarding the algebraic degree of the inverse of ChiChi, and investigate the properties of it against key recovery attacks.

Computer engineering. Computer hardware

Detail DOI Sumber

DOAJ Open Access 2024

Passive Non-Line-of-Sight Imaging Based on Diffuse Reflection

WU Cuicui, WANG Weidong

Non-Line-of-Sight (NLOS) imaging, which combines imaging and computational reconstruction, describes the reconstruction of hidden scenes in a medium by capturing scattered or reflected information without directly imaging the scene. NLOS imaging is still in the early stages of its development, and systematic research methods for scene modeling and target information reconstruction are lacking. To address these issues, an NLOS imaging method for unobstructed and non-self-luminous scenes is proposed. Based on optical radiation theory, the relationship between the imaging of diffuse reflection surfaces in the scene and the shape of hidden objects is analyzed to determine the NLOS imaging model and reconstruction targets. A Diffuse reflection full-Shadow passive NLOS (DS-NLOS) dataset that resembles physical reality is generated by combining a rendering software with the Motion Picture Experts Group 7 (MPEG7) dataset . A passive NLOS Reconstruction network model (Re-NLOS) is constructed using a Visual Transformer (ViT) structure in combination with a Generative Adversarial Network (GAN) to extract global features from captured diffuse reflection surface images and recover the shape of hidden objects. Experimental results on the DS-NLOS dataset demonstrate that this method can recover the shape information of hidden objects from diffusely reflected surfaces. In comparison with the diffuse reflection full-shadow images, the average Peak Signal-to-Noise Ratio (PSNR) for 20 object categories in the present test set is increased by 5.85 dB, and the average Structural SIMilarity (SSIM ) is increased by 0.038 1. This method also demonstrates restore capabilities in real indoor scenes.

Computer engineering. Computer hardware, Computer software

Detail DOI Sumber

arXiv Open Access 2024

Exploring the Effect of Dataset Diversity in Self-Supervised Learning for Surgical Computer Vision

Tim J. M. Jaspers, Ronald L. P. D. de Jong, Yasmina Al Khalil et al.

Over the past decade, computer vision applications in minimally invasive surgery have rapidly increased. Despite this growth, the impact of surgical computer vision remains limited compared to other medical fields like pathology and radiology, primarily due to the scarcity of representative annotated data. Whereas transfer learning from large annotated datasets such as ImageNet has been conventionally the norm to achieve high-performing models, recent advancements in self-supervised learning (SSL) have demonstrated superior performance. In medical image analysis, in-domain SSL pretraining has already been shown to outperform ImageNet-based initialization. Although unlabeled data in the field of surgical computer vision is abundant, the diversity within this data is limited. This study investigates the role of dataset diversity in SSL for surgical computer vision, comparing procedure-specific datasets against a more heterogeneous general surgical dataset across three different downstream surgical applications. The obtained results show that using solely procedure-specific data can lead to substantial improvements of 13.8%, 9.5%, and 36.8% compared to ImageNet pretraining. However, extending this data with more heterogeneous surgical data further increases performance by an additional 5.0%, 5.2%, and 2.5%, suggesting that increasing diversity within SSL data is beneficial for model performance. The code and pretrained model weights are made publicly available at https://github.com/TimJaspers0801/SurgeNet.

en cs.CV

Detail DOI Sumber

arXiv Open Access 2024

Back-Projection Diffusion: Solving the Wideband Inverse Scattering Problem with Diffusion Models

Borong Zhang, Martín Guerra, Qin Li et al.

We present Wideband Back-Projection Diffusion, an end-to-end probabilistic framework for approximating the posterior distribution induced by the inverse scattering map from wideband scattering data. This framework produces highly accurate reconstructions, leveraging conditional diffusion models to draw samples, and also honors the symmetries of the underlying physics of wave-propagation. The procedure is factored into two steps: the first step, inspired by the filtered back-propagation formula, transforms data into a physics-based latent representation, while the second step learns a conditional score function conditioned on this latent representation. These two steps individually obey their associated symmetries and are amenable to compression by imposing the rank structure found in the filtered back-projection formula. Empirically, our framework has both low sample and computational complexity, with its number of parameters scaling only sub-linearly with the target resolution, and has stable training dynamics. It provides sharp reconstructions effortlessly and is capable of recovering even sub-Nyquist features in the multiple-scattering regime.

en cs.LG, math.NA

Detail DOI Sumber

arXiv Open Access 2024

LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity

Hongjie Wang, Chih-Yao Ma, Yen-Cheng Liu et al.

Text-to-video generation enhances content creation but is highly computationally intensive: The computational cost of Diffusion Transformers (DiTs) scales quadratically in the number of pixels. This makes minute-length video generation extremely expensive, limiting most existing models to generating videos of only 10-20 seconds length. We propose a Linear-complexity text-to-video Generation (LinGen) framework whose cost scales linearly in the number of pixels. For the first time, LinGen enables high-resolution minute-length video generation on a single GPU without compromising quality. It replaces the computationally-dominant and quadratic-complexity block, self-attention, with a linear-complexity block called MATE, which consists of an MA-branch and a TE-branch. The MA-branch targets short-to-long-range correlations, combining a bidirectional Mamba2 block with our token rearrangement method, Rotary Major Scan, and our review tokens developed for long video generation. The TE-branch is a novel TEmporal Swin Attention block that focuses on temporal correlations between adjacent tokens and medium-range tokens. The MATE block addresses the adjacency preservation issue of Mamba and improves the consistency of generated videos significantly. Experimental results show that LinGen outperforms DiT (with a 75.6% win rate) in video quality with up to 15$\times$ (11.5$\times$) FLOPs (latency) reduction. Furthermore, both automatic metrics and human evaluation demonstrate our LinGen-4B yields comparable video quality to state-of-the-art models (with a 50.5%, 52.1%, 49.1% win rate with respect to Gen-3, LumaLabs, and Kling, respectively). This paves the way to hour-length movie generation and real-time interactive video generation. We provide 68s video generation results and more examples in our project website: https://lineargen.github.io/.

en cs.CV, cs.AI

Detail Sumber

arXiv Open Access 2024

Hardware-Friendly Implementation of Physical Reservoir Computing with CMOS-based Time-domain Analog Spiking Neurons

Nanako Kimura, Ckristian Duran, Zolboo Byambadorj et al.

This paper introduces an analog spiking neuron that utilizes time-domain information, i.e., a time interval of two signal transitions and a pulse width, to construct a spiking neural network (SNN) for a hardware-friendly physical reservoir computing (RC) on a complementary metal-oxide-semiconductor (CMOS) platform. A neuron with leaky integrate-and-fire is realized by employing two voltage-controlled oscillators (VCOs) with opposite sensitivities to the internal control voltage, and the neuron connection structure is restricted by the use of only 4 neighboring neurons on the 2-dimensional plane to feasibly construct a regular network topology. Such a system enables us to compose an SNN with a counter-based readout circuit, which simplifies the hardware implementation of the SNN. Moreover, another technical advantage thanks to the bottom-up integration is the capability of dynamically capturing every neuron state in the network, which can significantly contribute to finding guidelines on how to enhance the performance for various computational tasks in temporal information processing. Diverse nonlinear physical dynamics needed for RC can be realized by collective behavior through dynamic interaction between neurons, like coupled oscillators, despite the simple network structure. With behavioral system-level simulations, we demonstrate physical RC through short-term memory and exclusive OR tasks, and the spoken digit recognition task with an accuracy of 97.7% as well. Our system is considerably feasible for practical applications and also can be a useful platform for studying the mechanism of physical RC.

en cs.NE, cs.AR

Detail DOI Sumber

arXiv Open Access 2024

Distributed computing for physics-based data-driven reduced modeling at scale: Application to a rotating detonation rocket engine

Ionut-Gabriel Farcas, Rayomand P. Gundevia, Ramakanth Munipalli et al.

High-performance computing (HPC) has revolutionized our ability to perform detailed simulations of complex real-world processes. A prominent contemporary example is from aerospace propulsion, where HPC is used for rotating detonation rocket engine (RDRE) simulations in support of the design of next-generation rocket engines; however, these simulations take millions of core hours even on powerful supercomputers, which makes them impractical for engineering tasks like design exploration and risk assessment. Data-driven reduced-order models (ROMs) aim to address this limitation by constructing computationally cheap yet sufficiently accurate approximations that serve as surrogates for the high-fidelity model. This paper contributes a distributed memory algorithm that achieves fast and scalable construction of predictive physics-based ROMs trained from sparse datasets of extremely large state dimension. The algorithm learns structured physics-based ROMs that approximate the dynamical systems underlying those datasets.This enables model reduction for problems at a scale and complexity that exceeds the capabilities of standard, serial approaches. We demonstrate our algorithm's scalability using up to $2,048$ cores on the Frontera supercomputer at the Texas Advanced Computing Center. We focus on a real-world three-dimensional RDRE for which one millisecond of simulated physical time requires one million core hours on a supercomputer. Using a training dataset of $2,536$ snapshots each of state dimension $76$ million, our distributed algorithm enables the construction of a predictive data-driven reduced model in just $13$ seconds on $2,048$ cores on Frontera.

en math.NA, cs.DC

Detail DOI Sumber

arXiv Open Access 2024

An Exploratory Study on Upper-Level Computing Students' Use of Large Language Models as Tools in a Semester-Long Project

Ben Arie Tanay, Lexy Arinze, Siddhant S. Joshi et al.

Background: Large Language Models (LLMs) such as ChatGPT and CoPilot are influencing software engineering practice. Software engineering educators must teach future software engineers how to use such tools well. As of yet, there have been few studies that report on the use of LLMs in the classroom. It is, therefore, important to evaluate students' perception of LLMs and possible ways of adapting the computing curriculum to these shifting paradigms. Purpose: The purpose of this study is to explore computing students' experiences and approaches to using LLMs during a semester-long software engineering project. Design/Method: We collected data from a senior-level software engineering course at Purdue University. This course uses a project-based learning (PBL) design. The students used LLMs such as ChatGPT and Copilot in their projects. A sample of these student teams were interviewed to understand (1) how they used LLMs in their projects; and (2) whether and how their perspectives on LLMs changed over the course of the semester. We analyzed the data to identify themes related to students' usage patterns and learning outcomes. Results/Discussion: When computing students utilize LLMs within a project, their use cases cover both technical and professional applications. In addition, these students perceive LLMs to be efficient tools in obtaining information and completion of tasks. However, there were concerns about the responsible use of LLMs without being detrimental to their own learning outcomes. Based on our findings, we recommend future research to investigate the usage of LLM's in lower-level computer engineering courses to understand whether and how LLMs can be integrated as a learning aid without hurting the learning outcomes.

en cs.SE, cs.HC

Detail Sumber

arXiv Open Access 2024

Shem: A Hardware-Aware Optimization Framework for Analog Computing Systems

Yu-Neng Wang, Sara Achour

As the demand for efficient data processing escalates, reconfigurable analog hardware which implements novel analog compute paradigms, is promising for energy-efficient computing at the sensing and actuation boundaries. These analog computing platforms embed information in physical properties and then use the physics of materials, devices, and circuits to perform computation. These hardware platforms are more sensitive to nonidealities, such as noise and fabrication variations, than their digital counterparts and accrue high resource costs when programmable elements are introduced. Identifying resource-efficient analog system designs that mitigate these nonidealities is done manually today. While design optimization frameworks have been enormously successful in other fields, such as photonics, they typically either target linear dynamical systems that have closed-form solutions or target a specific differential equation system and then derive the solution through hand analysis. In both cases, time-domain simulation is no longer needed to predict hardware behavior. In contrast, described analog hardware platforms have nonlinear time-evolving dynamics that vary substantially from design to design, lack closed-form solutions, and require the optimizer to consider time explicitly. We present Shem, an optimization framework for analog systems. Shem leverages differentiation methods recently popularized to train neural ODEs to enable the optimization of analog systems that exhibit nonlinear dynamics, noise and mismatch, and discrete behavior. We evaluate Shem on oscillator-based pattern recognizer, CNN edge detector, and transmission-line security primitive design case studies and demonstrate it can improve designs. To our knowledge, the latter two design problems have not been optimized with automated methods before.

en cs.ET, cs.CE

Detail Sumber

DOAJ Open Access 2023

Reinforcement learning based task scheduling for environmentally sustainable federated cloud computing

Zhibao Wang, Shuaijun Chen, Lu Bai et al.

Abstract The significant energy consumption within data centers is an essential contributor to global energy consumption and carbon emissions. Therefore, reducing energy consumption and carbon emissions in data centers plays a crucial role in sustainable development. Traditional cloud computing has reached a bottleneck, primarily due to high energy consumption. The emerging federated cloud approach can reduce the energy consumption and carbon emissions of cloud data centers by leveraging the geographical differences of multiple cloud data centers in a federated cloud. In this paper, we propose Eco-friendly Reinforcement Learning in Federated Cloud (ERLFC), a framework that uses reinforcement learning for task scheduling in a federated cloud environment. ERLFC aims to intelligently consider the state of each data center and effectively harness the variations in energy and carbon emission ratios across geographically distributed cloud data centers in the federated cloud. We build ERLFC using Actor-Critic algorithm, which select the appropriate data center to assign a task based on various factors such as energy consumption, cooling method, waiting time of the task, energy type, emission ratio, and total energy consumption of the current cloud data center and the details of the next task. To demonstrate the effectiveness of ERLFC, we conducted simulations based on real-world task execution data, and the results show that ERLFC can effectively reduce energy consumption and emissions during task execution. In comparison to Round Robin, Random, SO, and GJO algorithms, ERLFC achieves respective reductions of 1.09, 1.08, 1.21, and 1.26 times in terms of energy saving and emission reduction.

Computer engineering. Computer hardware, Electronic computers. Computer science

Detail DOI Sumber

arXiv Open Access 2023

Spatz: Clustering Compact RISC-V-Based Vector Units to Maximize Computing Efficiency

Matteo Perotti, Samuel Riedel, Matheus Cavalcante et al.

The ever-increasing computational and storage requirements of modern applications and the slowdown of technology scaling pose major challenges to designing and implementing efficient computer architectures. To mitigate the bottlenecks of typical processor-based architectures on both the instruction and data sides of the memory, we present Spatz, a compact 64-bit floating-point-capable vector processor based on RISC-V's Vector Extension Zve64d. Using Spatz as the main Processing Element (PE), we design an open-source dual-core vector processor architecture based on a modular and scalable cluster sharing a Scratchpad Memory (SCM). Unlike typical vector processors, whose Vector Register Files (VRFs) are hundreds of KiB large, we prove that Spatz can achieve peak energy efficiency with a latch-based VRF of only 2 KiB. An implementation of the Spatz-based cluster in GlobalFoundries' 12LPP process with eight double-precision Floating Point Units (FPUs) achieves an FPU utilization just 3.4% lower than the ideal upper bound on a double-precision, floating-point matrix multiplication. The cluster reaches 7.7 FMA/cycle, corresponding to 15.7 DP-GFLOPS and 95.7 DP-GFLOPS/W at 1 GHz and nominal operating conditions (TT, 0.80V, 25C), with more than 55% of the power spent on the FPUs. Furthermore, the optimally-balanced Spatz-based cluster reaches a 95.0% FPU utilization (7.6 FMA/cycle), 15.2 DP-GFLOPS, and 99.3 DP-GFLOPS/W (61% of the power spent in the FPU) on a 2D workload with a 7x7 kernel, resulting in an outstanding area/energy efficiency of 171 DP-GFLOPS/W/mm2. At equi-area, the computing cluster built upon compact vector processors reaches a 30% higher energy efficiency than a cluster with the same FPU count built upon scalar cores specialized for stream-based floating-point computation.

en cs.AR

Detail Sumber

CrossRef Open Access 2022

From Computer Hardware to Software

en

Detail DOI Sumber

DOAJ Open Access 2022

Pedestrian Re-Identification Model Combining Semantic Segmentation and Attention Mechanism

ZHOU Dongming, ZHANG Canlong, TANG Yanping, LI Zhixin

Pedestrian identification results are easily affected by pedestrian posture changes, illumination perspective, background transformation and other factors.To reduce such interference, the existing pedestrian re-identification models usually divide the pedestrians in a dataset into several pieces to extract the local features of the image and improve the identification accuracy, but this also presents new problems such as the mismatch between local features of the human body and the loss of contextual clues of non-human parts.In order to solve the above problems, an improved pedestrian re-identification model is proposed.By aligning the local features of the human semantic parsing network, the semantic segmentation model can perform better in modeling arbitrary contours of pedestrians in the image.The local attention network is also used to capture the lost contextual clues of non-human body parts.The experimental results show that the proposed model displays an average accuracy of 83.5% on Market-1501, 80.8% on DukeMTMC, and 92.4% on CUHK03.The Rank-1 value on the DukeMTMC dataset is 90.2%.Compared with the pedestrian re-identification models based on attention mechanism, pedestrian semantic parsing network or Partial Alignment Network(PAN), the proposed model has higher robustness and mobility.

Computer engineering. Computer hardware, Computer software

Detail DOI Sumber

DOAJ Open Access 2021

Research on Game Algorithm for Power and Resource Allocation in Heterogeneous Cellular Network

SUN Chen, ZHANG Bo

Based on Device-to-Device(D2D) network and relay heterogeneous cellular network, resource reuse can be used to improve system performance, but it also complicates interference in the networks.To address the problem, a Power and Resource Allocation Game(PRAG) algorithm is proposed, which performs interference coordination in D2D network and relay heterogeneous cellular network through power control and resource allocation.Optimal transmitting power of D2D and relay links is derived in maximizing a utility function based a cost parameter.Then the optimal transmission power of D2D and relay links is determined.On this basis, the generated utility matrix is used in the game to choose suitable cellular users for resource reuse.Simulation results show that the proposed algorithm enables higher system throughput with less power compared with the Equal Power Allocation Random(EPAR) algorithm.

Computer engineering. Computer hardware, Computer software

Detail DOI Sumber

arXiv Open Access 2021

Improving Tuberculosis (TB) Prediction using Synthetically Generated Computed Tomography (CT) Images

Ashia Lewis, Evanjelin Mahmoodi, Yuyue Zhou et al.

The evaluation of infectious disease processes on radiologic images is an important and challenging task in medical image analysis. Pulmonary infections can often be best imaged and evaluated through computed tomography (CT) scans, which are often not available in low-resource environments and difficult to obtain for critically ill patients. On the other hand, X-ray, a different type of imaging procedure, is inexpensive, often available at the bedside and more widely available, but offers a simpler, two dimensional image. We show that by relying on a model that learns to generate CT images from X-rays synthetically, we can improve the automatic disease classification accuracy and provide clinicians with a different look at the pulmonary disease process. Specifically, we investigate Tuberculosis (TB), a deadly bacterial infectious disease that predominantly affects the lungs, but also other organ systems. We show that relying on synthetically generated CT improves TB identification by 7.50% and distinguishes TB properties up to 12.16% better than the X-ray baseline.

en eess.IV, cs.CV

Detail Sumber

arXiv Open Access 2021

Wisdom for the Crowd: Discoursive Power in Annotation Instructions for Computer Vision

Milagros Miceli, Julian Posada

Developers of computer vision algorithms outsource some of the labor involved in annotating training data through business process outsourcing companies and crowdsourcing platforms. Many data annotators are situated in the Global South and are considered independent contractors. This paper focuses on the experiences of Argentinian and Venezuelan annotation workers. Through qualitative methods, we explore the discourses encoded in the task instructions that these workers follow to annotate computer vision datasets. Our preliminary findings indicate that annotation instructions reflect worldviews imposed on workers and, through their labor, on datasets. Moreover, we observe that for-profit goals drive task instructions and that managers and algorithms make sure annotations are done according to requesters' commands. This configuration presents a form of commodified labor that perpetuates power asymmetries while reinforcing social inequalities and is compelled to reproduce them into datasets and, subsequently, in computer vision systems.

en cs.CV, cs.CY

Detail Sumber

DOAJ Open Access 2020

Solar Thermal Integration With and Without Energy Storage: The Cases of Bioethanol and a Dairy Plant

Amanda Lucero Fuentes-Silva, Daniel Velázquez-Torres, Martín Picón-Núñez et al.

This paper looks at the resulting combination of working fluid inlet temperature and heat storage capacity as a means for increasing the solar fraction and the number of hours that renewable energy can be supplied to a background process. The portion of the process heat duty that can effectively be supplied from solar thermal energy, or solar fraction, increases with the inlet temperature of the solar field. The increase of inlet temperature is beneficial since it reduces the size of the solar field. The way to achieve higher inlet temperatures is by means of surplus heat that is stored during sunny hour. For further increasing the period of operation beyond sunny hours, thermal storage must be increased. In this work, two case studies are considered and it is found that the integration of solar thermal plants proves to be a cost-effective alternative in energy conservation and pollution reduction. The payback time reveals that doubling the time where the solar system delivers the required temperature is still profitable.

Chemical engineering, Computer engineering. Computer hardware

Detail DOI Sumber

Hasil untuk "Computer engineering. Computer hardware"