As an emerging type of AI computing accelerator, SRAM Computing-In-Memory (CIM) accelerators feature high energy efficiency and throughput. However, various CIM designs and under-explored mapping strategies impede the full exploration of compute and storage balancing in SRAM-CIM accelerator, potentially leading to significant performance degradation. To address this issue, we propose CIM-Tuner, an automatic tool for hardware balancing and optimal mapping strategy under area constraint via hardware-mapping co-exploration. It ensures universality across various CIM designs through a matrix abstraction of CIM macros and a generalized accelerator template. For efficient mapping with different hardware configurations, it employs fine-grained two-level strategies comprising accelerator-level scheduling and macro-level tiling. Compared to prior CIM mapping, CIM-Tuner's extended strategy space achieves 1.58$\times$ higher energy efficiency and 2.11$\times$ higher throughput. Applied to SOTA CIM accelerators with identical area budget, CIM-Tuner also delivers comparable improvements. The simulation accuracy is silicon-verified and CIM-Tuner tool is open-sourced at https://github.com/champloo2878/CIM-Tuner.git.
Pedro Garcia Lopez, Marina López Alet, Usama Benabdelkrim Zakan
et al.
In recent years, Asia's rapid growth in research output has been reshaping the computing research landscape. What was once a two-block system (America and Europe) is evolving into a multipolar world with three major hubs: America, Europe, and Asia. To study these pivotal changes and evaluate international diversity, we have analyzed the past 13 years of 13 international systems research conferences: ASPLOS, NSDI, OSDI, SIGCOMM, ATC, EuroSys, ICDCS, Middleware, SoCC, CCGRID, IC2E, IEEE Cloud and EuroPar. Our analysis focuses on accepted papers and participation in the Program Committee, grouping the results by region (America, Europe, and Asia). Surprisingly, we find a pronounced historical imbalance in international diversity among top-tier systems conferences (ASPLOS, OSDI, NSDI, SIGCOMM). While most other conferences have progressively reflected Asia's growing research presence over the past decades, this group has shown a noticeable adjustment only in the recent four years. We also identify persistent rigidities in how program committee (PC) diversity adapts to shifts in accepted paper origins, with a consistent under-representation of researchers from Asian organizations in many PCs.
We present the first systematic cross-vendor analysis of GPU instruction set architectures spanning all four major GPU vendors: NVIDIA (PTX ISA v1.0 through v9.2, Fermi through Blackwell), AMD (RDNA 1 to 4 and CDNA 1 to 4), Intel (Gen11, Xe-LP, Xe-HPG, Xe-HPC), and Apple (G13, reverse-engineered). Drawing on official ISA reference manuals, architecture whitepapers, patent filings, and community reverse-engineering efforts totaling over 5,000 pages of primary sources across 16 distinct microarchitectures, we identify ten hardware-invariant computational primitives that appear across all four architectures, six parameterizable dialects where vendors implement identical concepts with different parameters, and six true architectural divergences representing fundamental design disagreements. Based on this analysis, we propose an abstract execution model for a vendor-neutral GPU ISA grounded in the physical constraints of parallel computation. We validate our model with benchmark results on NVIDIA T4 and Apple M1 hardware, the two most architecturally distant platforms in our study. On five of six benchmark-platform pairs, the abstract model matches or exceeds native vendor-optimized performance. The single outlier (parallel reduction on NVIDIA, 62.5% of native) reveals that intra-wave shuffle must be a mandatory primitive, a finding that refines our proposed model.
To address the challenge of inter-individual variability and improve the universality of gesture recognition technology, this study proposes a migration learning strategy based on Multi Parallel Conventional Neural Network (MPCNN), which aims to achieve efficient gesture recognition based on surface Electromyogram (sEMG) signals through a parallel architecture and an optimized migration learning mechanism. With a parallel architecture and optimized migration learning mechanism, MPCNN can deal with physiological differences between individuals more efficiently than previous CNN migration frameworks, which improves the model's adaptability to new users and recognition accuracy. In addition, MPCNN significantly enhances the utility of the system by reducing the model training time and improving the generalization ability. Through multiple sets of experiments, including multiplicative cross-validation, ablation experiments, and robustness tests, this study validates the effectiveness of the proposed strategy in several respects. The experimental results demonstrate that MPCNN significantly improves the accuracy of gesture recognition compared to traditional CNN models, and the proposed MPCNN migration learning strategy achieves a recognition rate of 94.95% in Ninapro DB7 compared to previous CNN migration learning frameworks, with an improvement of 4.38 percentage points, with the training time reduced by more than 50%. These experiments validate the advantages of the MPCNN migration model in reducing the training burden, enhancing the generalization ability, and improving anti-interference. The human-computer interaction capability is validated based on an experimental model, which verifies its promising potential for myoelectric control applications.
Abstract Precise tuning of dielectric constants (εr) in oxide glasses is critical for high‐frequency devices in 5G/6G systems, where εr directly governs signal propagation efficiency. A machine learning framework combining data augmentation and physicochemical descriptor integration is developed to address data scarcity. Validated pseudo‐labels are generated via ensemble learning, expanding the dataset from 1503 to 11,029 compositions without distributional shift. The XGBoost model trained on the augmented dataset achieved superior accuracy, with an R2 of 0.96 and an MSE of 0.14. For prediction tasks on unseen data, it reduced the error rate by 48% compared to the non‐augmented model and improved generalization performance by 43% over GlassNet. B2O3 and SiO2 are identified as εr suppressors and BaO and TiO2 as enhancers through SHAP analysis, aligning with network former/modifier roles. Cation‐specific polarizabilities are derived via Clausius–Mossotti regression (R2 = 0.909). Integration of physicochemical descriptors (coordination number and bond strength) enables transferable predictions for Y2O3 and La2O3 containing glasses, with mean deviation 2.46%–4.76%. Crucially, structural descriptors dominate polarizability with 69.9% feature importance, establishing network engineering as the optimal design paradigm. A data‐driven pathway for rational dielectric glass development is thus established.
Materials of engineering and construction. Mechanics of materials, Computer engineering. Computer hardware
Agile software development relies on self-organized teams, underlining the importance of individual responsibility. How developers take responsibility and build ownership are influenced by external factors such as architecture and development methods. This paper examines the existing literature on ownership in software engineering and in psychology, and argues that a more comprehensive view of ownership in software engineering has a great potential in improving software team's work. Initial positions on the issue are offered for discussion and to lay foundations for further research.
Abstract clones serve as an algebraic presentation of the syntax of a simple type theory. From the perspective of universal algebra, they define algebraic theories like those of groups, monoids and rings. This link allows one to study the language of simple type theory from the viewpoint of universal algebra. Programming languages, however, are much more complicated than simple type theory. Many useful features like reading, writing, and exception handling involve interacting with the environment; these are called side-effects. Algebraic presentations for languages with the appropriate syntax for handling effects are given by premulticategories and effectful multicategories. We study these structures with the aim of defining a suitable notion of an algebra. To achieve this goal, we proceed in two steps. First, we define a tensor on $[\to,\category{Set}]$, and show that this tensor along with the cartesian product gives the category a duoidal structure. Secondly, we introduce the novel notion of a multicategory enriched in a duoidal category which generalize the traditional notion of a multicategory. Further, we prove that an effectful multicategory is the same as a multicategory enriched in the duoidal category $[\to,\category{Set}]$. This result places multicategories and effectful multicategories on a similar footing, and provides a mechanism for transporting concepts from the theory of multicategories (which model pure computation) to the theory of effectful multicategories (which model effectful computation). As an example of this, we generalize the definition of a 2-morphism for multicategories to the duoidally enriched case. Our equivalence result then gives a natural definition of a 2-morphism for effectful multicategories, which we then use to define the notion of an algebra.
Rudrajit Choudhuri, Ambareesh Ramakrishnan, Amreeta Chatterjee
et al.
Generative AI (genAI) tools (e.g., ChatGPT, Copilot) have become ubiquitous in software engineering (SE). As SE educators, it behooves us to understand the consequences of genAI usage among SE students and to create a holistic view of where these tools can be successfully used. Through 16 reflective interviews with SE students, we explored their academic experiences of using genAI tools to complement SE learning and implementations. We uncover the contexts where these tools are helpful and where they pose challenges, along with examining why these challenges arise and how they impact students. We validated our findings through member checking and triangulation with instructors. Our findings provide practical considerations of where and why genAI should (not) be used in the context of supporting SE students.
Jun LUO, Qingwei GAO, Yi TAN, Dawei ZHAO, Yixiang LU, Dong SUN
Label-specific features are a research hotspot in multi-label learning, which utilizes label feature extraction to solve the problem of multiple class labels in a single instance. Existing research on multi-label classification usually considers only the correlation between labels and ignores the local manifold structure between the original data, which results in a decrease in classification accuracy. In addition, in label correlation, the structural relationship between features and labels, as well as the inherent causal relationship between labels, are often overlooked. To address these issues, in this study, a multi-label learning algorithm based on double Laplace regularization and causal inference is proposed. Linear regression models are used to establish a basic multi-label classification framework which is combined with causal learning to explore the inherent causal relationships between labels, to achieve the goal of mining the essential connections between labels. To fully utilize the structural relationship between features and labels, double Laplace regularization is added to mine local label association information and effectively maintain the local manifold structure of the original data. The effectiveness of the proposed algorithm is verified on a public multi-label dataset. The experimental results showed that compared to algorithms such as LLSF, ML-KNN, and LIFT, the proposed algorithm achieved an average performance improvement of 8.82%, 4.98%, 9.43%, 16.27%, 12.19%, and 3.35% in terms of Hamming Loss(HL), Average Precision(AP), One Error(OE), Ranking Loss(RL), coverage, and AUC, respectively.
The increasing popularity of applications like the Metaverse has led to the exploration of new, more effective ways of communication. Semantic communication, which focuses on the meaning behind transmitted information, represents a departure from traditional communication paradigms. As mobile devices become increasingly prevalent, it is important to explore the potential of edge computing to aid the semantic encoding/decoding process, which requires significant computing power and storage capabilities. However, establishing knowledge bases (KBs) for domain-oriented communication can be time-consuming. To address this challenge, this paper proposes a semantic caching model in edge computing system that caches domain-specialized general models and user-specific individual models. This approach has the potential to reduce the time and resources required to establish individual KBs while accurately capturing the semantics behind users' messages, ultimately leading to more efficient and accessible semantic communication.
Compute-in-memory accelerators built upon non-volatile memory devices excel in energy efficiency and latency when performing deep neural network (DNN) inference, thanks to their in-situ data processing capability. However, the stochastic nature and intrinsic variations of non-volatile memory devices often result in performance degradation during DNN inference. Introducing these non-ideal device behaviors in DNN training enhances robustness, but drawbacks include limited accuracy improvement, reduced prediction confidence, and convergence issues. This arises from a mismatch between the deterministic training and non-deterministic device variations, as such training, though considering variations, relies solely on the model's final output. In this work, inspired by control theory, we propose Negative Feedback Training (NeFT), a novel concept supported by theoretical analysis, to more effectively capture the multi-scale noisy information throughout the network. We instantiate this concept with two specific instances, oriented variational forward (OVF) and intermediate representation snapshot (IRS). Based on device variation models extracted from measured data, extensive experiments show that our NeFT outperforms existing state-of-the-art methods with up to a 45.08% improvement in inference accuracy while reducing epistemic uncertainty, boosting output confidence, and improving convergence probability. These results underline the generality and practicality of our NeFT framework for increasing the robustness of DNNs against device variations. The source code for these two instances is available at https://github.com/YifanQin-ND/NeFT_CIM
Magnetic resonance imaging (MRI) scanners have recently been used for magnetic actuation of robots for minimally invasive medical operations. Due to MRI's high soft‐tissue selectivity, it is possible to obtain 3D images of hard‐to‐reach cavities in the human body, where the wireless miniature magnetic robots powered by MRI could be employed for high‐precision targeted operations, such as drug delivery, stem cell therapy, and hyperthermia. However, the state‐of‐the‐art fast magnetic robot‐tracking methods in MRI are limited above millimeter‐size scale, which restricts the potential target regions inside the human body. Herein, a fast 1D projection‐based MRI approach that can track magnetic particles down to 300 μm diameter (1.17 × 10−2 emu) is reported. The technique reduces the trackable magnetic particle size in MRI‐powered navigation fivefold compared with the previous fast‐tracking methods. A closed‐loop MRI‐powered navigation with 0.78 ± 0.03 mm trajectory‐following accuracy in millimeter‐sized in vitro 2D channels and a 3D cavity setup using the tracking method is demonstrated. Furthermore, the feasibility of submillimeter magnetic robot tracking in ex vivo pig kidneys (N = 2) with a 3.6 ± 1.1 mm accuracy is demonstrated. Such a fast submillimeter‐scale mobile robot‐tracking approach can unlock new opportunities in minimally invasive medical operations.
Computer engineering. Computer hardware, Control engineering systems. Automatic machinery (General)
This publication presents a method responsible for counting tracking and monitoring visitors inside a building. The site examined is Manos Hatzidakis' House, situated in Xanthi. Specifically, we have conducted a study, which provides recommendations, regarding the installation of sensors in the building. We also present the communication protocols of the computer network used in order to ensure the efficient communication between the space examined and the sensor network. Finally, we describe the process of creating a website, which is designed to store and view the data.
Time synchronization is the key technology of underwater sensor networks. Due to the high propagation delay and Doppler frequency shift of underwater acoustic communication in the ocean, the land-based time synchronization algorithm using radio frequency communication can not be directly applied to the underwater environment. Based on the principle of Doppler velocity measurement and the mobility of nodes under water, this paper proposes a new time synchronization CD-Sync algorithm. The cluster model with clustering characteristics is used to select a reasonable cluster head node and synchronize with the water surface beacon node within the cluster. In the process of synchronization, the synchronization node uses Doppler principle to estimate the relative moving speed between nodes, so as to calculate the propagation delay between nodes. Experimental results show that, compared with MU-Sync algorithm based on clustering time synchronization and NU-Sync algorithm based on distributed time synchronization, this algorithm can shorten the distance between nodes and accelerate the convergence speed of synchronization between nodes, while effectively improving the accuracy of time synchronization.
Nayef Alawadhi, Imad Al Shaikhli, Abdulrahman Alkandari
et al.
The world intention toward open data technology has increased in the past years, and governments started to explore open data technology in the public and private sectors and tried to check its advantages and disadvantages. However, in the Arab world and especially in Kuwait, there is no solid structured attention about the technology in both sectors. As a result, we tried in this paper to determine if business owners in Kuwait have enough knowledge of the open data (OD) concept and if they have the willingness to use it for enhancing their services. The purpose of this research is to measure the acceptance of OD technology in Kuwait and to gather business owners' opinions about the ability to adopt the OD concept. Making online and hardcopy survey was our method for gathering different reactions and points of view about this technology. We intended to focus on the private sector and we targeted people who own a business and wish to introduce better services for their customers. Overall, the results have shown clear features about open data technology in Kuwait and the substantial need of education and awareness of the importance of this technology. The results of this study may positively and directly affect the level of motivation for other existing studies.
We live in exceptional times in which the entire world is witnessing the exponential spread of a pandemic, which requires to adopt new habits of mind and behaviors. In this paper, I introduce the term exponential competence, which encompasses these cognitive and social skills, and describe a course for computer science and software engineering students in which emphasis is placed on exponential competence. I argue that exponential competence is especially important for computer science and software engineering students, since many of them will, most likely, be required to deal with exponential phenomena in their future professional development.
The representation system in modern radars is carried out by obtaining digital signals. In the case of tracking radars, the representation is updated in the order of tenths of a millisecond. In order to achieve to update the display in real time, the acquisition, processing and representation process must comply with the time requirement imposed by the radar. The objective of this work is the visualization of the information for a medium of this type in real time using the Odroid XU4 board. For their solution, the parallel programming method was used, through the creation of threads in the Qt Creator Integrated Development Environment and the use of the "cauce" segmentation parallel programming pattern. The result allowed an efficient use of computing resources, obtaining a decrease of the execution time and a greater acceleration with respect to the sequential variant.
Biologically inspired recurrent neural networks, such as reservoir computers are of interest in designing spatio-temporal data processors from a hardware point of view due to the simple learning scheme and deep connections to Kalman filters. In this work we discuss using in-depth simulation studies a way to construct hardware reservoir computers using an analog stochastic neuron cell built from a low energy-barrier magnet based magnetic tunnel junction and a few transistors. This allows us to implement a physical embodiment of the mathematical model of reservoir computers. Compact implementation of reservoir computers using such devices may enable building compact, energy-efficient signal processors for standalone or in-situ machine cognition in edge devices.