For decades, Moore's Law has served as a steadfast pillar in computer architecture and system design, promoting a clear abstraction between hardware and software. This traditional Moore's computing paradigm has deepened the rift between the two, enabling software developers to achieve near-exponential performance gains often without needing to delve deeply into hardware-specific optimizations. Yet today, Moore's Law -- with its once relentless performance gains now diminished to incremental improvements -- faces inevitable physical barriers. This stagnation necessitates a reevaluation of the conventional system design philosophy. The traditional decoupled system design philosophy, which maintains strict abstractions between hardware and software, is increasingly obsolete. The once-clear boundary between software and hardware is rapidly dissolving, replaced by co-design. It is imperative for the computing community to intensify its commitment to hardware-software co-design, elevating system abstractions to first-class citizens and reimagining design principles to satisfy the insatiable appetite of modern computing. Hardware-software co-design is not a recent innovation. To illustrate its historical evolution, I classify its development into five relatively distinct ``epochs''. This post also highlights the growing influence of the architecture community in interdisciplinary teams -- particularly alongside ML researchers -- and explores why current co-design paradigms are struggling in today's computing landscape. Additionally, I will examine the concept of the ``hardware lottery'' and explore directions to mitigate its constraining influence on the next era of computing innovation.
Computing has a huge memory problem. The memory system, consisting of multiple technologies at different levels, is responsible for most of the energy consumption, performance bottlenecks, robustness problems, monetary cost, and hardware real estate of a modern computing system. All this becomes worse as modern and emerging applications become more data-intensive (as we readily witness in e.g., machine learning, genome analysis, graph processing, and data analytics), making the memory system an even larger bottleneck. In this paper, we discuss two major challenges that greatly affect computing system performance and efficiency: 1) memory technology & capacity scaling (at the lower device and circuit levels) and 2) system and application performance & energy scaling (at the higher levels of the computing stack). We demonstrate that both types of scaling have become extremely difficult, wasteful, and costly due to the dominant processor-centric design & execution paradigm of computers, which treats memory as a dumb and inactive component that cannot perform any computation. We show that moving to a memory-centric design & execution paradigm can solve the major challenges, while enabling multiple other potential benefits. In particular, we demonstrate that: 1) memory technology scaling problems (e.g., RowHammer, RowPress, Variable Read Disturbance, data retention, and other issues awaiting to be discovered) can be much more easily and efficiently handled by enabling memory to autonomously manage itself; 2) system and application performance & energy efficiency can, at the same time, be improved by orders of magnitude by enabling computation capability in memory chips and structures (i.e., processing in memory). We discuss adoption challenges against enabling memory-centric computing, and describe how we can get there step-by-step via an evolutionary path.
Neural systems use the same underlying computational substrate to carry out analog filtering and signal processing operations, as well as discrete symbol manipulation and digital computation. Inspired by the computational principles of canonical cortical microcircuits, we propose a framework for using recurrent spiking neural networks to seamlessly and robustly switch between analog signal processing and categorical and discrete computation. We provide theoretical analysis and practical neural network design tools to formally determine the conditions for inducing this switch. We demonstrate the robustness of this framework experimentally with hardware soft Winner-Take-All and mixed-feedback recurrent spiking neural networks, implemented by appropriately configuring the analog neuron and synapse circuits of a mixed-signal neuromorphic processor chip.
Timo Kehrer, Robert Haines, Guido Juckeland
et al.
Anecdotal evidence suggests that Research Software Engineers (RSEs) and Software Engineering Researchers (SERs) often use different terminologies for similar concepts, creating communication challenges. To better understand these divergences, we have started investigating how SE fundamentals from the SER community are interpreted within the RSE community, identifying aligned concepts, knowledge gaps, and areas for potential adaptation. Our preliminary findings reveal opportunities for mutual learning and collaboration, and our systematic methodology for terminology mapping provides a foundation for a crowd-sourced extension and validation in the future.
Commodity operating systems often lack sufficient security mechanisms to defend against sophisticated attacks, resulting in applications being vulnerable to attacks that compromises sensitive data and in turn involves in additional protection layers that increase software complexity and costs. To address these challenges, I introduce HBSP (Hypervisor-Based Software Protector), a lightweight and flexible solution that leverages Intel’s VT (Virtualization Technology) to provide enhanced security. HBSP operates entirely outside the host OS environment, using advanced memory-hiding techniques to protect sensitive data and application code from both the host OS and potential malicious actors. Unlike traditional approaches, HBSP requires no modifications to existing operating systems or applications. Its dynamic concealment of the hypervisor makes it harder for attackers to bypass protection mechanisms. Performance evaluations show minimal overhead (0.25% impact on application performance), making HBSP suitable for real-time and performance-critical applications. Moreover, it is extensible across various hardware virtualization platforms, ensuring broad applicability across diverse environments. HBSP offers a scalable, practical solution for improving software security without significant infrastructure changes or performance trade-offs.
LinkedIn is the largest professional network in the world. As such, it can serve to build bridges between practitioners, whose daily work is software engineering (SE), and researchers, who work to advance the field of software engineering. We know that such a metaphorical bridge exists: SE research findings are sometimes shared on LinkedIn and commented on by software practitioners. Yet, we do not know what state the bridge is in. Therefore, we quantitatively and qualitatively investigate how SE practitioners and researchers approach each other via public LinkedIn discussions and what both sides can contribute to effective science communication. We found that a considerable proportion of LinkedIn posts on SE research are written by people who are not the paper authors (39%). Further, 71% of all comments in our dataset are from people in the industry, but only every second post receives at least one comment at all. Based on our findings, we formulate concrete advice for researchers and practitioners to make sharing new research findings on LinkedIn more fruitful.
Bruno A. Krinski, Daniel V. Ruiz, Rayson Laroca
et al.
Due to the COVID-19 global pandemic, computer-assisted diagnoses of medical images have gained much attention, and robust methods of semantic segmentation of Computed Tomography (CT) images have become highly desirable. In this work, we present a deeper analysis of how data augmentation techniques improve segmentation performance on this problem. We evaluate 20 traditional augmentation techniques on five public datasets. Six different probabilities of applying each augmentation technique on an image were evaluated. We also assess a different training methodology where the training subsets are combined into a single larger set. All networks were evaluated through a 5-fold cross-validation strategy, resulting in over 4,600 experiments. We also propose a novel data augmentation technique based on Generative Adversarial Networks (GANs) to create new healthy and unhealthy lung CT images, evaluating four variations of our approach with the same six probabilities of the traditional methods. Our findings show that GAN-based techniques and spatial-level transformations are the most promising for improving the learning of deep models on this problem, with the StarGANv2 + F with a probability of 0.3 achieving the highest F-score value on the Ricord1a dataset in the unified training strategy. Our code is publicly available at https://github.com/VRI-UFPR/DACov2022
Today, many systems use artificial intelligence (AI) to solve complex problems. While this often increases system effectiveness, developing a production-ready AI-based system is a difficult task. Thus, solid AI engineering practices are required to ensure the quality of the resulting system and to improve the development process. While several practices have already been proposed for the development of AI-based systems, detailed practical experiences of applying these practices are rare. In this paper, we aim to address this gap by collecting such experiences during a case study, namely the development of an autonomous stock trading system that uses machine learning functionality to invest in stocks. We selected 10 AI engineering practices from the literature and systematically applied them during development, with the goal to collect evidence about their applicability and effectiveness. Using structured field notes, we documented our experiences. Furthermore, we also used field notes to document challenges that occurred during the development, and the solutions we applied to overcome them. Afterwards, we analyzed the collected field notes, and evaluated how each practice improved the development. Lastly, we compared our evidence with existing literature. Most applied practices improved our system, albeit to varying extent, and we were able to overcome all major challenges. The qualitative results provide detailed accounts about 10 AI engineering practices, as well as challenges and solutions associated with such a project. Our experiences therefore enrich the emerging body of evidence in this field, which may be especially helpful for practitioner teams new to AI engineering.
Knowledge graph embedding research has mainly focused on learning continuous representations of knowledge graphs towards the link prediction problem. Recently developed frameworks can be effectively applied in research related applications. Yet, these frameworks do not fulfill many requirements of real-world applications. As the size of the knowledge graph grows, moving computation from a commodity computer to a cluster of computers in these frameworks becomes more challenging. Finding suitable hyperparameter settings w.r.t. time and computational budgets are left to practitioners. In addition, the continual learning aspect in knowledge graph embedding frameworks is often ignored, although continual learning plays an important role in many real-world (deep) learning-driven applications. Arguably, these limitations explain the lack of publicly available knowledge graph embedding models for large knowledge graphs. We developed a framework based on the frameworks DASK, Pytorch Lightning and Hugging Face to compute embeddings for large-scale knowledge graphs in a hardware-agnostic manner, which is able to address real-world challenges pertaining to the scale of real application. We provide an open-source version of our framework along with a hub of pre-trained models having more than 11.4 B parameters.
The Intel Software Guard Extensions (SGX) technology enables applications to run in an isolated SGX enclave environment, with elevated confidentiality and integrity guarantees. Gramine Library OS facilitates execution of existing unmodified applications in SGX enclaves, requiring only an accompanying manifest file that describes the application's security posture and configuration. However, Intel SGX is a CPU-only technology, thus Gramine currently supports CPU-only workloads. To enable a broader class of applications that offload computations to hardware accelerators - GPU offload, NIC offload, FPGA offload, TPM communications - Gramine must be augmented with device-backed mmap support and generic ioctl support. In this paper, we describe the design and implementation of this newly added support, the corresponding changes to the manifest-file syntax and the requisite deep copy algorithm. We evaluate our implementation on Intel Media SDK workloads and discuss the encountered caveats and limitations. Finally, we outline a use case for the presented mmap/ioctl support beyond mere device communication, namely the mechanism to slice the application into the trusted enclave part (where the core application executes) and the untrusted shared-memory part (where insecure shared libraries execute).
Thomas M. Conte, Ian T. Foster, William Gropp
et al.
While past information technology (IT) advances have transformed society, future advances hold even greater promise. For example, we have only just begun to reap the changes from artificial intelligence (AI), especially machine learning (ML). Underlying IT's impact are the dramatic improvements in computer hardware, which deliver performance that unlock new capabilities. For example, recent successes in AI/ML required the synergy of improved algorithms and hardware architectures (e.g., general-purpose graphics processing units). However, unlike in the 20th Century and early 2000s, tomorrow's performance aspirations must be achieved without continued semiconductor scaling formerly provided by Moore's Law and Dennard Scaling. How will one deliver the next 100x improvement in capability at similar or less cost to enable great value? Can we make the next AI leap without 100x better hardware? This whitepaper argues for a multipronged effort to develop new computing approaches beyond Moore's Law to advance the foundation that computing provides to US industry, education, medicine, science, and government. This impact extends far beyond the IT industry itself, as IT is now central for providing value across society, for example in semi-autonomous vehicles, tele-education, health wearables, viral analysis, and efficient administration. Herein we draw upon considerable visioning work by CRA's Computing Community Consortium (CCC) and the IEEE Rebooting Computing Initiative (IEEE RCI), enabled by thought leader input from industry, academia, and the US government.
Nadeem Kafi, Zubair Ahmed Shaikh, Muhammad Shahid Shaikh
KG (Knowledge Generation) and understanding have traditionally been a Human-centric activity. KE (Knowledge Engineering) and KM (Knowledge Management) have tried to augment human knowledge on two separate planes: the first deals with machine interpretation of knowledge while the later explore interactions in human networks for KG and understanding. However, both remain computer-centric. Crowdsourced HC (Human Computations) have recently utilized human cognition and memory to generate diverse knowledge streams on specific tasks, which are mostly easy for humans to solve but remain challenging for machine algorithms. Literature shows little work on KM frameworks for citizen crowds, which gather input from the diverse category of Humans, organize that knowledge concerning tasks and knowledge categories and recreate new knowledge as a computer-centric activity. In this paper, we present an attempt to create a framework by implementing a simple solution, called ExamCheck, to focus on the generation of knowledge, feedback on that knowledge and recording the results of that knowledge in academic settings. Our solution, based on HC, shows that a structured KM framework can address a complex problem in a context that is important for participants themselves.
The rapid progress and advancement in electronic chips technology provide a variety of new implementation options for system engineers. The choice varies between the flexible programs running on a general-purpose processor (GPP) and the fixed hardware implementation using an application specific integrated circuit (ASIC). Many other implementation options present, for instance, a system with a RISC processor and a DSP core. Other options include graphics processors and microcontrollers. Specialist processors certainly improve performance over general-purpose ones, but this comes as a quid pro quo for flexibility. Combining the flexibility of GPPs and the high performance of ASICs leads to the introduction of reconfigurable computing (RC) as a new implementation option with a balance between versatility and speed. The focus of this chapter is on introducing reconfigurable computers as modern super computing architectures. The chapter also investigates the main reasons behind the current advancement in the development of RC-systems. Furthermore, a technical survey of various RC-systems is included laying common grounds for comparisons. In addition, this chapter mainly presents case studies implemented under the MorphoSys RC-system. The selected case studies belong to different areas of application, such as, computer graphics and information coding. Parallel versions of the studied algorithms are developed to match the topologies supported by the MorphoSys. Performance evaluation and results analyses are included for implementations with different characteristics.
Mordechai Guri, Boris Zadov, Dima Bykhovsky
et al.
Using the keyboard LEDs to send data optically was proposed in 2002 by Loughry and Umphress [1] (Appendix A). In this paper we extensively explore this threat in the context of a modern cyber-attack with current hardware and optical equipment. In this type of attack, an advanced persistent threat (APT) uses the keyboard LEDs (Caps-Lock, Num-Lock and Scroll-Lock) to encode information and exfiltrate data from airgapped computers optically. Notably, this exfiltration channel is not monitored by existing data leakage prevention (DLP) systems. We examine this attack and its boundaries for today's keyboards with USB controllers and sensitive optical sensors. We also introduce smartphone and smartwatch cameras as components of malicious insider and 'evil maid' attacks. We provide the necessary scientific background on optical communication and the characteristics of modern USB keyboards at the hardware and software level, and present a transmission protocol and modulation schemes. We implement the exfiltration malware, discuss its design and implementation issues, and evaluate it with different types of keyboards. We also test various receivers, including light sensors, remote cameras, 'extreme' cameras, security cameras, and smartphone cameras. Our experiment shows that data can be leaked from air-gapped computers via the keyboard LEDs at a maximum bit rate of 3000 bit/sec per LED given a light sensor as a receiver, and more than 120 bit/sec if smartphones are used. The attack doesn't require any modification of the keyboard at hardware or firmware levels.
Gianluca Reali, Mauro Femminella, Emilia Nunzi
et al.
This paper provides a global picture about the deployment of networked processing services for genomic data sets. Many current research make an extensive use genomic data, which are massive and rapidly increasing over time. They are typically stored in remote databases, accessible by using Internet. For this reason, a significant issue for effectively handling genomic data through data networks consists of the available network services. A first contribution of this paper consists of identifying the still unexploited features of genomic data that could allow optimizing their networked management. The second and main contribution of this survey consists of a methodological classification of computing and networking alternatives which can be used to offer what we call the Genomic-as-a-Service (GaaS) paradigm. In more detail, we analyze the main genomic processing applications, and classify not only the main computing alternatives to run genomics workflows in either a local machine or a distributed cloud environment, but also the main software technologies available to develop genomic processing services. Since an analysis encompassing only the computing aspects would provide only a partial view of the issues for deploying GaaS system, we present also the main networking technologies that are available to efficiently support a GaaS solution. We first focus on existing service platforms, and analyze them in terms of service features, such as scalability, flexibility, and efficiency. Then, we present a taxonomy for both wide area and datacenter network technologies that may fit the GaaS requirements. It emerges that virtualization, both in computing and networking, is the key for a successful large-scale exploitation of genomic data, by pushing ahead the adoption of the GaaS paradigm. Finally, the paper illustrates a short and long-term vision on future research challenges in the field.
Davide Fucci, Cristina Palomares, Dolors Costal
et al.
Background: Requirement engineering is often considered a critical activity in system development projects. The increasing complexity of software, as well as number and heterogeneity of stakeholders, motivate the development of methods and tools for improving large-scale requirement engineering. Aims: The empirical study presented in this paper aims to identify and understand the characteristics and challenges of a platform, as desired by experts, to support requirement engineering for individual stakeholders, based on the current pain-points of their organizations when dealing with a large number requirements. Method: We conducted a multiple case study with three companies in different domains. We collected data through ten semi-structured interviews with experts from these companies. Results: The main pain-point for stakeholders is handling the vast amount of data from different sources. The foreseen platform should leverage such data to manage changes in requirements according to customers' and users' preferences. It should also offer stakeholders an estimation of how long a requirements engineering task will take to complete, along with an easier requirements dependency identification and requirements reuse strategy. Conclusions: The findings provide empirical evidence about how practitioners wish to improve their requirement engineering processes and tools. The insights are a starting point for in-depth investigations into the problems and solutions presented. Practitioners can use the results to improve existing or design new practices and tools.
Augmented Reality (AR) application has been widely used for educational purposes. This study introduced AR in computer hardware (ARCH) learning media. ARCH is an application prototype which helps student identify computer hardware devices. It was important to measure student acceptance to evaluate attitude toward using and intention to use the application. Student acceptance would be measured using Technology Acceptance Model (TAM) approach. The constructs involved were perceived usefulness, perceived ease of use, perceived enjoyment, attitude toward using and intention to use. The purpose of this study was to investigate the most significant factors that affect attitude toward using and intention to use ARCH system. The methods consisted of collecting data in the questionnaire form, converted the data result into 5-point range Likert scale, reliability, and correlation test and delivered regression analysis test. The results showed that perceived ease of use was the most significant factor in regards to attitude toward using, and perceived enjoyment was the most affecting factor in regards to intention to use the ARCH system
Damiano Torre, Giuseppe Procaccianti, Davide Fucci
et al.
Nowadays, software is pervasive in our everyday lives. Its sustainability and environmental impact have become major factors to be considered in the development of software systems. Millennials-the newer generation of university students-are particularly keen to learn about and contribute to a more sustainable and green society. The need for training on green and sustainable topics in software engineering has been reflected in a number of recent studies. The goal of this paper is to get a first understanding of what is the current state of teaching sustainability in the software engineering community, what are the motivations behind the current state of teaching, and what can be done to improve it. To this end, we report the findings from a targeted survey of 33 academics on the presence of green and sustainable software engineering in higher education. The major findings from the collected data suggest that sustainability is under-represented in the curricula, while the current focus of teaching is on energy efficiency delivered through a fact-based approach. The reasons vary from lack of awareness, teaching material and suitable technologies, to the high effort required to teach sustainability. Finally, we provide recommendations for educators willing to teach sustainability in software engineering that can help to suit millennial students needs.
Augmented Reality (AR) application has been widely used for educational purposes. This study introduced AR in computer hardware (ARCH) learning media. ARCH is an application prototype which helps student identify computer hardware devices. It was important to measure student acceptance to evaluate attitude toward using and intention to use the application. Student acceptance would be measured using Technology Acceptance Model (TAM) approach. The constructs involved were perceived usefulness, perceived ease of use, perceived enjoyment, attitude toward using and intention to use. The purpose of this study was to investigate the most significant factors that affect attitude toward using and intention to use ARCH system. The methods consisted of collecting data in the questionnaire form, converted the data result into 5-point range Likert scale, reliability, and correlation test and delivered regression analysis test. The results showed that perceived ease of use was the most significant factor in regards to attitude toward using, and perceived enjoyment was the most affecting factor in regards to intention to use the ARCH system
Since its inception at the beginning of the twentieth century, quantum mechanics has challenged our conceptions of how the universe ought to work; however, the equations of quantum mechanics can be too computationally difficult to solve using existing computers for even modestly large systems. Here I will show that quantum computers can sometimes be used to address such problems and that quantum computer science can assign formal complexities to learning facts about nature. Hence, computer science should not only be regarded as an applied science; it is also of central importance to the foundations of science.