Empirical research in reverse engineering and software protection is crucial for evaluating the efficacy of methods designed to protect software against unauthorized access and tampering. However, conducting such studies with professional reverse engineers presents significant challenges, including access to professionals and affordability. This paper explores the use of students as participants in empirical reverse engineering experiments, examining their suitability and the necessary training; the design of appropriate challenges; strategies for ensuring the rigor and validity of the research and its results; ways to maintain students' privacy, motivation, and voluntary participation; and data collection methods. We present a systematic literature review of existing reverse engineering experiments and user studies, a discussion of related work from the broader domain of software engineering that applies to reverse engineering experiments, an extensive discussion of our own experience running experiments ourselves in the context of a master-level software hacking and protection course, and recommendations based on this experience. Our findings aim to guide future empirical studies in RE, balancing practical constraints with the need for meaningful, reproducible results.
In Internet of Things (IoT) scenarios, data are susceptible to noise during collection and transmission, resulting in outliers and missing data. Existing temporal regularized matrix factorization models typically consider the squared loss as a measure of reconstruction errors, ignoring the fact that the quality of matrix factorization is also a key factor affecting a model's prediction performance when dealing with multidimensional time series in the presence of anomalous data. Therefore, this paper proposes a Time Aware Robust Non-negative Matrix Factorization multidimensional temporal prediction framework (TARNMF) based on the L<sub>2, log</sub> norm. TARNMF establishes the spatiotemporal correlation of multidimensional time series data through Nonnegative Matrix Factorization (NMF) and autoregressive temporal regular terms with learnable parameters. In the presence of outliers, data obey the Laplace distribution. Based on this assumption, the L<sub>2, log</sub> norm is used to estimate the error between the original data and the reconstructed matrices in the nonnegative robust matrix factorization to minimize the interference of the anomalous data on the prediction model. The L<sub>2, log</sub> norm is as robust as existing metric functions, solves the problem of approximating the L<sub>1</sub> loss, and reduces its effect on the objective function by compressing the residuals of the outliers. The paper also proposes a projected gradient descent-based optimization method to optimize the model. Experiments on a high-dimensional Solar dataset show that TARNMF is scalable and robust, and the relative mean absolute error of the suboptimal results is reduced by 8.64%. Meanwhile, results on noisy data verify that TARNMF can efficiently process and predict IoT time series data in the presence of anomalous data.
Antonina Kashtalian, Sergii Lysenko, Anatoliy Sachenko
et al.
Independent restructuring of the architecture of multicomputer systems during their operation is a complex task, since such systems are distributed. One of the tasks in this restructuring is to change the architecture of system centers. That is, the system can be rebuilt without changes in its center. But the specifics of the tasks of systems for detecting malicious software and computer attacks require such an organization of systems that it is difficult for attackers to understand their behavior. Therefore, the current task considered in the work is the development of rules for ensuring the restructuring of system centers according to different types of architecture. The aim of the work is to develop criteria for evaluating potential options for centralization in the architecture of multicomputer systems with traps and decoys. To ensure such an assessment, the work analyzed known solutions and established the insufficiency of mathematical support for organizing the restructuring of system centers during their operation. Taking into account the specifics of the tasks for such systems, no parameters were determined that could be taken into account for the formation of the restructuring of system centers. The analyzed works establish the main types of centralization used in the architecture of systems: centralized, partially centralized, partially decentralized, decentralized. However, algorithms and methods for the transition of systems from one type to another in the process of their functioning are not provided. Subject. The work defines characteristic properties that can be used when synthesizing systems. They determine the number of potential variants of the system architecture to which it will switch at the next step when making a decision on restructuring the architecture. With an increase in the number of characteristic properties, the number of possible variants will increase. When approving the variants for the transition, it was necessary to evaluate them taking into account the previous experience of the systems' functioning. To evaluate potential centralization variants in the architecture of systems, evaluation criteria were developed. A feature of the evaluation criteria is that according to them, it is possible to take into account the experience of using the centralization variant in the case of repetition and evaluate the prepared variants that are offered for the first time. That is, the evaluation criteria include the previous experience of the functioning of multi-computer systems. This experience made it possible to evaluate the repeated option based on the results of its previous use. This made it possible to diversify the choice of system centers. Methods. The work developed an objective function for evaluating the next centralization option in the system architecture. The objective function takes into account four evaluation criteria for operational efficiency, stability, integrity and security. All these criteria are focused on evaluating potential options for system centers. New mathematical models were developed for the criteria for operational efficiency, stability, integrity and security in relation to the system center, which, unlike the known mathematical models for evaluating system centers for selecting the next options for centralization, are presented in analytical expressions that take into account the features of the types of centralization in the system architecture, indicators of operational efficiency, stability, integrity and security in relation to the system center and allow forming on their basis an objective function for evaluating options for centralization in systems, the feature of which is the hiding of components with the system center from detection by attackers. Results. The work analyzed the results of an experiment conducted with a prototype of the system. The convergence of the experimental results and the results obtained by the theoretical method has been established. Conclusion. The study introduces mathematical models for evaluating system centers based on operational efficiency, stability, integrity, and security criteria. Unlike existing models, these are presented as analytical expressions that account for various centralization types within system architectures. The models enable the creation of objective functions to evaluate centralization options, emphasizing the concealment of system center components from attackers. Experimental results with a system prototype confirm the theoretical models' validity, showing minimal deviations in function graphs. Significant deviations in specific time intervals are addressed to achieve optimal centralization options.
Balancing the accuracy and the complexity of models is a well established and ongoing challenge. Models can be misleading if they are not accurate, but models may be incomprehensible if their accuracy depends upon their being complex. In this paper, semilattices are examined as an option for balancing the accuracy and the complexity of machine learning models. This is done with a type of machine learning that is based on semilattices: algebraic machine learning. Unlike trees, semilattices can include connections between elements that are in different hierarchies. Trees are a subclass of semilattices. Hence, semilattices have higher expressive potential than trees. The explanation provided here encompasses diagrammatic semilattices, algebraic semilattices, and interrelationships between them. Machine learning based on semilattices is explained with the practical example of urban food access landscapes, comprising food deserts, food oases, and food swamps. This explanation describes how to formulate an algebraic machine learning model. Overall, it is argued that semilattices are better for balancing the accuracy and complexity of models than trees, and it is explained how algebraic semilattices can be the basis for machine learning models.
Air pollution is one of the primary challenges in urban environmental governance, with PM<sub>2.5</sub> being a significant contributor that affects air quality. As the traditional time-series prediction models for PM<sub>2.5</sub> often lack seasonal factor analysis and sufficient prediction accuracy, a fusion model based on machine learning, Seasonal Autoregressive Integrated Moving Average (SARIMA)-Support Vector Machine (SVM), is proposed in this paper. The fusion model is a tandem fusion model, which splits the data into linear and nonlinear parts. Based on the Autoregressive Integral Moving Average (ARIMA) model, the SARIMA model adds seasonal factor extraction parameters, to effectively analyze and predict the future linear seasonal trend of PM<sub>2.5</sub> data. Combined with the SVM model, the sliding step size prediction method is used to determine the optimal prediction step size for the residual series, thereby optimizing the residual sequence of the predicted data. The optimal model parameters are further determined through grid search, leading to the long-term predictions of PM<sub>2.5</sub> data and improves overall prediction accuracy. The analysis of the PM<sub>2.5</sub> monitoring data in Wuhan for the past five years shows that prediction accuracy of the fusion model is significantly higher than that of the single model. In the same experimental environment, the accuracy of the fusion model is improved by 99%, 99%, and 98% compared with those of ARIMA, Auto ARIMA, and SARIMA models, respectively and the stability of the model is also better, thus providing a new direction for the prediction of PM<sub>2.5</sub>.
ZHANG Changchang, LÜ Weidong, CAI Zijie, LIU Yankui
To address the lack of Sleeping on Duty datasets, poor generalization of current classification algorithms, and slow inference speeds, a Sleeping on Duty dataset containing 4 708 images is constructed to verify the recognition accuracy and generalization ability of the model. Additionally, a lightweight image classification algorithm, Stable_MobileNet, based on domain generalization, is proposed. First, the input images are padded along the shorter edges to maintain the aspect ratio of people within the images, followed by image enhancement and random erasure to expand the dataset. Second, the Efficient Channel Attention (ECA) module is introduced to improve the MobileNetv3_large network. Finally, the stable learning method, StableNet, is applied to enhance the generalization of the model by learning the weights of the training samples, reducing feature dependency, and allowing the model to focus more on character features rather than environmental factors. Experimental results on the Sleeping on Duty dataset indicate that Stable_MobileNet achieves faster average inference compared to MobileNetv3_large, with a recognition accuracy of 93.56%, which is 2.23% higher than that of MobileNetv3_large. In the test set, where the sample distribution differed from that of the training set, the recognition accuracy of Stable_MobileNet is improved by 2.23%.
As deep neural network (DNN) models get larger and more complicated, the importance of hardware acceleration becomes more and more apparent. This paper discusses various hardware acceleration strategies for deep learning, especially in the area of computer vision. It explores the use of GPUs, FPGAs, and ASICs, detailing their respective strengths and weaknesses in accelerating DNNs. This paper argues that the future of DNN hardware acceleration lies in hybrid approaches that combine the advantages of different architectures. Software advances such as improved compilers and synthesis tools will also play a critical role in making these techniques more accessible. By utilizing the appropriate hardware technology for a given task and continuing to innovate in both hardware and software, computer vision will make significant advances in performance, efficiency, and scalability. This hybrid approach is key to the future of DNN hardware acceleration, offering a path to overcome the limitations of any single type of hardware.
Dmytro Chumachenko, Kseniia Bazilevych, Mykola Butkevych
et al.
The spread of infectious diseases is significantly influenced by emergencies, particularly military conflicts, which disrupt healthcare systems and increase the risks of epidemics. The full-scale Russian invasion of Ukraine has exacerbated these challenges, causing environmental damage, mass displacement, and the breakdown of healthcare services, all of which contribute to the spread of infectious diseases. This study aims to develop a comprehensive methodology for assessing the impact of emergencies on the spread of infectious diseases, focusing on the full-scale invasion of Ukraine. The object of this study is to address epidemic threats posed by emergencies, particularly the increased spread of infectious diseases due to war-related disruptions. The subject of this study is methods and models of infectious disease transmission under conditions of emergencies, emphasizing the Russian full-scale invasion of Ukraine. The tasks of this study are to provide an analysis of the current state of research and develop a methodology for assessing the impact of emergencies on the spread of infectious diseases. The proposed methodology includes several key components. Comprehensive data from public health organizations includes infectious disease statistics, demographic shifts, healthcare disruptions, and environmental factors exacerbated by emergencies. Data preprocessing removes inconsistencies, standardization of formats, and normalization for population size differences. Machine learning models, including convolutional neural networks and recurrent neural networks, have been developed to simulate the spread of diseases based on demographic, environmental, and healthcare-related variables. Deep learning models analyze spatial and temporal patterns, whereas compartmental models such as SIR estimate changes in reproductive numbers (R₀ and Re). Additionally, models of excess mortality incorporate mixed effects to account for regional and time-based variations. The methodology incorporates real-time monitoring of epidemic threats using real-time data from multiple sources, enabling dynamic assessments of disease spread and facilitating predictive modeling. The models were trained on historical data and validated using cross-validation techniques to ensure robustness and reliability, with a specific focus on the pre- and post-invasion phases in Ukraine.Results: The study provides a comprehensive framework for collecting and processing data on infectious diseases and epidemic threats in emergencies. The proposed model introduces advanced machine learning and epidemiological models trained on pre- and post-invasion data to analyze disease transmission patterns and forecast future epidemic dynamics. Conclusion: The proposed methodology addresses current gaps in infectious disease during emergencies by integrating real-time data and machine learning techniques. This research improves decision-making in public health management and biosafety during crises, particularly in war-affected regions like Ukraine.
Fifth generation (5G) communication network is anticipated to satisfy the need for higher data rates, less latency, ubiquitous connectivity with minimum consumption of energy and acceptable quality of Service (QoS). The massive MIMO and heterogeneous network (HetNet) has evolved as a promising technology to address the aforementioned challenges. In the proposed framework, the massive MIMO technology at the macro base station and full duplex (FD) technology enabled at the small cell base station in HetNet are investigated. An optimal antenna selection strategy, user association (UA), and power allocation (PA) in a massive MIMO-based HetNet with FD-enabled small cells are proposed to optimize system sum rate. Initially, the objective function of the optimization problem is nonconvex and a mixed integer nonlinear programming problem. Then, the solution is obtained by transforming the problem into convex problem utilizing Lagrangian decomposition. Simulation results manifest the effect of the number of base station antennas, number of small cell base stations, self-interference cancellation factor, and channel state information on system sum rate. Moreover, it is evident that the propounded algorithm is effective in terms of optimizing the system sum rate when compared to the conventional maximum signal- to - interference - plus -noise - ratio (SINR) algorithm.
John Yang, Carlos E. Jimenez, Alexander Wettig
et al.
Language model (LM) agents are increasingly being used to automate complicated tasks in digital environments. Just as humans benefit from powerful software applications, such as integrated development environments, for complex tasks like software engineering, we posit that LM agents represent a new category of end users with their own needs and abilities, and would benefit from specially-built interfaces to the software they use. We investigate how interface design affects the performance of language model agents. As a result of this exploration, we introduce SWE-agent: a system that facilitates LM agents to autonomously use computers to solve software engineering tasks. SWE-agent's custom agent-computer interface (ACI) significantly enhances an agent's ability to create and edit code files, navigate entire repositories, and execute tests and other programs. We evaluate SWE-agent on SWE-bench and HumanEvalFix, achieving state-of-the-art performance on both with a pass@1 rate of 12.5% and 87.7%, respectively, far exceeding the previous state-of-the-art achieved with non-interactive LMs. Finally, we provide insight on how the design of the ACI can impact agents' behavior and performance.
Li-ion batteries are being widely used as power sources in a continuously increasing number of applications (from portable devices to electric vehicles and even more complex systems). Nonetheless these components are still characterized by serious concerns connected with their safety and stability, which often hinder their more widespread use. In particular, their operation is strictly dependent on their temperature which derives from the balance between the heat internally produced during operation and that dissipated towards the external environment. Beyond certain temperatures a thermal runaway can occur with possible dangerous events, such as fires and explosions.
In the present paper, 3D simulations have been carried out to investigate the cooling efficiency of an air flow, under different operating conditions, on a cylindrical Li-ion cell located in a whole battery pack. Under the investigated configurations, it was found that, beyond a minimum value of the passing air velocity, it is possible to keep the cell within safe conditions, thus preventing a thermal runaway.
Chemical engineering, Computer engineering. Computer hardware
LI Xuesong, ZHANG Qieshi, SONG Chengqun, KANG Yuhang, CHENG Jun
Trajectory prediction is a key technology in the fields of autonomous driving and intelligent transportation. The accurate prediction of trajectories for vehicles and moving pedestrians can improve the perception of environmental changes in autonomous driving systems,thereby ensuring overall safety.The data-driven trajectory prediction method accurately captures interaction characteristics between agents,analyzes the historical motion and static environment information of all agents within a scene,and predicts the agents' future trajectories.The mathematical models of trajectory prediction are introduced and categorized as traditional and data-driven trajectory prediction methods.The four main challenges faced by mainstream data-driven trajectory prediction methods include intelligent agent interaction modeling,motion behavior intention prediction,trajectory diversity prediction,and static environmental information fusion within a scene.Herein,starting from the use of trajectory prediction datasets,the performance evaluation indicators,model characteristics,and other aspects of typical data-driven trajectory prediction methods are analyzed and compared. On this basis,the solutions and application scenarios of the said methods to address the abovementioned challenges are summarized,and future development directions of trajectory prediction technology in autonomous driving are proposed.
Daniela P. Schacherer, Markus D. Herrmann, David A. Clunie
et al.
Background and Objectives: Reproducibility is a major challenge in developing machine learning (ML)-based solutions in computational pathology (CompPath). The NCI Imaging Data Commons (IDC) provides >120 cancer image collections according to the FAIR principles and is designed to be used with cloud ML services. Here, we explore its potential to facilitate reproducibility in CompPath research. Methods: Using the IDC, we implemented two experiments in which a representative ML-based method for classifying lung tumor tissue was trained and/or evaluated on different datasets. To assess reproducibility, the experiments were run multiple times with separate but identically configured instances of common ML services. Results: The AUC values of different runs of the same experiment were generally consistent. However, we observed small variations in AUC values of up to 0.045, indicating a practical limit to reproducibility. Conclusions: We conclude that the IDC facilitates approaching the reproducibility limit of CompPath research (i) by enabling researchers to reuse exactly the same datasets and (ii) by integrating with cloud ML services so that experiments can be run in identically configured computing environments.
The concept of the Metaverse has garnered growing interest from both academic and industry circles. The decentralization of both the integrity and security of digital items has spurred the popularity of play-to-earn (P2E) games, where players are entitled to earn and own digital assets which they may trade for physical-world currencies. However, these computationally-intensive games are hardly playable on resource-limited mobile devices and the computational tasks have to be offloaded to an edge server. Through mobile edge computing (MEC), users can upload data to the Metaverse Service Provider (MSP) edge servers for computing. Nevertheless, there is a trade-off between user-perceived in-game latency and user visual experience. The downlink transmission of lower-resolution videos lowers user-perceived latency while lowering the visual fidelity and consequently, earnings of users. In this paper, we design a method to enhance the Metaverse-based mobile augmented reality (MAR) in-game user experience. Specifically, we formulate and solve a multi-objective optimization problem. Given the inherent NP-hardness of the problem, we present a low-complexity algorithm to address it, mitigating the trade-off between delay and earnings. The experiment results show that our method can effectively balance the user-perceived latency and profitability, thus improving the performance of Metaverse-based MAR systems.
Investigation of crack propagation can sometimes be a crucial stage of engineering analysis. The T–element method presented in this work is a convenient tool to deal with it. In general, T-elements are the Trefftz-type finite elements, which can model both continuous material and local cracks or inclusions. The authors propose a special T-element in a form of a pentagon with shape functions analytically modelling the vicinity of the crack tip. This relatively large finite element can be surrounded by even larger standard T-elements. This enables easy modification of the rough element grid while investigating the crack propagation. Numerical examples proved that the "moving pentagon" concept enables easy automatic generation of the T-element mesh, which facilitates observation of crack propagation even in very complicated structures with many possible crack initiators occurring for example in material fatigue phenomena.
Computer engineering. Computer hardware, Mechanics of engineering. Applied mechanics
Tomislav Maric, Dennis Gläser, Jan-Patrick Lehr
et al.
University research groups in Computational Science and Engineering (CSE) generally lack dedicated funding and personnel for Research Software Engineering (RSE), which, combined with the pressure to maximize the number of scientific publications, shifts the focus away from sustainable research software development and reproducible results. The neglect of RSE in CSE at University research groups negatively impacts the scientific output: research data - including research software - related to a CSE publication cannot be found, reproduced, or re-used, different ideas are not combined easily into new ideas, and published methods must very often be re-implemented to be investigated further. This slows down CSE research significantly, resulting in considerable losses in time and, consequentially, public funding. We propose a RSE workflow for Computational Science and Engineering (CSE) that addresses these challenges, that improves the quality of research output in CSE. Our workflow applies established software engineering practices adapted for CSE: software testing, result visualization, and periodical cross-linking of software with reports/publications and data, timed by milestones in the scientific publication process. The workflow introduces minimal work overhead, crucial for university research groups, and delivers modular and tested software linked to publications whose results can easily be reproduced. We define research software quality from a perspective of a pragmatic researcher: the ability to quickly find the publication, data, and software related to a published research idea, quickly reproduce results, understand or re-use a CSE method, and finally extend the method with new research ideas.
Abhishek Moitra, Abhiroop Bhattacharjee, Runcong Kuang
et al.
SNNs are an active research domain towards energy efficient machine intelligence. Compared to conventional ANNs, SNNs use temporal spike data and bio-plausible neuronal activation functions such as Leaky-Integrate Fire/Integrate Fire (LIF/IF) for data processing. However, SNNs incur significant dot-product operations causing high memory and computation overhead in standard von-Neumann computing platforms. Today, In-Memory Computing (IMC) architectures have been proposed to alleviate the "memory-wall bottleneck" prevalent in von-Neumann architectures. Although recent works have proposed IMC-based SNN hardware accelerators, the following have been overlooked- 1) the adverse effects of crossbar non-ideality on SNN performance due to repeated analog dot-product operations over multiple time-steps, 2) hardware overheads of essential SNN-specific components such as the LIF/IF and data communication modules. To this end, we propose SpikeSim, a tool that can perform realistic performance, energy, latency and area evaluation of IMC-mapped SNNs. SpikeSim consists of a practical monolithic IMC architecture called SpikeFlow for mapping SNNs. Additionally, the non-ideality computation engine (NICE) and energy-latency-area (ELA) engine performs hardware-realistic evaluation of SpikeFlow-mapped SNNs. Based on 65nm CMOS implementation and experiments on CIFAR10, CIFAR100 and TinyImagenet datasets, we find that the LIF/IF neuronal module has significant area contribution (>11% of the total hardware area). We propose SNN topological modifications leading to 1.24x and 10x reduction in the neuronal module's area and the overall energy-delay-product value, respectively. Furthermore, in this work, we perform a holistic comparison between IMC implemented ANN and SNNs and conclude that lower number of time-steps are the key to achieve higher throughput and energy-efficiency for SNNs compared to 4-bit ANNs.