Johannes Hentschel, Yannis Rammos, Markus Neuwirth
et al.
Abstract The present corpus is the outcome of a long-term collaborative effort to produce analytically annotated music scores suitable for the computer-assisted study of European compositions since 1600. With 1283 analytically annotated, symbolically encoded music scores by 36 composers, our corpus amounts to one of the largest published resources of its kind. At the same time, it provides a modular digital infrastructure for the accountable, collaborative curation of annotated scores (“sheet music”). All annotations were created and reviewed by a team of trained music theorists, who collaborated online using the git version control software according to a formally codified workflow. To improve the consistency of analytical practices given the diversity of represented eras and genres, the corpus has been automatically parsed for notational well-formedness and cross-reviewed by annotators for adherence to our music-analytical guidelines. The computational infrastructure has been designed with “data persistence” and open access in mind.
Aspect Term Extraction (ATE) is a critical task in aspect-level sentiment analysis, and extraction and annotation costs are extremely high. When training and testing samples come from different domains, the performance of traditional methods often degrades significantly owing to the differences between the two samples. Existing methods focus on domain adaptation techniques based on rich semantic information within local contexts to achieve cross-domain ATE. However, they overlook the potential global long-range dependency relationships of aspect terms within the text, thereby limiting the performance, scalability, and robustness of the models. To address these issues, this study proposes a cross-domain ATE model known as CBiLSTM, which does not require additional manual labeling and integrates global and local semantic information. The model leverages semantic information as a pivot and first incorporates external semantic information into word embeddings to construct pivot information for both the source and target domains. It then performs parallel encoding of the global and local contextual semantic information, thereby better capturing comprehensive semantic features and bridging the gap between the source and target domains to achieve cross-domain ATE. CBiLSTM achieves an average F1-score of 53.87%, outperforming the current state-of-the-art model by 0.49 percentage points, on three benchmark datasets. Experimental results demonstrate the superior performance and lower computational cost of CBiLSTM.
Additive Manufacturing (AM) revolutionizes the industrial sector by producing complex, customized parts. With Industry 4.0, machine learning (ML) has become a vital tool for enhancing 3D printing processes. This paper investigates the integration of ML in various stages of additive manufacturing, including design optimization, material property prediction, and cloud-based solutions. The research aims to improve efficiency and quality, and reduce production costs by integrating ML in design optimization, material property prediction, and cloud-based manufacturing solutions. Key methodologies include supervised and unsupervised learning algorithms for defect detection, generative design, and process parameter optimization. ML-driven approaches have led to significant advancements in predictive maintenance and adaptive manufacturing. However, challenges like data scarcity, model interpretability, and computational complexity persist. The paper talks about possible solutions and future research directions for machine learning in additive manufacturing. It emphasises how it could change 3D printing technologies and industrial uses. This paper reviews the role of ML in 3D printing with a focus on process optimization, quality control, and cloud-based services. Key challenges, including data limitations, real-time monitoring, and model accuracy, are examined to provide insights into future research directions.
Abstract Steganography aims to embed and extract secret information in digital media for enhancing information security, which is widely applied to covert communication, copyright and privacy protection, digital forensics, etc. To resist steganalysis detection, generative steganography is one of the most promising techniques with embedding secret information into a generated image. Although existing generative steganographic methods could perform well with low hiding capacity, most of them encode the secret information in non-distribution-preserving manners, leading to poor security performance against steganalyzers when hiding more secret information. Meanwhile, the secret information tends to be difficult to be extracted with these methods because the secret-to-image transformations are irreversible. To tackle these issues, in this paper, we propose a reversible generative steganography with distribution-preserving scheme, which is mainly composed of a secret message mapping strategy with distribution-preserving and a reversible Glow model. To improve the anti-detectability against steganalyzers, the message mapping strategy with distribution-preserving is customized to encode the secret information into latent vectors which follow the Gaussian distribution as they are usually done in typical image generation models. The Glow model is then trained with reversible transformation to map the latent vectors into the generated stego-images with information hiding. Owing to the distribution-preserving and reversibility of the message mapping and Glow model, the proposed generative steganographic method achieves superior security performance and accurate extraction of secret message. Extensive experimental results demonstrate that the proposed method outperforms several state-of-the-art methods in terms of information extraction accuracy and anti-detectability, especially for high hiding capacity (up to 4.0 bpp).
The paper entitled "Qualitative Methods in Empirical Studies of Software Engineering" by Carolyn Seaman was published in TSE in 1999. It has been chosen as one of the most influential papers from the third decade of TSE's 50 years history. In this retrospective, the authors discuss the evolution of the use of qualitative methods in software engineering research, the impact it's had on research and practice, and reflections on what is coming and deserves attention.
Applications of Large Language Models (LLMs) are rapidly growing in industry and academia for various software engineering (SE) tasks. As these models become more integral to critical processes, ensuring their reliability and trustworthiness becomes essential. Consequently, the concept of trust in these systems is becoming increasingly critical. Well-calibrated trust is important, as excessive trust can lead to security vulnerabilities, and risks, while insufficient trust can hinder innovation. However, the landscape of trust-related concepts in LLMs in SE is relatively unclear, with concepts such as trust, distrust, and trustworthiness lacking clear conceptualizations in the SE community. To bring clarity to the current research status and identify opportunities for future work, we conducted a comprehensive review of $88$ papers: a systematic literature review of $18$ papers focused on LLMs in SE, complemented by an analysis of 70 papers from broader trust literature. Additionally, we conducted a survey study with 25 domain experts to gain insights into practitioners' understanding of trust and identify gaps between existing literature and developers' perceptions. The result of our analysis serves as a roadmap that covers trust-related concepts in LLMs in SE and highlights areas for future exploration.
From its first adoption in the late 80s, qualitative research has slowly but steadily made a name for itself in what was, and perhaps still is, the predominantly quantitative software engineering (SE) research landscape. As part of our regular column on empirical software engineering (ACM SIGSOFT SEN-ESE), we reflect on the state of qualitative SE research with a focus group of experts. Among other things, we discuss why qualitative SE research is important, how it evolved over time, common impediments faced while practicing it today, and what the future of qualitative SE research might look like. Joining the conversation are Rashina Hoda (Monash University, Australia), Carolyn Seaman (University of Maryland, United States), and Klaas Stol (University College Cork, Ireland). The content of this paper is a faithful account of our conversation from October 25, 2025, which we moderated and edited for our column.
Web3 applications, built on blockchain technology, manage billions of dollars in digital assets through decentralized applications (dApps) and smart contracts. These systems rely on complex, software supply chains that introduce significant security vulnerabilities. This paper examines the software supply chain security challenges unique to the Web3 ecosystem, where traditional Web2 software supply chain problems intersect with the immutable and high-stakes nature of blockchain technology. We analyze the threat landscape and propose mitigation strategies to strengthen the security posture of Web3 systems.
Mojtaba Nedaei, Abolfazl Keykhah, Borzo Kamary
et al.
Population growth worldwide in recent decades has increased the demand for power. Geothermal energy provides a reliable and stable reservoir for power generation. This paper proposes an integration of single-flash geothermal with a dual-evaporation organic Rankine cycle (D-ORC) to generate power. The system’s performance is estimated via thermodynamic and thermoeconomic analyses. Five different zeotropic mixtures are considered the D-ORC working fluid, and their performance is compared at the optimum state. Perfluoropentane/butene presents the best performance indexes and is considered the D-ORC’s working fluid. Hence, the proposed system provides 7992.29 kW of net power with 62.42% exergetic efficiency. Also, the exergoeconomic performance indicates that the net present value and payback period are about 10.85 million dollars and 3.47 years, respectively. Also, the net present value of the proposed system is estimated for the four electricity sale and geofluid prices and reveals that the product sale costs influence the system’s economic performance more than the purchase cost. The exergy destruction distribution in the employed components is shown as the Grassmann diagram. The steam turbine has the highest exergy destruction of about 996 kW, and the first expansion valve with 714 kW of exergy destruction is the next one. Also, the condensers contain considerable exergy destruction, about 26.98% of total exergy destruction.
Molecular circuits and devices with temporal signal processing capability are of great significance for the analysis of complex biological processes. Mapping temporal inputs to binary messages is a process of history-dependent signal responses, which can help understand the signal-processing behavior of organisms. Here, we propose a DNA temporal logic circuit based on DNA strand displacement reactions, which can map temporally ordered inputs to corresponding binary message outputs. The presence or absence of the output signal is determined by the type of substrate reaction with the input so that different orders of inputs correspond to different binary outputs. We demonstrate that a circuit can be generalized to more complex temporal logic circuits by increasing or decreasing the number of substrates or inputs. We also show that our circuit had excellent responsiveness to temporally ordered inputs, flexibility, and expansibility in the case of symmetrically encrypted communications. We envision that our scheme can provide some new ideas for future molecular encryption, information processing, and neural networks.
In recent years,graph self-supervised learning represented by graph contrastive learning has become a hot research to-pic in the field of graph learning.This learning paradigm does not depend on node labels and has good generalization ability.However,most of the existing graph self-supervised learning methods use static graph structures to design learning tasks,such as learning node-level or graph-level representations based on structural contrast,without considering the dynamic information of graph over time.To address this problem,the paper proposes a self-supervised dynamic graph representation learning method based on contrastive prediction(DGCP),which utilizes a contrastive loss inducing the embedding space to capture the most useful information for predicting future graph structures.Firstly,each temporal snapshot graph is encoded using a graph neural network to obtain its corresponding node representation matrix.Then,an autoregressive model is used to predict node representations in the next temporal snapshot graph.Finally,the model is trained end-to-end by using the contrastive loss and sliding window me-chanism.Experimental results on real graph datasets show that DGCP outperforms baseline methods on the link prediction task.
One of the most time-consuming tasks for developers is the comprehension of new code bases. An effective approach to aid this process is to label source code files with meaningful annotations, which can help developers understand the content and functionality of a code base quicker. However, most existing solutions for code annotation focus on project-level classification: manually labelling individual files is time-consuming, error-prone and hard to scale. The work presented in this paper aims to automate the annotation of files by leveraging project-level labels; and using the file-level annotations to annotate items at larger levels of granularity, for example, packages and a whole project. We propose a novel approach to annotate source code files using a weak labelling approach and a subsequent hierarchical aggregation. We investigate whether this approach is effective in achieving multi-granular annotations of software projects, which can aid developers in understanding the content and functionalities of a code base more quickly. Our evaluation uses a combination of human assessment and automated metrics to evaluate the annotations' quality. Our approach correctly annotated 50% of files and more than 50\% of packages. Moreover, the information captured at the file-level allowed us to identify, on average, three new relevant labels for any given project. We can conclude that the proposed approach is a convenient and promising way to generate noisy (not precise) annotations for files. Furthermore, hierarchical aggregation effectively preserves the information captured at file-level, and it can be propagated to packages and the overall project itself.
JIANG Yang-yang, SONG Li-hua, XING Chang-you, ZHANG Guo-min, ZENG Qing-wei
As a typical deception defense means,honeypot technology is of great significance in actively trapping attackers.The existing design methods mainly optimize the trapping decision of honeypot through the game model,ignoring the impact of the attacker's belief on the game decision of both sides.There are some shortcomings,such as weak adaptive optimization decision-making ability,easy to be seen through and used by the attacker and so on.Therefore,a belief based honeypot game mechanism(BHGM) is proposed.Based on the multi round game process of attacker completing the task,BHGM focuses on the impact of honeypot action on attacker's belief and the impact of belief on whether the attacker continues to attack.At the same time,a belief driven algorithm for solving the optimal attack and defense strategy is designed based on the upper confidence bound apply to tree(UCT).Simulation results show that the belief driven attacker strategy can choose to continue the attack or stop the loss in time based on the current belief to obtain the maximum profit,while the belief driven honeypot strategy can reduce attacker's suspicion as much as possible to lure him to continue the attack and obtain greater profit.
Traditional target detection algorithms have difficulty to adapt complex environmental changes and have limited applicable scenarios. However, the deep-learning-based target detection model can automatically learn with strong generalization capability. In this article, we choose a single-stage deep-learning-based target detection model for research based on the model’s real-time processing requirements and to improve the accuracy and the robustness of target detection in remote sensing images. In addition, we improve the YOLOv4 network and present a new approach. First, we propose a classification setting of the nonmaximum suppression threshold to increase the accuracy without affecting the speed. Second, we study the anchor frame allocation problem in YOLOv4 and propose two allocation schemes. The proposed anchor frame scheme also improves the detection performance, and experimental results on the DOTA dataset validate their effectiveness.
Skin cancer these days have become quite a common occurrence especially in certain geographic areas such as Oceania. Early detection of such cancer with high accuracy is of utmost importance, and studies have shown that deep learning- based intelligent approaches to address this concern have been fruitful. In this research, we present a novel deep learning- based classifier that has shown promise in classifying this type of cancer on a relevant preprocessed dataset having important features pre-identified through an effective feature extraction method.Skin cancer in modern times has become one of the most ubiquitous types of cancer. Accurate identification of cancerous skin lesions is of vital importance in treating this malady. In this research, we employed a deep learning approach to identify benign and malignant skin lesions. The initial dataset was obtained from Kaggle before several preprocessing steps for hair and background removal, image enhancement, selection of the region of interest (ROI), region-based segmentation, morphological gradient, and feature extraction were performed, resulting in histopathological images data with 20 input features based on geometrical and textural features. A principle component analysis (PCA)-based feature extraction technique was put into action to reduce the dimensionality to 10 input features. Subsequently, we applied our deep learning classifier, SkinNet-16, to detect the cancerous lesion accurately at a very early stage. The highest accuracy was obtained with the Adamax optimizer with a learning rate of 0.006 from the neural network-based model developed in this study. The model also delivered an impressive accuracy of approximately 99.19%.
Neoplasms. Tumors. Oncology. Including cancer and carcinogens
Muhammad Rizal Rifa'i, Wahyudin Wahyudin, Billy Nugraha
One of the problems of lecture assignments based on drawing is drawing technique. The function of the learning software is to be the basic media of the teacher in explaining the material. This research aims to analyze the feasibility of learning software, namely computer aided design. Research techniques applied in this research are through reference sources from previous studies. The feasibility test method uses validity techniques that already have parameters in determining the feasibility of the test. Results from this study compared to the results of analysis from previous studies showed 78.6% (can be categorized as feasible). In addition, computer aided design learning software in 3D mini ragum has a validation result of 69% (can be categorized as feasible). Then it can be concluded that the learning software used today can support the delivery of materials that will be given.
Background. The software architecture recovery method RELAX produces a concern-based architectural view of a software system graphically and textually from that system's source code. The method has been implemented in software which can be run on subject systems whose source code is written in Java. Aims. Our aim was to find out whether the availability of architectural views produced by RELAX can help maintainers who are new to a project in becoming productive with development tasks sooner, and find out how they felt about working in such an environment. Method. We conducted a user study with nine participants. They were subjected to a controlled experiment in which maintenance success and speed with and without access to RELAX recovery results were compared to each other. Results. We have observed that employing architecture views produced by RELAX helped participants reduce time to get started on maintenance tasks by a factor of 5.38 or more. While most participants were unable to finish their tasks within the allotted time when they did not have recovery results available, all of them finished them successfully when they did. Additionally, participants reported that these views were easy to understand, helped them to learn the system's structure and enabled them to compare different versions of the system. Conclusions. In the speedup experienced to the start of maintenance experienced by the participants as well as in their experience-based opinions, RELAX has shown itself to be a valuable help that could form the basis for further tools that specifically support the development process with a focus on maintenance.
The widespread adoption of Free/Libre and Open Source Software (FLOSS) means that the ongoing maintenance of many widely used software components relies on the collaborative effort of volunteers who set their own priorities and choose their own tasks. We argue that this has created a new form of risk that we call 'underproduction' which occurs when the supply of software engineering labor becomes out of alignment with the demand of people who rely on the software produced. We present a conceptual framework for identifying relative underproduction in software as well as a statistical method for applying our framework to a comprehensive dataset from the Debian GNU/Linux distribution that includes 21,902 source packages and the full history of 461,656 bugs. We draw on this application to present two experiments: (1) a demonstration of how our technique can be used to identify at-risk software packages in a large FLOSS repository and (2) a validation of these results using an alternate indicator of package risk. Our analysis demonstrates both the utility of our approach and reveals the existence of widespread underproduction in a range of widely-installed software components in Debian.