Hasil "Norwegian literature"

arXiv Open Access 2026

Knowledge Graph Extraction from Biomedical Literature for Alkaptonuria Rare Disease

Giang Pham, Rebecca Finetti, Caterina Graziani et al.

Alkaptonuria (AKU) is an ultra-rare autosomal recessive metabolic disorder caused by mutations in the HGD (Homogentisate 1,2-Dioxygenase) gene, leading to a pathological accumulation of homogentisic acid (HGA) in body fluids and tissues. This leads to systemic manifestations, including premature spondyloarthropathy, renal and prostatic stones, and cardiovascular complications. Being ultra-rare, the amount of data related to the disease is limited, both in terms of clinical data and literature. Knowledge graphs (KGs) can help connect the limited knowledge about the disease (basic mechanisms, manifestations and existing therapies) with other knowledge; however, AKU is frequently underrepresented or entirely absent in existing biomedical KGs. In this work, we apply a text-mining methodology based on PubTator3 for large-scale extraction of biomedical relations. We construct two KGs of different sizes, validate them using existing biochemical knowledge and use them to extract genes, diseases and therapies possibly related to AKU. This computational framework reveals the systemic interactions of the disease, its comorbidities, and potential therapeutic targets, demonstrating the efficacy of our approach in analyzing rare metabolic disorders.

en cs.AI, cs.IR

Detail Sumber

arXiv Open Access 2026

SciDef: Automating Definition Extraction from Academic Literature with Large Language Models

Filip Kučera, Christoph Mandl, Isao Echizen et al.

Definitions are the foundation for any scientific work, but with a significant increase in publication numbers, gathering definitions relevant to any keyword has become challenging. We therefore introduce SciDef, an LLM-based pipeline for automated definition extraction. We test SciDef on DefExtra & DefSim, novel datasets of human-extracted definitions and definition-pairs' similarity, respectively. Evaluating 16 language models across prompting strategies, we demonstrate that multi-step and DSPy-optimized prompting improve extraction performance. To evaluate extraction, we test various metrics and show that an NLI-based method yields the most reliable results. We show that LLMs are largely able to extract definitions from scientific literature (86.4% of definitions from our test-set); yet future work should focus not just on finding definitions, but on identifying relevant ones, as models tend to over-generate them. Code & datasets are available at https://github.com/Media-Bias-Group/SciDef.

en cs.IR, cs.CL

Detail Sumber

arXiv Open Access 2026

The State of Generative AI in Software Development: Insights from Literature and a Developer Survey

Vincent Gurgul, Robin Gubela, Stefan Lessmann

Generative Artificial Intelligence (GenAI) rapidly transforms software engineering, yet existing research remains fragmented across individual tasks in the Software Development Lifecycle. This study integrates a systematic literature review with a survey of 65 software developers. The results show that GenAI exerts its highest impact in design, implementation, testing, and documentation, where over 70 % of developers report at least halving the time for boilerplate and documentation tasks. 79 % of survey respondents use GenAI daily, preferring browser-based Large Language Models over alternatives integrated directly in their development environment. Governance is maturing, with two-thirds of organizations maintaining formal or informal guidelines. In contrast, early SDLC phases such as planning and requirements analysis show markedly lower reported benefits. In a nutshell, GenAI shifts value creation from routine coding toward specification quality, architectural reasoning, and oversight, while risks such as uncritical adoption, skill erosion, and technical debt require robust governance and human-in-the-loop mechanisms.

en cs.SE, cs.AI

Detail Sumber

arXiv Open Access 2025

Machine learning for fraud detection in digital banking: a systematic literature review REVIEW

Md Zahin Hossain George, Md Khorshed Alam, Md Tarek Hasan

This systematic literature review examines the role of machine learning in fraud detection within digital banking, synthesizing evidence from 118 peer-reviewed studies and institutional reports. Following the PRISMA guidelines, the review applied a structured identification, screening, eligibility, and inclusion process to ensure methodological rigor and transparency. The findings reveal that supervised learning methods, such as decision trees, logistic regression, and support vector machines, remain the dominant paradigm due to their interpretability and established performance, while unsupervised anomaly detection approaches are increasingly adopted to address novel fraud patterns in highly imbalanced datasets. Deep learning architectures, particularly recurrent and convolutional neural networks, have emerged as transformative tools capable of modeling sequential transaction data and detecting complex fraud typologies, though challenges of interpretability and real-time deployment persist. Hybrid models that combine supervised, unsupervised, and deep learning strategies demonstrate superior adaptability and detection accuracy, highlighting their potential as convergent solutions.

en cs.LG

Detail DOI Sumber

arXiv Open Access 2025

Scalability and Maintainability Challenges and Solutions in Machine Learning: Systematic Literature Review

Karthik Shivashankar, Ghadi S. Al Hajj, Antonio Martini

This systematic literature review examines the critical challenges and solutions related to scalability and maintainability in Machine Learning (ML) systems. As ML applications become increasingly complex and widespread across industries, the need to balance system scalability with long-term maintainability has emerged as a significant concern. This review synthesizes current research and practices addressing these dual challenges across the entire ML life-cycle, from data engineering to model deployment in production. We analyzed 124 papers to identify and categorize 41 maintainability challenges and 13 scalability challenges, along with their corresponding solutions. Our findings reveal intricate inter dependencies between scalability and maintainability, where improvements in one often impact the other. The review is structured around six primary research questions, examining maintainability and scalability challenges in data engineering, model engineering, and ML system development. We explore how these challenges manifest differently across various stages of the ML life-cycle. This comprehensive overview offers valuable insights for both researchers and practitioners in the field of ML systems. It aims to guide future research directions, inform best practices, and contribute to the development of more robust, efficient, and sustainable ML applications across various domains.

en cs.SE, cs.LG

Detail Sumber

arXiv Open Access 2025

Large Language Models for Unit Testing: A Systematic Literature Review

Quanjun Zhang, Chunrong Fang, Siqi Gu et al.

Unit testing is a fundamental practice in modern software engineering, with the aim of ensuring the correctness, maintainability, and reliability of individual software components. Very recently, with the advances in Large Language Models (LLMs), a rapidly growing body of research has leveraged LLMs to automate various unit testing tasks, demonstrating remarkable performance and significantly reducing manual effort. However, due to ongoing explorations in the LLM-based unit testing field, it is challenging for researchers to understand existing achievements, open challenges, and future opportunities. This paper presents the first systematic literature review on the application of LLMs in unit testing until March 2025. We analyze \numpaper{} relevant papers from the perspectives of both unit testing and LLMs. We first categorize existing unit testing tasks that benefit from LLMs, e.g., test generation and oracle generation. We then discuss several critical aspects of integrating LLMs into unit testing research, including model usage, adaptation strategies, and hybrid approaches. We further summarize key challenges that remain unresolved and outline promising directions to guide future research in this area. Overall, our paper provides a systematic overview of the research landscape to the unit testing community, helping researchers gain a comprehensive understanding of achievements and promote future research. Our artifacts are publicly available at the GitHub repository: https://github.com/iSEngLab/AwesomeLLM4UT.

en cs.SE

Detail Sumber

arXiv Open Access 2025

Toward Autonomous Engineering Design: A Knowledge-Guided Multi-Agent Framework

Varun Kumar, George Em Karniadakis

The engineering design process often demands expertise from multiple domains, leading to complex collaborations and iterative refinements. Traditional methods can be resource-intensive and prone to inefficiencies. To address this, we formalize the engineering design process through a multi-agent AI framework that integrates structured design and review loops. The framework introduces specialized knowledge-driven agents that collaborate to generate and refine design candidates. As an exemplar, we demonstrate its application to the aerodynamic optimization of 4-digit NACA airfoils. The framework consists of three key AI agents: a Graph Ontologist, a Design Engineer, and a Systems Engineer. The Graph Ontologist employs a Large Language Model (LLM) to construct two domain-specific knowledge graphs from airfoil design literature. The Systems Engineer, informed by a human manager, formulates technical requirements that guide design generation and evaluation. The Design Engineer leverages the design knowledge graph and computational tools to propose candidate airfoils meeting these requirements. The Systems Engineer reviews and provides feedback both qualitative and quantitative using its own knowledge graph, forming an iterative feedback loop until a design is validated by the manager. The final design is then optimized to maximize performance metrics such as the lift-to-drag ratio. Overall, this work demonstrates how collaborative AI agents equipped with structured knowledge representations can enhance efficiency, consistency, and quality in the engineering design process.

en cs.AI, cs.LG

Detail Sumber

arXiv Open Access 2025

LLM-as-a-Judge for Software Engineering: Literature Review, Vision, and the Road Ahead

Junda He, Jieke Shi, Terry Yue Zhuo et al.

The rapid integration of Large Language Models (LLMs) into software engineering (SE) has revolutionized tasks like code generation, producing a massive volume of software artifacts. This surge has exposed a critical bottleneck: the lack of scalable, reliable methods to evaluate these outputs. Human evaluation is costly and time-consuming, while traditional automated metrics like BLEU fail to capture nuanced quality aspects. In response, the LLM-as-a-Judge paradigm - using LLMs for automated evaluation - has emerged. This approach leverages the advanced reasoning of LLMs, offering a path toward human-like nuance at automated scale. However, LLM-as-a-Judge research in SE is still in its early stages. This forward-looking SE 2030 paper aims to steer the community toward advancing LLM-as-a-Judge for evaluating LLM-generated software artifacts. We provide a literature review of existing SE studies, analyze their limitations, identify key research gaps, and outline a detailed roadmap. We envision these frameworks as reliable, robust, and scalable human surrogates capable of consistent, multi-faceted artifact evaluation by 2030. Our work aims to foster research and adoption of LLM-as-a-Judge frameworks, ultimately improving the scalability of software artifact evaluation.

en cs.SE

Detail Sumber

arXiv Open Access 2023

Large Language Models for Software Engineering: A Systematic Literature Review

Xinyi Hou, Yanjie Zhao, Yue Liu et al.

Large Language Models (LLMs) have significantly impacted numerous domains, including Software Engineering (SE). Many recent publications have explored LLMs applied to various SE tasks. Nevertheless, a comprehensive understanding of the application, effects, and possible limitations of LLMs on SE is still in its early stages. To bridge this gap, we conducted a systematic literature review (SLR) on LLM4SE, with a particular focus on understanding how LLMs can be exploited to optimize processes and outcomes. We select and analyze 395 research papers from January 2017 to January 2024 to answer four key research questions (RQs). In RQ1, we categorize different LLMs that have been employed in SE tasks, characterizing their distinctive features and uses. In RQ2, we analyze the methods used in data collection, preprocessing, and application, highlighting the role of well-curated datasets for successful LLM for SE implementation. RQ3 investigates the strategies employed to optimize and evaluate the performance of LLMs in SE. Finally, RQ4 examines the specific SE tasks where LLMs have shown success to date, illustrating their practical contributions to the field. From the answers to these RQs, we discuss the current state-of-the-art and trends, identifying gaps in existing research, and flagging promising areas for future study. Our artifacts are publicly available at https://github.com/xinyi-hou/LLM4SE_SLR.

en cs.SE, cs.AI

Detail Sumber

arXiv Open Access 2023

Agent-based Learning of Materials Datasets from Scientific Literature

Mehrad Ansari, Seyed Mohamad Moosavi

Advancements in machine learning and artificial intelligence are transforming materials discovery. Yet, the availability of structured experimental data remains a bottleneck. The vast corpus of scientific literature presents a valuable and rich resource of such data. However, manual dataset creation from these resources is challenging due to issues in maintaining quality and consistency, scalability limitations, and the risk of human error and bias. Therefore, in this work, we develop a chemist AI agent, powered by large language models (LLMs), to overcome these challenges by autonomously creating structured datasets from natural language text, ranging from sentences and paragraphs to extensive scientific research articles. Our chemist AI agent, Eunomia, can plan and execute actions by leveraging the existing knowledge from decades of scientific research articles, scientists, the Internet and other tools altogether. We benchmark the performance of our approach in three different information extraction tasks with various levels of complexity, including solid-state impurity doping, metal-organic framework (MOF) chemical formula, and property relations. Our results demonstrate that our zero-shot agent, with the appropriate tools, is capable of attaining performance that is either superior or comparable to the state-of-the-art fine-tuned materials information extraction methods. This approach simplifies compilation of machine learning-ready datasets for various materials discovery applications, and significantly ease the accessibility of advanced natural language processing tools for novice users in natural language. The methodology in this work is developed as an open-source software on https://github.com/AI4ChemS/Eunomia.

en cs.AI

Detail Sumber

arXiv Open Access 2023

A systematic literature review on solution approaches for the index tracking problem in the last decade

Julio Cezar Soares Silva, Adiel Teixeira de Almeida Filho

The passive management approach offers conservative investors a way to reduce risk concerning the market. This investment strategy aims at replicating a specific index, such as the NASDAQ Composite or the FTSE100 index. The problem is that buying all the index's assets incurs high rebalancing costs, and this harms future returns. The index tracking problem concerns building a portfolio that follows a specific benchmark with fewer transaction costs. Since a subset of assets is required to solve the index problem this class of problems is NP-hard, and in the past years, researchers have been studying solution approaches to obtain tracking portfolios more practically. This work brings an analysis, spanning the last decade, of the advances in mathematical approaches for index tracking. The systematic literature review covered important issues, such as the most relevant research areas, solution methods, and model structures. Special attention was given to the exploration and analysis of metaheuristics applied to the index tracking problem.

en q-fin.PM, cs.CE

Detail DOI Sumber

arXiv Open Access 2022

QoS-based Packet Scheduling Algorithms for Heterogeneous LTE-Advanced Networks, Concepts and a Literature Survey

Najem N Sirhan, Manel Martinez-Ramon

The number of LTE users and their applications has increased significantly in the last decade, which increased the demand on the mobile network. LTE-Advanced comes with many features that can support this increasing demand. LTE-Advanced supports Heterogeneous Networks deployment, in which it consists of a mix of macro-cells, remote radio heads, and low power nodes such as Pico-cells, and Femto-cells. Embedding this mix of base-stations in a macro-cellular network allows for achieving significant gains in coverage, throughput and system capacity compared to the use of macro-cells only. These base-stations can operate on the same wireless channel as the macro-cellular network, which will provide higher spatial reuse via cell splitting. Also, it allows network operators to support higher data traffic by offloading it to smaller cells, such as Femto-cells. Hence, it enables network operators to provide their growing number of users with the required Quality of Service that meets with their service demands. In-order for the network operators to make the best out of the heterogeneous LTE-Advanced network, they need to use QoS-based packet scheduling algorithms that can efficiently manage the spectrum resources in the heterogeneous deployment. In this paper, we survey Quality of Service based packet scheduling algorithms that were proposed in the literature for the use of packet scheduling in Heterogeneous LTE-Advanced Networks. We start by explaining the concepts of QoS in LTE, heterogeneous LTE-Advanced networks, and how traffic is classified within a packet scheduling architecture for heterogeneous LTE-Advanced networks. Then, by summarising the proposed QoS-based packet scheduling algorithms in the literature for heterogeneous LTE-Advanced Networks, and for Femtocells LTE-Advanced Networks. And finally, we provide some concluding remarks in the last section.

en cs.NI

Detail DOI Sumber

arXiv Open Access 2022

Computational analyses of the topics, sentiments, literariness, creativity and beauty of texts in a large Corpus of English Literature

Arthur M. Jacobs, Annette Kinder

The Gutenberg Literary English Corpus (GLEC, Jacobs, 2018a) provides a rich source of textual data for research in digital humanities, computational linguistics or neurocognitive poetics. In this study we address differences among the different literature categories in GLEC, as well as differences between authors. We report the results of three studies providing i) topic and sentiment analyses for six text categories of GLEC (i.e., children and youth, essays, novels, plays, poems, stories) and its >100 authors, ii) novel measures of semantic complexity as indices of the literariness, creativity and book beauty of the works in GLEC (e.g., Jane Austen's six novels), and iii) two experiments on text classification and authorship recognition using novel features of semantic complexity. The data on two novel measures estimating a text's literariness, intratextual variance and stepwise distance (van Cranenburgh et al., 2019) revealed that plays are the most literary texts in GLEC, followed by poems and novels. Computation of a novel index of text creativity (Gray et al., 2016) revealed poems and plays as the most creative categories with the most creative authors all being poets (Milton, Pope, Keats, Byron, or Wordsworth). We also computed a novel index of perceived beauty of verbal art (Kintsch, 2012) for the works in GLEC and predict that Emma is the theoretically most beautiful of Austen's novels. Finally, we demonstrate that these novel measures of semantic complexity are important features for text classification and authorship recognition with overall predictive accuracies in the range of .75 to .97. Our data pave the way for future computational and empirical studies of literature or experiments in reading psychology and offer multiple baselines and benchmarks for analysing and validating other book corpora.

en cs.CL

Detail Sumber

arXiv Open Access 2022

Blockchain and Cryptocurrency in Human Computer Interaction: A Systematic Literature Review and Research Agenda

Michael Fröhlich, Franz Waltenberger, Ludwig Trotter et al.

We present a systematic literature review of cryptocurrency and blockchain research in Human-Computer Interaction (HCI) published between 2014 and 2021. We aim to provide an overview of the field, consolidate existing knowledge, and chart paths for future research. Our analysis of 99 articles identifies six major themes: (1) the role of trust, (2) understanding motivation, risk, and perception of cryptocurrencies, (3) cryptocurrency wallets, (4) engaging users with blockchain, (5) using blockchain for application-specific use cases, and (6) support tools for blockchain. We discuss the focus of the existing research body and juxtapose it to the changing landscape of emerging blockchain technologies to highlight future research avenues for HCI and interaction design. With this review, we identify key aspects where interaction design is critical for the adoption of blockchain systems. Doing so, we provide a starting point for new scholars and designers and help them position future contributions.

en cs.HC

Detail DOI Sumber

arXiv Open Access 2021

Line bundles on perfectoid covers: case of good reduction

Ben Heuer

We study Picard groups and Picard functors of perfectoid spaces which are limits of rigid spaces. For sufficiently large covers that are limits of rigid spaces of good reduction, we show that the Picard functor can be represented by the special fibre. We use our results to answer several open questions about Picard groups of perfectoid spaces from the literature, for example we show that these are not always $p$-divisible. Along the way, we construct a "Hodge--Tate spectral sequence for $\mathbb G_m$" of independent interest.

en math.AG, math.NT

Detail Sumber

arXiv Open Access 2021

Literature Review of the Pioneering Approaches in Cloud-based Search Engines Powered by LETOR Techniques

Gizem Gezici

Search engines play an essential role in our daily lives. Nonetheless, they are also very crucial in enterprise domain to access documents from various information sources. Since traditional search systems index the documents mainly by looking at the frequency of the occurring words in these documents, they are barely able to support natural language search, but rather keyword search. It seems that keyword based search will not be sufficient for enterprise data which is growing extremely fast. Thus, enterprise search becomes increasingly critical in corporate domain. In this report, we present an overview of the state-of-the-art technologies in literature for three main purposes: i) to increase the retrieval performance of a search engine, ii) to deploy a search platform to a cloud environment, and iii) to select the best terms in expanding queries for achieving even a higher retrieval performance as well as to provide good query suggestions to its users for a better user experience.

en cs.IR

Detail Sumber

arXiv Open Access 2021

A systematic literature review on state-of-the-art deep learning methods for process prediction

Dominic A. Neu, Johannes Lahann, Peter Fettke

Process mining enables the reconstruction and evaluation of business processes based on digital traces in IT systems. An increasingly important technique in this context is process prediction. Given a sequence of events of an ongoing trace, process prediction allows forecasting upcoming events or performance measurements. In recent years, multiple process prediction approaches have been proposed, applying different data processing schemes and prediction algorithms. This study focuses on deep learning algorithms since they seem to outperform their machine learning alternatives consistently. Whilst having a common learning algorithm, they use different data preprocessing techniques, implement a variety of network topologies and focus on various goals such as outcome prediction, time prediction or control-flow prediction. Additionally, the set of log-data, evaluation metrics and baselines used by the authors diverge, making the results hard to compare. This paper attempts to synthesise the advantages and disadvantages of the procedural decisions in these approaches by conducting a systematic literature review.

en cs.LG

Detail DOI Sumber

arXiv Open Access 2019

Software Testing Process Models Benefits & Drawbacks: a Systematic Literature Review

Katarína Hrabovská, Bruno Rossi, Tomáš Pitner

Context: Software testing plays an essential role in product quality improvement. For this reason, several software testing models have been developed to support organizations. However, adoption of testing process models inside organizations is still sporadic, with a need for more evidence about reported experiences. Aim: Our goal is to identify results gathered from the application of software testing models in organizational contexts. We focus on characteristics such as the context of use, practices applied in different testing process phases, and reported benefits & drawbacks. Method: We performed a Systematic Literature Review (SLR) focused on studies about the application of software testing processes, complemented by results from previous reviews. Results: From 35 primary studies and survey-based articles, we collected 17 testing models. Although most of the existing models are described as applicable to general contexts, the evidence obtained from the studies shows that some models are not suitable for all enterprise sizes, and inadequate for specific domains. Conclusion: The SLR evidence can serve to compare different software testing models for applicability inside organizations. Both benefits and drawbacks, as reported in the surveyed cases, allow getting a better view of the strengths and weaknesses of each model.

en cs.SE

Detail Sumber

arXiv Open Access 2019

Bibliography of Literature on GW Ab Initio Calculations which use Imaginary Time

Vincent Sacksteder

The GW Approximation is an ab initio approach to calculating electronic structure which avoids using the Local Density (LDA) Approximation, the Generalized Gradient (GGA) Approximation, or similar density functionals. It goes beyond the Hartree-Fock approximation by including screening and excited state effects, and shares conceptual similarities with MP2 and RPA calculations. Because GW includes dynamics and time/frequency dependence of the system's screening and excited state behavior, a pivotal issue in any GW calculation is the question of how to numerically represent and manipulate time/frequency dependence. While earlier GW calculations generally used a representation in real time/frequency, many recent calculations have used a representation on the imaginary time axis. Imaginary time is important not only for numerics but also because it can enable additional physics such as systems where self-consistent GW is needed or that are strongly interacting, integration with DMFT, and scaling of GW to very large systems. This current text reviews the research literature on imaginary time GW codes and briefly discusses the possibilities for memory and CPU optimizations.

en cond-mat.mes-hall

Detail Sumber

arXiv Open Access 2019

Flood Prediction Using Machine Learning Models: Literature Review

Amir Mosavi, Pinar Ozturk, Kwok-wing Chau

Floods are among the most destructive natural disasters, which are highly complex to model. The research on the advancement of flood prediction models contributed to risk reduction, policy suggestion, minimization of the loss of human life, and reduction the property damage associated with floods. To mimic the complex mathematical expressions of physical processes of floods, during the past two decades, machine learning (ML) methods contributed highly in the advancement of prediction systems providing better performance and cost-effective solutions. Due to the vast benefits and potential of ML, its popularity dramatically increased among hydrologists. Researchers through introducing novel ML methods and hybridizing of the existing ones aim at discovering more accurate and efficient prediction models. The main contribution of this paper is to demonstrate the state of the art of ML models in flood prediction and to give insight into the most suitable models. In this paper, the literature where ML models were benchmarked through a qualitative analysis of robustness, accuracy, effectiveness, and speed are particularly investigated to provide an extensive overview on the various ML algorithms used in the field. The performance comparison of ML models presents an in-depth understanding of the different techniques within the framework of a comprehensive evaluation and discussion. As a result, this paper introduces the most promising prediction methods for both long-term and short-term floods. Furthermore, the major trends in improving the quality of the flood prediction models are investigated. Among them, hybridization, data decomposition, algorithm ensemble, and model optimization are reported as the most effective strategies for the improvement of ML methods.

en cs.LG, stat.ML

Detail DOI Sumber

Hasil untuk "Norwegian literature"