Hasil untuk "Norwegian literature"

Menampilkan 20 dari ~1362432 hasil · dari arXiv, CrossRef

JSON API
arXiv Open Access 2026
Graph-Aware Late Chunking for Retrieval-Augmented Generation in Biomedical Literature

Pouria Mortezaagha, Arya Rahgozar

Retrieval-Augmented Generation (RAG) systems for biomedical literature are typically evaluated using ranking metrics like Mean Reciprocal Rank (MRR), which measure how well the system identifies the single most relevant chunk. We argue that for full-text scientific documents, this paradigm is incomplete: it rewards retrieval precision while ignoring retrieval breadth -- the ability to surface evidence from across a document's structural sections. We propose GraLC-RAG, a framework that unifies late chunking with graph-aware structural intelligence, introducing structure-aware chunk boundary detection, UMLS knowledge graph infusion, and graph-guided hybrid retrieval. We evaluate six strategies on 2,359 IMRaD-filtered PubMed Central articles using 2,033 cross-section questions and two metric families: standard ranking metrics (MRR, Recall@k) and structural coverage metrics (SecCov@k, CS Recall). Our results expose a sharp divergence: content-similarity methods achieve the highest MRR (0.517) but always retrieve from a single section, while structure-aware methods retrieve from up to 15.6x more sections. Generation experiments show that KG-infused retrieval narrows the answer-quality gap to delta-F1 = 0.009 while maintaining 4.6x section diversity. These findings demonstrate that standard metrics systematically undervalue structural retrieval and that closing the multi-section synthesis gap is a key open problem for biomedical RAG.

en cs.AI, cs.IR
arXiv Open Access 2026
Automated Extraction of Mechanical Constitutive Models from Scientific Literature using Large Language Models: Applications in Cultural Heritage Conservation

Rui Hu, Yue Wu, Tianhao Su et al.

The preservation of cultural heritage is increasingly transitioning towards data-driven predictive maintenance and "Digital Twin" construction. However, the mechanical constitutive models required for high-fidelity simulations remain fragmented across decades of unstructured scientific literature, creating a "Data Silo" that hinders conservation engineering. To address this, we present an automated, two-stage agentic framework leveraging Large Language Models (LLMs) to extract mechanical constitutive equations, calibrated parameters, and metadata from PDF documents. The workflow employs a resource-efficient "Gatekeeper" agent for relevance filtering and a high-capability "Analyst" agent for fine-grained extraction, featuring a novel Context-Aware Symbolic Grounding mechanism to resolve mathematical ambiguities. Applied to a corpus of over 2,000 research papers, the system successfully isolated 113 core documents and constructed a structured database containing 185 constitutive model instances and over 450 calibrated parameters. The extraction precision reached 80.4\%, establishing a highly efficient "Human-in-the-loop" workflow that reduces manual data curation time by approximately 90\%. We demonstrate the system's utility through a web-based Knowledge Retrieval Platform, which enables rapid parameter discovery for computational modeling. This work transforms scattered literature into a queryable digital asset, laying the data foundation for the "Digital Material Twin" of built heritage.

en cs.DB
arXiv Open Access 2025
Missing the human touch? A computational stylometry analysis of GPT-4 translations of online Chinese literature

Xiaofang Yao, Yong-Bin Kang, Anthony McCosker

Existing research indicates that machine translations (MTs) of literary texts are often unsatisfactory. MTs are typically evaluated using automated metrics and subjective human ratings, with limited focus on stylistic features. Evidence is also limited on whether state-of-the-art large language models (LLMs) will reshape literary translation. This study examines the stylistic features of LLM translations, comparing GPT-4's performance to human translations in a Chinese online literature task. Computational stylometry analysis shows that GPT-4 translations closely align with human translations in lexical, syntactic, and content features, suggesting that LLMs might replicate the 'human touch' in literary translation style. These findings offer insights into AI's impact on literary translation from a posthuman perspective, where distinctions between machine and human translations become increasingly blurry.

en cs.CL, cs.AI
arXiv Open Access 2025
A Systematic Literature Review on Detecting Software Vulnerabilities with Large Language Models

Sabrina Kaniewski, Fabian Schmidt, Markus Enzweiler et al.

The increasing adoption of Large Language Models (LLMs) in software engineering has sparked interest in their use for software vulnerability detection. However, the rapid development of this field has resulted in a fragmented research landscape, with diverse studies that are difficult to compare due to differences in, e.g., system designs and dataset usage. This fragmentation makes it difficult to obtain a clear overview of the state-of-the-art or compare and categorize studies meaningfully. In this work, we present a comprehensive systematic literature review (SLR) of LLM-based software vulnerability detection. We analyze 263 studies published between January 2020 and November 2025, categorizing them by task formulation, input representation, system architecture, and techniques. Further, we analyze the datasets used, including their characteristics, vulnerability coverage, and diversity. We present a fine-grained taxonomy of vulnerability detection approaches, identify key limitations, and outline actionable future research opportunities. By providing a structured overview of the field, this review improves transparency and serves as a practical guide for researchers and practitioners aiming to conduct more comparable and reproducible research. We publicly release all artifacts and maintain a living repository of LLM-based software vulnerability detection studies at https://github.com/hs-esslingen-it-security/Awesome-LLM4SVD.

en cs.SE, cs.AI
arXiv Open Access 2025
Large Language Models for Power System Applications: A Comprehensive Literature Survey

Muhammad Sarwar, Muhammad Rizwan, Mubushra Aziz et al.

This comprehensive literature review examines the emerging applications of Large Language Models (LLMs) in power system engineering. Through a systematic analysis of recent research published between 2020 and 2025, we explore how LLMs are being integrated into various aspects of power system operations, planning, and management. The review covers key application areas including fault diagnosis, load forecasting, cybersecurity, control and optimization, system planning, simulation, and knowledge management. Our findings indicate that while LLMs show promising potential in enhancing power system operations through their advanced natural language processing and reasoning capabilities, significant challenges remain in their practical implementation. These challenges include limited domain-specific training data, concerns about reliability and safety in critical infrastructure, and the need for enhanced explainability. The review also highlights emerging trends such as the development of power system-specific LLMs and hybrid approaches combining LLMs with traditional power engineering methods. We identify crucial research directions for advancing the field, including the development of specialized architectures, improved security frameworks, and enhanced integration with existing power system tools. This survey provides power system researchers and practitioners with a comprehensive overview of the current state of LLM applications in the field and outlines future pathways for research and development.

en eess.SY
CrossRef Open Access 2024
The hidden scabies: a rare case of atypical Norwegian scabies, case report and literature review

Angela Mauro, Cristiana Colonna, Silvia Taranto et al.

Abstract Background Norwegian scabies is a rare dermatological manifestation that usually affects the most fragile populations, such as elderly and immunocompromised patients, and its diagnosis is quite complex, due to its low prevalence in the general population and because of a broad spectrum manifestation. Case Presentation Here we describe a rare case of Norwegian scabies that was previously misdiagnosed in a sixteen year old patient affected by Down syndrome and we conducted a non-systematic literature review about this topic. Lesions were atypical, pruritic and associated with periodic desquamation of the palms and soles and after a series of specialist evaluations, she finally underwent topical treatment with complete remission. Conclusion It is therefore crucial to take in consideration the relation between Down syndrome and community acquired crusted scabies, to enable preventative measures, early detection, and proper treatment.

4 sitasi en
arXiv Open Access 2024
SciQu: Accelerating Materials Properties Prediction with Automated Literature Mining for Self-Driving Laboratories

Anand Babu

Assessing different material properties to predict specific attributes, such as band gap, resistivity, young modulus, work function, and refractive index, is a fundamental requirement for materials science-based applications. However, the process is time-consuming and often requires extensive literature reviews and numerous experiments. Our study addresses these challenges by leveraging machine learning to analyze material properties with greater precision and efficiency. By automating the data extraction process and using the extracted information to train machine learning models, our developed model, SciQu, optimizes material properties. As a proof of concept, we predicted the refractive index of materials using data extracted from numerous research articles with SciQu, considering input descriptors such as space group, volume, and bandgap with Root Mean Square Error (RMSE) 0.068 and R2 0.94. Thus, SciQu not only predicts the properties of materials but also plays a key role in self-driving laboratories by optimizing the synthesis parameters to achieve precise shape, size, and phase of the materials subjected to the input parameters.

en cond-mat.mtrl-sci, cs.AI
arXiv Open Access 2024
Software Solutions for Newcomers' Onboarding in Software Projects: A Systematic Literature Review

Italo Santos, Katia Romero Felizardo, Marco A. Gerosa et al.

[Context] Newcomers joining an unfamiliar software project face numerous barriers; therefore, effective onboarding is essential to help them engage with the team and develop the behaviors, attitudes, and skills needed to excel in their roles. However, onboarding can be a lengthy, costly, and error-prone process. Software solutions can help mitigate these barriers and streamline the process without overloading senior members. [Objective] This study aims to identify the state-of-the-art software solutions for onboarding newcomers. [Method] We conducted a systematic literature review (SLR) to answer six research questions. [Results] We analyzed 32 studies about software solutions for onboarding newcomers and yielded several key findings: (1) a range of strategies exists, with recommendation systems being the most prevalent; (2) most solutions are web-based; (3) solutions target a variety of onboarding aspects, with a focus on process; (4) many onboarding barriers remain unaddressed by existing solutions; (5) laboratory experiments are the most commonly used method for evaluating these solutions; and (6) diversity and inclusion aspects primarily address experience level. [Conclusion] We shed light on current technological support and identify research opportunities to develop more inclusive software solutions for onboarding. These insights may also guide practitioners in refining existing platforms and onboarding programs to promote smoother integration of newcomers into software projects.

en cs.SE
arXiv Open Access 2024
Crossing the disciplines -- a starter toolkit for researchers who wish to explore early Irish literature

M. McCarthy, D. P. Curley

The inspiration behind this paper came from both authors' long-term collaboration with our friend and colleague, Professor Ralph Kenna. This connection emerged initially through his interest in Rathcroghan and in our paper, `Exploring the Nature of the Fráoch Saga', which we concluded with the statement that we believed it `presents a case that will hopefully ignite conversation between disciplines'. This led us to consider the potential value for researchers of compiling a template list of useful and reliable sources and resources to consult, in other words a type of starter toolkit or guide for any individual from an alternative discipline or background, who might possess, or, in time, develop a personal or professional interest in Early Ireland and Early Irish literature. In doing this, we decided for ease of illustration, to take the example of the location name Rathcroghan/Cruachan Aí, (the prehistoric Royal Site of Connacht in the west of Ireland and the place that we both work in and interact with on a daily basis), as a case study in order to demonstrate an initial methodological approach to not only the types of resources and information available, but also to highlight some potential pitfalls that may arise in the course of an investigation.

en physics.soc-ph
arXiv Open Access 2023
Analyzing the impact of climate change on critical infrastructure from the scientific literature: A weakly supervised NLP approach

Tanwi Mallick, Joshua David Bergerson, Duane R. Verner et al.

Natural language processing (NLP) is a promising approach for analyzing large volumes of climate-change and infrastructure-related scientific literature. However, best-in-practice NLP techniques require large collections of relevant documents (corpus). Furthermore, NLP techniques using machine learning and deep learning techniques require labels grouping the articles based on user-defined criteria for a significant subset of a corpus in order to train the supervised model. Even labeling a few hundred documents with human subject-matter experts is a time-consuming process. To expedite this process, we developed a weak supervision-based NLP approach that leverages semantic similarity between categories and documents to (i) establish a topic-specific corpus by subsetting a large-scale open-access corpus and (ii) generate category labels for the topic-specific corpus. In comparison with a months-long process of subject-matter expert labeling, we assign category labels to the whole corpus using weak supervision and supervised learning in about 13 hours. The labeled climate and NCF corpus enable targeted, efficient identification of documents discussing a topic (or combination of topics) of interest and identification of various effects of climate change on critical infrastructure, improving the usability of scientific literature and ultimately supporting enhanced policy and decision making. To demonstrate this capability, we conduct topic modeling on pairs of climate hazards and NCFs to discover trending topics at the intersection of these categories. This method is useful for analysts and decision-makers to quickly grasp the relevant topics and most important documents linked to the topic.

en cs.LG
arXiv Open Access 2022
Adaptive user interfaces in systems targeting chronic disease: a systematic literature review

Wei Wang, Hourieh Khalajzadeh, Anuradha Madugalla et al.

eHealth technologies have been increasingly used to foster proactive self-management skills for patients with chronic diseases. However, it is challenging to provide each user with their desired support due to the dynamic and diverse nature of the chronic disease and its impact on users. Many such eHealth applications support aspects of `adaptive user interfaces' -- interfaces that change or can be changed to accommodate the user and usage context differences. To identify the state-of-art in adaptive user interfaces in the field of chronic diseases, we systematically located and analysed 48 key studies in the literature with the aim of categorising the key approaches used to date and identifying limitations, gaps and trends in research. Our data synthesis is based on the data sources used for interface adaptation, the data collection techniques used to extract the data, the adaptive mechanisms used to process the data and the adaptive elements generated at the interface. The findings of this review will aid researchers and developers in understanding where adaptive user interface approaches can be applied and necessary considerations for employing adaptive user interfaces to different chronic disease-related eHealth applications.

arXiv Open Access 2022
Figure and Figure Caption Extraction for Mixed Raster and Vector PDFs: Digitization of Astronomical Literature with OCR Features

J. P. Naiman, Peter K. G. Williams, Alyssa Goodman

Scientific articles published prior to the "age of digitization" in the late 1990s contain figures which are "trapped" within their scanned pages. While progress to extract figures and their captions has been made, there is currently no robust method for this process. We present a YOLO-based method for use on scanned pages, post-Optical Character Recognition (OCR), which uses both grayscale and OCR-features. When applied to the astrophysics literature holdings of the Astrophysics Data System (ADS), we find F1 scores of 90.9% (92.2%) for figures (figure captions) with the intersection-over-union (IOU) cut-off of 0.9 which is a significant improvement over other state-of-the-art methods.

en astro-ph.IM, cs.DL
arXiv Open Access 2021
The Effects of Continuous Integration on Software Development: a Systematic Literature Review

Eliezio Soares, Gustavo Sizilio, Jadson Santos et al.

Context: Continuous integration (CI) is a software engineering technique that proclaims a set of frequent activities to assure the health of the software product. Researchers and practitioners mention several benefits related to CI. However, no systematic study surveys state of the art regarding such benefits or cons. Objective: This study aims to identify and interpret empirical evidence regarding how CI impacts software development. Method: Through a Systematic Literature Review, we search for studies in six digital libraries. Starting from 479 studies, we select 101 empirical studies that evaluate CI for any software development activity (e.g., testing). We thoroughly read and extract information regarding (i) CI environment, (ii) findings related to effects of CI, and (iii) the employed methodology. We apply a thematic synthesis to group and summarize the findings. Results: Existing research has explored the positive effects of CI, such as better cooperation, or negative effects, such as adding technical and process challenges. From our thematic synthesis, we identify six themes: development activities, software process, quality assurance, integration patterns, issues & defects, and build patterns. Conclusions: Empirical research in CI has been increasing over recent years. We found that much of the existing research reveals that CI brings positive effects to the software development phenomena. However, CI may also bring technical challenges to software development teams. Despite the overall positive outlook regarding CI, we still find room for improvements in the existing empirical research that evaluates the effects of CI.

en cs.SE
arXiv Open Access 2021
Identification and Measurement of Technical Debt Requirements in Software Development: a Systematic Literature Review

Ana Melo, Roberta Fagundes, Valentina Lenarduzzi et al.

Context: Technical Debt requirements are related to the distance between the ideal value of the specification and the system's actual implementation, which are consequences of strategic decisions for immediate gains, or unintended changes in context. To ensure the evolution of the software, it is necessary to keep it managed. Identification and measurement are the first two stages of the management process; however, they are little explored in academic research in requirements engineering. Objective: We aimed at investigating which evidence helps to strengthen the process of TD requirements management, including identification and measurement. Method: We conducted a Systematic Literature Review through manual and automatic searches considering 7499 studies from 2010 to 2020, and including 61 primary studies. Results: We identified some causes related to Technical Debt requirements, existing strategies to help in the identification and measurement, and metrics to support the measurement stage. Conclusion: Studies on TD requirements are still preliminary, especially on management tools. Yet, not enough attention is given to interpersonal issues, which are difficulties encountered when performing such activities, and therefore also require research. Finally, the provision of metrics to help measure TD is part of this work's contribution, providing insights into the application in the requirements context.

en cs.SE
arXiv Open Access 2021
Self-citation and its impact on research evaluation: Literature review. Part I

Vladimir Pislyakov

This review summarizes papers which analyze impact of self-citation on research evaluation. We introduce a generalized definition of self-citation and its variants: author, institutional, country, journal, discipline, publisher self-citation. Formulae of the basic self-citation measures are given, namely self-citing and self-cited rates. World literature on author, institutional, country and journal self-citation is studied in more detail. Current views on the role and impact of self-citation are compiled and analyzed. It is found that there is a general consensus on some points: (a) pathological is as excessive self-citation so its total absence; (b) self-citation has low impact on large science units but may be critical for analysis of individual researchers; (c) share of self-citations is generally higher for units with low bibliometric performance, while top scientists, institutions, journals receive the majority of their citations from outside. This review also considers how bibliometric instruments and databases respond to challenge of possible manipulation by self-citations and how they correct bibliometric indicators calculated by them. The first part of the review presented here deals with the fundamental terms and definitions, and the most discussed and studied type of the self-citation, author self-citation. The paper was funded by RFBR, project number 20-111-50209

en cs.DL
arXiv Open Access 2020
Data Mining with Big Data in Intrusion Detection Systems: A Systematic Literature Review

Fadi Salo, MohammadNoor Injadat, Ali Bou Nassif et al.

Cloud computing has become a powerful and indispensable technology for complex, high performance and scalable computation. The exponential expansion in the deployment of cloud technology has produced a massive amount of data from a variety of applications, resources and platforms. In turn, the rapid rate and volume of data creation has begun to pose significant challenges for data management and security. The design and deployment of intrusion detection systems (IDS) in the big data setting has, therefore, become a topic of importance. In this paper, we conduct a systematic literature review (SLR) of data mining techniques (DMT) used in IDS-based solutions through the period 2013-2018. We employed criterion-based, purposive sampling identifying 32 articles, which constitute the primary source of the present survey. After a careful investigation of these articles, we identified 17 separate DMTs deployed in an IDS context. This paper also presents the merits and disadvantages of the various works of current research that implemented DMTs and distributed streaming frameworks (DSF) to detect and/or prevent malicious attacks in a big data environment.

en cs.CR, cs.AI
arXiv Open Access 2020
Motivations, Benefits, and Issues for Adopting Micro-Frontends: A Multivocal Literature Review

Severi Peltonen, Luca Mezzalira, Davide Taibi

[Context] Micro-Frontends are increasing in popularity, being adopted by several large companies, such as DAZN, Ikea, Starbucks and may others. Micro-Frontends enable splitting of monolithic frontends into independent and smaller micro applications. However, many companies are still hesitant to adopt Micro-Frontends, due to the lack of knowledge concerning their benefits. Additionally, provided online documentation is often times perplexed and contradictory. [Objective] The goal of this work is to map the existing knowledge on Micro-Frontends, by understanding the motivations of companies when adopting such applications as well as possible benefits and issues. [Method] We conducted a Multivocal Literature Review, analyzing 43 sources , and classifying motivations, benefits and issues. [Results] The results show that existing architectural options to build web applications are cumbersome if the application and development team grows, and if multiple teams need to develop the same frontend application. The application of the Micro-Frontend, confirmed the expected benefits, and Micro-Frontends resulted to provide the same benefits as microservices on the back end side, combining the development team into a fully cross-functional development team that can scale processes when needed. However, Micro-Frontends also showed some issues, such as the increased payload size of the application, increased code duplication and coupling between teams, and monitoring complexity. [Conclusions] Micro-Frontends allow companies to scale development according to business needs in the same way microservices do with the back end side. In addition, ...

en cs.SE
arXiv Open Access 2019
Explaining a prediction in some nonlinear models

Cosimo Izzo

In this article we will analyse how to compute the contribution of each input value to its aggregate output in some nonlinear models. Regression and classification applications, together with related algorithms for deep neural networks are presented. The proposed approach merges two methods currently present in the literature: integrated gradient and deep Taylor decomposition. Compared to DeepLIFT and Deep SHAP, it provides a natural choice of the reference point peculiar to the model at use.

en cs.LG, stat.ML

Halaman 57 dari 68122