Hasil untuk "Literature (General)"

Menampilkan 20 dari ~14811603 hasil · dari CrossRef, DOAJ, arXiv, Semantic Scholar

JSON API
arXiv Open Access 2026
An Algorithmic Framework for Systematic Literature Reviews: A Case Study for Financial Narratives

Gabin Taibi, Joerg Osterrieder

This paper introduces an algorithmic framework for conducting systematic literature reviews (SLRs), designed to improve efficiency, reproducibility, and selection quality assessment in the literature review process. The proposed method integrates Natural Language Processing (NLP) techniques, clustering algorithms, and interpretability tools to automate and structure the selection and analysis of academic publications. The framework is applied to a case study focused on financial narratives, an emerging area in financial economics that examines how structured accounts of economic events, formed by the convergence of individual interpretations, influence market dynamics and asset prices. Drawing from the Scopus database of peer-reviewed literature, the review highlights research efforts to model financial narratives using various NLP techniques. Results reveal that while advances have been made, the conceptualization of financial narratives remains fragmented, often reduced to sentiment analysis, topic modeling, or their combination, without a unified theoretical framework. The findings underscore the value of more rigorous and dynamic narrative modeling approaches and demonstrate the effectiveness of the proposed algorithmic SLR methodology.

en q-fin.GN, cs.AI
DOAJ Open Access 2025
Decline in activities of daily living in the rarer dementias

Sebastian Crutch, Claire Waddington, Emma Harding et al.

Rarer dementias are associated with atypical symptoms and younger onset, which result in a higher burden of care. We provide a review of the global literature on longitudinal decline in activities of daily living (ADLs) in dementias that account for less than 10% of dementia diagnoses. Published studies were identified through searches conducted in Medical Literature Analysis and Retrieval System Online (MEDLINE), Excerpta Medica Database (Embase), Excerpta Medica Care (Emcare), PsycINFO, and Cumulative Index in Nursing and Allied Health Literature (CINAHL). The search criteria included terms related to ‘rarer dementias’, ‘activities of daily living’ and ‘longitudinal or cross-sectional studies’ following a predefined protocol registered. Studies were screened, and those that met the criteria were citation searched. Quality assessments were performed, and relevant data were extracted. 20 articles were selected, of which 19 focused on dementias within the frontotemporal dementia/primary progressive aphasia spectrum, while one addressed posterior cortical atrophy. Four studies were cross-sectional and 16 studies were longitudinal, with a median duration of 2.2 years. The Disability Assessment for Dementia was used to measure decline in 8 of the 20 studies. The varied sequences of ADL decline reported in the literature reflect variation in diagnostic specificity between studies and within-syndrome heterogeneity. Most studies used Alzheimer’s disease staging scales to measure decline, which cannot capture variant-specific symptoms. To enhance care provision in dementia, ADL scales could be deployed postdiagnosis to aid treatment and planning. This necessitates staging scales that are variant-specific and span the disease course from diagnosis to end of life. PROSPERO registration number: CRD42021283302.

arXiv Open Access 2025
LIFT: Interpretable truck driving risk prediction with literature-informed fine-tuned LLMs

Xiao Hu, Yuansheng Lian, Ke Zhang et al.

This study proposes an interpretable prediction framework with literature-informed fine-tuned (LIFT) LLMs for truck driving risk prediction. The framework integrates an LLM-driven Inference Core that predicts and explains truck driving risk, a Literature Processing Pipeline that filters and summarizes domain-specific literature into a literature knowledge base, and a Result Evaluator that evaluates the prediction performance as well as the interpretability of the LIFT LLM. After fine-tuning on a real-world truck driving risk dataset, the LIFT LLM achieved accurate risk prediction, outperforming benchmark models by 26.7% in recall and 10.1% in F1-score. Furthermore, guided by the literature knowledge base automatically constructed from 299 domain papers, the LIFT LLM produced variable importance ranking consistent with that derived from the benchmark model, while demonstrating robustness in interpretation results to various data sampling conditions. The LIFT LLM also identified potential risky scenarios by detecting key combination of variables in truck driving risk, which were verified by PERMANOVA tests. Finally, we demonstrated the contribution of the literature knowledge base and the fine-tuning process in the interpretability of the LIFT LLM, and discussed the potential of the LIFT LLM in data-driven knowledge discovery.

en cs.AI, cs.LG
arXiv Open Access 2025
Technical Report on classification of literature related to children speech disorder

Ziang Wang, Amir Aryani

This technical report presents a natural language processing (NLP)-based approach for systematically classifying scientific literature on childhood speech disorders. We retrieved and filtered 4,804 relevant articles published after 2015 from the PubMed database using domain-specific keywords. After cleaning and pre-processing the abstracts, we applied two topic modeling techniques - Latent Dirichlet Allocation (LDA) and BERTopic - to identify latent thematic structures in the corpus. Our models uncovered 14 clinically meaningful clusters, such as infantile hyperactivity and abnormal epileptic behavior. To improve relevance and precision, we incorporated a custom stop word list tailored to speech pathology. Evaluation results showed that the LDA model achieved a coherence score of 0.42 and a perplexity of -7.5, indicating strong topic coherence and predictive performance. The BERTopic model exhibited a low proportion of outlier topics (less than 20%), demonstrating its capacity to classify heterogeneous literature effectively. These results provide a foundation for automating literature reviews in speech-language pathology.

en cs.CL, cs.IR
arXiv Open Access 2025
Science mapping of the Revista General de Informacion y Documentacion (2005-2022)

Carmen Galvez

A study of the Revista General de Informacion y Documentacion, from 2005 to 2022. The objective is aimed at qualifying the structure of the research field and assessing the trajectory of the thematic areas covered. Applying as methodology the analysis of co-words, the construction of bibliometric networks and the creation of scientific maps. 514 documents are extracted from the Web of Science (WoS) database. The keywords assigned by the authors of the documents are selected and divided into three subperiods: 2005-2010, 2011-2016 and 2017-2022. In the results, 1701 author keywords and 37 bibliometric networks are obtained. In the period 2005-2010, the structure of the research field is represented on the scientific map with very few central and specialized topics, considering an initial and underdeveloped organization. In the period 2011-2016, the structure of the research field is distributed on the scientific map with a more varied number of central and specialized topics, but still insufficient, considering an organization in the process of development. In the period 2017-2022, the structure of the research field is shown on the map with all kinds of family of topics (central, specialized, transversal, emerging or disappearing), being valued as a dynamic, complex and heterogeneous organization. Regarding the evolution of the thematic areas, the map shows solid progress between the last two periods. The morphology of the thematic field treated in RGID is outlined in three phases: foundation, process of development and consolidation.

arXiv Open Access 2025
ChEmbed: Enhancing Chemical Literature Search Through Domain-Specific Text Embeddings

Ali Shiraee Kasmaee, Mohammad Khodadad, Mehdi Astaraki et al.

Retrieval-Augmented Generation (RAG) systems in chemistry heavily depend on accurate and relevant retrieval of chemical literature. However, general-purpose text embedding models frequently fail to adequately represent complex chemical terminologies, resulting in suboptimal retrieval quality. Specialized embedding models tailored to chemical literature retrieval have not yet been developed, leaving a substantial performance gap. To address this challenge, we introduce ChEmbed, a domain-adapted family of text embedding models fine-tuned on a dataset comprising chemistry-specific text from the PubChem, Semantic Scholar, and ChemRxiv corpora. To create effective training data, we employ large language models to synthetically generate queries, resulting in approximately 1.7 million high-quality query-passage pairs. Additionally, we augment the tokenizer by adding 900 chemically specialized tokens to previously unused slots, which significantly reduces the fragmentation of chemical entities, such as IUPAC names. ChEmbed also maintains a 8192-token context length, enabling the efficient retrieval of longer passages compared to many other open-source embedding models, which typically have a context length of 512 or 2048 tokens. Evaluated on our newly introduced ChemRxiv Retrieval benchmark, ChEmbed outperforms state-of-the-art general embedding models, raising nDCG@10 from 0.82 to 0.91 (+9 pp). ChEmbed represents a practical, lightweight, and reproducible embedding solution that effectively improves retrieval for chemical literature search.

en cs.IR, cs.CL
arXiv Open Access 2025
SciNetBench: A Relation-Aware Benchmark for Scientific Literature Retrieval Agents

Chenyang Shao, Yong Li, Fengli Xu

The rapid development of AI agent has spurred the development of advanced research tools, such as Deep Research. Achieving this require a nuanced understanding of the relations within scientific literature, surpasses the scope of keyword-based or embedding-based retrieval. Existing retrieval agents mainly focus on the content-level similarities and are unable to decode critical relational dynamics, such as identifying corroborating or conflicting studies or tracing technological lineages, all of which are essential for a comprehensive literature review. Consequently, this fundamental limitation often results in a fragmented knowledge structure, misleading sentiment interpretation, and inadequate modeling of collective scientific progress. To investigate relation-aware retrieval more deeply, we propose SciNetBench, the first Scientific Network Relation-aware Benchmark for literature retrieval agents. Constructed from a corpus of over 18 million AI papers, our benchmark systematically evaluates three levels of relations: ego-centric retrieval of papers with novel knowledge structures, pair-wise identification of scholarly relationships, and path-wise reconstruction of scientific evolutionary trajectories. Through extensive evaluation of three categories of retrieval agents, we find that their accuracy on relation-aware retrieval tasks often falls below 20%, revealing a core shortcoming of current retrieval paradigms. Notably, further experiments on the literature review tasks demonstrate that providing agents with relational ground truth leads to a substantial 23.4% performance improvement in the review quality, validating the critical importance of relation-aware retrieval. We publicly release our benchmark at https://anonymous.4open.science/r/SciNetBench/ to support future research on advanced retrieval systems.

en cs.CE, cs.CL
DOAJ Open Access 2024
Big data analytics capability and social innovation: the mediating role of knowledge exploration and exploitation

Nan Wang, Baolian Chen, Liya Wang et al.

Abstract While many organizations have successfully leveraged big data analytics capabilities to improve their performance, our understanding is limited on whether and how big data analytics capabilities affect social innovation in organizations. Based on the organizational information processing theory and the organizational learning theory, this study aims to investigate how big data analytics capabilities support social innovation, and how knowledge ambidexterity mediates this relationship. A total of 354 high-tech companies in China, this study shows that big data analytics management, big data analytics technology, and big data analytics personnel capabilities all have positive effects on social innovation. In addition, both knowledge exploration and knowledge exploitation play a mediating role in this process. Furthermore, a polynomial regression and response surface analysis shows that social innovation increases when knowledge exploration and knowledge exploitation are highly consistent but declines when knowledge exploration and knowledge exploitation are inconsistent. This study not only provides new perspectives for understanding how big data analytics capabilities contribute to social innovation, complementing the existing literature on big data analytics capabilities and social innovation, but also provides important practical guidance on how organizations can develop big data analytics capabilities to improve social innovation and solve social problems in the digital age.

History of scholarship and learning. The humanities, Social Sciences
arXiv Open Access 2024
AceParse: A Comprehensive Dataset with Diverse Structured Texts for Academic Literature Parsing

Huawei Ji, Cheng Deng, Bo Xue et al.

With the development of data-centric AI, the focus has shifted from model-driven approaches to improving data quality. Academic literature, as one of the crucial types, is predominantly stored in PDF formats and needs to be parsed into texts before further processing. However, parsing diverse structured texts in academic literature remains challenging due to the lack of datasets that cover various text structures. In this paper, we introduce AceParse, the first comprehensive dataset designed to support the parsing of a wide range of structured texts, including formulas, tables, lists, algorithms, and sentences with embedded mathematical expressions. Based on AceParse, we fine-tuned a multimodal model, named AceParser, which accurately parses various structured texts within academic literature. This model outperforms the previous state-of-the-art by 4.1% in terms of F1 score and by 5% in Jaccard Similarity, demonstrating the potential of multimodal models in academic literature parsing. Our dataset is available at https://github.com/JHW5981/AceParse.

en cs.CL, cs.AI
arXiv Open Access 2024
Was that Sarcasm?: A Literature Survey on Sarcasm Detection

Harleen Kaur Bagga, Jasmine Bernard, Sahil Shaheen et al.

Sarcasm is hard to interpret as human beings. Being able to interpret sarcasm is often termed as a sign of intelligence, given the complex nature of sarcasm. Hence, this is a field of Natural Language Processing which is still complex for computers to decipher. This Literature Survey delves into different aspects of sarcasm detection, to create an understanding of the underlying problems faced during detection, approaches used to solve this problem, and different forms of available datasets for sarcasm detection.

en cs.CL

Halaman 26 dari 740581