Hasil untuk "Greek philology and language"

Menampilkan 19 dari ~1458185 hasil · dari DOAJ, arXiv, CrossRef

JSON API
arXiv Open Access 2025
SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors

Bohan Lyu, Siqiao Huang, Zichen Liang et al.

Neural surrogate models are powerful and efficient tools in data mining. Meanwhile, large language models (LLMs) have demonstrated remarkable capabilities in code-related tasks, such as generation and understanding. However, an equally important yet underexplored question is whether LLMs can serve as surrogate models for code execution prediction. To systematically investigate it, we introduce SURGE, a comprehensive benchmark with $1160$ problems covering $8$ key aspects: multi-language programming tasks, competition-level programming problems, repository-level code analysis, high-cost scientific computing, time-complexity-intensive algorithms, buggy code analysis, programs dependent on specific compilers or execution environments, and formal mathematical proof verification. Through extensive analysis of $21$ open-source and proprietary LLMs, we examine scaling laws, data efficiency, and predictive accuracy. Our findings reveal important insights about the feasibility of LLMs as efficient surrogates for computational processes. The benchmark and evaluation framework are available at https://github.com/Imbernoulli/SURGE.

en cs.LG, cs.CL
arXiv Open Access 2025
Knowledge-Driven Agentic Scientific Corpus Distillation Framework for Biomedical Large Language Models Training

Meng Xiao, Xunxin Cai, Qingqing Long et al.

Corpus distillation for biomedical large language models (LLMs) seeks to address the pressing challenge of insufficient quantity and quality in open-source annotated scientific corpora, which remains a bottleneck for effective LLM training in biomedical research. This paper proposes a knowledge-driven, agentic framework for scientific corpus distillation, tailored explicitly for LLM training in the biomedical domain, addressing the challenge posed by the complex hierarchy of biomedical knowledge. Central to our approach is a collaborative multi-agent architecture, where specialized agents, each guided by the Medical Subject Headings (MeSH) hierarchy, work in concert to autonomously extract, synthesize, and self-evaluate high-quality textual data from vast scientific literature. This agentic framework collectively generates and refines domain-specific question-answer pairs, ensuring comprehensive coverage and consistency with biomedical ontologies while minimizing manual involvement. Extensive experimental results show that language models trained on our multi-agent distilled datasets achieve notable improvements in biomedical question-answering tasks, outperforming both strong life sciences LLM baselines and advanced proprietary models. Notably, our AI-Ready dataset enables Llama3-70B to surpass GPT-4 with MedPrompt and Med-PaLM-2, despite their larger scale. Detailed ablation studies and case analyses further validate the effectiveness and synergy of each agent within the framework, highlighting the potential of multi-agent collaboration in biomedical LLM training.

en cs.CL, cs.AI
arXiv Open Access 2025
Exploring Gender Bias in Alzheimer's Disease Detection: Insights from Mandarin and Greek Speech Perception

Liu He, Yuanchao Li, Rui Feng et al.

Gender bias has been widely observed in speech perception tasks, influenced by the fundamental voicing differences between genders. This study reveals a gender bias in the perception of Alzheimer's Disease (AD) speech. In a perception experiment involving 16 Chinese listeners evaluating both Chinese and Greek speech, we identified that male speech was more frequently identified as AD, with this bias being particularly pronounced in Chinese speech. Acoustic analysis showed that shimmer values in male speech were significantly associated with AD perception, while speech portion exhibited a significant negative correlation with AD identification. Although language did not have a significant impact on AD perception, our findings underscore the critical role of gender bias in AD speech perception. This work highlights the necessity of addressing gender bias when developing AD detection models and calls for further research to validate model performance across different linguistic contexts.

en cs.CL, cs.HC
arXiv Open Access 2025
Can Vision-Language Models Solve Visual Math Equations?

Monjoy Narayan Choudhury, Junling Wang, Yifan Hou et al.

Despite strong performance in visual understanding and language-based reasoning, Vision-Language Models (VLMs) struggle with tasks requiring integrated perception and symbolic computation. We study this limitation through visual equation solving, where mathematical equations are embedded in images, variables are represented by object icons, and coefficients must be inferred by counting. While VLMs perform well on textual equations, they fail on visually grounded counterparts. To understand this gap, we decompose the task into coefficient counting and variable recognition, and find that counting is the primary bottleneck, even when recognition is accurate. We also observe that composing recognition and reasoning introduces additional errors, highlighting challenges in multi-step visual reasoning. Finally, as equation complexity increases, symbolic reasoning itself becomes a limiting factor. These findings reveal key weaknesses in current VLMs and point toward future improvements in visually grounded mathematical reasoning.

en cs.CL, cs.AI
DOAJ Open Access 2024
O nieco zapomnianym erotyku rzymskim

Zofia Głombiowska

The article concerns the poem Dirae preserved in Appendix Vergiliana, or more precisely its second part that speaks of love for Lydia. The author presents the structure of the work, discusses the way of depicting female beauty and anthropomorphization of nature — which was later imitated by Petrarch — and assesses the selection of mythological motifs and indicates their sources (Homer, Hesiod). Regarding the portrayal of love, she assesses the differences in comparison to the Roman elegy as significant — the separation of lovers and the love drama in the poem about Lydia is not caused by the girl’s indifference infidelity, but by political decisions (confiscation of land in favour of war veterans forces a man to leave the village). The poem depicts free, fulfilled love as a transgression. Since the text in the codices is full of errors, its interpretation presented here also includes numerous reflections on manuscript lessons and publishers’ corrections (as in the last lines of the work, for example) or is based on the punctuation of sentence construction (as in verses 142–143 [L 38–40]).

Philology. Linguistics, Greek language and literature. Latin language and literature
arXiv Open Access 2024
Mitigating Translationese in Low-resource Languages: The Storyboard Approach

Garry Kuwanto, Eno-Abasi E. Urua, Priscilla Amondi Amuok et al.

Low-resource languages often face challenges in acquiring high-quality language data due to the reliance on translation-based methods, which can introduce the translationese effect. This phenomenon results in translated sentences that lack fluency and naturalness in the target language. In this paper, we propose a novel approach for data collection by leveraging storyboards to elicit more fluent and natural sentences. Our method involves presenting native speakers with visual stimuli in the form of storyboards and collecting their descriptions without direct exposure to the source text. We conducted a comprehensive evaluation comparing our storyboard-based approach with traditional text translation-based methods in terms of accuracy and fluency. Human annotators and quantitative metrics were used to assess translation quality. The results indicate a preference for text translation in terms of accuracy, while our method demonstrates worse accuracy but better fluency in the language focused.

en cs.CL
arXiv Open Access 2024
From 'Showgirls' to 'Performers': Fine-tuning with Gender-inclusive Language for Bias Reduction in LLMs

Marion Bartl, Susan Leavy

Gender bias is not only prevalent in Large Language Models (LLMs) and their training data, but also firmly ingrained into the structural aspects of language itself. Therefore, adapting linguistic structures within LLM training data to promote gender-inclusivity can make gender representations within the model more inclusive. The focus of our work are gender-exclusive affixes in English, such as in 'show-girl' or 'man-cave', which can perpetuate gender stereotypes and binary conceptions of gender. We use an LLM training dataset to compile a catalogue of 692 gender-exclusive terms along with gender-neutral variants and from this, develop a gender-inclusive fine-tuning dataset, the 'Tiny Heap'. Fine-tuning three different LLMs with this dataset, we observe an overall reduction in gender-stereotyping tendencies across the models. Our approach provides a practical method for enhancing gender inclusivity in LLM training data and contributes to incorporating queer-feminist linguistic activism in bias mitigation research in NLP.

en cs.CL
arXiv Open Access 2023
This Reads Like That: Deep Learning for Interpretable Natural Language Processing

Claudio Fanconi, Moritz Vandenhirtz, Severin Husmann et al.

Prototype learning, a popular machine learning method designed for inherently interpretable decisions, leverages similarities to learned prototypes for classifying new data. While it is mainly applied in computer vision, in this work, we build upon prior research and further explore the extension of prototypical networks to natural language processing. We introduce a learned weighted similarity measure that enhances the similarity computation by focusing on informative dimensions of pre-trained sentence embeddings. Additionally, we propose a post-hoc explainability mechanism that extracts prediction-relevant words from both the prototype and input sentences. Finally, we empirically demonstrate that our proposed method not only improves predictive performance on the AG News and RT Polarity datasets over a previous prototype-based approach, but also improves the faithfulness of explanations compared to rationale-based recurrent convolutions.

en cs.CL, cs.AI
arXiv Open Access 2023
Fuzzy Temporal Protoforms for the Quantitative Description of Processes in Natural Language

Yago Fontenla-Seco, Alberto Bugarín-Diz, Manuel Lama

In this paper, we propose a series of fuzzy temporal protoforms in the framework of the automatic generation of quantitative and qualitative natural language descriptions of processes. The model includes temporal and causal information from processes and attributes, quantifies attributes in time during the process life-span and recalls causal relations and temporal distances between events, among other features. Through integrating process mining techniques and fuzzy sets within the usual Data-to-Text architecture, our framework is able to extract relevant quantitative temporal as well as structural information from a process and describe it in natural language involving uncertain terms. A real use-case in the cardiology domain is presented, showing the potential of our model for providing natural language explanations addressed to domain experts.

arXiv Open Access 2023
Can ChatGPT be Your Personal Medical Assistant?

Md. Rafiul Biswas, Ashhadul Islam, Zubair Shah et al.

The advanced large language model (LLM) ChatGPT has shown its potential in different domains and remains unbeaten due to its characteristics compared to other LLMs. This study aims to evaluate the potential of using a fine-tuned ChatGPT model as a personal medical assistant in the Arabic language. To do so, this study uses publicly available online questions and answering datasets in Arabic language. There are almost 430K questions and answers for 20 disease-specific categories. GPT-3.5-turbo model was fine-tuned with a portion of this dataset. The performance of this fine-tuned model was evaluated through automated and human evaluation. The automated evaluations include perplexity, coherence, similarity, and token count. Native Arabic speakers with medical knowledge evaluated the generated text by calculating relevance, accuracy, precision, logic, and originality. The overall result shows that ChatGPT has a bright future in medical assistance.

en cs.CL, cs.SI
arXiv Open Access 2023
GroundNLQ @ Ego4D Natural Language Queries Challenge 2023

Zhijian Hou, Lei Ji, Difei Gao et al.

In this report, we present our champion solution for Ego4D Natural Language Queries (NLQ) Challenge in CVPR 2023. Essentially, to accurately ground in a video, an effective egocentric feature extractor and a powerful grounding model are required. Motivated by this, we leverage a two-stage pre-training strategy to train egocentric feature extractors and the grounding model on video narrations, and further fine-tune the model on annotated data. In addition, we introduce a novel grounding model GroundNLQ, which employs a multi-modal multi-scale grounding module for effective video and text fusion and various temporal intervals, especially for long videos. On the blind test set, GroundNLQ achieves 25.67 and 18.18 for R1@IoU=0.3 and R1@IoU=0.5, respectively, and surpasses all other teams by a noticeable margin. Our code will be released at\url{https://github.com/houzhijian/GroundNLQ}.

en cs.CV, cs.CL
arXiv Open Access 2023
Assessing Linguistic Generalisation in Language Models: A Dataset for Brazilian Portuguese

Rodrigo Wilkens, Leonardo Zilio, Aline Villavicencio

Much recent effort has been devoted to creating large-scale language models. Nowadays, the most prominent approaches are based on deep neural networks, such as BERT. However, they lack transparency and interpretability, and are often seen as black boxes. This affects not only their applicability in downstream tasks but also the comparability of different architectures or even of the same model trained using different corpora or hyperparameters. In this paper, we propose a set of intrinsic evaluation tasks that inspect the linguistic information encoded in models developed for Brazilian Portuguese. These tasks are designed to evaluate how different language models generalise information related to grammatical structures and multiword expressions (MWEs), thus allowing for an assessment of whether the model has learned different linguistic phenomena. The dataset that was developed for these tasks is composed of a series of sentences with a single masked word and a cue phrase that helps in narrowing down the context. This dataset is divided into MWEs and grammatical structures, and the latter is subdivided into 6 tasks: impersonal verbs, subject agreement, verb agreement, nominal agreement, passive and connectors. The subset for MWEs was used to test BERTimbau Large, BERTimbau Base and mBERT. For the grammatical structures, we used only BERTimbau Large, because it yielded the best results in the MWE task.

en cs.CL
DOAJ Open Access 2021
Introduction

Mélanie Lucciano, ML

Ce colloque a reçu le soutien des Universités de Turin, de Vercelli Piemonte Orientale et de Rouen-Normandie, de l’IRIHS (Institut de Recherche Interdisciplinaire Homme Société), de l’Université Francoitalienne et de la SIAC (Société internationale des Amis de Cicéron).

Philology. Linguistics, Greek language and literature. Latin language and literature
DOAJ Open Access 2021
Las edades del hombre en Fulgencio

Julieta Cardigni

En su Expositio virgilianae continentiae, Fulgencio el Mitógrafo (fines del siglo V d. C., o principios del VI) realiza una exposición e interpretación de la Eneida virgiliana en clave alegórico-etimológica, a lo cual suma la sanción moral de las escrituras para clausurar el sentido. Lo más novedoso de su obra –dejando de lado el hecho, de por sí notable, de que es el primer comentarista cristiano de Virgilio— es que transforma la Eneida en una suerte de recorrido por las etapas de la vida humana, haciendo corresponder cada libro de la epopeya con una edad del hombre. El presente trabajo se propone analizar las operaciones fulgencianas de interpretación, focalizándose en la lectura que el autor hace de la Eneida como una alegoría de la vida del hombre. El presente trabajo se propone analizar las operaciones fulgencianas de interpretación, focalizándose en la lectura que el autor hace de la Eneida como una alegoría de la vida del hombre. De esta manera, buscamos iluminar la búsqueda exegética de Fulgencio, en el convencimiento de que no lo guía un objetivo serio y noble, sino la intención de burlarse y ridiculizar las técnicas exegéticas que se manejaban en su época y que tenían como objetivo la conciliación y síncresis del cristianismo con la tradición filosófico-literaria pagana. En este sentido, el autor respondería, por un lado, al molde menipeo que se percibe en su obra, y por otro, al contexto más amplio del Tardoantiguo, en el cual la reflexión y preocupación por el discurso y sus posibilidades de construcción de sentido están a la orden del día.

Philology. Linguistics, Greek language and literature. Latin language and literature
arXiv Open Access 2021
Chebyshev Greeks: Smoothing Gamma without Bias

Andrea Maran, Andrea Pallavicini, Stefano Scoleri

The computation of Greeks is a fundamental task for risk managing of financial instruments. The standard approach to their numerical evaluation is via finite differences. Most exotic derivatives are priced via Monte Carlo simulation: in these cases, it is hard to find a fast and accurate approximation of Greeks, mainly because of the need of a tradeoff between bias and variance. Recent improvements in Greeks computation, such as Adjoint Algorithmic Differentiation, are unfortunately uneffective on second order Greeks (such as Gamma), which are plagued by the most significant instabilities, so that a viable alternative to standard finite differences is still lacking. We apply Chebyshev interpolation techniques to the computation of spot Greeks, showing how to improve the stability of finite difference Greeks of arbitrary order, in a simple and general way. The increased performance of the proposed technique is analyzed for a number of real payoffs commonly traded by financial institutions.

en q-fin.CP, q-fin.MF
arXiv Open Access 2021
Connecting Language and Vision for Natural Language-Based Vehicle Retrieval

Shuai Bai, Zhedong Zheng, Xiaohan Wang et al.

Vehicle search is one basic task for the efficient traffic management in terms of the AI City. Most existing practices focus on the image-based vehicle matching, including vehicle re-identification and vehicle tracking. In this paper, we apply one new modality, i.e., the language description, to search the vehicle of interest and explore the potential of this task in the real-world scenario. The natural language-based vehicle search poses one new challenge of fine-grained understanding of both vision and language modalities. To connect language and vision, we propose to jointly train the state-of-the-art vision models with the transformer-based language model in an end-to-end manner. Except for the network structure design and the training strategy, several optimization objectives are also re-visited in this work. The qualitative and quantitative experiments verify the effectiveness of the proposed method. Our proposed method has achieved the 1st place on the 5th AI City Challenge, yielding competitive performance 18.69% MRR accuracy on the private test set. We hope this work can pave the way for the future study on using language description effectively and efficiently for real-world vehicle retrieval systems. The code will be available at https://github.com/ShuaiBai623/AIC2021-T5-CLV.

en cs.CV

Halaman 35 dari 72910