Hasil "Greek language and literature. Latin language and literature"

arXiv Open Access 2026

TARAZ: Persian Short-Answer Question Benchmark for Cultural Evaluation of Language Models

Reihaneh Iranmanesh, Saeedeh Davoudi, Pasha Abrishamchian et al.

This paper presents a comprehensive evaluation framework for assessing the cultural competence of large language models (LLMs) in Persian. Existing Persian cultural benchmarks rely predominantly on multiple-choice formats and English-centric metrics that fail to capture Persian's morphological complexity and semantic nuance. Our framework introduces a Persian-specific short-answer evaluation that combines rule-based morphological normalization with a hybrid syntactic and semantic similarity module, enabling robust soft-match scoring beyond exact string overlap. Through systematic evaluation of 15 state-of-the-art open- and closed-source models across three culturally grounded Persian datasets, we demonstrate that our hybrid evaluation improves scoring consistency by +10 compared to exact-match baselines by capturing meaning that surface-level methods cannot detect. Our human evaluation further confirms that the proposed semantic similarity metric achieves higher agreement with human judgments than LLM-based judges. We publicly release our evaluation framework, providing the first standardized benchmark for measuring cultural understanding in Persian and establishing a reproducible foundation for cross-cultural LLM evaluation research.

en cs.CL, cs.LG

Detail Sumber

DOAJ Open Access 2025

Material 1: Ausschreibungsunterlagen für 2025/26

Susanne Aretz

Greek language and literature. Latin language and literature, Philology. Linguistics

Detail DOI Sumber

DOAJ Open Access 2025

Elementos de ética estoica en la segunda parte de la Monarquía mística de Lorenzo de Zamora

Manuel Andrés Seoane Rodríguez

Desde bien temprano el cristianismo en su afán evangelizador no solo hizo suyas muchas de las enseñanzas de las diferentes escuelas filosóficas, especialmente las procedentes de académicos, neoplatónicos y estoicos, sino que además se esforzó en presentarse como verdadeira filosofía. De este modo, preceptos teóricos y actitudes vitales pasaron a integrarse perfectamente dentro de la doctrina cristiana en sus diferentes manifestaciones. El monacato heredó, sin duda, todo lo relativo a la vida interior y a la praxis del ejercicio filosófico. Nosotros nos proponemos en este trabajo demostrar que asuntos como el del autoconocimiento y el contemptus mundi llegaron a la obra de fray Lorenzo de Zamora titulada Monarquía mística no solo a través del uso de antologias sino también a través de la regla monástica de la orden monástica a la que él pertenecía, el Císter.

History of the Greco-Roman World, Greek language and literature. Latin language and literature

Detail DOI Sumber

arXiv Open Access 2025

Type-Less yet Type-Aware Inductive Link Prediction with Pretrained Language Models

Alessandro De Bellis, Salvatore Bufi, Giovanni Servedio et al.

Inductive link prediction is emerging as a key paradigm for real-world knowledge graphs (KGs), where new entities frequently appear and models must generalize to them without retraining. Predicting links in a KG faces the challenge of guessing previously unseen entities by leveraging generalizable node features such as subgraph structure, type annotations, and ontological constraints. However, explicit type information is often lacking or incomplete. Even when available, type information in most KGs is often coarse-grained, sparse, and prone to errors due to human annotation. In this work, we explore the potential of pre-trained language models (PLMs) to enrich node representations with implicit type signals. We introduce TyleR, a Type-less yet type-awaRe approach for subgraph-based inductive link prediction that leverages PLMs for semantic enrichment. Experiments on standard benchmarks demonstrate that TyleR outperforms state-of-the-art baselines in scenarios with scarce type annotations and sparse graph connectivity. To ensure reproducibility, we share our code at https://github.com/sisinflab/tyler .

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2025

Stronger Baselines for Retrieval-Augmented Generation with Long-Context Language Models

Alex Laitenberger, Christopher D. Manning, Nelson F. Liu

With the rise of long-context language models (LMs) capable of processing tens of thousands of tokens in a single context window, do multi-stage retrieval-augmented generation (RAG) pipelines still offer measurable benefits over simpler, single-stage approaches? To assess this question, we conduct a controlled evaluation for QA tasks under systematically scaled token budgets, comparing two recent multi-stage pipelines, ReadAgent and RAPTOR, against three baselines, including DOS RAG (Document's Original Structure RAG), a simple retrieve-then-read method that preserves original passage order. Despite its straightforward design, DOS RAG consistently matches or outperforms more intricate methods on multiple long-context QA benchmarks. We trace this strength to a combination of maintaining source fidelity and document structure, prioritizing recall within effective context windows, and favoring simplicity over added pipeline complexity. We recommend establishing DOS RAG as a simple yet strong baseline for future RAG evaluations, paired with state-of-the-art embedding and language models, and benchmarked under matched token budgets, to ensure that added pipeline complexity is justified by clear performance gains as models continue to improve.

en cs.CL

Detail DOI Sumber

DOAJ Open Access 2024

Zweite Griechischakademie Köln 2024

Susanne Aretz

Greek language and literature. Latin language and literature, Philology. Linguistics

Detail DOI Sumber

arXiv Open Access 2024

Evaluating Class Membership Relations in Knowledge Graphs using Large Language Models

Bradley P. Allen, Paul T. Groth

A backbone of knowledge graphs are their class membership relations, which assign entities to a given class. As part of the knowledge engineering process, we propose a new method for evaluating the quality of these relations by processing descriptions of a given entity and class using a zero-shot chain-of-thought classifier that uses a natural language intensional definition of a class. We evaluate the method using two publicly available knowledge graphs, Wikidata and CaLiGraph, and 7 large language models. Using the gpt-4-0125-preview large language model, the method's classification performance achieves a macro-averaged F1-score of 0.830 on data from Wikidata and 0.893 on data from CaLiGraph. Moreover, a manual analysis of the classification errors shows that 40.9% of errors were due to the knowledge graphs, with 16.0% due to missing relations and 24.9% due to incorrectly asserted relations. These results show how large language models can assist knowledge engineers in the process of knowledge graph refinement. The code and data are available on Github.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2024

ByteScience: Bridging Unstructured Scientific Literature and Structured Data with Auto Fine-tuned Large Language Model in Token Granularity

Tong Xie, Hanzhi Zhang, Shaozhou Wang et al.

Natural Language Processing (NLP) is widely used to supply summarization ability from long context to structured information. However, extracting structured knowledge from scientific text by NLP models remains a challenge because of its domain-specific nature to complex data preprocessing and the granularity of multi-layered device-level information. To address this, we introduce ByteScience, a non-profit cloud-based auto fine-tuned Large Language Model (LLM) platform, which is designed to extract structured scientific data and synthesize new scientific knowledge from vast scientific corpora. The platform capitalizes on DARWIN, an open-source, fine-tuned LLM dedicated to natural science. The platform was built on Amazon Web Services (AWS) and provides an automated, user-friendly workflow for custom model development and data extraction. The platform achieves remarkable accuracy with only a small amount of well-annotated articles. This innovative tool streamlines the transition from the science literature to structured knowledge and data and benefits the advancements in natural informatics.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2024

Multi-word Tokenization for Sequence Compression

Leonidas Gee, Leonardo Rigutini, Marco Ernandes et al.

Large Language Models have proven highly successful at modelling a variety of tasks. However, this comes at a steep computational cost that hinders wider industrial uptake. In this paper, we present MWT: a Multi-Word Tokenizer that goes beyond word boundaries by representing frequent multi-word expressions as single tokens. MWTs produce a more compact and efficient tokenization that yields two benefits: (1) Increase in performance due to a greater coverage of input data given a fixed sequence length budget; (2) Faster and lighter inference due to the ability to reduce the sequence length with negligible drops in performance. Our results show that MWT is more robust across shorter sequence lengths, thus allowing for major speedups via early sequence truncation.

en cs.CL, cs.LG

Detail DOI Sumber

arXiv Open Access 2024

LLAssist: Simple Tools for Automating Literature Review Using Large Language Models

Christoforus Yoga Haryanto

This paper introduces LLAssist, an open-source tool designed to streamline literature reviews in academic research. In an era of exponential growth in scientific publications, researchers face mounting challenges in efficiently processing vast volumes of literature. LLAssist addresses this issue by leveraging Large Language Models (LLMs) and Natural Language Processing (NLP) techniques to automate key aspects of the review process. Specifically, it extracts important information from research articles and evaluates their relevance to user-defined research questions. The goal of LLAssist is to significantly reduce the time and effort required for comprehensive literature reviews, allowing researchers to focus more on analyzing and synthesizing information rather than on initial screening tasks. By automating parts of the literature review workflow, LLAssist aims to help researchers manage the growing volume of academic publications more efficiently.

en cs.DL, cs.AI

Detail Sumber

arXiv Open Access 2024

Streaming Bilingual End-to-End ASR model using Attention over Multiple Softmax

Aditya Patil, Vikas Joshi, Purvi Agrawal et al.

Even with several advancements in multilingual modeling, it is challenging to recognize multiple languages using a single neural model, without knowing the input language and most multilingual models assume the availability of the input language. In this work, we propose a novel bilingual end-to-end (E2E) modeling approach, where a single neural model can recognize both languages and also support switching between the languages, without any language input from the user. The proposed model has shared encoder and prediction networks, with language-specific joint networks that are combined via a self-attention mechanism. As the language-specific posteriors are combined, it produces a single posterior probability over all the output symbols, enabling a single beam search decoding and also allowing dynamic switching between the languages. The proposed approach outperforms the conventional bilingual baseline with 13.3%, 8.23% and 1.3% word error rate relative reduction on Hindi, English and code-mixed test sets, respectively.

en eess.AS, cs.CL

Detail DOI Sumber

arXiv Open Access 2024

Guided Distant Supervision for Multilingual Relation Extraction Data: Adapting to a New Language

Alistair Plum, Tharindu Ranasinghe, Christoph Purschke

Relation extraction is essential for extracting and understanding biographical information in the context of digital humanities and related subjects. There is a growing interest in the community to build datasets capable of training machine learning models to extract relationships. However, annotating such datasets can be expensive and time-consuming, in addition to being limited to English. This paper applies guided distant supervision to create a large biographical relationship extraction dataset for German. Our dataset, composed of more than 80,000 instances for nine relationship types, is the largest biographical German relationship extraction dataset. We also create a manually annotated dataset with 2000 instances to evaluate the models and release it together with the dataset compiled using guided distant supervision. We train several state-of-the-art machine learning models on the automatically created dataset and release them as well. Furthermore, we experiment with multilingual and cross-lingual experiments that could benefit many low-resource languages.

en cs.CL, cs.LG

Detail Sumber

arXiv Open Access 2024

DrBenchmark: A Large Language Understanding Evaluation Benchmark for French Biomedical Domain

Yanis Labrak, Adrien Bazoge, Oumaima El Khettari et al.

The biomedical domain has sparked a significant interest in the field of Natural Language Processing (NLP), which has seen substantial advancements with pre-trained language models (PLMs). However, comparing these models has proven challenging due to variations in evaluation protocols across different models. A fair solution is to aggregate diverse downstream tasks into a benchmark, allowing for the assessment of intrinsic PLMs qualities from various perspectives. Although still limited to few languages, this initiative has been undertaken in the biomedical field, notably English and Chinese. This limitation hampers the evaluation of the latest French biomedical models, as they are either assessed on a minimal number of tasks with non-standardized protocols or evaluated using general downstream tasks. To bridge this research gap and account for the unique sensitivities of French, we present the first-ever publicly available French biomedical language understanding benchmark called DrBenchmark. It encompasses 20 diversified tasks, including named-entity recognition, part-of-speech tagging, question-answering, semantic textual similarity, and classification. We evaluate 8 state-of-the-art pre-trained masked language models (MLMs) on general and biomedical-specific data, as well as English specific MLMs to assess their cross-lingual capabilities. Our experiments reveal that no single model excels across all tasks, while generalist models are sometimes still competitive.

en cs.CL, cs.AI

Detail Sumber

DOAJ Open Access 2023

M. Fabii Quintiliani Institutionis oratoriae libri XII. Marco Fabio Quintiliano, Sobre la formación del orador, doce libros, trad. y com. Alfonso Ortega Carmona, en el XIX centenario de la muerte de Quintiliano (Años 96-1996), Salamanca, Publicaciones ...

Roberto Heredia Correa

Por fin tenemos una traducción castellana, moderna y completa, de la Institución oratoria de Quintiliano.

Greek language and literature. Latin language and literature

Detail Sumber

arXiv Open Access 2023

Using Large Language Models for Knowledge Engineering (LLMKE): A Case Study on Wikidata

Bohui Zhang, Ioannis Reklos, Nitisha Jain et al.

In this work, we explore the use of Large Language Models (LLMs) for knowledge engineering tasks in the context of the ISWC 2023 LM-KBC Challenge. For this task, given subject and relation pairs sourced from Wikidata, we utilize pre-trained LLMs to produce the relevant objects in string format and link them to their respective Wikidata QIDs. We developed a pipeline using LLMs for Knowledge Engineering (LLMKE), combining knowledge probing and Wikidata entity mapping. The method achieved a macro-averaged F1-score of 0.701 across the properties, with the scores varying from 1.00 to 0.328. These results demonstrate that the knowledge of LLMs varies significantly depending on the domain and that further experimentation is required to determine the circumstances under which LLMs can be used for automatic Knowledge Base (e.g., Wikidata) completion and correction. The investigation of the results also suggests the promising contribution of LLMs in collaborative knowledge engineering. LLMKE won Track 2 of the challenge. The implementation is available at https://github.com/bohuizhang/LLMKE.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2023

Social Bias Probing: Fairness Benchmarking for Language Models

Marta Marchiori Manerba, Karolina Stańczak, Riccardo Guidotti et al.

While the impact of social biases in language models has been recognized, prior methods for bias evaluation have been limited to binary association tests on small datasets, limiting our understanding of bias complexities. This paper proposes a novel framework for probing language models for social biases by assessing disparate treatment, which involves treating individuals differently according to their affiliation with a sensitive demographic group. We curate SoFa, a large-scale benchmark designed to address the limitations of existing fairness collections. SoFa expands the analysis beyond the binary comparison of stereotypical versus anti-stereotypical identities to include a diverse range of identities and stereotypes. Comparing our methodology with existing benchmarks, we reveal that biases within language models are more nuanced than acknowledged, indicating a broader scope of encoded biases than previously recognized. Benchmarking LMs on SoFa, we expose how identities expressing different religions lead to the most pronounced disparate treatments across all models. Finally, our findings indicate that real-life adversities faced by various groups such as women and people with disabilities are mirrored in the behavior of these models.

en cs.CL

Detail DOI Sumber

arXiv Open Access 2023

PRCA: Fitting Black-Box Large Language Models for Retrieval Question Answering via Pluggable Reward-Driven Contextual Adapter

Haoyan Yang, Zhitao Li, Yong Zhang et al.

The Retrieval Question Answering (ReQA) task employs the retrieval-augmented framework, composed of a retriever and generator. The generator formulates the answer based on the documents retrieved by the retriever. Incorporating Large Language Models (LLMs) as generators is beneficial due to their advanced QA capabilities, but they are typically too large to be fine-tuned with budget constraints while some of them are only accessible via APIs. To tackle this issue and further improve ReQA performance, we propose a trainable Pluggable Reward-Driven Contextual Adapter (PRCA), keeping the generator as a black box. Positioned between the retriever and generator in a Pluggable manner, PRCA refines the retrieved information by operating in a token-autoregressive strategy via maximizing rewards of the reinforcement learning phase. Our experiments validate PRCA's effectiveness in enhancing ReQA performance on three datasets by up to 20% improvement to fit black-box LLMs into existing frameworks, demonstrating its considerable potential in the LLMs era.

en cs.CL

Detail Sumber

DOAJ Open Access 2022

Recensioni

Stefano Costa, SC, Chiara Schürch, CS, Fabio Bellorio, FB et al.

CICERONE, In difesa di Archia, saggio introduttivo, nuova traduzione e note a cura di Daniele PELLACANI (S. COSTA) 143 Gernot Michael MÜLLER, Jörn MÜLLER (Hgg.), Cicero ethicus. Die Tusculanae disputationes im Vergleich mit De finibus bonorum et malorum (C. Schürch) 147 Christopher DIEZ, Ciceros emanzipatorische Leserführung. Studien zum Verhältnis von dialogisch-rhetorischer Inszenierung und skeptischer Philosophie in De natura deorum (F. BELLORIO) 153 Tommaso GAZZARRI, The Stylus and the Scalpel. Theory and Pratice of Metaphors in Seneca’s Prose (A. CASAMENTO) 159

Philology. Linguistics, Greek language and literature. Latin language and literature

Detail DOI Sumber

DOAJ Open Access 2022

Petrarch and the Significance of Dialogue

Aaron Chung, Charles Irwin

The collective mind often attributes the image of a modern Latin classroom to a teacher writing on a chalkboard in front of students eagerly memorising the declensions in silence. However, as part of their search for innovative and effective practices, Latin instructors have consistently expanded their gaze beyond the traditional parameters of rote memorisation for at least since the pioneering efforts of W.H.D. Rouse, looking to more innovative models presented by novel methods for inspiration and to the halls of predecessors in hopes of fostering a more engaging learning environment. Upon close comparative study between the modern pedagogical methods in Latin classrooms and the perspective of Renaissance scholar Petrarch, this study identified a commonality between the two: emphasis on dialogue between different members of the classroom and personal interpretations of preceding authors’ works for a better opportunity of comprehending the content. Grounded in the philosophies of the Socratic method, Petrarch claimed that an important element of the tradition of pedagogy finds expression in dialogues, imitation, and the significance of fully comprehending the topic in pursuit of wisdom. Likewise, many institutions of the U.K. and the United States, strengthened by the emergence of dialectic assessment applications during the Covid-19 Pandemic, are working towards a new norm in place. After conducting an in-depth interpretation of primary and secondary sources regarding Petrarch's pedagogy, as well as research of its modern developments and the applications, the comparison suggests a new direction for the Classics community to consider going forward.

Theory and practice of education, Ancient history

Detail DOI Sumber

DOAJ Open Access 2021

A perspectiva de Rodrigo de Castro sobre as características do sangue menstrual

António Maria Martins Melo, José Sílvio Fernandes, Cristina Santos Pinheiro

A discussão sobre as características do sangue menstrual é um tópico de relevo na obra ginecológica de Rodrigo de Castro. É objectivo deste artigo analisar os argumentos aí apresentados e explorar os pontos fundamentais da reflexão sobre o mênstruo no processo de construção da opinião pessoal do médico lusitano, que se mostra um leitor atento e crítico da tradição, sintetizando as perspectivas convencionais sobre este tema, de modo a elaborar um quadro conceptual de valorização do sangue menstrual.

History of the Greco-Roman World, Greek language and literature. Latin language and literature

Detail DOI Sumber

Hasil untuk "Greek language and literature. Latin language and literature"