TARAZ: Persian Short-Answer Question Benchmark for Cultural Evaluation of Language Models
Reihaneh Iranmanesh, Saeedeh Davoudi, Pasha Abrishamchian
et al.
This paper presents a comprehensive evaluation framework for assessing the cultural competence of large language models (LLMs) in Persian. Existing Persian cultural benchmarks rely predominantly on multiple-choice formats and English-centric metrics that fail to capture Persian's morphological complexity and semantic nuance. Our framework introduces a Persian-specific short-answer evaluation that combines rule-based morphological normalization with a hybrid syntactic and semantic similarity module, enabling robust soft-match scoring beyond exact string overlap. Through systematic evaluation of 15 state-of-the-art open- and closed-source models across three culturally grounded Persian datasets, we demonstrate that our hybrid evaluation improves scoring consistency by +10 compared to exact-match baselines by capturing meaning that surface-level methods cannot detect. Our human evaluation further confirms that the proposed semantic similarity metric achieves higher agreement with human judgments than LLM-based judges. We publicly release our evaluation framework, providing the first standardized benchmark for measuring cultural understanding in Persian and establishing a reproducible foundation for cross-cultural LLM evaluation research.
Material 1: Ausschreibungsunterlagen für 2025/26
Susanne Aretz
Greek language and literature. Latin language and literature, Philology. Linguistics
Elementos de ética estoica en la segunda parte de la Monarquía mística de Lorenzo de Zamora
Manuel Andrés Seoane Rodríguez
Desde bien temprano el cristianismo en su afán evangelizador no solo hizo suyas muchas de las enseñanzas de las diferentes escuelas filosóficas, especialmente las procedentes de académicos, neoplatónicos y estoicos, sino que además se esforzó en presentarse como verdadeira filosofía. De este modo, preceptos teóricos y actitudes vitales pasaron a integrarse perfectamente dentro de la doctrina cristiana en sus diferentes manifestaciones. El monacato heredó, sin duda, todo lo relativo a la vida interior y a la praxis del ejercicio filosófico. Nosotros nos proponemos en este trabajo demostrar que asuntos como el del autoconocimiento y el contemptus mundi llegaron a la obra de fray Lorenzo de Zamora titulada Monarquía mística no solo a través del uso de antologias sino también a través de la regla monástica de la orden monástica a la que él pertenecía, el Císter.
History of the Greco-Roman World, Greek language and literature. Latin language and literature
Type-Less yet Type-Aware Inductive Link Prediction with Pretrained Language Models
Alessandro De Bellis, Salvatore Bufi, Giovanni Servedio
et al.
Inductive link prediction is emerging as a key paradigm for real-world knowledge graphs (KGs), where new entities frequently appear and models must generalize to them without retraining. Predicting links in a KG faces the challenge of guessing previously unseen entities by leveraging generalizable node features such as subgraph structure, type annotations, and ontological constraints. However, explicit type information is often lacking or incomplete. Even when available, type information in most KGs is often coarse-grained, sparse, and prone to errors due to human annotation. In this work, we explore the potential of pre-trained language models (PLMs) to enrich node representations with implicit type signals. We introduce TyleR, a Type-less yet type-awaRe approach for subgraph-based inductive link prediction that leverages PLMs for semantic enrichment. Experiments on standard benchmarks demonstrate that TyleR outperforms state-of-the-art baselines in scenarios with scarce type annotations and sparse graph connectivity. To ensure reproducibility, we share our code at https://github.com/sisinflab/tyler .
Stronger Baselines for Retrieval-Augmented Generation with Long-Context Language Models
Alex Laitenberger, Christopher D. Manning, Nelson F. Liu
With the rise of long-context language models (LMs) capable of processing tens of thousands of tokens in a single context window, do multi-stage retrieval-augmented generation (RAG) pipelines still offer measurable benefits over simpler, single-stage approaches? To assess this question, we conduct a controlled evaluation for QA tasks under systematically scaled token budgets, comparing two recent multi-stage pipelines, ReadAgent and RAPTOR, against three baselines, including DOS RAG (Document's Original Structure RAG), a simple retrieve-then-read method that preserves original passage order. Despite its straightforward design, DOS RAG consistently matches or outperforms more intricate methods on multiple long-context QA benchmarks. We trace this strength to a combination of maintaining source fidelity and document structure, prioritizing recall within effective context windows, and favoring simplicity over added pipeline complexity. We recommend establishing DOS RAG as a simple yet strong baseline for future RAG evaluations, paired with state-of-the-art embedding and language models, and benchmarked under matched token budgets, to ensure that added pipeline complexity is justified by clear performance gains as models continue to improve.
Zweite Griechischakademie Köln 2024
Susanne Aretz
Greek language and literature. Latin language and literature, Philology. Linguistics
Evaluating Class Membership Relations in Knowledge Graphs using Large Language Models
Bradley P. Allen, Paul T. Groth
A backbone of knowledge graphs are their class membership relations, which assign entities to a given class. As part of the knowledge engineering process, we propose a new method for evaluating the quality of these relations by processing descriptions of a given entity and class using a zero-shot chain-of-thought classifier that uses a natural language intensional definition of a class. We evaluate the method using two publicly available knowledge graphs, Wikidata and CaLiGraph, and 7 large language models. Using the gpt-4-0125-preview large language model, the method's classification performance achieves a macro-averaged F1-score of 0.830 on data from Wikidata and 0.893 on data from CaLiGraph. Moreover, a manual analysis of the classification errors shows that 40.9% of errors were due to the knowledge graphs, with 16.0% due to missing relations and 24.9% due to incorrectly asserted relations. These results show how large language models can assist knowledge engineers in the process of knowledge graph refinement. The code and data are available on Github.
ByteScience: Bridging Unstructured Scientific Literature and Structured Data with Auto Fine-tuned Large Language Model in Token Granularity
Tong Xie, Hanzhi Zhang, Shaozhou Wang
et al.
Natural Language Processing (NLP) is widely used to supply summarization ability from long context to structured information. However, extracting structured knowledge from scientific text by NLP models remains a challenge because of its domain-specific nature to complex data preprocessing and the granularity of multi-layered device-level information. To address this, we introduce ByteScience, a non-profit cloud-based auto fine-tuned Large Language Model (LLM) platform, which is designed to extract structured scientific data and synthesize new scientific knowledge from vast scientific corpora. The platform capitalizes on DARWIN, an open-source, fine-tuned LLM dedicated to natural science. The platform was built on Amazon Web Services (AWS) and provides an automated, user-friendly workflow for custom model development and data extraction. The platform achieves remarkable accuracy with only a small amount of well-annotated articles. This innovative tool streamlines the transition from the science literature to structured knowledge and data and benefits the advancements in natural informatics.
Multi-word Tokenization for Sequence Compression
Leonidas Gee, Leonardo Rigutini, Marco Ernandes
et al.
Large Language Models have proven highly successful at modelling a variety of tasks. However, this comes at a steep computational cost that hinders wider industrial uptake. In this paper, we present MWT: a Multi-Word Tokenizer that goes beyond word boundaries by representing frequent multi-word expressions as single tokens. MWTs produce a more compact and efficient tokenization that yields two benefits: (1) Increase in performance due to a greater coverage of input data given a fixed sequence length budget; (2) Faster and lighter inference due to the ability to reduce the sequence length with negligible drops in performance. Our results show that MWT is more robust across shorter sequence lengths, thus allowing for major speedups via early sequence truncation.
LLAssist: Simple Tools for Automating Literature Review Using Large Language Models
Christoforus Yoga Haryanto
This paper introduces LLAssist, an open-source tool designed to streamline literature reviews in academic research. In an era of exponential growth in scientific publications, researchers face mounting challenges in efficiently processing vast volumes of literature. LLAssist addresses this issue by leveraging Large Language Models (LLMs) and Natural Language Processing (NLP) techniques to automate key aspects of the review process. Specifically, it extracts important information from research articles and evaluates their relevance to user-defined research questions. The goal of LLAssist is to significantly reduce the time and effort required for comprehensive literature reviews, allowing researchers to focus more on analyzing and synthesizing information rather than on initial screening tasks. By automating parts of the literature review workflow, LLAssist aims to help researchers manage the growing volume of academic publications more efficiently.
Streaming Bilingual End-to-End ASR model using Attention over Multiple Softmax
Aditya Patil, Vikas Joshi, Purvi Agrawal
et al.
Even with several advancements in multilingual modeling, it is challenging to recognize multiple languages using a single neural model, without knowing the input language and most multilingual models assume the availability of the input language. In this work, we propose a novel bilingual end-to-end (E2E) modeling approach, where a single neural model can recognize both languages and also support switching between the languages, without any language input from the user. The proposed model has shared encoder and prediction networks, with language-specific joint networks that are combined via a self-attention mechanism. As the language-specific posteriors are combined, it produces a single posterior probability over all the output symbols, enabling a single beam search decoding and also allowing dynamic switching between the languages. The proposed approach outperforms the conventional bilingual baseline with 13.3%, 8.23% and 1.3% word error rate relative reduction on Hindi, English and code-mixed test sets, respectively.
Guided Distant Supervision for Multilingual Relation Extraction Data: Adapting to a New Language
Alistair Plum, Tharindu Ranasinghe, Christoph Purschke
Relation extraction is essential for extracting and understanding biographical information in the context of digital humanities and related subjects. There is a growing interest in the community to build datasets capable of training machine learning models to extract relationships. However, annotating such datasets can be expensive and time-consuming, in addition to being limited to English. This paper applies guided distant supervision to create a large biographical relationship extraction dataset for German. Our dataset, composed of more than 80,000 instances for nine relationship types, is the largest biographical German relationship extraction dataset. We also create a manually annotated dataset with 2000 instances to evaluate the models and release it together with the dataset compiled using guided distant supervision. We train several state-of-the-art machine learning models on the automatically created dataset and release them as well. Furthermore, we experiment with multilingual and cross-lingual experiments that could benefit many low-resource languages.
DrBenchmark: A Large Language Understanding Evaluation Benchmark for French Biomedical Domain
Yanis Labrak, Adrien Bazoge, Oumaima El Khettari
et al.
The biomedical domain has sparked a significant interest in the field of Natural Language Processing (NLP), which has seen substantial advancements with pre-trained language models (PLMs). However, comparing these models has proven challenging due to variations in evaluation protocols across different models. A fair solution is to aggregate diverse downstream tasks into a benchmark, allowing for the assessment of intrinsic PLMs qualities from various perspectives. Although still limited to few languages, this initiative has been undertaken in the biomedical field, notably English and Chinese. This limitation hampers the evaluation of the latest French biomedical models, as they are either assessed on a minimal number of tasks with non-standardized protocols or evaluated using general downstream tasks. To bridge this research gap and account for the unique sensitivities of French, we present the first-ever publicly available French biomedical language understanding benchmark called DrBenchmark. It encompasses 20 diversified tasks, including named-entity recognition, part-of-speech tagging, question-answering, semantic textual similarity, and classification. We evaluate 8 state-of-the-art pre-trained masked language models (MLMs) on general and biomedical-specific data, as well as English specific MLMs to assess their cross-lingual capabilities. Our experiments reveal that no single model excels across all tasks, while generalist models are sometimes still competitive.
M. Fabii Quintiliani Institutionis oratoriae libri XII. Marco Fabio Quintiliano, Sobre la formación del orador, doce libros, trad. y com. Alfonso Ortega Carmona, en el XIX centenario de la muerte de Quintiliano (Años 96-1996), Salamanca, Publicaciones ...
Roberto Heredia Correa
Por fin tenemos una traducción castellana, moderna y completa, de la Institución oratoria de Quintiliano.
Greek language and literature. Latin language and literature
Using Large Language Models for Knowledge Engineering (LLMKE): A Case Study on Wikidata
Bohui Zhang, Ioannis Reklos, Nitisha Jain
et al.
In this work, we explore the use of Large Language Models (LLMs) for knowledge engineering tasks in the context of the ISWC 2023 LM-KBC Challenge. For this task, given subject and relation pairs sourced from Wikidata, we utilize pre-trained LLMs to produce the relevant objects in string format and link them to their respective Wikidata QIDs. We developed a pipeline using LLMs for Knowledge Engineering (LLMKE), combining knowledge probing and Wikidata entity mapping. The method achieved a macro-averaged F1-score of 0.701 across the properties, with the scores varying from 1.00 to 0.328. These results demonstrate that the knowledge of LLMs varies significantly depending on the domain and that further experimentation is required to determine the circumstances under which LLMs can be used for automatic Knowledge Base (e.g., Wikidata) completion and correction. The investigation of the results also suggests the promising contribution of LLMs in collaborative knowledge engineering. LLMKE won Track 2 of the challenge. The implementation is available at https://github.com/bohuizhang/LLMKE.
Social Bias Probing: Fairness Benchmarking for Language Models
Marta Marchiori Manerba, Karolina Stańczak, Riccardo Guidotti
et al.
While the impact of social biases in language models has been recognized, prior methods for bias evaluation have been limited to binary association tests on small datasets, limiting our understanding of bias complexities. This paper proposes a novel framework for probing language models for social biases by assessing disparate treatment, which involves treating individuals differently according to their affiliation with a sensitive demographic group. We curate SoFa, a large-scale benchmark designed to address the limitations of existing fairness collections. SoFa expands the analysis beyond the binary comparison of stereotypical versus anti-stereotypical identities to include a diverse range of identities and stereotypes. Comparing our methodology with existing benchmarks, we reveal that biases within language models are more nuanced than acknowledged, indicating a broader scope of encoded biases than previously recognized. Benchmarking LMs on SoFa, we expose how identities expressing different religions lead to the most pronounced disparate treatments across all models. Finally, our findings indicate that real-life adversities faced by various groups such as women and people with disabilities are mirrored in the behavior of these models.
PRCA: Fitting Black-Box Large Language Models for Retrieval Question Answering via Pluggable Reward-Driven Contextual Adapter
Haoyan Yang, Zhitao Li, Yong Zhang
et al.
The Retrieval Question Answering (ReQA) task employs the retrieval-augmented framework, composed of a retriever and generator. The generator formulates the answer based on the documents retrieved by the retriever. Incorporating Large Language Models (LLMs) as generators is beneficial due to their advanced QA capabilities, but they are typically too large to be fine-tuned with budget constraints while some of them are only accessible via APIs. To tackle this issue and further improve ReQA performance, we propose a trainable Pluggable Reward-Driven Contextual Adapter (PRCA), keeping the generator as a black box. Positioned between the retriever and generator in a Pluggable manner, PRCA refines the retrieved information by operating in a token-autoregressive strategy via maximizing rewards of the reinforcement learning phase. Our experiments validate PRCA's effectiveness in enhancing ReQA performance on three datasets by up to 20% improvement to fit black-box LLMs into existing frameworks, demonstrating its considerable potential in the LLMs era.
Recensioni
Stefano Costa, SC, Chiara Schürch, CS, Fabio Bellorio, FB
et al.
CICERONE, In difesa di Archia, saggio introduttivo, nuova traduzione e note a cura di Daniele PELLACANI (S. COSTA) 143
Gernot Michael MÜLLER, Jörn MÜLLER (Hgg.), Cicero ethicus. Die Tusculanae disputationes im Vergleich mit De finibus bonorum et malorum (C. Schürch) 147
Christopher DIEZ, Ciceros emanzipatorische Leserführung. Studien zum Verhältnis von dialogisch-rhetorischer Inszenierung und skeptischer Philosophie in De natura deorum (F. BELLORIO) 153
Tommaso GAZZARRI, The Stylus and the Scalpel. Theory and Pratice of Metaphors in Seneca’s Prose (A. CASAMENTO) 159
Philology. Linguistics, Greek language and literature. Latin language and literature
Petrarch and the Significance of Dialogue
Aaron Chung, Charles Irwin
The collective mind often attributes the image of a modern Latin classroom to a teacher writing on a chalkboard in front of students eagerly memorising the declensions in silence. However, as part of their search for innovative and effective practices, Latin instructors have consistently expanded their gaze beyond the traditional parameters of rote memorisation for at least since the pioneering efforts of W.H.D. Rouse, looking to more innovative models presented by novel methods for inspiration and to the halls of predecessors in hopes of fostering a more engaging learning environment. Upon close comparative study between the modern pedagogical methods in Latin classrooms and the perspective of Renaissance scholar Petrarch, this study identified a commonality between the two: emphasis on dialogue between different members of the classroom and personal interpretations of preceding authors’ works for a better opportunity of comprehending the content. Grounded in the philosophies of the Socratic method, Petrarch claimed that an important element of the tradition of pedagogy finds expression in dialogues, imitation, and the significance of fully comprehending the topic in pursuit of wisdom. Likewise, many institutions of the U.K. and the United States, strengthened by the emergence of dialectic assessment applications during the Covid-19 Pandemic, are working towards a new norm in place. After conducting an in-depth interpretation of primary and secondary sources regarding Petrarch's pedagogy, as well as research of its modern developments and the applications, the comparison suggests a new direction for the Classics community to consider going forward.
Theory and practice of education, Ancient history
A perspectiva de Rodrigo de Castro sobre as características do sangue menstrual
António Maria Martins Melo, José Sílvio Fernandes, Cristina Santos Pinheiro
A discussão sobre as características do sangue menstrual é um tópico de relevo na obra ginecológica de Rodrigo de Castro. É objectivo deste artigo analisar os argumentos aí apresentados e explorar os pontos fundamentais da reflexão sobre o mênstruo no processo de construção da opinião pessoal do médico lusitano, que se mostra um leitor atento e crítico da tradição, sintetizando as perspectivas convencionais sobre este tema, de modo a elaborar um quadro conceptual de valorização do sangue menstrual.
History of the Greco-Roman World, Greek language and literature. Latin language and literature