LitBench: A Graph-Centric Large Language Model Benchmarking Tool For Literature Tasks
Andreas Varvarigos, Ali Maatouk, Jiasheng Zhang
et al.
While large language models (LLMs) have become the de facto framework for literature-related tasks, they still struggle to function as domain-specific literature agents due to their inability to connect pieces of knowledge and reason across domain-specific contexts, terminologies, and nomenclatures. This challenge underscores the need for a tool that facilitates such domain-specific adaptation and enables rigorous benchmarking across literature tasks. To that end, we introduce LitBench, a benchmarking tool designed to enable the development and evaluation of domain-specific LLMs tailored to literature-related tasks. At its core, LitBench uses a data curation process that generates domain-specific literature sub-graphs and constructs training and evaluation datasets based on the textual attributes of the resulting nodes and edges. The tool is designed for flexibility, supporting the curation of literature graphs across any domain chosen by the user, whether high-level fields or specialized interdisciplinary areas. In addition to dataset curation, LitBench defines a comprehensive suite of literature tasks, ranging from node and edge level analyses to advanced applications such as related work generation. These tasks enable LLMs to internalize domain-specific knowledge and relationships embedded in the curated graph during training, while also supporting rigorous evaluation of model performance. Our results show that small domain-specific LLMs trained and evaluated on LitBench datasets achieve competitive performance compared to state-of-the-art models like GPT-4o and DeepSeek-R1. To enhance accessibility and ease of use, we open-source the tool along with an AI agent tool that streamlines data curation, model training, and evaluation.
Some Remarks on Marginal Code Languages
Stavros Konstantinidis
A prefix code L satisfies the condition that no word of L is a proper prefix of another word of L. Recently, Ko, Han and Salomaa relaxed this condition by allowing a word of L to be a proper prefix of at most k words of L, for some `margin' k, introducing thus the class of k-prefix-free languages, as well as the similar classes of k-suffix-free and k-infix-free languages. Here we unify the definitions of these three classes of languages into one uniform definition in two ways: via the method of partial orders and via the method of transducers. Thus, for any known class of code-related languages definable via the transducer method, one gets a marginal version of that class. Building on the techniques of Ko, Han and Salomaa, we discuss the \emph{uniform} satisfaction and maximality problems for marginal classes of languages.
Dynamic Membership for Regular Tree Languages
Antoine Amarilli, Corentin Barloy, Louis Jachiet
et al.
We study the dynamic membership problem for regular tree languages under relabeling updates: we fix an alphabet $Σ$ and a regular tree language $L$ over $Σ$ (expressed, e.g., as a tree automaton), we are given a tree $T$ with labels in $Σ$, and we must maintain the information of whether the tree $T$ belongs to $L$ while handling relabeling updates that change the labels of individual nodes in $T$. Our first contribution is to show that this problem admits an $O(\log n / \log \log n)$ algorithm for any fixed regular tree language, improving over known $O(\log n)$ algorithms. This generalizes the known $O(\log n / \log \log n)$ upper bound over words, and it matches the lower bound of $Ω(\log n / \log \log n)$ from dynamic membership to some word languages and from the existential marked ancestor problem. Our second contribution is to introduce a class of regular languages, dubbed almost-commutative tree languages, and show that dynamic membership to such languages under relabeling updates can be decided in constant time per update. Almost-commutative languages generalize both commutative languages and finite languages: they are the analogue for trees of the ZG languages enjoying constant-time dynamic membership over words. Our main technical contribution is to show that this class is conditionally optimal when we assume that the alphabet features a neutral letter, i.e., a letter that has no effect on membership to the language. More precisely, we show that any regular tree language with a neutral letter which is not almost-commutative cannot be maintained in constant time under the assumption that the prefix-U1 problem from (Amarilli, Jachiet, Paperman, ICALP'21) also does not admit a constant-time algorithm.
NLP-KG: A System for Exploratory Search of Scientific Literature in Natural Language Processing
Tim Schopf, Florian Matthes
Scientific literature searches are often exploratory, whereby users are not yet familiar with a particular field or concept but are interested in learning more about it. However, existing systems for scientific literature search are typically tailored to keyword-based lookup searches, limiting the possibilities for exploration. We propose NLP-KG, a feature-rich system designed to support the exploration of research literature in unfamiliar natural language processing (NLP) fields. In addition to a semantic search, NLP-KG allows users to easily find survey papers that provide a quick introduction to a field of interest. Further, a Fields of Study hierarchy graph enables users to familiarize themselves with a field and its related areas. Finally, a chat interface allows users to ask questions about unfamiliar concepts or specific articles in NLP and obtain answers grounded in knowledge retrieved from scientific publications. Our system provides users with comprehensive exploration possibilities, supporting them in investigating the relationships between different fields, understanding unfamiliar concepts in NLP, and finding relevant research literature. Demo, video, and code are available at: https://github.com/NLP-Knowledge-Graph/NLP-KG-WebApp.
Algebraic Language Theory with Effects
Fabian Lenke, Stefan Milius, Henning Urbat
et al.
Regular languages -- the languages accepted by deterministic finite automata -- are known to be precisely the languages recognized by finite monoids. This characterization is the origin of algebraic language theory. In this paper, we generalize the correspondence between automata and monoids to automata with generic computational effects given by a monad, providing the foundations of an effectful algebraic language theory. We show that, under suitable conditions on the monad, a language is computable by an effectful automaton precisely when it is recognizable by (1) an effectful monoid morphism into an effect-free finite monoid, and (2) a monoid morphism into a monad-monoid bialgebra whose carrier is a finitely generated algebra for the monad, the former mode of recognition being conceptually completely new. Our prime application is a novel algebraic approach to languages computed by probabilistic finite automata. Additionally, we derive new algebraic characterizations for nondeterministic probabilistic finite automata and for weighted finite automata over unrestricted semirings, generalizing previous results on weighted algebraic recognition over commutative rings.
What We Know So Far: Artificial Intelligence in African Healthcare
Naome Etori, Ebasa Temesgen, Maria Gini
Healthcare in Africa is a complex issue influenced by many factors including poverty, lack of infrastructure, and inadequate funding. However, Artificial intelligence (AI) applied to healthcare, has the potential to transform healthcare in Africa by improving the accuracy and efficiency of diagnosis, enabling earlier detection of diseases, and supporting the delivery of personalized medicine. This paper reviews the current state of how AI Algorithms can be used to improve diagnostics, treatment, and disease monitoring, as well as how AI can be used to improve access to healthcare in Africa as a low-resource setting and discusses some of the critical challenges and opportunities for its adoption. As such, there is a need for a well-coordinated effort by the governments, private sector, healthcare providers, and international organizations to create sustainable AI solutions that meet the unique needs of the African healthcare system.
Front Matter and TOC
Hiroshi Nara
-
Language and Literature, Japanese language and literature
SQL and NoSQL Databases Software architectures performance analysis and assessments -- A Systematic Literature review
Wisal Khan, Teerath Kumar, Zhang Cheng
et al.
Context: The efficient processing of Big Data is a challenging task for SQL and NoSQL Databases, where competent software architecture plays a vital role. The SQL Databases are designed for structuring data and supporting vertical scalability. In contrast, horizontal scalability is backed by NoSQL Databases and can process sizeable unstructured Data efficiently. One can choose the right paradigm according to the organisation's needs; however, making the correct choice can often be challenging. The SQL and NoSQL Databases follow different architectures. Also, the mixed model is followed by each category of NoSQL Databases. Hence, data movement becomes difficult for cloud consumers across multiple cloud service providers (CSPs). In addition, each cloud platform IaaS, PaaS, SaaS, and DBaaS also monitors various paradigms. Objective: This systematic literature review (SLR) aims to study the related articles associated with SQL and NoSQL Database software architectures and tackle data portability and Interoperability among various cloud platforms. State of the art presented many performance comparison studies of SQL and NoSQL Databases by observing scaling, performance, availability, consistency and sharding characteristics. According to the research studies, NoSQL Database designed structures can be the right choice for big data analytics, while SQL Databases are suitable for OLTP Databases. The researcher proposes numerous approaches associated with data movement in the cloud. Platform-based APIs are developed, which makes users' data movement difficult. Therefore, data portability and Interoperability issues are noticed during data movement across multiple CSPs. To minimize developer efforts and Interoperability, Unified APIs are demanded to make data movement relatively more accessible among various cloud platforms.
Semilinearity of Families of Languages
Oscar H. Ibarra, Ian McQuillan
Techniques are developed for creating new and general language families of only semilinear languages, and for showing families only contain semilinear languages. It is shown that for language families L that are semilinear full trios, the smallest full AFL containing L that is also closed under intersection with languages in NCM (where NCM is the family of languages accepted by NFAs augmented with reversal-bounded counters), is also semilinear. If these closure properties are effective, this also immediately implies decidability of membership, emptiness, and infiniteness for these general families. From the general techniques, new grammar systems are given that are extensions of well-known families of semilinear full trios, whereby it is implied that these extensions must only describe semilinear languages. This also implies positive decidability properties for the new systems. Some characterizations of the new families are also given.
A Benkei for Every Age: Musashibō Benkei as Palimpsest
Christopher Smith
This article traces the history of Benkei production—the production of texts concerning Musashibō Benkei—to show that the image of Benkei is not stable, but rather has been adapted and modified repeatedly since the fourteenth century according to the social, economic, political, and cultural climate, as well as the narrative needs, of various eras. Each new instance of Benkei production does not erase or overwrite the previous instances, but rather adds another layer to the cultural construct “Benkei.” This article is not intended to be a comprehensive overview of Benkei works, nor is it particularly an attempt to unearth obscure Benkei works. Instead, the article shows how literature and literary characters can be adapted and transformed over a long time frame. It addresses relatively well-known texts, but examines them in the context of a history of Benkei texts that reveal a shifting, changing image of Benkei responsive to historicized cultural environments.
Language and Literature, Japanese language and literature
Implementation of e-Learning Based on Learning Management System Using Discovery Learning Method for Disabilities Students
Roni Amrulloh, Irwan Rahadi, Riyana Rizki Yuliatin
et al.
Inclusive learning for students with disabilities is a must that must be done in learning. The purpose of this study is to look at the application of Discovery Learning learning to students with physical disabilities and mild disabilities in the learning process especially in learning advanced speaking skills. The method used is to use the type of development Borg & Gall by using 7 stages which are divided into 3 parts, namely preliminary studies, product development and trials and finalization. The results of this study are Introduction (Rationalization of innovation, Discovery Learning models, Supporting theories of learning model development); Learning model with agreement on Discovery Learning (Discovery Learning learning component and learning model component); and Instructions on the implementation of the learning problem learning model (Planning, Implementation, and Evaluation) equipped with videos of the implementation of learning activities. In addition, the learning process produces learning videos as a result of learning the activity.
Keywords: Mentally Disability, Physical Disability, Discovery Learning, Students, Literary Criticism
Theory and practice of education, Languages and literature of Eastern Asia, Africa, Oceania
Aanspreekliheid (Jaco Fouché)
Chantelle Gray
African languages and literature
Krónika 2020/2
A szerkesztő
Összefoglalás az ELTE Távol-keleti Intézet éves munkájáról.
Chinese language and literature
Eastern African women writers’ ‘national epics’: A new force in creative fiction?
Annie Gagiano
In this article, I bring five recent, substantial novels by Eastern African women writers together for the first time in a study regarding the texts as modern ‘national epics’, analysing some of their shared characteristics in foregrounding local participation in the making of East African ethno-national histories. I trace the novelists’ implicit, open-eyed moral evaluation of their leaders and peoples, neither sentimentalising nor deriding the often terrible struggles of their peoples against both inside and outside powers that seek to keep them in subjugation. The texts eschew traditional heroic portrayal of single, male leaders in national epics and allow us to grasp diverse, communal contributions to the growth of nationhood, while giving larger, often central roles to women. The texts earn the epithet ‘epic’ by authoritatively demonstrating that their embodied, localised histories matter, testifying to the wide human spectrum of the peoples they portray; as novelistic acts they are impressive and moving bids for recognition. As post-colonial endeavours, the texts effectively decentre colonial interventions. While the chosen novels are shown to be relatable, their individual power of portrayal and aesthetic achievements are scrupulously differentiated.
African languages and literature
State Complexity Investigations on Commutative Languages -- The Upward and Downward Closure, Commutative Aperiodic and Commutative Group Languages
Stefan Hoffmann
We investigate the state complexity of the upward and downward closure and interior operations on commutative regular languages. Then, we systematically study the state complexity of these operations and of the shuffle operation on commutative group languages and commutative aperiodic (or star-free) languages.
An integration by parts formula for the bilinear form of the hypersingular boundary integral operator for the transient heat equation in three spatial dimensions
Raphael Watschinger, Günther Of
While an integration by parts formula for the bilinear form of the hypersingular boundary integral operator for the transient heat equation in three spatial dimensions is available in the literature, a proof of this formula seems to be missing. Moreover, the available formula contains an integral term including the time derivative of the fundamental solution of the heat equation, whose interpretation is difficult at second glance. To fill these gaps we provide a rigorous proof of a general version of the integration by parts formula and an alternative representation of the mentioned integral term, which is valid for a certain class of functions including the typical tensor-product discretization spaces.
INTERACTIVE LEARNING MEDIUM DEVELOPMENT FOR LEARNING HIRAGANA AND KATAKANA
Desak Made Sri Mardani, I Wayan Sadyana, Putu Hendra Suputra
The ability of college students in X university to use Hiragana and Katakana letters is still weak due to the lack of practice in reading and using these letters in a word/sentence. The ability to use Hiragana and Katakana letters is not only about the ability to understand the order of writing and the differences in the strokes, but also to use Hiragana and Katakana in words/sentences. This research was a descriptive study using R & D design based on the Four-D Model. In this study, three stages were carried out out of the four stages of the model. Questionnaires and interviews were used in this study as a method of data collection. Interviews also conducted to determine the needs of teachers and students, while questionnaire was used for expert judgement (expert appraisal) process consisting of learning media expert judgement and content expert judgement. The questionnaire data were analyzed descriptively to determine deficiencies in the media created. After going through the improvement phase, then a limited scale trial was conducted on 26 students. From the questionnaire data on the interactive media produced, it is known that overall, the media produced received excellent responses from students, wherein each assessment indicator also received a positive response as well. However, further studies are needed to find out how to implement the interactive media of Hiragana and Katakana in the learning process directly inside the classroom.
Japanese language and literature
Proto-Berber phonological reconstruction: An update
Maarten Kossmann
Over the last decades, our insights in the phonological history of Berber and the reconstruction of its earlier stages greatly evolved. This is thanks to an emergent discussion and to new data on a number of languages that are crucial to reconstructing Proto-Berber, most importantly the works by Catherine Taine-Cheikh on Zenaga. In this article, I will provide an overview of the results and challenges in the reconstruction of Proto-Berber phonology.
African languages and literature
Narrative Inquiry into Online Teaching of Chinese Characters during the Pandemic
Qi Zhang
Against the backdrop of the coronavirus pandemic, a large number of third-level institutes have had to transfer all teaching and learning activities online. This inevitable and urgently needed remote teaching is likely to lead to difficulties in the study of Chinese characters for beginner learners. Due to the pictographic origin and logographic nature of Chinese script, previous research shows the writeto-read effect and the importance of handwriting-to-character recognition. However, the nature of online learning suggests that all pedagogical practices will have to rely on digital input rather than pen and paper, which minimises the opportunities for handwriting. Furthermore, the worldwide crisis has also led to a lack of time and resources needed to develop a well-paced online curriculum that allow beginner learners to acquire characters while developing their character typing skills. Building upon narrative inquiry, this study explores the approach to studying Chinese characters during the pandemic. It first examines the challenges of teaching and learning Chinese characters online, and then reflects on the first-hand experience of teaching characters online among five CFL teachers based in Ireland and the UK. This study finds that a structured approach seems to benefit
the teaching of Chinese characters. Knowledge of Chinese characters should also be explicitly
incorporated into online teaching. The study will be one of the first contributing to the design and delivery of online teaching of Chinese characters in the context of a crisis scenario.
Chinese language and literature
Rational index of bounded-oscillation languages
Ekaterina Shemetova, Alexander Okhotin, Semyon Grigorev
The rational index of a context-free language $L$ is a function $f(n)$, such that for each regular language $R$ recognized by an automaton with $n$ states, the intersection of $L$ and $R$ is either empty or contains a word shorter than $f(n)$. It is known that the context-free language (CFL-)reachability problem and Datalog query evaluation for context-free languages (queries) with the polynomial rational index is in NC, while these problems is P-complete in the general case. We investigate the rational index of bounded-oscillation languages and show that it is of polynomial order. We obtain upper bounds on the values of the rational index for general bounded-oscillation languages and for some of its previously studied subclasses.