Du statut des prédicatifs dits verbaux en buamu (Langue gur)
Roland BICABA
Cet article traite de la morphologie des constituants verbaux en buamu, une langue gur parlée au Burkina Faso et au Mali. Il interroge spécifiquement la nature de certains morphèmes traditionnellement considérés comme des marqueurs verbaux. Grace à une méthodologie axée sur l’examen d’un corpus, nous avons pu établir qu’en ce qui concerne l’expression du futur, le prospectif ne possède pas de marqueur spécifique ; il se manifeste par la simple juxtaposition d’un terme sujet à la forme non finie du verbe. Quant au projectif et à l’éventuel, ils ne sont pas, eux non plus, exprimés par des prédicatifs verbaux, mais par des verbes auxiliaires qui, sur le plan sémantique, appartiennent à la catégorie des verbes de mouvement. Sont également des verbes axillaires (de mouvement), les monèmes qui servent à l’expression des différentes nuances de l’aspect inaccompli présent en buamu. La particularité des constructions à cet aspect réside cependant dans le fait qu’elles dérivent d’une structure de prédication non verbale de situation. En effet, la transformation d’une prédication non verbale en une prédication dite verbale repose simplement sur la suppression du prédicatif non verbal de situation. Par conséquent, il nous parait légitime de nous interroger sur la pertinence même du concept de prédication verbale dans l’expression des valeurs aspectuelles du présent dans la langue buamu.
African languages and literature
Turn Complexity of Context-free Languages, Pushdown Automata and One-Counter Automata
Giovanni Pighizzini
A turn in a computation of a pushdown automaton is a switch from a phase in which the height of the pushdown store increases to a phase in which it decreases. Given a pushdown or one-counter automaton, we consider, for each string in its language, the minimum number of turns made in accepting computations. We prove that it cannot be decided if this number is bounded by any constants. Furthermore, we obtain a non-recursive trade-off between pushdown and one-counter automata accepting in a finite number of turns and finite-turn pushdown automata, that are defined requiring that the constant bound is satisfied by each accepting computation. We prove that there are languages accepted in a sublinear but not constant number of turns, with respect to the input length. Furthermore, there exists an infinite proper hierarchy of complexity classes, with the number of turns bounded by different sublinear functions. In addition, there is a language requiring a number of turns which is not constant but grows slower than each of the functions defining the above hierarchy.
MzansiText and MzansiLM: An Open Corpus and Decoder-Only Language Model for South African Languages
Anri Lombard, Simbarashe Mawere, Temi Aina
et al.
Decoder-only language models can be adapted to diverse tasks through instruction finetuning, but the extent to which this generalizes at small scale for low-resource languages remains unclear. We focus on the languages of South Africa, where we are not aware of a publicly available decoder-only model that explicitly targets all eleven official written languages, nine of which are low-resource. We introduce MzansiText, a curated multilingual pretraining corpus with a reproducible filtering pipeline, and MzansiLM, a 125M-parameter language model trained from scratch. We evaluate MzansiLM on natural language understanding and generation using three adaptation regimes: monolingual task-specific finetuning, multilingual task-specific finetuning, and general multi-task instruction finetuning. Monolingual task-specific finetuning achieves strong performance on data-to-text generation, reaching 20.65 BLEU on isiXhosa and competing with encoder-decoder baselines over ten times larger. Multilingual task-specific finetuning benefits closely related languages on topic classification, achieving 78.5% macro-F1 on isiXhosa news classification. While MzansiLM adapts effectively to supervised NLU and NLG tasks, few-shot reasoning remains challenging at this model size, with performance near chance even for much larger decoder-only models. We release MzansiText and MzansiLM to provide a reproducible decoder-only baseline and clear guidance on adaptation strategies for South African languages at small scale.
Human language technology tools for indigenous South African languages and their potential use
Respect Mlambo, Muzi Matfunjwa
Human language technology (HLT) contributes to the development of languages by providing various avenues through which languages can be interrogated. Through HLT, diverse questions can be raised and answered scientifically and objectively. In the context of South African indigenous languages (SAIL), several HLT tools support these languages. However, it seems that some language users are unaware of the availability and capabilities of these tools, which contributes to their underutilisation. This study aims to identify and describe briefly some of the HLT tools that support and analyse SAIL. It presents an overview of the open access HLT tools, namely part-of-speech (POS) taggers, morphological decomposers (MDs), morphological analysers (MAs), isiZulu.net, ZulMorph and Google Translate (GT). These tools are crucial in analysing and understanding SAIL, as well as for advancing these languages in the field of HLT. In this study, the researchers anticipate that by raising awareness of the existence of these tools, more users of indigenous languages will be eager to use them.
Contribution: This study fills the practical gap in the use of HLT to perform linguistic functions for SAIL. It seems that there is underutilisation of existing HLT tools for SAIL, which might be attributed to language users being unaware of these tools. Therefore, the study aims to identify and describe some HLT tools that support and analyse SAIL. It presents an overview of the open access HLT tools, namely POS taggers, MD, MA, isiZulu.net, ZulMorph and GT. The researchers intend to demonstrate the use of these tools and to raise awareness about their existence.
African languages and literature
Sesotho Language Acquisition by Faculty of Education Students in South Africa: A Systematic Review
Nthabiseng B. Khoalenyane, Patrick Alpheous Nyathi, Precious Moyo
Higher education institutions are increasingly interested in teaching African languages, specifically as third, fourth, or additional languages. Learning Sesotho poses a unique challenge to non-native speakers if introduced at the exit phase. This systematic review aims to identify the challenges students face while learning Sesotho at the exit stages of their educational degrees and explore how their proficiency in Sesotho can benefit professional teaching practices in different regions of South Africa. Within the scope of this objective, a comprehensive literature search was conducted in "Google Scholar, Scopus, and JSTOR" As of 22 September 2024, a total of 73 articles were identified from the databases. During the initial screening of titles and abstracts, 11 duplicates were excluded. Of the remaining 62 articles, 40 were excluded based on relevance, and 22 were downloaded to the digital workspace. Prioritising African languages in education, particularly by studying additional indigenous languages, can result in significant advantages. Therefore, the study examines the pros and cons of acquiring conversational Sesotho proficiency, particularly in a university setting where IsiZulu may be the predominant language. This exploration highlights the broader implications and benefits of introducing linguistic diversity in educational environments in exit phases. In order to capture nuanced perspectives and experiences, this paper adopts a systematic literature review approach to gain a comprehensive knowledge of the challenges, benefits, and implications of learning Sesotho as an additional language in higher education contexts. The findings of this research highlight that student-teachers lack an understanding of the need to learn an additional language, and therefore, they are not motivated to acquire this knowledge.
Language and Literature, Social Sciences
CrossTL: A Universal Programming Language Translator with Unified Intermediate Representation
Nripesh Niketan, Vaatsalya Shrivastva
We present CrossTL, a universal programming language translator enabling bidirectional translation between multiple languages through a unified intermediate representation called CrossGL. Traditional approaches require separate translators for each language pair, leading to exponential complexity growth. CrossTL uses a single universal IR to facilitate translations between CUDA, HIP, Metal, DirectX HLSL, OpenGL GLSL, Vulkan SPIR-V, Rust, and Mojo, with Slang support in development. Our system consists of: language-specific lexers/parsers converting source code to ASTs, bidirectional CrossGL translation modules implementing ToCrossGLConverter classes for importing code and CodeGen classes for target generation, and comprehensive backend implementations handling full translation pipelines. We demonstrate effectiveness through comprehensive evaluation across programming domains, achieving successful compilation and execution across all supported backends. The universal IR design enables adding new languages with minimal effort, requiring only language-specific frontend/backend components. Our contributions include: (1) a unified IR capturing semantics of multiple programming paradigms, (2) a modular architecture enabling extensibility, (3) a comprehensive framework supporting GPU compute, graphics programming, and systems languages, and (4) empirical validation demonstrating practical viability of universal code translation. CrossTL represents a significant step toward language-agnostic programming, enabling write-once, deploy-everywhere development.
Graph Rewriting Language as a Platform for Quantum Diagrammatic Calculi
Kayo Tei, Haruto Mishina, Naoki Yamamoto
et al.
Systematic discovery of optimization paths in quantum circuit simplification remains a challenge. Today, ZX-calculus, a computing model for quantum circuit transformation, is attracting attention for its highly abstract graph-based approach. Whereas existing tools such as PyZX and Quantomatic offer domain-specific support for quantum circuit optimization, visualization and theorem-proving, we present a complementary approach using LMNtal, a general-purpose hierarchical graph rewriting language, to establish a diagrammatic transformation and verification platform with model checking. Our methodology shows three advantages: (1) manipulation of ZX-diagrams through native graph transformation rules, enabling direct implementation of basic rules; (2) quantified pattern matching via QLMNtal extensions, greatly simplifying rule specification; and (3) interactive visualization and validation of optimization paths through state space exploration. Through case studies, we demonstrate how our framework helps understand optimization paths and design new algorithms and strategies. This suggests that the declarative language LMNtal and its toolchain could serve as a new platform to investigate quantum circuit transformation from a different perspective.
Polymorphic Records for Dynamic Languages
Giuseppe Castagna, Loïc Peyrot
We define and study "row polymorphism" for a type system with set-theoretic types, specifically union, intersection, and negation types. We consider record types that embed row variables and define a subtyping relation by interpreting types into sets of record values and by defining subtyping as the containment of interpretations. We define a functional calculus equipped with operations for field extension, selection, and deletion, its operational semantics, and a type system that we prove to be sound. We provide algorithms for deciding the typing and subtyping relations. This research is motivated by the current trend of defining static type system for dynamic languages and, in our case, by an ongoing effort of endowing the Elixir programming language with a gradual type system.
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Ashmal Vayani, Dinura Dissanayake, Hasindri Watawana
et al.
Existing Large Multimodal Models (LMMs) generally focus on only a few regions and languages. As LMMs continue to improve, it is increasingly important to ensure they understand cultural contexts, respect local sensitivities, and support low-resource languages, all while effectively integrating corresponding visual cues. In pursuit of culturally diverse global multimodal models, our proposed All Languages Matter Benchmark (ALM-bench) represents the largest and most comprehensive effort to date for evaluating LMMs across 100 languages. ALM-bench challenges existing models by testing their ability to understand and reason about culturally diverse images paired with text in various languages, including many low-resource languages traditionally underrepresented in LMM research. The benchmark offers a robust and nuanced evaluation framework featuring various question formats, including true/false, multiple choice, and open-ended questions, which are further divided into short and long-answer categories. ALM-bench design ensures a comprehensive assessment of a model's ability to handle varied levels of difficulty in visual and linguistic reasoning. To capture the rich tapestry of global cultures, ALM-bench carefully curates content from 13 distinct cultural aspects, ranging from traditions and rituals to famous personalities and celebrations. Through this, ALM-bench not only provides a rigorous testing ground for state-of-the-art open and closed-source LMMs but also highlights the importance of cultural and linguistic inclusivity, encouraging the development of models that can serve diverse global populations effectively. Our benchmark is publicly available.
Overview of the First Workshop on Language Models for Low-Resource Languages (LoResLM 2025)
Hansi Hettiarachchi, Tharindu Ranasinghe, Paul Rayson
et al.
The first Workshop on Language Models for Low-Resource Languages (LoResLM 2025) was held in conjunction with the 31st International Conference on Computational Linguistics (COLING 2025) in Abu Dhabi, United Arab Emirates. This workshop mainly aimed to provide a forum for researchers to share and discuss their ongoing work on language models (LMs) focusing on low-resource languages, following the recent advancements in neural language models and their linguistic biases towards high-resource languages. LoResLM 2025 attracted notable interest from the natural language processing (NLP) community, resulting in 35 accepted papers from 52 submissions. These contributions cover a broad range of low-resource languages from eight language families and 13 diverse research areas, paving the way for future possibilities and promoting linguistic inclusivity in NLP.
Uibukaji wa U-Nigeria katika Tasnia ya Muziki wa Kizazi Kipya Nchini Tanzania: Masuala Muhimu ya Kuzingatia
Gervas A Kasiga
Imebainika kuwa tasnia ya muziki wa kizazi kipya wa Kiswahili nchini Tanzania imetekwa na utamaduni wa nje hususani wa Kinigeria (Mrikaria, 2007; Kasiga, 2022). Athari zake zimebainika katika vipengele kadhaa vya kibunifu (Kasiga, 2021). Pia, misukumo mbalimbali inayowasukuma wasanii kutumia U-Nigeria imebainishwa (Kasiga, 2022). Kimsingi, hali hii inatishia utambulisho wa Kitanzania katika sanaa. Hivyo, makala hii imetoa mapendekezo ya kiuboreshaji dhidi ya U-Nigeria katika video za nyimbo za muziki wa kizazi kipya wa Kiswahili nchini Tanzania. Mapendekezo yaliyotolewa yamehusisha: kufanyika uwekezaji wa muziki wa asili wa Kitanzania, kufanyika kwa usimamizi maalumu katika vyombo vya habari, kuwepo kwa sheria na kanuni za kulinda uasili wa muziki wa kizazi kipya, na uanzishwaji wa semina, warsha pamoja na madarasa maalumu kwa wasanii. Isitoshe, makala imependekeza kuwepo kwa agenda mahususi ya kitaifa kujenga uzalendo, kumakinikia mitindo ya sanaa za Kitanzania katika utengenezaji wa muziki, kuharakisha upatikanaji wa vazi la taifa, kuanzishwa kwa vipindi vya uchambuzi wa nyimbo, na wasanii kushawishiwa kutumia mandhari ya Kitanzania katika video zao. Zaidi, imependekezwa kuanzishwa kwa matamasha na tuzo za muziki, asasi za elimu kutengeneza mitaala ya biashara, lugha ya Kiswahili, na utamaduni wa Kitanzania, na kuwe na muumano kati ya biashara, utamaduni na teknolojia.
African languages and literature
Realist literature, gender and gullibility in African Pentecostalism: The case of Chiundura Moyo’s Kereke Inofa
Enna S. Gudhlanga, Angeline M. Madongonda, Molly Manyonganise
There is a general consensus among religious scholars that Pentecostalism has risen phenomenally in Africa and Zimbabwe is no exception. In most cases, Pentecostalism has been presented as a sophisticated brand of Christianity while members of African Independent churches are shown to be gullible. The newly founded Pentecostal churches are more focused on gospreneurship while the media is busy with cases of cheating, dishonesty and the sexual abuse of women in these churches. Thus, academic scholars have begun to pay their attention on gullibility in Pentecostalism. Unfortunately, not many scholarly works have looked at literary texts that bring out the gullibility of members of Pentecostal churches in Zimbabwe. This article seeks to bridge this gap by analysing Aaron Chiundura Moyo’s Kereke Inofa [The Church Can die]. The main purpose is to bring out the significance of literary texts in projecting societal ills, specifically the gender power dynamics in Zimbabwean Pentecostal churches that may be difficult to deal with directly. The focus is on how women and some men are victims of the whims of some Pentecostal church leaders. The article is informed by the socio-historical approach, which states that artists derive the material for their works of art, subject matter, images and artistic languages from the life experiences of their societies. The socio-historical approach enables the researcher to understand the prevalence of gullibility in Pentecostal churches in Zimbabwe. The article relies heavily on content analysis of Moyo’s Kereke Inofa’s presentation of deception, and infidelity in Pentecostal Churches being performed on members who are projected in this play as ‘gullible’.
Contribution: This article’s contribution lies in its critical analysis of gender and gullibility in African Pentecostalism in Zimbabwe. It is significant as it utilises a literary text to project the ills in Pentecostal churches and women’s sexual vulnerabilities.
The Bible, Practical Theology
MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages
Cheikh M. Bamba Dione, David Adelani, Peter Nabende
et al.
In this paper, we present MasakhaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the UD (universal dependencies) guidelines. We conducted extensive POS baseline experiments using conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in UD. Evaluating on the MasakhaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with cross-lingual parameter-efficient fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems more effective for POS tagging in unseen languages.
SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval)
Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Seid Muhie Yimam
et al.
We present the first Africentric SemEval Shared task, Sentiment Analysis for African Languages (AfriSenti-SemEval) - The dataset is available at https://github.com/afrisenti-semeval/afrisent-semeval-2023. AfriSenti-SemEval is a sentiment classification challenge in 14 African languages: Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and Yorùbá (Muhammad et al., 2023), using data labeled with 3 sentiment classes. We present three subtasks: (1) Task A: monolingual classification, which received 44 submissions; (2) Task B: multilingual classification, which received 32 submissions; and (3) Task C: zero-shot classification, which received 34 submissions. The best performance for tasks A and B was achieved by NLNDE team with 71.31 and 75.06 weighted F1, respectively. UCAS-IIE-NLP achieved the best average score for task C with 58.15 weighted F1. We describe the various approaches adopted by the top 10 systems and their approaches.
Proceedings of the 18th International Workshop on Logical Frameworks and Meta-Languages: Theory and Practice
Alberto Ciaffaglione, Carlos Olarte
Logical frameworks and meta-languages form a common substrate for representing, implementing and reasoning about a wide variety of deductive systems of interest in logic and computer science. Their design, implementation and their use in reasoning tasks, ranging from the correctness of software to the properties of formal systems, have been the focus of considerable research over the last two decades. This workshop brings together designers, implementors and practitioners to discuss various aspects impinging on the structure and utility of logical frameworks, including the treatment of variable binding, inductive and co-inductive reasoning techniques and the expressiveness and lucidity of the reasoning process.
Polymorphic Type Inference for Dynamic Languages
Giuseppe Castagna, Mickaël Laurent, Kim Nguyen
We present a type system that combines, in a controlled way, first-order polymorphism with intersectiontypes, union types, and subtyping, and prove its safety. We then define a type reconstruction algorithm that issound and terminating. This yields a system in which unannotated functions are given polymorphic types(thanks to Hindley-Milner) that can express the overloaded behavior of the functions they type (thanks tothe intersection introduction rule) and that are deduced by applying advanced techniques of type narrowing(thanks to the union elimination rule). This makes the system a prime candidate to type dynamic languages.
Drivers' attention detection: a systematic literature review
Luiz G. Véras, Anna K. F. Gomes, Guilherme A. R. Dominguez
et al.
Countless traffic accidents often occur because of the inattention of the drivers. Many factors can contribute to distractions while driving, since objects or events to physiological conditions, as drowsiness and fatigue, do not allow the driver to stay attentive. The technological progress allowed the development and application of many solutions to detect the attention in real situations, promoting the interest of the scientific community in these last years. Commonly, these solutions identify the lack of attention and alert the driver, in order to help her/him to recover the attention, avoiding serious accidents and preserving lives. Our work presents a Systematic Literature Review (SLR) of the methods and criteria used to detect attention of drivers at the wheel, focusing on those methods based on images. As results, 50 studies were selected from the literature on drivers' attention detection, in which 22 contain solutions in the desired context. The results of SLR can be used as a resource in the preparation of new research projects in drivers' attention detection.
Low-Resource Neural Machine Translation for Southern African Languages
Evander Nyoni, Bruce A. Bassett
Low-resource African languages have not fully benefited from the progress in neural machine translation because of a lack of data. Motivated by this challenge we compare zero-shot learning, transfer learning and multilingual learning on three Bantu languages (Shona, isiXhosa and isiZulu) and English. Our main target is English-to-isiZulu translation for which we have just 30,000 sentence pairs, 28% of the average size of our other corpora. We show the importance of language similarity on the performance of English-to-isiZulu transfer learning based on English-to-isiXhosa and English-to-Shona parent models whose BLEU scores differ by 5.2. We then demonstrate that multilingual learning surpasses both transfer learning and zero-shot learning on our dataset, with BLEU score improvements relative to the baseline English-to-isiZulu model of 9.9, 6.1 and 2.0 respectively. Our best model also improves the previous SOTA BLEU score by more than 10.
Sentiment Classification in Swahili Language Using Multilingual BERT
Gati L. Martin, Medard E. Mswahili, Young-Seob Jeong
The evolution of the Internet has increased the amount of information that is expressed by people on different platforms. This information can be product reviews, discussions on forums, or social media platforms. Accessibility of these opinions and peoples feelings open the door to opinion mining and sentiment analysis. As language and speech technologies become more advanced, many languages have been used and the best models have been obtained. However, due to linguistic diversity and lack of datasets, African languages have been left behind. In this study, by using the current state-of-the-art model, multilingual BERT, we perform sentiment classification on Swahili datasets. The data was created by extracting and annotating 8.2k reviews and comments on different social media platforms and the ISEAR emotion dataset. The data were classified as either positive or negative. The model was fine-tuned and achieve the best accuracy of 87.59%.
Apresentação
Iris Maria da Costa Amâncio, Terezinha Taborda Moreira
Apresentação à Abril 25.
Language and Literature, African languages and literature