Tassallah Abdullahi, Macton Mgonzo, Mardiyyah Oduwole
et al.
Current guardian models are predominantly Western-centric and optimized for high-resource languages, leaving low-resource African languages vulnerable to evolving harms, cross-lingual safety failures, and cultural misalignment. Moreover, most guardian models rely on rigid, predefined safety categories that fail to generalize across diverse linguistic and sociocultural contexts. Robust safety, therefore, requires flexible, runtime-enforceable policies and benchmarks that reflect local norms, harm scenarios, and cultural expectations. We introduce UbuntuGuard, the first African policy-based safety benchmark built from adversarial queries authored by 155 domain experts across sensitive fields, including healthcare. From these expert-crafted queries, we derive context-specific safety policies and reference responses that capture culturally grounded risk signals, enabling policy-aligned evaluation of guardian models. We evaluate 13 models, comprising six general-purpose LLMs and seven guardian models across three distinct variants: static, dynamic, and multilingual. Our findings reveal that existing English-centric benchmarks overestimate real-world multilingual safety, cross-lingual transfer provides partial but insufficient coverage, and dynamic models, while better equipped to leverage policies at inference time, still struggle to fully localize African-language contexts. These findings highlight the urgent need for multilingual, culturally grounded safety benchmarks to enable the development of reliable and equitable guardian models for low-resource languages. Our code can be found online.\footnote{Code repository available at https://github.com/hemhemoh/UbuntuGuard.
Israel Abebe Azime, Jesujoba Oluwadara Alabi, Crystina Zhang
et al.
Assessing the veracity of a claim made online is a complex and important task with real-world implications. When these claims are directed at communities with limited access to information and the content concerns issues such as healthcare and culture, the consequences intensify, especially in low-resource languages. In this work, we introduce AfrIFact, a dataset that covers the necessary steps for automatic fact-checking (i.e., information retrieval, evidence extraction, and fact checking), in ten African languages and English. Our evaluation results show that even the best embedding models lack cross-lingual retrieval capabilities, and that cultural and news documents are easier to retrieve than healthcare-domain documents, both in large corpora and in single documents. We show that LLMs lack robust multilingual fact-verification capabilities in African languages, while few-shot prompting improves performance by up to 43% in AfriqueQwen-14B, and task-specific fine-tuning further improves fact-checking accuracy by up to 26%. These findings, along with our release of the AfrIFact dataset, encourage work on low-resource information retrieval, evidence retrieval, and fact checking.
In AI, most evaluations of natural language understanding tasks are conducted in standardized dialects such as Standard American English (SAE). In this work, we investigate how accurately large language models (LLMs) represent African American Vernacular English (AAVE). We analyze three LLMs to compare their usage of AAVE to the usage of humans who natively speak AAVE. We first analyzed interviews from the Corpus of Regional African American Language and TwitterAAE to identify the typical contexts where people use AAVE grammatical features such as ain't. We then prompted the LLMs to produce text in AAVE and compared the model-generated text to human usage patterns. We find that, in many cases, there are substantial differences between AAVE usage in LLMs and humans: LLMs usually underuse and misuse grammatical features characteristic of AAVE. Furthermore, through sentiment analysis and manual inspection, we found that the models replicated stereotypes about African Americans. These results highlight the need for more diversity in training data and the incorporation of fairness methods to mitigate the perpetuation of stereotypes.
Toheeb Aduramomi Jimoh, Tabea De Wille, Nikola S. Nikolov
Sarcasm detection poses a fundamental challenge in computational semantics, requiring models to resolve disparities between literal and intended meaning. The challenge is amplified in low-resource languages where annotated datasets are scarce or nonexistent. We present \textbf{Yor-Sarc}, the first gold-standard dataset for sarcasm detection in Yorùbá, a tonal Niger-Congo language spoken by over $50$ million people. The dataset comprises 436 instances annotated by three native speakers from diverse dialectal backgrounds using an annotation protocol specifically designed for Yorùbá sarcasm by taking culture into account. This protocol incorporates context-sensitive interpretation and community-informed guidelines and is accompanied by a comprehensive analysis of inter-annotator agreement to support replication in other African languages. Substantial to almost perfect agreement was achieved (Fleiss' $κ= 0.7660$; pairwise Cohen's $κ= 0.6732$--$0.8743$), with $83.3\%$ unanimous consensus. One annotator pair achieved almost perfect agreement ($κ= 0.8743$; $93.8\%$ raw agreement), exceeding a number of reported benchmarks for English sarcasm research works. The remaining $16.7\%$ majority-agreement cases are preserved as soft labels for uncertainty-aware modelling. Yor-Sarc\footnote{https://github.com/toheebadura/yor-sarc} is expected to facilitate research on semantic interpretation and culturally informed NLP for low-resource African languages.
Abdoulaye Diack, Perry Nelson, Kwaku Agbesi
et al.
The advancement of speech technology has predominantly favored high-resource languages, creating a significant digital divide for speakers of most Sub-Saharan African languages. To address this gap, we introduce WAXAL, a large-scale, openly accessible speech dataset for 24 languages representing over 100 million speakers. The collection consists of two main components: an Automated Speech Recognition (ASR) dataset containing approximately 1,250 hours of transcribed, natural speech from a diverse range of speakers, and a Text-to-Speech (TTS) dataset with around 235 hours of high-quality, single-speaker recordings reading phonetically balanced scripts. This paper details our methodology for data collection, annotation, and quality control, which involved partnerships with four African academic and community organizations. We provide a detailed statistical overview of the dataset and discuss its potential limitations and ethical considerations. The WAXAL datasets are released at https://huggingface.co/datasets/google/WaxalNLP under the permissive CC-BY-4.0 license to catalyze research, enable the development of inclusive technologies, and serve as a vital resource for the digital preservation of these languages.
Adetola Emmanuel Babalola, Victor Johnson, Akin Oromakinde
et al.
Abstract A factor that impacts the health outcome of individuals is effective communication, and language is an important part of communication. The use of patients’ local language during health service delivery has been shown to influence patient satisfaction, compliance and overall health outcomes. A narrative review of existing literature was conducted, and the methodology involved searching through PubMed, Google Scholar, SCOPUS, Directory of Open Access Journals (DOAJ), COCHRANE Library, and African Journals Online (AJOL) focusing on original studies that examined the influence of indigenous languages on healthcare delivery. Also, only literature published in the English language was considered and narrative reviews, preprints, opinions, letters, and commentaries were excluded. Twenty studies were reviewed, and there were diverse categories of eligible populations among the papers included in this review: sixteen Quantitative Studies, two systematic reviews, and two mixed-method studies. The key findings showed that the use of local languages in healthcare delivery improves metrics such as patient satisfaction, compliance with medical instructions, and health improvement. The identified limitations of this study include the restrictions in the criteria for the literature that were reviewed, limited focus on specific healthcare specialties, and possible publication bias. The recommendations include implementing policies that prioritize local language use in healthcare service delivery, community engagement, and promotion of health technologies that support communication in multiple languages.
Senyu Li, Jiayi Wang, Felermino D. M. A. Ali
et al.
Evaluating machine translation (MT) quality for under-resourced African languages remains a significant challenge, as existing metrics often suffer from limited language coverage and poor performance in low-resource settings. While recent efforts, such as AfriCOMET, have addressed some of the issues, they are still constrained by small evaluation sets, a lack of publicly available training data tailored to African languages, and inconsistent performance in extremely low-resource scenarios. In this work, we introduce SSA-MTE, a large-scale human-annotated MT evaluation (MTE) dataset covering 14 African language pairs from the News domain, with over 73,000 sentence-level annotations from a diverse set of MT systems. Based on this data, we develop SSA-COMET and SSA-COMET-QE, improved reference-based and reference-free evaluation metrics. We also benchmark prompting-based approaches using state-of-the-art LLMs like GPT-4o, Claude-3.7 and Gemini 2.5 Pro. Our experimental results show that SSA-COMET models significantly outperform AfriCOMET and are competitive with the strongest LLM Gemini 2.5 Pro evaluated in our study, particularly on low-resource languages such as Twi, Luo, and Yoruba. All resources are released under open licenses to support future research.
Priscilla Muheki, Mirjana Pović, Somaya Saad
et al.
In preparation for the International Astronomical Union (IAU) General Assembly (GA) 2024, the first GA held in Africa, the African Network of Women in Astronomy (AfNWA) embarked on a visionary project: the creation of an inspiring storytelling book that showcases the remarkable journeys of professional female astronomers in Africa. This book is not merely a collection of biographies; it is a tapestry of resilience, passion, and scientific excellence woven through the lives of women who have ventured into the cosmos from the African continent. The primary aim of this book is twofold. Firstly, it seeks to bring greater visibility to women astronomers in Africa, highlighting their groundbreaking research and the personal stories that have shaped their careers. By shining a light on their achievements and awards, we hope to acknowledge their contributions to the field of astronomy and underscore the importance of diversity in science. Secondly, this book aspires to inspire and empower the next generation of scientists, particularly young women and girls across Africa. Through the personal narratives and professional achievements of these trailblazing astronomers and students in astronomy, we aim to spark curiosity, foster a love for science, and demonstrate that the sky is not the limit but just the beginning for those who dare to dream. As you delve into the stories within these pages, you will encounter a rich array of experiences and insights that reflect the unique challenges and triumphs women face in astronomy. From overcoming societal barriers to making groundbreaking discoveries, these women have carved paths that others can follow, proving that with determination and passion, the stars are within reach for everyone.
Vukosi Marivate, Kayode Olaleye, Sitwala Mundia
et al.
This paper introduces Swivuriso, a 3000-hour multilingual speech dataset developed as part of the African Next Voices project, to support the development and benchmarking of automatic speech recognition (ASR) technologies in seven South African languages. Covering agriculture, healthcare, and general domain topics, Swivuriso addresses significant gaps in existing ASR datasets. We describe the design principles, ethical considerations, and data collection procedures that guided the dataset creation. We present baseline results of training/finetuning ASR models with this data and compare to other ASR datasets for the langauges concerned.
Vukosi Marivate, Isheanesu Dzingirai, Fiskani Banda
et al.
The critical lack of structured terminological data for South Africa's official languages hampers progress in multilingual NLP, despite the existence of numerous government and academic terminology lists. These valuable assets remain fragmented and locked in non-machine-readable formats, rendering them unusable for computational research and development. Mafoko addresses this challenge by systematically aggregating, cleaning, and standardising these scattered resources into open, interoperable datasets. We introduce the foundational Mafoko dataset, released under the equitable, Africa-centered NOODL framework. To demonstrate its immediate utility, we integrate the terminology into a Retrieval-Augmented Generation (RAG) pipeline. Experiments show substantial improvements in the accuracy and domain-specific consistency of English-to-Tshivenda machine translation for large language models. Mafoko provides a scalable foundation for developing robust and equitable NLP technologies, ensuring South Africa's rich linguistic diversity is represented in the digital age.
Achille Mbembe and Felwine Sarr (eds.), translated by Drew Burk. 2023. To Write the Africa World. Cambridge & Hoboken: Polity Press. 324 pp.
Achille Mbembe and Felwine Sarr (eds.), translated by Philip Gerard. 2023. The Politics of Time: Imagining African Becomings. Cambridge & Hoboken: Polity Press. 332 pp.
History of Africa, African languages and literature
The intersection of modern African poetry and folklore is a much-researched topic in African literary scholarship. However, literary scholars have not examined this relationship in the poetry of Christian Otobotekere, a king and poet whose cultural productions have been mostly studied from the perspective of ecocriticism. Therefore, in this article, I look at the place of Ịjọ (also spelt “Ijaw”) folklore in Otobotekere’s A Sailor’s Son 1: In the Wake of Games and Dances (2015). I find that, in this collection, Otobotekere frequently employs the folklore of his people, including dirge, drum poetry, lullabies, moonlight stories, dance patterns, and musical styles, as well as elements of Ịjọ songs such as simplicity, repetition, allusion, dialogue, and direct address. I further discover that Otobotekere’s incorporation of Ịjọ folklore makes his poetry performative and helps it to achieve the quality of what is usually referred to as “written orality”. I argue that Otobotekere makes it his main aim to showcase these aspects of folklore to the non-Ịjọ reader and to document them for future generations in the Ịjọ community in Nigeria’s oil-rich Niger Delta region. This study appeals to scholars in the fields of literature and folklore as it contributes to the decades-long conversation on the interaction between the two disciplines.
Tuka Alhanai, Adam Kasumovic, Mohammad Ghassemi
et al.
Large Language Models (LLMs) have shown remarkable performance across various tasks, yet significant disparities remain for non-English languages, and especially native African languages. This paper addresses these disparities by creating approximately 1 million human-translated words of new benchmark data in 8 low-resource African languages, covering a population of over 160 million speakers of: Amharic, Bambara, Igbo, Sepedi (Northern Sotho), Shona, Sesotho (Southern Sotho), Setswana, and Tsonga. Our benchmarks are translations of Winogrande and three sections of MMLU: college medicine, clinical knowledge, and virology. Using the translated benchmarks, we report previously unknown performance gaps between state-of-the-art (SOTA) LLMs in English and African languages. Finally, using results from over 400 fine-tuned models, we explore several methods to reduce the LLM performance gap, including high-quality dataset fine-tuning (using an LLM-as-an-Annotator), cross-lingual transfer, and cultural appropriateness adjustments. Key findings include average mono-lingual improvements of 5.6% with fine-tuning (with 5.4% average mono-lingual improvements when using high-quality data over low-quality data), 2.9% average gains from cross-lingual transfer, and a 3.0% out-of-the-box performance boost on culturally appropriate questions. The publicly available benchmarks, translations, and code from this study support further research and development aimed at creating more inclusive and effective language technologies.
Studying bias detection and mitigation methods in natural language processing and the particular case of machine translation is highly relevant, as societal stereotypes might be reflected or reinforced by these systems. In this paper, we analyze the state-of-the-art with a particular focus on European and African languages. We show how the majority of the work in this field concentrates on few languages, and that there is potential for future research to cover also the less investigated languages to contribute to more diversity in the research field.
This paper describes the submission runs from the HLTCOE team at the CIRAL CLIR tasks for African languages at FIRE 2023. Our submissions use machine translation models to translate the documents and the training passages, and ColBERT-X as the retrieval model. Additionally, we present a set of unofficial runs that use an alternative training procedure with a similar training setting.
Lungisani Xolani Khumalo, Thabo Ditsele, Christopher Rwodzi
The Use of Official Languages Act (No. 12 of 2012) applies to all national departments and state-owned enterprises (SOEs) in South Africa and stipulates that they should promote multilingualism when interacting with members of the public and/or customers. The main aim of this study was to investigate how two SOEs, that is, the South African Post Office (SAPO) and Passenger Rail Agency of South Africa (PRASA), manage communication with their customers, particularly those who cannot communicate in English and Afrikaans. Data for this study were gathered through a mixed method approach. Quantitative data (i.e., a Likert-type scale) were gathered from 120 participants who were customers of the two SOEs, and qualitative data (i.e., face-to-face interviews) were gathered from 20 interviewees who were drawn from the 120 participants. The researcher was based in Gauteng, and conducted the study in that province because it was convenient and practical. The data were gathered in Tshwane, Ekurhuleni, Johannesburg, and the West Rand. The study found that customers believed that those who could not communicate in English and Afrikaans did not receive adequate information from the SOEs because of this shortcoming. The study also revealed that the marginalisation of Black South African Languages (BSALs) by SOEs was regarded as justified by some respondents because these SOEs provided services to customers who speak different languages. The study also found that other participants felt that it was necessary for SOEs to continue to use English as the main language of communication with customers because it is an international language, which also promotes unity among the people of South Africa, including customers of SOEs.
Contribution: The major contribution of this article to scientific knowledge is that it dwells deeper into how customers of the two SOEs who are less proficient in English and Afrikaans felt excluded in communication with all customers, and this is the first article to do so. Through this article, there is potential that the SOEs will appreciate that customers who are less proficient in English and Afrikaans want major adjustments to be made so they too can feel a sense of belonging and also fully appreciate what is being communicated to all customers, regardless to their proficiency in the two languages.
Multiple types can represent the same concept. For example, lists and trees can both represent sets. Unfortunately, this easily leads to incomplete libraries: some set-operations may only be available on lists, others only on trees. Similarly, subtypes and quotients are commonly used to construct new type abstractions in formal verification. In such cases, one often wishes to reuse operations on the representation type for the new type abstraction, but to no avail: the types are not the same. To address these problems, we present a new framework that transports programs via equivalences. Existing transport frameworks are either designed for dependently typed, constructive proof assistants, use univalence, or are restricted to partial quotient types. Our framework (1) is designed for simple type theory, (2) generalises previous approaches working on partial quotient types, and (3) is based on standard mathematical concepts, particularly Galois connections and equivalences. We introduce the notion of partial Galois connections and equivalences and prove their closure properties under (dependent) function relators, (co)datatypes, and compositions. We formalised the framework in Isabelle/HOL and provide a prototype. This is the extended version of "Transport via Partial Galois Connections and Equivalences", 21st Asian Symposium on Programming Languages and Systems, 2023.
Israel Abebe Azime, Sana Sabah Al-Azzawi, Atnafu Lambebo Tonja
et al.
AfriSenti-SemEval Shared Task 12 of SemEval-2023. The task aims to perform monolingual sentiment classification (sub-task A) for 12 African languages, multilingual sentiment classification (sub-task B), and zero-shot sentiment classification (task C). For sub-task A, we conducted experiments using classical machine learning classifiers, Afro-centric language models, and language-specific models. For task B, we fine-tuned multilingual pre-trained language models that support many of the languages in the task. For task C, we used we make use of a parameter-efficient Adapter approach that leverages monolingual texts in the target language for effective zero-shot transfer. Our findings suggest that using pre-trained Afro-centric language models improves performance for low-resource African languages. We also ran experiments using adapters for zero-shot tasks, and the results suggest that we can obtain promising results by using adapters with a limited amount of resources.
Ethiopia is the most prominent example of the late 20th-century adoption of federalism to accommodate diversity and complete state-building. This article explores the implementation of federalism and accommodation of ethnonational diversity in dominant party regimes by using Ethiopia as a case. Drawing on legal documents, literature, news sources and government reports, the article argues that federalism enabled distinctive groups to promote their culture, use their languages and exercise self-rule in their territory. However, ethnonationalities’ constitutionally proclaimed self-determination rights and the practice rarely correspond. Although all ethnonationalities have the same constitutional rights, some are still subjugated, and self-rule remains their dream. The dominant party regime in Ethiopia met demands for self-rule and accommodation with suppression and violence. The constitution grants regions to use their legislative powers to accommodate region-specific demands; nevertheless, regions cannot operate out of the narrow framework of the federal ruling party. Thus, regions became repressive agents of the centre rather than genuine self-rule agents. Insights from Ethiopia have broader implications for states embracing federalism.