Hasil untuk "African languages and literature"

Menampilkan 19 dari ~2193367 hasil · dari DOAJ, CrossRef, arXiv, Semantic Scholar

JSON API
arXiv Open Access 2026
AfriNLLB: Efficient Translation Models for African Languages

Yasmin Moslem, Aman Kassahun Wassie, Amanuel Gizachew Abebe

In this work, we present AfriNLLB, a series of lightweight models for efficient translation from and into African languages. AfriNLLB supports 15 language pairs (30 translation directions), including Swahili, Hausa, Yoruba, Amharic, Somali, Zulu, Lingala, Afrikaans, Wolof, and Egyptian Arabic, as well as other African Union official languages such as Arabic (MSA), French, Portuguese, and Spanish. Our training data covers bidirectional translation between English and 13 languages, and between French and two languages (Lingala and Wolof). AfriNLLB models are based on NLLB-200 600M, which we compress using iterative layer pruning and quantization. We fine-tune the pruned models on parallel corpora we curated for African languages, employing knowledge distillation from a larger teacher model. Our work aims at enabling efficient deployment of translation models for African languages in resource-constrained settings. Our evaluation results demonstrate that AfriNLLB models achieve performance comparable to the baseline while being significantly faster. We release two versions of the AfriNLLB models, a Transformers version that allows further fine-tuning and a CTranslate2 version for efficient inference. Moreover, we release all the training data that we used for fine-tuning the baseline and pruned models to facilitate further research.

en cs.CL
S2 Open Access 2025
Investigating the Transparency of Language for Place Value Understanding: Comparing Indigenous Southern African Languages and European-based Languages

Kevin Larkin, Pamela Vale, Silke Ladel et al.

In this article we investigate the transparency of language in learning place value in either a Southern African indigenous language (isiXhosa, Setswana, Oshiwambo or Emakhuwa) or a European-based language (Afrikaans, English, German or Portuguese). Since language is a key mediator in developing place value understanding, it is important to investigate the ways in which the transparency of various languages may impact place value learning. A review of pertinent literature and an analysis of literal translations of number words (to thousands) of our eight languages lead us to the conclusion that Southern African indigenous languages are more accessible in their meaning, in relation to place value, than the four European-based languages spoken in Southern Africa, which we analysed. We identified two key advantages in the indigenous languages: (i) there was transparency of the ‘places’ in how numbers are named; and (ii) there was logical alignment between the spoken and symbolic representation of numbers. Despite this, many Southern African learners learn mathematics in English, Afrikaans or Portuguese even though this is not their home language (L1). This means that many learners are denied access to the transparency of the place value concepts that exist in their L1 and must manage learning place value, not only in a yet to be learned ‘foreign’ language, but also in one where they must learn to decode the idiosyncratic ‘irregularities’ of the way those languages name numbers. We conclude this article by discussing the implications of these findings for the teaching of place value in Southern African classrooms, in which indigenous learners are often learning in a European-based language that is not their L1.

S2 Open Access 2025
Self-harm in the public spaces while trying to embrace indigenous languages in South African context

M. Ntshangase, Nkarhi. E. Mathebula

In public debates about language discourse there is commonly an agreement that people need to be enabled to express themselves sufficiently. However, there are few or no scholars who come forth to address social implementation of multi/ open language policy in the public sector. This paper adopts critical social theory (CST) to explore the debates around the cry about linguistic exclusion in public institutions which necessitates multi/ open language policy. This qualitative study purposively sampled 5 public hospital nurses and 5 public clinics nurses from two South African provinces in order to thematically analyse the findings that emanate from their experiences with regards to language policy. Findings show that the arguments for multi/ open language policy are commonly positively skewed as scholars ignore its negative aspects on implementation failures. This study contributes to literature in terms of exposing an unpopular view that multi/ open language policy has with regards to implementation and expose the likely negativity of its informal implementation. Therefore, data collected through semi-structured interviews during this study will be subject to thematic analysis as the objective is to expose that multi/ open language policy in the public hospitals and clinics has some negative aspects than having only positives.

S2 Open Access 2025
African Literature: Beyond the Western Gaze

Nogwaja S. Zulu, Thulani Mkhize

Adapted from the Introduction to a collection of commentaries on contemporary issues in African literature in South Africa, this article challenges a ‘Western gaze’: a gaze that configures African literature as a literature of texts or events expressed in the ex-colonial language of English, French, Spanish or Portuguese. Instead, the article pursues African literature as a literature written in African languages as well as other languages about African concerns and experiences.

DOAJ Open Access 2025
Evaluating language policy implementation in South African higher education - three decades of progress and challenges: A scoping review protocol.

Silingene Joyce Ngcobo, Tracy Zhandire, Zamasomi Meyiwa Luvuno et al.

<h4>Background</h4>South Africa's higher education institutions (HEIs) continue to face challenges in implementing inclusive language policies that integrate indigenous African languages into academic settings, even three decades after apartheid. Higher Education Institutions (HEIs) face significant challenges in integrating indigenous African languages into academic settings. Despite progressive reforms, higher education institutions face significant challenges in integrating indigenous African languages into academic settings.<h4>Objectives</h4>This scoping review aims to evaluate the current state of language policy implementation in South African public HEis. Specifically, it seeks to: (1) map the integration of multilingual policies into teaching, research, and administrative practices; (2) identify persistent barriers to effective policy implementation; (3) explore successful strategies for promoting multilingualism (4) assess the extent of African language usage in academic contexts; and (5) identify research gaps to guide future investigations.<h4>Methods</h4>The review will adhere to the PRISMA-ScR guidelines and follow the framework outlined by Arksey and O'Malley, ensuring a systematic and transparent approach. A comprehensive search will be conducted in databases including Google Scholar, Scopus, Web of Science, ERIC, and African Journals Online (AJOL), covering studies published from 1994 to the present. This will be supplemented by grey literature from government and institutional sources. Three independent reviewers will screen studies using predefined eligibility criteria, managing and screening articles through Rayyan. Data will be extracted using a standardized form, and thematic analysis will synthesize the findings, with stakeholder consultation to validate results.<h4>Expected outcomes</h4>This review will provide a comprehensive assessment of language policy implementation, highlighting successful strategies and persistent challenges across institutions. The findings will inform policy refinement, identify effective practices, and guide future research directions for achieving linguistically inclusive higher education in South Africa, while contributing to a broader understanding of implementing multilingual policies in post-colonial educational contexts. This protocol is preregistered on OSF, available at https://doi.org/10.17605/OSF.IO/AU2SD.

Medicine, Science
arXiv Open Access 2025
From Scarcity to Efficiency: Investigating the Effects of Data Augmentation on African Machine Translation

Mardiyyah Oduwole, Oluwatosin Olajide, Jamiu Suleiman et al.

The linguistic diversity across the African continent presents different challenges and opportunities for machine translation. This study explores the effects of data augmentation techniques in improving translation systems in low-resource African languages. We focus on two data augmentation techniques: sentence concatenation with back translation and switch-out, applying them across six African languages. Our experiments show significant improvements in machine translation performance, with a minimum increase of 25\% in BLEU score across all six languages. We provide a comprehensive analysis and highlight the potential of these techniques to improve machine translation systems for low-resource languages, contributing to the development of more robust translation systems for under-resourced languages.

en cs.CL
arXiv Open Access 2025
Large Language Models for Sentiment Analysis to Detect Social Challenges: A Use Case with South African Languages

Koena Ronny Mabokela, Tim Schlippe, Matthias Wölfel

Sentiment analysis can aid in understanding people's opinions and emotions on social issues. In multilingual communities sentiment analysis systems can be used to quickly identify social challenges in social media posts, enabling government departments to detect and address these issues more precisely and effectively. Recently, large-language models (LLMs) have become available to the wide public and initial analyses have shown that they exhibit magnificent zero-shot sentiment analysis abilities in English. However, there is no work that has investigated to leverage LLMs for sentiment analysis on social media posts in South African languages and detect social challenges. Consequently, in this work, we analyse the zero-shot performance of the state-of-the-art LLMs GPT-3.5, GPT-4, LlaMa 2, PaLM 2, and Dolly 2 to investigate the sentiment polarities of the 10 most emerging topics in English, Sepedi and Setswana social media posts that fall within the jurisdictional areas of 10 South African government departments. Our results demonstrate that there are big differences between the various LLMs, topics, and languages. In addition, we show that a fusion of the outcomes of different LLMs provides large gains in sentiment classification performance with sentiment classification errors below 1%. Consequently, it is now feasible to provide systems that generate reliable information about sentiment analysis to detect social challenges and draw conclusions about possible needs for actions on specific topics and within different language groups.

en cs.CL, cs.AI
arXiv Open Access 2025
Automatic Speech Recognition (ASR) for African Low-Resource Languages: A Systematic Literature Review

Sukairaj Hafiz Imam, Tadesse Destaw Belay, Kedir Yassin Husse et al.

ASR has achieved remarkable global progress, yet African low-resource languages remain rigorously underrepresented, producing barriers to digital inclusion across the continent with more than +2000 languages. This systematic literature review (SLR) explores research on ASR for African languages with a focus on datasets, models and training methods, evaluation techniques, challenges, and recommends future directions. We employ the PRISMA 2020 procedures and search DBLP, ACM Digital Library, Google Scholar, Semantic Scholar, and arXiv for studies published between January 2020 and July 2025. We include studies related to ASR datasets, models or metrics for African languages, while excluding non-African, duplicates, and low-quality studies (score <3/5). We screen 71 out of 2,062 records and we record a total of 74 datasets across 111 languages, encompassing approximately 11,206 hours of speech. Fewer than 15% of research provided reproducible materials, and dataset licensing is not clear. Self-supervised and transfer learning techniques are promising, but are hindered by limited pre-training data, inadequate coverage of dialects, and the availability of resources. Most of the researchers use Word Error Rate (WER), with very minimal use of linguistically informed scores such as Character Error Rate (CER) or Diacritic Error Rate (DER), and thus with limited application in tonal and morphologically rich languages. The existing evidence on ASR systems is inconsistent, hindered by issues like dataset availability, poor annotations, licensing uncertainties, and limited benchmarking. Nevertheless, the rise of community-driven initiatives and methodological advancements indicates a pathway for improvement. Sustainable development for this area will also include stakeholder partnership, creation of ethically well-balanced datasets, use of lightweight modelling techniques, and active benchmarking.

en cs.CL
arXiv Open Access 2025
Synthetic Voice Data for Automatic Speech Recognition in African Languages

Brian DeRenzi, Anna Dixon, Mohamed Aymane Farhi et al.

Speech technology remains out of reach for most of the over 2300 languages in Africa. We present the first systematic assessment of large-scale synthetic voice corpora for African ASR. We apply a three-step process: LLM-driven text creation, TTS voice synthesis, and ASR fine-tuning. Eight out of ten languages for which we create synthetic text achieved readability scores above 5 out of 7. We evaluated ASR improvement for three (Hausa, Dholuo, Chichewa) and created more than 2,500 hours of synthetic voice data at below 1% of the cost of real data. Fine-tuned Wav2Vec-BERT-2.0 models trained on 250h real and 250h synthetic Hausa matched a 500h real-data-only baseline, while 579h real and 450h to 993h synthetic data created the best performance. We also present gender-disaggregated ASR performance evaluation. For very low-resource languages, gains varied: Chichewa WER improved about 6.5% relative with a 1:2 real-to-synthetic ratio; a 1:1 ratio for Dholuo showed similar improvements on some evaluation data, but not on others. Investigating intercoder reliability, ASR errors and evaluation datasets revealed the need for more robust reviewer protocols and more accurate evaluation data. All data and models are publicly released to invite further work to improve synthetic data for African languages.

arXiv Open Access 2025
Designing and Contextualising Probes for African Languages

Wisdom Aduah, Francois Meyer

Pretrained language models (PLMs) for African languages are continually improving, but the reasons behind these advances remain unclear. This paper presents the first systematic investigation into probing PLMs for linguistic knowledge about African languages. We train layer-wise probes for six typologically diverse African languages to analyse how linguistic features are distributed. We also design control tasks, a way to interpret probe performance, for the MasakhaPOS dataset. We find PLMs adapted for African languages to encode more linguistic information about target languages than massively multilingual PLMs. Our results reaffirm previous findings that token-level syntactic information concentrates in middle-to-last layers, while sentence-level semantic information is distributed across all layers. Through control tasks and probing baselines, we confirm that performance reflects the internal knowledge of PLMs rather than probe memorisation. Our study applies established interpretability techniques to African-language PLMs. In doing so, we highlight the internal mechanisms underlying the success of strategies like active learning and multilingual adaptation.

en cs.CL
S2 Open Access 2025
Analysing loan blends and code mixing as main strategies to promote African languages in Chimamada Ngozie Adichie's Americanah (2013) and Ngugi Wa Thiong’o's Matigari (1987)

Sènami-Fifa Blandine Araba, Charles Dossou Ligan, Abossèdé Paulette Okpeicha

This research work aims at describing the way with which « loan blends » and « Code mixing » are used as one of the strategies, among other ones, Ngugi Wa Thiong’o in Matigari (1987) and Chimamanda Ngozi Adichie in Americanah (2013) refer to so as to valorize and promote their mother tongues. Despite the huge linguistic diversity of the African continent, most of literary works are still written in foreign languages. Facing this issue, there is a good reason for which people can get worry on the threatening depreciation of indigenous African languages. To boost a better analysis of bilingualism effects in literary works under study, this research focuses on key notions and theories some linguists like Hoffmann and Holmes have worked on. Indeed, the study mainly focuses on loanblends and Code Mixing/Code switching. From the results obtained, it has been discovered that some African writers like Ngugi Wa Thiong’o and Chimamanda Ngozi Adichie definitely use a variety of strategies – including loanblends and code mixing – to valorize their local language leading thus to the preservation of the cultural identity of the African continent.

S2 Open Access 2025
Accelerating African Languages Development through Strategic Improvement of Publishing Landscape: Lessons from Luganda Language Realities

Masaazi Fred Masagazi, Edward Masembe, Margarete Nanfuka et al.

Despite concerted efforts by the African Union and the Independent African States to develop African Languages to the extent of becoming Instructional Languages in schools, there are still gaps which need to be addressed to reach that level. One of the gaps which require strategic intervention is in publishing more literature in the respective African Languages. From this perspective, African countries need to learn from each other and, most importantly to share experiences. In Uganda, the Luganda Language has taken a step in using Luganda in schools and also in publishing. This status quo should be learned by other African languages to support their development. In this paper, we try to discuss factors which have been central in elevating the Luganda Language through publishing. The study used a descriptive research design, thereby analyzing the status, challenges and opportunities of publishing in African languages, using Luganda as a case. We observed that in order to increase public awareness of the use of African languages which leads to increased demand for publishing in them, policymakers should support the use of African languages in the education sector. We conclude that the Luganda language could be used to benchmark how African people could be supported to write and publish in their languages

S2 Open Access 2025
Teaching Methodological Challenges of Indigenous African Languages in the Foundation Phase

Kabelo Ramolula, Prof. Milton Nkoane

The dialectic on the pedagogy of indigenous African languages in the Foundation Phase has occupied academic space lately. African language teachers seem to face the challenges of teaching the indigenous languages. The purpose of this study is to explore the challenges of teaching African languages at foundation phase and how they could be solved. The study adopted constructivist paradigm and qualitative approach. The paradigm holds that knowledge is a social Construction, therefore, there are multiple realities. Data were generated from the critical review of related literature for the past seven years on the teaching of indigenous African languages at the Foundation Phase. It was analysed thematically following Castle and Nolen’s five steps of thematic analysis namely; compiling, disassembling, reassembling, interpreting and concluding in order to get to the depth of the phenomenon. Deconstruction theory formed the theoretical frame work for this study. The theory holds that there are always cracks, new ways of doing things, thus making changes from what is considered the norm. The findings point to insufficient indigenous language teaching materials. They are also archaic and not even digitalized. They are also not user friendly. Poor teacher training in African languages ia another finding. The study concludes that there is poor government financial support which results in insufficient teaching materials. Poor teacher training in African indigenous languages leads to ineffective teaching methods, thus poor learners’ performance. The study therefore, recommends effective teacher training for foundation phase and the government should fund the digitalization and accessibility of teaching materials.

S2 Open Access 2024
Harmonizing Africa’s linguistic symphony: navigating the complexities of translating African literature using a postcolonial theory

Mlamli Diko

Abstract Whereas translation, as a theoretical and professional discipline, has fairly been scrutinized in the African context and elsewhere in the world, it cannot be downplayed that it continues to suffer a great deal of challenges. Many of these challenges could be pinned on postcolonial and post-apartheid dynamics, which prioritize political agendas over the development of this discipline in the African context. Given this reality, this article problematizes and navigates four pertinent challenges pertaining to the translation of African literature, predominantly from European (or European-Africanized) to indigenous African languages. These four challenges, which are named in the second section of this article, are recognized as principal sources of data to provide empirical evidence and are elicited from different African literary discourses. The objective is to underline that the translation challenges concerning African literature, in large part, are intensified by the vestigial elements of colonialism and apartheid in Africa. Owing to this concern, this article applies postcolonial theory to its discussions. Above all, it is important to note that I use the prefix ‘post-’ from postcolonial theory to imply that although many African states officially ceased colonialism and apartheid, it does not denote that colonialism and apartheid are completely dead. For these reasons, the general findings and discussions confirm that translators continually struggle to strike a balance between linguistic transference and retaining the ethnological, spiritual, and historical tenets in African literature. The closing remarks underline the necessity to continue this discourse in a bid to find reasonable solutions to this conundrum.

13 sitasi en
arXiv Open Access 2024
Cheetah: Natural Language Generation for 517 African Languages

Ife Adebara, AbdelRahim Elmadany, Muhammad Abdul-Mageed

Low-resource African languages pose unique challenges for natural language processing (NLP) tasks, including natural language generation (NLG). In this paper, we develop Cheetah, a massively multilingual NLG language model for African languages. Cheetah supports 517 African languages and language varieties, allowing us to address the scarcity of NLG resources and provide a solution to foster linguistic diversity. We demonstrate the effectiveness of Cheetah through comprehensive evaluations across six generation downstream tasks. In five of the six tasks, Cheetah significantly outperforms other models, showcasing its remarkable performance for generating coherent and contextually appropriate text in a wide range of African languages. We additionally conduct a detailed human evaluation to delve deeper into the linguistic capabilities of Cheetah. The introduction of Cheetah has far-reaching benefits for linguistic diversity. By leveraging pretrained models and adapting them to specific languages, our approach facilitates the development of practical NLG applications for African communities. The findings of this study contribute to advancing NLP research in low-resource settings, enabling greater accessibility and inclusion for African languages in a rapidly expanding digital landscape. We publicly release our models for research.

en cs.CL
arXiv Open Access 2024
Anti-Context-Free languages

Carles Cardó

Context-free languages can be characterized in several ways. This article studies projective linearisations of languages of simple dependency trees, i.e., dependency trees in which a node can govern at most one node with a given syntactic function. We prove that the projective linearisations of local languages of simple dependency trees coincide with the context-free languages. Simple dependency trees suggest alternative dual notions of locality and projectivity, which permits defining a dual language for each context-free language. We call this new class of languages anti-context-free. These languages are related to some linguistic constructions exhibiting the so-called cross-serial dependencies that were historically important for the development of computational linguistics. We propose that this duality could be a relevant linguistic phenomenon.

en cs.FL
S2 Open Access 2024
South African indigenous languages in teaching and learning: policies and the threat of cultural genocide

Johan Beckmann

South Africa is a multilingual country with 10 indigenous, English, and Sign Language as official languages. Before 1994, only English and Afrikaans were used as languages of learning and teaching (LOLTs) at all educational levels. Indigenous African languages were only used as LOLTs to Grade 3. 1994 led to new expectations regarding the use and development of indigenous languages as LOLTs. Government seemingly intends to eventually make English the only LOLT at school and higher education levels. Concerns have surfaced regarding the possible ‘murder’ of indigenous languages and the violation of people’s human rights through language policy implementation. An education law and policy lens was mostly used to examine issues. I wrote the article as a critical analysis of extant literature and used Skutnabb-Kangas and Phillipson’s (1994) concept of linguicism as the theoretical basis of my examination of data. It led to my conclusion that the emergence of English as the juggernaut language in education could probably lead to the revival of colonization, the assimilation (or ‘destruction’) of indigenous languages, and ‘cultural genocide’ called multilingualism. McIlwraith’s (2014) letter of advice to language and development leaders after a 2013 international language conference in South Africa and cited in the conclusion of the article still provides a fitting conclusion resonating with the content of the article.

Halaman 1 dari 109669