Hasil "African languages and literature"

DOAJ Open Access 2026

Contribution de l’alphabétisation à la diffusion des Objectifs de Développement Durable (ODD) en baoulé, dioula et koulango

Djibril Soumahoro, Yeboua Vincent

L’atteinte des Objectifs de Développement Durable (ODD) en Côte d’Ivoire ne peut être réellement effective que si les politiques s’insèrent dans des processus plus larges de diffusion en langues locales et d’une meilleure prise en compte de l’alphabétisation. Cela suppose la mise en œuvre des projets d’alphabétisation en langues locales, dont la réalisation facilite l’accès à l’information et à surmonter la barrière de la langue officielle (le français). L’alphabétisation en langue maternelle rend les concepts des ODD plus accessibles et compréhensibles pour une frange de la population. En utilisant les langues locales, les populations mieux informées peuvent adopter des comportements responsables liés à l’éducation, la santé, l’environnement et la lutte contre la pauvreté. Dans un contexte d’appropriation des objectifs de développement et d’engagement accrue des populations au développement durable du pays, cet article met l’accent sur l’apprentissage dans les langues locales.

African languages and literature

Detail DOI Sumber

arXiv Open Access 2025

Multilingual State Space Models for Structured Question Answering in Indic Languages

Arpita Vats, Rahul Raja, Mrinal Mathur et al.

The diversity and complexity of Indic languages present unique challenges for natural language processing (NLP) tasks, particularly in the domain of question answering (QA).To address these challenges, this paper explores the application of State Space Models (SSMs),to build efficient and contextually aware QA systems tailored for Indic languages. SSMs are particularly suited for this task due to their ability to model long-term and short-term dependencies in sequential data, making them well-equipped to handle the rich morphology, complex syntax, and contextual intricacies characteristic of Indian languages. We evaluated multiple SSM architectures across diverse datasets representing various Indic languages and conducted a comparative analysis of their performance. Our results demonstrate that these models effectively capture linguistic subtleties, leading to significant improvements in question interpretation, context alignment, and answer generation. This work represents the first application of SSMs to question answering tasks in Indic languages, establishing a foundational benchmark for future research in this domain. We propose enhancements to existing SSM frameworks, optimizing their applicability to low-resource settings and multilingual scenarios prevalent in Indic languages.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2025

Data Caricatures: On the Representation of African American Language in Pretraining Corpora

Nicholas Deas, Blake Vente, Amith Ananthram et al.

With a combination of quantitative experiments, human judgments, and qualitative analyses, we evaluate the quantity and quality of African American Language (AAL) representation in 12 predominantly English, open-source pretraining corpora. We specifically focus on the sources, variation, and naturalness of included AAL texts representing the AAL-speaking community. We find that AAL is underrepresented in all evaluated pretraining corpora compared to US demographics, constituting as few as 0.007% and at most 0.18% of documents. We also find that more than 25% of AAL texts in C4 may be perceived as inappropriate for LLMs to generate and to reinforce harmful stereotypes. Finally, we find that most automated filters are more likely to conserve White Mainstream English (WME) texts over AAL in pretraining corpora.

en cs.CL

Detail Sumber

arXiv Open Access 2025

FUSE : A Ridge and Random Forest-Based Metric for Evaluating MT in Indigenous Languages

Rahul Raja, Arpita Vats

This paper presents the winning submission of the RaaVa team to the AmericasNLP 2025 Shared Task 3 on Automatic Evaluation Metrics for Machine Translation (MT) into Indigenous Languages of America, where our system ranked first overall based on average Pearson correlation with the human annotations. We introduce Feature-Union Scorer (FUSE) for Evaluation, FUSE integrates Ridge regression and Gradient Boosting to model translation quality. In addition to FUSE, we explore five alternative approaches leveraging different combinations of linguistic similarity features and learning paradigms. FUSE Score highlights the effectiveness of combining lexical, phonetic, semantic, and fuzzy token similarity with learning-based modeling to improve MT evaluation for morphologically rich and low-resource languages. MT into Indigenous languages poses unique challenges due to polysynthesis, complex morphology, and non-standardized orthography. Conventional automatic metrics such as BLEU, TER, and ChrF often fail to capture deeper aspects like semantic adequacy and fluency. Our proposed framework, formerly referred to as FUSE, incorporates multilingual sentence embeddings and phonological encodings to better align with human evaluation. We train supervised models on human-annotated development sets and evaluate held-out test data. Results show that FUSE consistently achieves higher Pearson and Spearman correlations with human judgments, offering a robust and linguistically informed solution for MT evaluation in low-resource settings.

en cs.CL

Detail Sumber

arXiv Open Access 2025

Mind the Gap: Evaluating the Representativeness of Quantitative Medical Language Reasoning LLM Benchmarks for African Disease Burdens

Fred Mutisya, Shikoh Gitau, Christine Syovata et al.

Introduction: Existing medical LLM benchmarks largely reflect examination syllabi and disease profiles from high income settings, raising questions about their validity for African deployment where malaria, HIV, TB, sickle cell disease and other neglected tropical diseases (NTDs) dominate burden and national guidelines drive care. Methodology: We systematically reviewed 31 quantitative LLM evaluation papers (Jan 2019 May 2025) identifying 19 English medical QA benchmarks. Alama Health QA was developed using a retrieval augmented generation framework anchored on the Kenyan Clinical Practice Guidelines. Six widely used sets (AfriMedQA, MMLUMedical, PubMedQA, MedMCQA, MedQAUSMLE, and guideline grounded Alama Health QA) underwent harmonized semantic profiling (NTD proportion, recency, readability, lexical diversity metrics) and blinded expert rating across five dimensions: clinical relevance, guideline alignment, clarity, distractor plausibility, and language/cultural fit. Results: Alama Health QA captured >40% of all NTD mentions across corpora and the highest within set frequencies for malaria (7.7%), HIV (4.1%), and TB (5.2%); AfriMedQA ranked second but lacked formal guideline linkage. Global benchmarks showed minimal representation (e.g., sickle cell disease absent in three sets) despite large scale. Qualitatively, Alama scored highest for relevance and guideline alignment; PubMedQA lowest for clinical utility. Discussion: Quantitative medical LLM benchmarks widely used in the literature underrepresent African disease burdens and regulatory contexts, risking misleading performance claims. Guideline anchored, regionally curated resources such as Alama Health QA and expanded disease specific derivatives are essential for safe, equitable model evaluation and deployment across African health systems.

en cs.AI

Detail Sumber

arXiv Open Access 2025

Prompt-oriented Output of Culture-Specific Items in Translated African Poetry by Large Language Model: An Initial Multi-layered Tabular Review

Adeyola Opaluwah

This paper examines the output of cultural items generated by Chat Generative PreTrained Transformer Pro in response to three structured prompts to translate three anthologies of African poetry. The first prompt was broad, the second focused on poetic structure, and the third prompt emphasized cultural specificity. To support this analysis, four comparative tables were created. The first table presents the results of the cultural items produced after the three prompts, the second categorizes these outputs based on Aixela framework of Proper nouns and Common expressions, the third table summarizes the cultural items generated by human translators, a custom translation engine, and a Large Language Model. The final table outlines the strategies employed by Chat Generative PreTrained Transformer Pro following the culture specific prompt. Compared to the outputs of cultural items from reference human translation and the custom translation engine in prior studies the findings indicate that the culture oriented prompts used with Chat Generative PreTrained Transformer Pro did not yield significant enhancements of cultural items during the translation of African poetry from English to French. Among the fifty four cultural items, the human translation produced thirty three cultural items in repetition, the custom translation engine generated Thirty eight cultural items in repetition while Chat Generative PreTrained Transformer Pro produced forty one cultural items in repetition. The untranslated cultural items revealed inconsistencies in Large language models approach to translating cultural items in African poetry from English to French.

en cs.CL

Detail DOI Sumber

DOAJ Open Access 2024

Intellectualisation of Northern Sotho through English terminology adaptation

Matefu L. Mabela, Thabo Ditsele

This study aimed to investigate and propose a pragmatic approach in the adaptation of English terminologies for scientific purposes into Northern Sotho. This is necessary because of the lack of terminologies to describe and define scientific phenomena in the language. Languages are constantly evolving, and speakers drive their evolution. The study aimed to overcome the scientific terminology development challenge for Northern Sotho by analysing existing data and using corpus linguistics as a method. The Multilingual Natural Science and Technology Dictionary Grade 4–6 (2013) was used to provide illustrative examples and clarify linguistic complexities in scientific terminology. The study revealed the complexity of the linguistic adaptation of Northern Sotho and the challenges and pitfalls linked to the integration of borrowed English terminology into academic discourse, encompassing apprehensions concerning accuracy, clarity and cultural appropriateness. Furthermore, the findings indicated that the puristic term adaptation approach tends to be perplexing. Contribution: This study contributes to the intellectualisation and revitalisation of Northern Sotho by enhancing the language and equipping its speakers to engage more efficiently in scientific contexts, illuminating intricacies and potential misrepresentations inherent in the process of adaptation. Moreover, the research underscored the significance of employing adaptation strategies that are suitable for the Northern Sotho context and are consistent with linguistic patterns and semantics.

African languages and literature

Detail DOI Sumber

DOAJ Open Access 2024

Die melkweg en die miskruier (Jeanette Ferreira)

Frederick Botha

African languages and literature

Detail DOI Sumber

arXiv Open Access 2024

Building a Language-Learning Game for Brazilian Indigenous Languages: A Case of Study

Gustavo Polleti

In this paper we discuss a first attempt to build a language learning game for brazilian indigenous languages and the challenges around it. We present a design for the tool with gamification aspects. Then we describe a process to automatically generate language exercises and questions from a dependency treebank and a lexical database for Tupian languages. We discuss the limitations of our prototype highlighting ethical and practical implementation concerns. Finally, we conclude that new data gathering processes should be established in partnership with indigenous communities and oriented for educational purposes.

en cs.CL

Detail Sumber

DOAJ Open Access 2023

A decolonial reading of the Third Chapter of the Gospel of John in Moffat’s Translation of the Catechism into Setswana (1826)

I.D. Mothoagae

The Setswana language is one of the Southern African languages that was “reduced” into a written language through the translation of Christian literature by the London Missionary Society. The introduction of the Setswana spelling book in 1826 epitomised the vernacularisation and standardisation of Setswana. In 1826, Robert Moffat also translated the first Setswana catechism. Rev. William Brown’s Catechism served as a source text. He also added the third chapter of the Gospel of John and the Lord’s Prayer. This paper focuses on the second section of the 1826 Setswana catechism, namely the third chapter of John’s Gospel. It is argued that translation does not happen in a vacuum; rather, it also has the ideological intentions of the translator. Through the translated texts, Moffat performs a technology of power by eroding, dislocating, and disassociating the Batswana from their epistemic and spiritual heritage. The paper applies a decolonial lens to analyse the theme of conversion (metanoia) in the Gospel of John, as translated by Moffat.

Christianity, Practical religion. The Christian life

Detail DOI Sumber

arXiv Open Access 2023

clustering an african hairstyle dataset using pca and k-means

Teffo Phomolo Nicrocia, Owolawi Pius Adewale, Pholo Moanda Diana

The adoption of digital transformation was not expressed in building an African face shape classifier. In this paper, an approach is presented that uses k-means to classify African women images. African women rely on beauty standards recommendations, personal preference, or the newest trends in hairstyles to decide on the appropriate hairstyle for them. In this paper, an approach is presented that uses K-means clustering to classify African women's images. In order to identify potential facial clusters, Haarcascade is used for feature-based training, and K-means clustering is applied for image classification.

en cs.CV, cs.LG

Detail Sumber

arXiv Open Access 2023

Parallel Corpus for Indigenous Language Translation: Spanish-Mazatec and Spanish-Mixtec

Atnafu Lambebo Tonja, Christian Maldonado-Sifuentes, David Alejandro Mendoza Castillo et al.

In this paper, we present a parallel Spanish-Mazatec and Spanish-Mixtec corpus for machine translation (MT) tasks, where Mazatec and Mixtec are two indigenous Mexican languages. We evaluated the usability of the collected corpus using three different approaches: transformer, transfer learning, and fine-tuning pre-trained multilingual MT models. Fine-tuning the Facebook M2M100-48 model outperformed the other approaches, with BLEU scores of 12.09 and 22.25 for Mazatec-Spanish and Spanish-Mazatec translations, respectively, and 16.75 and 22.15 for Mixtec-Spanish and Spanish-Mixtec translations, respectively. The findings show that the dataset size (9,799 sentences in Mazatec and 13,235 sentences in Mixtec) affects translation performance and that indigenous languages work better when used as target languages. The findings emphasize the importance of creating parallel corpora for indigenous languages and fine-tuning models for low-resource translation tasks. Future research will investigate zero-shot and few-shot learning approaches to further improve translation performance in low-resource settings. The dataset and scripts are available at \url{https://github.com/atnafuatx/Machine-Translation-Resources}

en cs.CL

Detail Sumber

arXiv Open Access 2023

From Local to Global: Navigating Linguistic Diversity in the African Context

Rashmi Margani, Nelson Ndugu

The focus is on critical problems in NLP related to linguistic diversity and variation across the African continent, specifically with regards to African local dialects and Arabic dialects that have received little attention. We evaluated our various approaches, demonstrating their effectiveness while highlighting the potential impact of the proposed approach on businesses seeking to improve customer experience and product development in African local dialects. The idea of using the model as a teaching tool for product-based instruction is interesting, as it could potentially stimulate interest in learners and trigger techno entrepreneurship. Overall, our modified approach offers a promising analysis of the challenges of dealing with African local dialects. Particularly Arabic dialects, which could have a significant impact on businesses seeking to improve customer experience and product development.

en cs.CL, cs.LG

Detail Sumber

arXiv Open Access 2022

Introduction to the African Strategy for Fundamental and Applied Physics (ASFAP)

Farida Fassi

Generating scientific and technological knowledge and converting them into innovations which are of added value to society are key instruments for a society economic growth and development. As outstanding as these capabilities are for other regions in the world, Africa science, innovation, education and research infrastructure, particularly in fundamental and applied physics, have over the years been under valued and under resourced. To efficiently address the scientific and technological gaps with the rest of the world, Africa stance needs radical overhaul. With the big ambition to drive a community wide effort in Africa, the African Strategy for Fundamental and Applied Physics (ASFAP) was founded. The aspiration is to demonstrate the physics potential benefits for African society and how physics can contribute to the technological infrastructure development and to provide trained personnel needed to take advantage of scientific advances. The vision consists in fostering scientific literacy driven by physics based technologies and their impact for economic growth, including other sciences that draw heavily on advances in physics. In addition to developing and enhancing collaborations and partnerships among Africans in national, regional, and Pan African organizations. This should assist to tackle the challenges that Africans struggle and prioritize educational and research resources, innovation and development. The ASFAP initiative could present a unique opportunity of overcoming the complexity of the African social and economic challenges, if Africa needs to have and maintain its position as a coleader in the global scientific process and reap the consequent socioeconomic benefits. ASFAP will take a few years with a final report to notify the African policymakers and broader communities.

en physics.soc-ph

Detail Sumber

arXiv Open Access 2022

@C -- augmented version of C programming language

Iosif Iulian Petrila

The augmented version of C programming language is presented. The language was completed with a series of low-level and high-level facilities to enlarge the language usage spectrum to various computing systems, operations, users. The ambiguities and inconsistencies have been resolved by managing problematic and undefined languages elements through an interpretation and management similar to that used in the case of other C syntax based languages. The proposed augmentative completeness elements, through @C approach, preserve the spirit of C language and its basic characteristics through compatibility with the standard version but also allow rejuvenation and bring C language to the present programming languages state of the art.

en cs.PL, cs.FL

Detail Sumber

DOAJ Open Access 2021

Dol heuning (S. J. Naudé)

Bibi Burger

African languages and literature

Detail DOI Sumber

arXiv Open Access 2021

Blockchain for Genomics: A Systematic Literature Review

Mohammed Alghazwi, Fatih Turkmen, Joeri van der Velde et al.

Human genomic data carry unique information about an individual and offer unprecedented opportunities for healthcare. The clinical interpretations derived from large genomic datasets can greatly improve healthcare and pave the way for personalized medicine. Sharing genomic datasets, however, pose major challenges, as genomic data is different from traditional medical data, indirectly revealing information about descendants and relatives of the data owner and carrying valid information even after the owner passes away. Therefore, stringent data ownership and control measures are required when dealing with genomic data. In order to provide secure and accountable infrastructure, blockchain technologies offer a promising alternative to traditional distributed systems. Indeed, the research on blockchain-based infrastructures tailored to genomics is on the rise. However, there is a lack of a comprehensive literature review that summarizes the current state-of-the-art methods in the applications of blockchain in genomics. In this paper, we systematically look at the existing work both commercial and academic, and discuss the major opportunities and challenges. Our study is driven by five research questions that we aim to answer in our review. We also present our projections of future research directions which we hope the researchers interested in the area can benefit from.

en cs.CR

Detail DOI Sumber

DOAJ Open Access 2020

Of Motherhood and Melancholia: Notebook of a Psycho-ethnographer (Lou-Marié Kruger)

Azille Coetzee

African languages and literature

Detail DOI Sumber

DOAJ Open Access 2020

Die dao van Daan van der Walt (Lodewyk G. du Plessis)

Stefan van Zyl

African languages and literature

Detail DOI Sumber

DOAJ Open Access 2020

Chronology: Nuruddin Farah

F. Fiona Moolla

None

African languages and literature

Detail DOI Sumber

Hasil untuk "African languages and literature"