Hasil "Languages and literature of Eastern Asia, Africa, Oceania"

DOAJ Open Access 2026

Du statut des prédicatifs dits verbaux en buamu (Langue gur)

Roland BICABA

Cet article traite de la morphologie des constituants verbaux en buamu, une langue gur parlée au Burkina Faso et au Mali. Il interroge spécifiquement la nature de certains morphèmes traditionnellement considérés comme des marqueurs verbaux. Grace à une méthodologie axée sur l’examen d’un corpus, nous avons pu établir qu’en ce qui concerne l’expression du futur, le prospectif ne possède pas de marqueur spécifique ; il se manifeste par la simple juxtaposition d’un terme sujet à la forme non finie du verbe. Quant au projectif et à l’éventuel, ils ne sont pas, eux non plus, exprimés par des prédicatifs verbaux, mais par des verbes auxiliaires qui, sur le plan sémantique, appartiennent à la catégorie des verbes de mouvement. Sont également des verbes axillaires (de mouvement), les monèmes qui servent à l’expression des différentes nuances de l’aspect inaccompli présent en buamu. La particularité des constructions à cet aspect réside cependant dans le fait qu’elles dérivent d’une structure de prédication non verbale de situation. En effet, la transformation d’une prédication non verbale en une prédication dite verbale repose simplement sur la suppression du prédicatif non verbal de situation. Par conséquent, il nous parait légitime de nous interroger sur la pertinence même du concept de prédication verbale dans l’expression des valeurs aspectuelles du présent dans la langue buamu.

African languages and literature

Detail DOI Sumber

arXiv Open Access 2026

SAGE: Sustainable Agent-Guided Expert-tuning for Culturally Attuned Translation in Low-Resource Southeast Asia

Zhixiang Lu, Chong Zhang, Yulong Li et al.

The vision of an inclusive World Wide Web is impeded by a severe linguistic divide, particularly for communities in low-resource regions of Southeast Asia. While large language models (LLMs) offer a potential solution for translation, their deployment in data-poor contexts faces a dual challenge: the scarcity of high-quality, culturally relevant data and the prohibitive energy costs of training on massive, noisy web corpora. To resolve the tension between digital inclusion and environmental sustainability, we introduce Sustainable Agent-Guided Expert-tuning (SAGE). This framework pioneers an energy-aware paradigm that prioritizes the "right data" over "big data". Instead of carbon-intensive training on unfiltered datasets, SAGE employs a reinforcement learning (RL) agent, optimized via Group Relative Policy Optimization (GRPO), to autonomously curate a compact training set. The agent utilizes a semantic reward signal derived from a small, expert-constructed set of community dialogues to filter out noise and cultural misalignment. We then efficiently fine-tune open-source LLMs on this curated data using Low-Rank Adaptation (LoRA). We applied SAGE to translation tasks between English and seven low-resource languages (LRLs) in Southeast Asia. Our approach establishes new state-of-the-art performance on BLEU-4 and COMET-22 metrics, effectively capturing local linguistic nuances. Crucially, SAGE surpasses baselines trained on full datasets while reducing data usage by 97.1% and training energy consumption by 95.2%. By delivering high-performance models with a minimal environmental footprint, SAGE offers a scalable and responsible pathway to bridge the digital divide in the Global South.

en cs.CL

Detail Sumber

arXiv Open Access 2026

Assessing the Case for Africa-Centric AI Safety Evaluations

Gathoni Ireri, Cecil Abungu, Jean Cheptumo et al.

Frontier AI systems are being adopted across Africa, yet most AI safety evaluations are designed and validated in Western environments. In this paper, we argue that the portability gap can leave Africa-centric pathways to severe harm untested when frontier AI systems are embedded in materially constrained and interdependent infrastructures. We define severe AI risks as material risks from frontier AI systems that result in critical harm, measured as the grave injury or death of thousands of people or economic loss and damage equivalent to five percent of a country's GDP. To support AI safety evaluation design, we develop a taxonomy for identifying Africa-centric severe AI risks. The taxonomy links outcome thresholds to process pathways that model risk as the intersection of hazard, vulnerability, and exposure. We distinguish severe risks by amplification and suddenness, where amplification requires that frontier AI be a necessary magnifier of latent danger and suddenness captures harms that materialise rapidly enough to overwhelm ordinary coping and governance capacity. We then propose threat modelling strategies for African contexts, surveying reference class forecasting, structured expert elicitation, scenario planning, and system theoretic process analysis, and tailoring them to constraints of limited resources, poor connectivity, limited technical expertise, weak state capacity, and conflict. We also examine AI misalignment risk, concluding that Africa is more likely to expose universal failure modes through distributional shift than to generate distinct pathways of misalignment. Finally, we offer practical guidance for running evaluations under resource constraints, emphasising open and extensible tooling, tiered evaluation pipelines, and sharing methods and findings to broaden evaluation scope.

en cs.CY

Detail Sumber

arXiv Open Access 2026

AfroScope: A Framework for Studying the Linguistic Landscape of Africa

Sang Yun Kwon, AbdelRahim Elmadany, Muhammad Abdul-Mageed

Language Identification (LID) is the task of determining the language of a given text and is a fundamental preprocessing step that affects the reliability of downstream NLP applications. While recent work has expanded LID coverage for African languages, existing approaches remain limited in (i) the number of supported languages and (ii) their ability to make fine-grained distinctions among closely related varieties. We introduce AfroScope, a unified framework for African LID that includes AfroScope-Data, a dataset covering 713 African languages, and AfroScope-Models, a suite of strong LID models with broad language coverage. To better distinguish highly confusable languages, we propose a hierarchical classification approach that leverages Mirror-Serengeti, a specialized embedding model targeting 29 closely related or geographically proximate languages. This approach improves macro F1 by 4.55 on this confusable subset compared to our best base model. Finally, we analyze cross linguistic transfer and domain effects, offering guidance for building robust African LID systems. We position African LID as an enabling technology for large scale measurement of Africas linguistic landscape in digital text and release AfroScope-Data and AfroScope-Models publicly.

en cs.CL

Detail Sumber

DOAJ Open Access 2025

Human language technology tools for indigenous South African languages and their potential use

Respect Mlambo, Muzi Matfunjwa

Human language technology (HLT) contributes to the development of languages by providing various avenues through which languages can be interrogated. Through HLT, diverse questions can be raised and answered scientifically and objectively. In the context of South African indigenous languages (SAIL), several HLT tools support these languages. However, it seems that some language users are unaware of the availability and capabilities of these tools, which contributes to their underutilisation. This study aims to identify and describe briefly some of the HLT tools that support and analyse SAIL. It presents an overview of the open access HLT tools, namely part-of-speech (POS) taggers, morphological decomposers (MDs), morphological analysers (MAs), isiZulu.net, ZulMorph and Google Translate (GT). These tools are crucial in analysing and understanding SAIL, as well as for advancing these languages in the field of HLT. In this study, the researchers anticipate that by raising awareness of the existence of these tools, more users of indigenous languages will be eager to use them. Contribution: This study fills the practical gap in the use of HLT to perform linguistic functions for SAIL. It seems that there is underutilisation of existing HLT tools for SAIL, which might be attributed to language users being unaware of these tools. Therefore, the study aims to identify and describe some HLT tools that support and analyse SAIL. It presents an overview of the open access HLT tools, namely POS taggers, MD, MA, isiZulu.net, ZulMorph and GT. The researchers intend to demonstrate the use of these tools and to raise awareness about their existence.

African languages and literature

Detail DOI Sumber

arXiv Open Access 2025

Counting and Sampling Traces in Regular Languages

Alexis de Colnet, Kuldeep S. Meel, Umang Mathur

In this work, we study the problems of counting and sampling Mazurkiewicz traces that a regular language touches. Fix an alphabet $Σ$ and an independence relation $\mathbb{I} \subseteq Σ\times Σ$. The input consists of a regular language $L \subseteq Σ^*$, given by a finite automaton with $m$ states, and a natural number $n$ (in unary). For the counting problem, the goal is to compute the number of Mazurkiewicz traces (induced by $\mathbb{I}$) that intersect the $n^\text{th}$ slice $L_n = L \cap Σ^n$, i.e., traces that admit at least one linearization in $L_n$. For the sampling problem, the goal is to output a trace drawn from a distribution that is approximately uniform over all such traces. These tasks are motivated by bounded model checking with partial-order reduction, where an \emph{a priori} estimate of the reduced state space is valuable, and by testing methods for concurrent programs that use partial-order-aware random exploration. We first show that the counting problem is #P-hard even when $L$ is accepted by a deterministic automaton, in sharp contrast to counting words of a DFA, which is polynomial-time solvable. We then prove that the problem lies in #P for both NFAs and DFAs, irrespective of whether $L$ is trace-closed. Our main algorithmic contributions are a \emph{fully polynomial-time randomized approximation scheme} (FPRAS) that, with high probability, approximates the desired count within a prescribed accuracy, and a \emph{fully polynomial-time almost uniform sampler} (FPAUS) that generates traces whose distribution is provably close to uniform.

en cs.FL, cs.CC

Detail DOI Sumber

arXiv Open Access 2025

Enhancing Plagiarism Detection in Marathi with a Weighted Ensemble of TF-IDF and BERT Embeddings for Low-Resource Language Processing

Atharva Mutsaddi, Aditya Choudhary

Plagiarism involves using another person's work or concepts without proper attribution, presenting them as original creations. With the growing amount of data communicated in regional languages such as Marathi -- one of India's regional languages -- it is crucial to design robust plagiarism detection systems tailored for low-resource languages. Language models like Bidirectional Encoder Representations from Transformers (BERT) have demonstrated exceptional capability in text representation and feature extraction, making them essential tools for semantic analysis and plagiarism detection. However, the application of BERT for low-resource languages remains under-explored, particularly in the context of plagiarism detection. This paper presents a method to enhance the accuracy of plagiarism detection for Marathi texts using BERT sentence embeddings in conjunction with Term Frequency-Inverse Document Frequency (TF-IDF) feature representation. This approach effectively captures statistical, semantic, and syntactic aspects of text features through a weighted voting ensemble of machine learning models.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2025

2025 Southeast Asia Eleven Nations Influence Index Report

Wei Meng

This study constructs a fully data-driven and reproducible Southeast Asia Influence Index (SAII v3) to reduce bias from expert scoring and subjective weighting while mapping hierarchical power structures across the eleven ASEAN nations. We aggregate authoritative open-source indicators across four dimensions (economic, military, diplomatic, socio-technological) and apply a three-tiered standardization chain quantile-Box-Cox-min-max to mitigate outliers and skewness. Weights are obtained through equal-weight integration of Entropy Weighting Method (EWM), CRITIC, and PCA. Robustness is assessed via Kendall's tau, +/-20% weight perturbation, and 10,000 bootstrap iterations, with additional checks including +/-10% dimensional sensitivity and V2-V3 bump chart comparisons. Results show integrated weights: Economy 35-40%, Military 20-25%, Diplomacy about 20%, Socio-Technology about 15%. The regional landscape exhibits a one-strong, two-medium, three-stable, and multiple-weak pattern: Indonesia, Singapore, and Malaysia lead, while Thailand, the Philippines, and Vietnam form a mid-tier competitive band. V2 and V3 rankings are highly consistent (Kendall's tau = 0.818), though small mid-tier reorderings appear (Thailand and the Philippines rise, Vietnam falls), indicating that v3 is more sensitive to structural equilibrium. ASEAN-11 average sensitivity highlights military and socio-technological dimensions as having the largest marginal effects (+/-0.002). In conclusion, SAII v3 delivers algorithmic weighting and auditable reproducibility, reveals multidimensional drivers of influence in Southeast Asia, and provides actionable quantitative evidence for resource allocation and policy prioritization by regional governments and external partners.

en physics.soc-ph, cs.AI

Detail Sumber

arXiv Open Access 2024

Kallaama: A Transcribed Speech Dataset about Agriculture in the Three Most Widely Spoken Languages in Senegal

Elodie Gauthier, Aminata Ndiaye, Abdoulaye Guissé

This work is part of the Kallaama project, whose objective is to produce and disseminate national languages corpora for speech technologies developments, in the field of agriculture. Except for Wolof, which benefits from some language data for natural language processing, national languages of Senegal are largely ignored by language technology providers. However, such technologies are keys to the protection, promotion and teaching of these languages. Kallaama focuses on the 3 main spoken languages by Senegalese people: Wolof, Pulaar and Sereer. These languages are widely spoken by the population, with around 10 million of native Senegalese speakers, not to mention those outside the country. However, they remain under-resourced in terms of machine-readable data that can be used for automatic processing and language technologies, all the more so in the agricultural sector. We release a transcribed speech dataset containing 125 hours of recordings, about agriculture, in each of the above-mentioned languages. These resources are specifically designed for Automatic Speech Recognition purpose, including traditional approaches. To build such technologies, we provide textual corpora in Wolof and Pulaar, and a pronunciation lexicon containing 49,132 entries from the Wolof dataset.

en cs.CL

Detail Sumber

arXiv Open Access 2023

Polynomial definability in constraint languages with few subpowers

Jakub Bulín, Michael Kompatscher

A first-order formula is called primitive positive (pp) if it only admits the use of existential quantifiers and conjunction. Pp-formulas are a central concept in (fixed-template) constraint satisfaction since CSP($Γ$) can be viewed as the problem of deciding the primitive positive theory of $Γ$, and pp-definability captures gadget reductions between CSPs. An important class of tractable constraint languages $Γ$ is characterized by having few subpowers, that is, the number of $n$-ary relations pp-definable from $Γ$ is bounded by $2^{p(n)}$ for some polynomial $p(n)$. In this paper we study a restriction of this property, stating that every pp-definable relation is definable by a pp-formula of polynomial length. We conjecture that the existence of such short definitions is actually equivalent to $Γ$ having few subpowers, and verify this conjecture for a large subclass that, in particular, includes all constraint languages on three-element domains. We furthermore discuss how our conjecture imposes an upper complexity bound of co-NP on the subpower membership problem of algebras with few subpowers.

en cs.LO, math.LO

Detail Sumber

DOAJ Open Access 2022

SEMINAR SEBAGAI SARANA PENINGKATAN LITERASI DIGITAL BAGI MAHASISWA UNTUK INDONESIA BERKEMAJUAN

Santi Sartika

ABSTRAK: Mahasiswa cenderung sering melakukan literasi dengan digital atau literasi digital. Munculnya literasi digital menambah daftar baru kegiatan yang dilakukan untuk terciptanya sebuah literasi. Kementrian Komunikasi dan Informasi (Kominfo) bekerja sama dengan LLPA dan Universitas Ahmad Dahlan mengadakan Seminar Nasional Literasi Digital, dalam rangka mengantisipasi tersebarnya berita hoaks. Makalah ini menggunakan metode deskriptif kualitatif dengan teknik observasi dan dokumentasi. Penulis berharap dengan adanya makalah ini, dapat meningkatkan nilai literasi, ketelitian dalam mengolah suatu berita, dan menangulanginya.

Languages and literature of Eastern Asia, Africa, Oceania

Detail DOI Sumber

DOAJ Open Access 2021

Introduction to the Project of the “East Asia” Thematic Group and its Peculiarities

Ágnes Birtalan

Introduction to the Project of the “East Asia” Thematic Group and its Peculiarities by Ágnes Birtalan, Head of the Research Group.

Chinese language and literature

Detail DOI Sumber

DOAJ Open Access 2021

Designing the Holistic Evaluation in Teaching Reading/Tasmim al-Taqwim al-Syumuli Fi Ta’lim al-Muthala’ah

Rini Rini, Partomuan Harahap

The purpose of this article is to design the holistic evaluation in teaching reading. Assessment plays an important role in education process. Good evaluation is essential for good education and good learning. The evaluation of reading material is able to depict the faults and advantages of a reading instruction program. Also, students are able to know the level of their ability and absorption of reading materials from the assessment. The research method used is the library method. Assessment is an important part of teaching reading. So far, there are still found some designs of reading assessment which are not integrated and comprehensive, identical to the theory of reading comprehension including literal reading, interpretive reading, critical reading, and creative reading. As a result of this research, the design of the reading assessment is based upon the teaching and learning process (the daily calendar) or the so-called formative evaluation and an evaluation that is based at the end of the lecture in the form of midterm exams and final exams, or what is called a final evaluation. The evaluation design has two types comprising the practice evaluation and the written evaluation.

Language and Literature, Languages and literature of Eastern Asia, Africa, Oceania

Detail DOI Sumber

arXiv Open Access 2021

A Search for recurrent novae among Far Eastern guest stars

Susanne M Hoffmann, Nikolaus Vogt

According to recent theoretical studies, classical novae are expected to erupt every ~$10^5$ years, while the recurrence time scale of modern recurrent novae (N_r) stars ranges from 10 to ~100 years. To bridge this huge gap in our knowledge (three orders of magnitude in time scales), it appears attractive to consider historical data: In Far Eastern sources, we searched for brightening events at different epochs but similar positions that possibly refer to recurrent nova eruptions. Probing a sample of ~185 Asian observations from ~500 BCE to 1700 CE, we present a method to systematically filter possible events. The result are a few search fields with between 2 and 5 flare ups and typical cadences between $10^2$ and $10^3$ years. For most of our recurrence candidates, we found possible counterparts among known cataclysmic variables in the corresponding search areas. This work is based on an interdisciplinary approach, combining methods from digital humanities and computational astrophysics when applying our previously developed methods in searches for classical novae among Far Eastern guest stars. A first and rather preliminary comparison of (possible) historical and (well known) modern recurrent novae reveals first tentative hints on some of their properties, stimulating further studies in this direction.

en astro-ph.SR, astro-ph.HE

Detail DOI Sumber

DOAJ Open Access 2020

Apresentação

Iris Maria da Costa Amâncio, Terezinha Taborda Moreira

Apresentação à Abril 25.

Language and Literature, African languages and literature

Detail DOI Sumber

DOAJ Open Access 2020

Mogadishu as lost modern: In conversation with A Naked Needle

Ubah Cristina Ali Farah

None

African languages and literature

Detail DOI Sumber

DOAJ Open Access 2019

Corinna R. Unger: Entwicklungspfade in Indien. Eine internationale Geschichte, 1947–1980

Julia Sophie Schmidt

History of Asia, Unlocalized maps (Asian studies only)

Detail DOI Sumber

DOAJ Open Access 2019

Raffael Raddatz (2017): Patriotismusdiskurse im gegenwärtigen Japan. Identitätssuche im Spannungsfeld von Nation, Region und globalem Kapital zu Beginn des 21. Jahrhunderts. Beiträge zur Politischen Wissenschaft, Band 192. Berlin: Duncker & Humblot.

Ken'ichi Mishima

Rezension

Language and Literature, Japanese language and literature

Detail DOI Sumber

arXiv Open Access 2019

Lab Hackathons to Overcome Laboratory Equipment Shortages in Africa: Opportunities and Challenges

Helena Webb, Jason R. C. Nurse, Louise Bezuidenhout et al.

Equipment shortages in Africa undermine Science, Technology, Engineering and Mathematics (STEM) Education. We have pioneered the LabHackathon (LabHack): a novel initiative that adapts the conventional hackathon and draws on insights from the Open Hardware movement and Responsible Research and Innovation (RRI). LabHacks are fun, educational events that challenge student participants to build frugal and reproducible pieces of laboratory equipment. Completed designs are then made available to others. LabHacks can therefore facilitate the open and sustainable design of laboratory equipment, in situ, in Africa. In this case study we describe the LabHackathon model, discuss its application in a pilot event held in Zimbabwe and outline the opportunities and challenges it presents.

en cs.CY, cs.HC

Detail DOI Sumber

arXiv Open Access 2019

Annals of Library and Information Studies. A bibliometric analysis of the journal and a comparison with the top library and information studies journals in Asia and worldwide (2011_2017)

Juan Jose Prieto-Gutierrez, Francisco Segado-Boj

This paper presents a thorough bibliometric analysis of research published in Annals of Library and Information Studies (ALIS), an India-based journal, for the period 2011_2017. Specifically, it compares this journal's trends with those of other library and information science (LIS) journals from the same geographical area (India, and Asia as a whole) and with the 10 highest-rated LIS journals worldwide. The source of the data used was the multidisciplinary database Scopus. To perform this comparison, ALIS' production was analyzed in order to identify authorship patterns; for example, authors' countries of residence, co-authorship trends, and collaboration networks. Research topics were identified through keyword analysis, while performance was measured by examining the number of citations articles received. This study provides substantial information. The research lines detected through examining the keywords in ALIS articles were determined to be similar to those for the top LIS journals in both Asia and worldwide. Specifically, ALIS authors are focusing on metrics, bibliometrics, and social networking, which follows global trends. Notably, however, collaboration among Asia-based journals was found to be lower than that in the top-indexed journals in the LIS field. The results obtained present a roadmap for expanding the research in this field.

en cs.DL, cs.IR

Detail DOI Sumber

Hasil untuk "Languages and literature of Eastern Asia, Africa, Oceania"