Hasil untuk "Languages and literature of Eastern Asia, Africa, Oceania"

Menampilkan 20 dari ~2976338 hasil · dari DOAJ, CrossRef, arXiv

JSON API
arXiv Open Access 2025
Multilingual State Space Models for Structured Question Answering in Indic Languages

Arpita Vats, Rahul Raja, Mrinal Mathur et al.

The diversity and complexity of Indic languages present unique challenges for natural language processing (NLP) tasks, particularly in the domain of question answering (QA).To address these challenges, this paper explores the application of State Space Models (SSMs),to build efficient and contextually aware QA systems tailored for Indic languages. SSMs are particularly suited for this task due to their ability to model long-term and short-term dependencies in sequential data, making them well-equipped to handle the rich morphology, complex syntax, and contextual intricacies characteristic of Indian languages. We evaluated multiple SSM architectures across diverse datasets representing various Indic languages and conducted a comparative analysis of their performance. Our results demonstrate that these models effectively capture linguistic subtleties, leading to significant improvements in question interpretation, context alignment, and answer generation. This work represents the first application of SSMs to question answering tasks in Indic languages, establishing a foundational benchmark for future research in this domain. We propose enhancements to existing SSM frameworks, optimizing their applicability to low-resource settings and multilingual scenarios prevalent in Indic languages.

en cs.CL, cs.AI
arXiv Open Access 2025
Impacts of Climate Change on Photovoltaic Potential in Africa

Eva Lu, Dongdong Wang

Africa holds the world's highest solar irradiance yet has <2% of global photovoltaic (PV) capacity, leaving 600 million people without electricity access. However, climate change impacts on its 10 TW potential remain understudied. Using four decades of ERA5 reanalysis data (1980-2020) at 0.25 degree resolution, we quantify the contributions of key climate factors to historical changes in African PV potential through multivariate decomposition. Continental PV potential increased by 3.2%, driven primarily by enhanced solar radiation (+1.2 degree Celsius, contributing -23%). East Africa gained >6% from radiation enhancement, while North Africa declined by 0.5% as extreme heat (+2 degree Celsius) overwhelmed radiation benefits. Critically, stability analysis using the coefficient of variation (CV) reveals that high-irradiance subtropical zones are highly variable (CV=0.4), in contrast to stable equatorial regions (CV=0.1), challenging the assumption that resource abundance ensures reliability. These findings reframe Africa's solar strategy: North Africa requires prioritizing heat-resilient technology over capacity maximization; subtropical zones demand grid-storage co-investment; and East Africa presents globally competitive opportunities for rapid, stable deployment. By resolving spatiotemporal heterogeneities and quantifying climate-driver contributions, our analysis provides an actionable framework for climate-resilient solar deployment, critical for Africa's energy transition and climate mitigation.

en physics.soc-ph, physics.ao-ph
arXiv Open Access 2025
FUSE : A Ridge and Random Forest-Based Metric for Evaluating MT in Indigenous Languages

Rahul Raja, Arpita Vats

This paper presents the winning submission of the RaaVa team to the AmericasNLP 2025 Shared Task 3 on Automatic Evaluation Metrics for Machine Translation (MT) into Indigenous Languages of America, where our system ranked first overall based on average Pearson correlation with the human annotations. We introduce Feature-Union Scorer (FUSE) for Evaluation, FUSE integrates Ridge regression and Gradient Boosting to model translation quality. In addition to FUSE, we explore five alternative approaches leveraging different combinations of linguistic similarity features and learning paradigms. FUSE Score highlights the effectiveness of combining lexical, phonetic, semantic, and fuzzy token similarity with learning-based modeling to improve MT evaluation for morphologically rich and low-resource languages. MT into Indigenous languages poses unique challenges due to polysynthesis, complex morphology, and non-standardized orthography. Conventional automatic metrics such as BLEU, TER, and ChrF often fail to capture deeper aspects like semantic adequacy and fluency. Our proposed framework, formerly referred to as FUSE, incorporates multilingual sentence embeddings and phonological encodings to better align with human evaluation. We train supervised models on human-annotated development sets and evaluate held-out test data. Results show that FUSE consistently achieves higher Pearson and Spearman correlations with human judgments, offering a robust and linguistically informed solution for MT evaluation in low-resource settings.

en cs.CL
DOAJ Open Access 2024
Modarres-e Reḍavi’s Edition of Anvari’s divān: A Critical Assessment

Brotto, Giacomo

The aim of this paper is two-folded: 1) to discuss Modarres-e Reḍavi’s edition of Anvari’s divān in order to show that this edition, although still very valuable, should be used cautiously: even for non-philological, literary-oriented studies manuscripts should be checked. These should include not only the newly-discovered codices, not used by the editor, but also the manuscripts he used, which must be double-checked; 2) to give a solid starting point to any scholar attempting to investigate Anvari’s divān from a philological perspective, by showing in which areas Modarres-e Reḍavi’s edition is lacking and to what extent.

Languages and literature of Eastern Asia, Africa, Oceania
arXiv Open Access 2024
Filipino Benchmarks for Measuring Sexist and Homophobic Bias in Multilingual Language Models from Southeast Asia

Lance Calvin Lim Gamboa, Mark Lee

Bias studies on multilingual models confirm the presence of gender-related stereotypes in masked models processing languages with high NLP resources. We expand on this line of research by introducing Filipino CrowS-Pairs and Filipino WinoQueer: benchmarks that assess both sexist and anti-queer biases in pretrained language models (PLMs) handling texts in Filipino, a low-resource language from the Philippines. The benchmarks consist of 7,074 new challenge pairs resulting from our cultural adaptation of English bias evaluation datasets, a process that we document in detail to guide similar forthcoming efforts. We apply the Filipino benchmarks on masked and causal multilingual models, including those pretrained on Southeast Asian data, and find that they contain considerable amounts of bias. We also find that for multilingual models, the extent of bias learned for a particular language is influenced by how much pretraining data in that language a model was exposed to. Our benchmarks and insights can serve as a foundation for future work analyzing and mitigating bias in multilingual models.

en cs.CL
arXiv Open Access 2024
EthioMT: Parallel Corpus for Low-resource Ethiopian Languages

Atnafu Lambebo Tonja, Olga Kolesnikova, Alexander Gelbukh et al.

Recent research in natural language processing (NLP) has achieved impressive performance in tasks such as machine translation (MT), news classification, and question-answering in high-resource languages. However, the performance of MT leaves much to be desired for low-resource languages. This is due to the smaller size of available parallel corpora in these languages, if such corpora are available at all. NLP in Ethiopian languages suffers from the same issues due to the unavailability of publicly accessible datasets for NLP tasks, including MT. To help the research community and foster research for Ethiopian languages, we introduce EthioMT -- a new parallel corpus for 15 languages. We also create a new benchmark by collecting a dataset for better-researched languages in Ethiopia. We evaluate the newly collected corpus and the benchmark dataset for 23 Ethiopian languages using transformer and fine-tuning approaches.

en cs.CL
arXiv Open Access 2024
Building a Language-Learning Game for Brazilian Indigenous Languages: A Case of Study

Gustavo Polleti

In this paper we discuss a first attempt to build a language learning game for brazilian indigenous languages and the challenges around it. We present a design for the tool with gamification aspects. Then we describe a process to automatically generate language exercises and questions from a dependency treebank and a lexical database for Tupian languages. We discuss the limitations of our prototype highlighting ethical and practical implementation concerns. Finally, we conclude that new data gathering processes should be established in partnership with indigenous communities and oriented for educational purposes.

en cs.CL
arXiv Open Access 2024
The State of Computer Vision Research in Africa

Abdul-Hakeem Omotayo, Ashery Mbilinyi, Lukman Ismaila et al.

Despite significant efforts to democratize artificial intelligence (AI), computer vision which is a sub-field of AI, still lags in Africa. A significant factor to this, is the limited access to computing resources, datasets, and collaborations. As a result, Africa's contribution to top-tier publications in this field has only been 0.06% over the past decade. Towards improving the computer vision field and making it more accessible and inclusive, this study analyzes 63,000 Scopus-indexed computer vision publications from Africa. We utilize large language models to automatically parse their abstracts, to identify and categorize topics and datasets. This resulted in listing more than 100 African datasets. Our objective is to provide a comprehensive taxonomy of dataset categories to facilitate better understanding and utilization of these resources. We also analyze collaboration trends of researchers within and outside the continent. Additionally, we conduct a large-scale questionnaire among African computer vision researchers to identify the structural barriers they believe require urgent attention. In conclusion, our study offers a comprehensive overview of the current state of computer vision research in Africa, to empower marginalized communities to participate in the design and development of computer vision systems.

arXiv Open Access 2023
Natural Language Processing in Ethiopian Languages: Current State, Challenges, and Opportunities

Atnafu Lambebo Tonja, Tadesse Destaw Belay, Israel Abebe Azime et al.

This survey delves into the current state of natural language processing (NLP) for four Ethiopian languages: Amharic, Afaan Oromo, Tigrinya, and Wolaytta. Through this paper, we identify key challenges and opportunities for NLP research in Ethiopia. Furthermore, we provide a centralized repository on GitHub that contains publicly available resources for various NLP tasks in these languages. This repository can be updated periodically with contributions from other researchers. Our objective is to identify research gaps and disseminate the information to NLP researchers interested in Ethiopian languages and encourage future research in this domain.

en cs.CL
DOAJ Open Access 2022
Challenges Facing Community Management of Rural Water Supply: The Case of Ohangwena Region, Namibia

Nespect Butty Salom, Prudence Khumalo

This study investigated the critical success factors for the community management of rural water supplies in the Ohangwena Region, Namibia. Rural communities in Namibia receive water through the Community Based Management (CBM) strategy, which necessitates water governance decentralization, thereby enabling local communities to participate in the management of their water resources. In pursuance of this policy and philosophy, a large number of water point committees have been created nationally. At least half of the existing water points in rural areas in Namibia are faulty and dysfunctional, however, and the majority of people are still struggling to access clean water. The study endeavoured to examine key considerations that have a positive impact on the success of the management of the rural water supply in Namibia, using the Ohangwena Region as a case study. Qualitative and quantitative methods were used in the study. The findings from the study affirmed that governance, leadership attributes of the committee members, training and capacity building, level of community involvement, coordination and support are critical success factors for effective management of rural water supplies. Finally, a rural water management model was developed, which is anticipated to contribute towards improved management of rural water provision in the study area.

History of Africa, African languages and literature
DOAJ Open Access 2022
The transformation of management, business culture, and work style in Japanese companies

S. V. Shaposhnikov, Yu. Sadoi

Achieving sustainable economic development and productivity growth in Japanese companies are possible through digital transformation (DX). The implementation of digital transformation has become even more important as the COVID-19 pandemic has led to a major downturn in economic activity in the country. The transition to digital technology is revealing the traditional features of Japanese company work style, business culture, and management that are being altered by the digital transformation process. This digital transition can be a catalyst for Japanese companies to change or even abandon the work style, business culture, and management that today not only prevent companies from being competitive, but also can lead to extinction. The changes taking place so far meet resistance, but, perhaps, will soon be accepted and adapted by Japanese business.

Japanese language and literature
DOAJ Open Access 2021
The Past as an Exponent of the Present in Modern Tamil Literature: Story-(re)-Telling and Telling History in Selected Works of Indira Parthasarathy

Jacek Woźniak

Indira Parthasarathy is the author of many works that touch upon historical issues but are in fact reflections on contemporary India. Although the narrative of some of them takes place in the past, they cannot be called historical literature. While the author is not really interested in describing the past per se, as is also often the case with other contemporary Tamil writers, clear references to the past and history help him showcase contemporary issues, current problems, and life as it is here and now. The article briefly discusses two plays, whose protagonists are historical figures; a novel based on a contemporary event that has become an integral part of the history of Tamil Nadu; and two other works which came to be written on the basis of writer’s own life experience in Poland and are in a way related to the history of that country.

Indo-Iranian languages and literature, Languages and literature of Eastern Asia, Africa, Oceania
arXiv Open Access 2021
Differentiable Allophone Graphs for Language-Universal Speech Recognition

Brian Yan, Siddharth Dalmia, David R. Mortensen et al.

Building language-universal speech recognition systems entails producing phonological units of spoken sound that can be shared across languages. While speech annotations at the language-specific phoneme or surface levels are readily available, annotations at a universal phone level are relatively rare and difficult to produce. In this work, we present a general framework to derive phone-level supervision from only phonemic transcriptions and phone-to-phoneme mappings with learnable weights represented using weighted finite-state transducers, which we call differentiable allophone graphs. By training multilingually, we build a universal phone-based speech recognition model with interpretable probabilistic phone-to-phoneme mappings for each language. These phone-based systems with learned allophone graphs can be used by linguists to document new languages, build phone-based lexicons that capture rich pronunciation variations, and re-evaluate the allophone mappings of seen language. We demonstrate the aforementioned benefits of our proposed framework with a system trained on 7 diverse languages.

en cs.CL, cs.SD
arXiv Open Access 2021
Blockchain for Genomics: A Systematic Literature Review

Mohammed Alghazwi, Fatih Turkmen, Joeri van der Velde et al.

Human genomic data carry unique information about an individual and offer unprecedented opportunities for healthcare. The clinical interpretations derived from large genomic datasets can greatly improve healthcare and pave the way for personalized medicine. Sharing genomic datasets, however, pose major challenges, as genomic data is different from traditional medical data, indirectly revealing information about descendants and relatives of the data owner and carrying valid information even after the owner passes away. Therefore, stringent data ownership and control measures are required when dealing with genomic data. In order to provide secure and accountable infrastructure, blockchain technologies offer a promising alternative to traditional distributed systems. Indeed, the research on blockchain-based infrastructures tailored to genomics is on the rise. However, there is a lack of a comprehensive literature review that summarizes the current state-of-the-art methods in the applications of blockchain in genomics. In this paper, we systematically look at the existing work both commercial and academic, and discuss the major opportunities and challenges. Our study is driven by five research questions that we aim to answer in our review. We also present our projections of future research directions which we hope the researchers interested in the area can benefit from.

DOAJ Open Access 2020
LARAS BAHASA IKLAN PADA MEDIA SOSIAL INSTAGRAM

Adisti Putri Pramesti, Martutik Martutik

Penelitian ini bertujuan untuk mendeskripsikan (1) bentuk laras bahasa iklan, (2) pola laras bahasa iklan, dan (3) fungsi laras bahasa iklan pada media sosial instagram. Penelitian ini menggunakan pendekatan kualitatif dengan jenis penelitian grounded theory. Data penelitian ini berupa wacana yang mendampingi gambar yang terdapat dalam instagram. Proses pengumpulan data dilakukan dengan menggunakan teknik dokumentasi. Analisis data yang dilakukan meliputi proses reduksi data yaitu identifikasi, pengodean, dan klasifikasi. Kemudian dilakukan penyajian data dalam bentuk teks naratif, penarikan kesimpulan, dan pengecekan keabsahan data. Berdasarkan analisis data, diperoleh hasil penelitian sebagai berikut. Pertama, bentuk laras bahasa iklan yang terdapat dalam iklan meliputi bentuk dasar, afiksasi, reduplikasi, pemajemukan, singkatan dan akronim, serta istilah khusus. Kedua, pola laras bahasa yang digunakan dalam iklan meliputi urutan kata dan struktur kalimat. Ketiga, fungsi laras bahasa yang terdapat dalam iklan meliputi fungsi ekspresif, fungsi direktif, fungsi informasional, fungsi interaksional, dan fungsi puitik. Kata Kunci: laras bahasa, iklan, media sosial This study aims to describe (1) the form of laras bahasa advertising, (2) the pattern of laras bahasa advertising, and (3) the function of laras bahasa advertising on instagram. This study used a qualitative approach with grounded theory study. This research data in the form of discourse in the description contained in the instagram picture. The process of data collection was done by using the documentation. The data analysis was conducted on data reduction process is the identification, coding, and classification. Then the data is presented in the form of narrative text, drawing conclusions, and checking the validity of the data. Based on the data analysis, the result of the study as follows. First, the form of laras bahasa contained in the online shop include basic shapes, affixation, reduplication, compounding, abbreviations and acronyms, as well as special names. Second, the pattern of laras bahasa used in online shop advertising include word order and sentence structure. Third, the function of laras bahasa contained in the online shop advertising include expressive function, the function of the directive, the functions informasional, interaksional function, and the function of the poetic. Key words: laras bahasa,advertising, instagram

Languages and literature of Eastern Asia, Africa, Oceania
arXiv Open Access 2020
Multi-nucleon transfer in the interaction of 977 MeV and 1143 MeV $^{204}$Hg with $^{208}$Pb

V. V. Desai, A. Pica, W. Loveland et al.

A previous study of symmetric collisions of massive nuclei has shown that current models of multi-nucleon transfer (MNT) reactions do not adequately describe the transfer product yields. To gain further insight into this problem, we have measured the yields of MNT products in the interaction of 977 (E/A = 4.79 MeV) and 1143 MeV (E/A = 5.60 MeV) $^{204}$Hg with $^{208}$Pb. We find that the yield of multi-nucleon transfer products are similar in these two reactions and are substantially lower than those observed in the reaction of 1257 MeV (E/A = 6.16 MeV) $^{204}$Hg + $^{198}$Pt. We compare our measurements with the predictions of the GRAZING-F, di-nuclear systems (DNS) and improved quantum molecular dynamics (ImQMD) models. For the observed isotopes of the elements Au, Hg, Tl, Pb and Bi, the measured values of the MNT cross sections are orders of magnitude larger than the predicted values. Furthermore, the various models predict the formation of nuclides near the N=126 shell, which are not observed.

arXiv Open Access 2020
Correctness of Sequential Monte Carlo Inference for Probabilistic Programming Languages

Daniel Lundén, Johannes Borgström, David Broman

Probabilistic programming is an approach to reasoning under uncertainty by encoding inference problems as programs. In order to solve these inference problems, probabilistic programming languages (PPLs) employ different inference algorithms, such as sequential Monte Carlo (SMC), Markov chain Monte Carlo (MCMC), or variational methods. Existing research on such algorithms mainly concerns their implementation and efficiency, rather than the correctness of the algorithms themselves when applied in the context of expressive PPLs. To remedy this, we give a correctness proof for SMC methods in the context of an expressive PPL calculus, representative of popular PPLs such as WebPPL, Anglican, and Birch. Previous work have studied correctness of MCMC using an operational semantics, and correctness of SMC and MCMC in a denotational setting without term recursion. However, for SMC inference -- one of the most commonly used algorithms in PPLs as of today -- no formal correctness proof exists in an operational setting. In particular, an open question is if the resample locations in a probabilistic program affects the correctness of SMC. We solve this fundamental problem, and make four novel contributions: (i) we extend an untyped PPL lambda calculus and operational semantics to include explicit resample terms, expressing synchronization points in SMC inference; (ii) we prove, for the first time, that subject to mild restrictions, any placement of the explicit resample terms is valid for a generic form of SMC inference; (iii) as a result of (ii), our calculus benefits from classic results from the SMC literature: a law of large numbers and an unbiased estimate of the model evidence; and (iv) we formalize the bootstrap particle filter for the calculus and discuss how our results can be further extended to other SMC algorithms.

Halaman 7 dari 148817