Hasil "Language acquisition"

DOAJ Open Access 2025

Reviving Spoken Tamil among South African Tamils / தென்னாப்பிரிக்க தமிழர்கள் இடையே பேச்சுத் தமிழுக்கு ஒரு மறுதொடக்கம்

Dr Kameshwaran Envernathan Govender / முனைவர் காமேஷ்வரன் என்வர்நாதன் கோவேந்தர், Prof Nalini Moodley / பேராசிரியர் நளினி முதலி

Traditions survive beyond language but language can revive traditions. Tamil was once the spoken language of South African Tamils who were brought as labourers (1860–1911), and has largely been lost over generations due to the dominance of English. The socio-political changes, historical disruptions and apartheid-era isolation from India were also other reasons for it. Tamil traditions, their religious practices and festivals continue to flourish as evidence of the Tamil cultural spirit. Nevertheless, the incapacity to speak Tamil creates a sense of guilt and longing among many of the native Tamils. It also makes efforts to reconnect with their linguistic heritage. This paper explores how Tamil media, radio, and social platforms act as modern tools to facilitate language renewal and cultural preservation among South African Tamils. The study examines the historical contexts, conducting interviews with individuals who are engaged in learning Tamil and analysing current practices. It highlights the transforming role of technology in connecting the generation gaps. The media, such as Tamil television, radio programs and other modern social media platforms provide opportunities to speak the language even amid different situations. Hence, the study highlights how these media act as tools that aid language acquisition and foster a deeper sense of identity to Tamil heritage.

Language and Literature

Detail DOI Sumber

DOAJ Open Access 2025

From Tradition to Transformation: Addressing Challenges and Trends in Arabic Syntax Mastery via Problem-Based Learning

Faida Masruroh, Abdul Basith

Understanding and mastering Arabic syntax have a significant contribution to comprehending religious sciences, especially the books that use the Arabic language. The teaching method used can influence the enthusiasm, motivation, and outcomes of student learning. The purpose of this research is to determine the effectiveness of the problem-based learning method in improving understanding and mastery of Arabic syntax. In this study, data were collected using a quasi-experimental methodology to analyze it quantitatively, determining the effectiveness of using PBL in Arabic language acquisition for understanding grammar. We analyzed the quantitative data using independent sample t-test statistics. Using the independent sample t-test criteria, a Sig. (2-tailed) value < 0.05 indicates a significant result. The data testing produced a Sig. (2-tailed) value of 0.007, which is less than 0.05. This indicates that the intervention produced significant results. The problem-based learning method is very capable and effective in enhancing knowledge and understanding of Arabic syntax.

Philology. Linguistics

Detail DOI Sumber

DOAJ Open Access 2025

Sesotho Language Acquisition by Faculty of Education Students in South Africa: A Systematic Review

Nthabiseng B. Khoalenyane, Patrick Alpheous Nyathi, Precious Moyo

Higher education institutions are increasingly interested in teaching African languages, specifically as third, fourth, or additional languages. Learning Sesotho poses a unique challenge to non-native speakers if introduced at the exit phase. This systematic review aims to identify the challenges students face while learning Sesotho at the exit stages of their educational degrees and explore how their proficiency in Sesotho can benefit professional teaching practices in different regions of South Africa. Within the scope of this objective, a comprehensive literature search was conducted in "Google Scholar, Scopus, and JSTOR" As of 22 September 2024, a total of 73 articles were identified from the databases. During the initial screening of titles and abstracts, 11 duplicates were excluded. Of the remaining 62 articles, 40 were excluded based on relevance, and 22 were downloaded to the digital workspace. Prioritising African languages in education, particularly by studying additional indigenous languages, can result in significant advantages. Therefore, the study examines the pros and cons of acquiring conversational Sesotho proficiency, particularly in a university setting where IsiZulu may be the predominant language. This exploration highlights the broader implications and benefits of introducing linguistic diversity in educational environments in exit phases. In order to capture nuanced perspectives and experiences, this paper adopts a systematic literature review approach to gain a comprehensive knowledge of the challenges, benefits, and implications of learning Sesotho as an additional language in higher education contexts. The findings of this research highlight that student-teachers lack an understanding of the need to learn an additional language, and therefore, they are not motivated to acquire this knowledge.

Language and Literature, Social Sciences

Detail DOI Sumber

arXiv Open Access 2025

EuroGEST: Investigating gender stereotypes in multilingual language models

Jacqueline Rowe, Mateusz Klimaszewski, Liane Guillou et al.

Large language models increasingly support multiple languages, yet most benchmarks for gender bias remain English-centric. We introduce EuroGEST, a dataset designed to measure gender-stereotypical reasoning in LLMs across English and 29 European languages. EuroGEST builds on an existing expert-informed benchmark covering 16 gender stereotypes, expanded in this work using translation tools, quality estimation metrics, and morphological heuristics. Human evaluations confirm that our data generation method results in high accuracy of both translations and gender labels across languages. We use EuroGEST to evaluate 24 multilingual language models from six model families, demonstrating that the strongest stereotypes in all models across all languages are that women are 'beautiful', 'empathetic' and 'neat' and men are 'leaders', 'strong, tough' and 'professional'. We also show that larger models encode gendered stereotypes more strongly and that instruction finetuning does not consistently reduce gendered stereotypes. Our work highlights the need for more multilingual studies of fairness in LLMs and offers scalable methods and resources to audit gender bias across languages.

en cs.CL

Detail DOI Sumber

arXiv Open Access 2025

Benchmarking Vision Language Models on German Factual Data

René Peinl, Vincent Tischler

Similar to LLMs, the development of vision language models is mainly driven by English datasets and models trained in English and Chinese language, whereas support for other languages, even those considered high-resource languages such as German, remains significantly weaker. In this work we present an analysis of open-weight VLMs on factual knowledge in the German and English language. We disentangle the image-related aspects from the textual ones by analyzing accu-racy with jury-as-a-judge in both prompt languages and images from German and international contexts. We found that for celebrities and sights, VLMs struggle because they are lacking visual cognition of German image contents. For animals and plants, the tested models can often correctly identify the image contents ac-cording to the scientific name or English common name but fail in German lan-guage. Cars and supermarket products were identified equally well in English and German images across both prompt languages.

en cs.CL

Detail Sumber

arXiv Open Access 2025

EchoMind: An Interrelated Multi-level Benchmark for Evaluating Empathetic Speech Language Models

Li Zhou, Lutong Yu, You Lyu et al.

Speech Language Models (SLMs) have made significant progress in spoken language understanding. Yet it remains unclear whether they can fully perceive non lexical vocal cues alongside spoken words, and respond with empathy that aligns with both emotional and contextual factors. Existing benchmarks typically evaluate linguistic, acoustic, reasoning, or dialogue abilities in isolation, overlooking the integration of these skills that is crucial for human-like, emotionally intelligent conversation. We present EchoMind, the first interrelated, multi-level benchmark that simulates the cognitive process of empathetic dialogue through sequential, context-linked tasks: spoken-content understanding, vocal-cue perception, integrated reasoning, and response generation. All tasks share identical and semantically neutral scripts that are free of explicit emotional or contextual cues, and controlled variations in vocal style are used to test the effect of delivery independent of the transcript. EchoMind is grounded in an empathy-oriented framework spanning 3 coarse and 12 fine-grained dimensions, encompassing 39 vocal attributes, and evaluated using both objective and subjective metrics. Testing 12 advanced SLMs reveals that even state-of-the-art models struggle with high-expressive vocal cues, limiting empathetic response quality. Analyses of prompt strength, speech source, and ideal vocal cue recognition reveal persistent weaknesses in instruction-following, resilience to natural speech variability, and effective use of vocal cues for empathy. These results underscore the need for SLMs that integrate linguistic content with diverse vocal cues to achieve truly empathetic conversational ability.

en cs.CL

Detail Sumber

DOAJ Open Access 2024

Factors facilitating and hindering South Asian immigrant adults from engaging in exercise and physical activity – a qualitative systematic review

Nasimah Maricar, Behram Khan, Trixy David et al.

Abstract Background Exercise and physical activity are key components of management in patients with rheumatic musculoskeletal diseases (RMD), but people of the South Asian communities have a lower level of engagement with these activities compared to their Caucasian counterparts. The aim of this qualitative systematic review was to determine the barriers and facilitators of exercise and physical activity in South Asian communities who have migrated and live in western countries, particularly in those who have RMD. Methods Qualitative studies, published in English between 1999 and 2021 and including evaluation of barriers and/or facilitators to exercise or physical activity behaviour in people of South Asian adult communities who have migrated and/or lived in western countries were identified from Embase, MEDLINE, CINAHL, PsycINFO, Google Scholar and manual searches. The studies were appraised using the CASP checklist. Inductive thematic synthesis was used to identify common and global themes. Results A total of 32 studies that discussed barriers and facilitators of physical activity in South Asian communities who have migrated and lived in western countries were used for this review but there were no studies identified that focussed specifically on those with RMD. Following appraisal of the reporting of the studies, 30 studies were included in the pooling of the results. The facilitators and barriers to physical activities were broadly categorized into ‘extrinsic’ and ‘intrinsic’ factors. Extrinsic factors such as ‘opportunity’ included environmental factors such as weather and safety; socioeconomic factors such as education, language and literacy, and support in the form of social, psychological and resources. Intrinsic factors included cultural factors, such as life stages and family influence, beliefs and knowledge, which impacted attitudes and skills. Conclusions This review has synthesised evidence of barriers or facilitators and identified potentially modifiable factors influencing physical activity and exercise engagement, which could form the basis of evidence-based interventions to promote participation in healthy behaviour change. Provision of a safe, comfortable and culturally acceptable environment together with culturally-aligned cognitive strategies to facilitate acquisition of exercise-efficacy skills could help engagement. Registration The systematic review was registered on PROSPERO, registration no. 289,235.

Public aspects of medicine

Detail DOI Sumber

DOAJ Open Access 2024

Sustainable development goals and transversal competences through L2/3 virtual exchange

Oksana Polyakova, Leonarda Lovrović

This pilot study explores the effects of virtual exchange in the higher education setting. The core goal is to investigate the potential new training environments in second or third-language learning and relate their use to competence enhancement and sustainability awareness. Through quantitative, qualitative and pre-/ post-test methods, the researchers detected a positive correlation between the online didactic experiment and the increased capacities of international learners. Moreover, this project has provided an exciting opportunity to compare the advancements of technology and humanities undergraduates from Spain and Croatia. Given the importance of intercultural cooperation among different nationalities, this interaction process offers a new communication channel linked to sustainability and competence context in English as a Foreign Language. Besides, the debate about online collaboration's role in foreign language course design has gained fresh insights and delivered experimental data worth further consideration.

Romanic languages, Education

Detail DOI Sumber

DOAJ Open Access 2024

Zurück in die Zukunft?

Jule Böhmer

This article deals with the question of how foreign language teaching, especially Russian, will be organized and designed in schools in the future. The rapidly changing social conditions mean that there is a great need for reform of the school system in Germany, which has so far responded too little to the changed conditions of digitality and has not adapted its learning and examination culture to the circumstances of the 21st century. Foreign language teaching, with its current focus on the acquisition of communicative skills, is increasingly being called into question by the rapid pace of technological progress at its core. If artificial intelligence (AI) based applications take over more, easier and better communicative skills in the future, the question arises whether the acquisition of intercultural communicative competencies should not play a greater role in foreign language acquisition. Based on this, suggestions are made as to how Russian lessons can be organized and designed in the future and what measures are necessary for this.

Slavic languages. Baltic languages. Albanian languages

Detail DOI Sumber

DOAJ Open Access 2024

Language mediation during consultations between deaf patients and health care providers at Chivhu Hospital, Zimbabwe

Tawanda Matende, Paul Svongoro, Bridget Phiri

This study examines the extent to which the Covid-19 pandemic exacerbated language barriers between the Deaf community and healthcare providers. Virtual interviews and focus group discussions (FGDs) with Deaf merchants and other Deaf people living in Chivhu, Zimbabwe, were used to gather data for the study. Observation and document analysis were also utilized to supplement these two approaches of data acquisition. The study used critical theory as its theoretical framework in order to better understand the communication difficulties between Deaf and Healthcare practitioners. According to the study's findings, Covid-19 had a significant impact on how Deaf people and healthcare workers interacted, which helped the Deaf population receive health services. In addition, the lack of qualified Sign Language interpreters to close the communication gap between Deaf patients and healthcare professionals made it more difficult for Deaf people to access healthcare facilities. The study suggests, among other initiatives, that during emergency crisis situations like those brought on by the Covid-19 pandemic, the Government of Zimbabwe, the health system, and various stakeholders should provide accessible information and language mediators for the Deaf.

Language and Literature

Detail DOI Sumber

DOAJ Open Access 2024

The interplay between French and English reading skills among Moroccan 9th-grade middle school students: a correlational study

Imad Hamdanat

This correlational study investigates the association between reading skills in French (L2) and English (L3) among 9th-grade middle school students in Morocco (n = 70). Building on Cummins' Interdependence Hypothesis, the study explores the extent to which L2 French reading proficiency is associated with L3 English reading abilities. Standardized cloze tests were administered to students in Sidi Kacem, Morocco, to assess reading comprehension in both languages. Data analysis using SPSS 21 employed descriptive statistics and Pearson's correlation coefficient to examine the relationship between L2 and L3 reading skills. The findings revealed a significant positive correlation (R = 0.69, p < .001) between L2 and L3 reading skills, particularly in sub-skills like skimming and scanning. This suggests that strong foundational reading abilities acquired in French (L2) can facilitate the development of reading skills in English (L3). The discussion emphasizes the importance of capitalizing on existing L2 competencies to enhance L3 learning. It proposes a pedagogical approach that acknowledges the interconnectedness of language abilities, fostering a cross-linguistic instructional perspective that leverages students' strengths across their linguistic repertoire. Overall, this study contributes empirical evidence to the field of language acquisition, offering valuable insights into effective language education practices in multilingual contexts like Morocco.

Special aspects of education, Language acquisition

Detail Sumber

arXiv Open Access 2024

Facilitating large language model Russian adaptation with Learned Embedding Propagation

Mikhail Tikhomirov, Daniil Chernyshev

Rapid advancements of large language model (LLM) technologies led to the introduction of powerful open-source instruction-tuned LLMs that have the same text generation quality as the state-of-the-art counterparts such as GPT-4. While the emergence of such models accelerates the adoption of LLM technologies in sensitive-information environments the authors of such models don not disclose the training data necessary for replication of the results thus making the achievements model-exclusive. Since those open-source models are also multilingual this in turn reduces the benefits of training a language specific LLMs as improved inference computation efficiency becomes the only guaranteed advantage of such costly procedure. More cost-efficient options such as vocabulary extension and subsequent continued pre-training are also inhibited by the lack of access to high-quality instruction-tuning data since it is the major factor behind the resulting LLM task-solving capabilities. To address the limitations and cut the costs of the language adaptation pipeline we propose Learned Embedding Propagation (LEP). Unlike existing approaches our method has lower training data size requirements due to minimal impact on existing LLM knowledge which we reinforce using novel ad-hoc embedding propagation procedure that allows to skip the instruction-tuning step and instead implant the new language knowledge directly into any existing instruct-tuned variant. We evaluated four Russian vocabulary adaptations for LLaMa-3-8B and Mistral-7B, showing that LEP is competitive with traditional instruction-tuning methods, achieving performance comparable to OpenChat 3.5 and LLaMa-3-8B-Instruct, with further improvements via self-calibration and continued tuning enhancing task-solving capabilities.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2024

A Critical Review of Causal Reasoning Benchmarks for Large Language Models

Linying Yang, Vik Shirvaikar, Oscar Clivio et al.

Numerous benchmarks aim to evaluate the capabilities of Large Language Models (LLMs) for causal inference and reasoning. However, many of them can likely be solved through the retrieval of domain knowledge, questioning whether they achieve their purpose. In this review, we present a comprehensive overview of LLM benchmarks for causality. We highlight how recent benchmarks move towards a more thorough definition of causal reasoning by incorporating interventional or counterfactual reasoning. We derive a set of criteria that a useful benchmark or set of benchmarks should aim to satisfy. We hope this work will pave the way towards a general framework for the assessment of causal understanding in LLMs and the design of novel benchmarks.

en cs.LG, cs.CL

Detail Sumber

arXiv Open Access 2024

Quantifying Memorization and Detecting Training Data of Pre-trained Language Models using Japanese Newspaper

Shotaro Ishihara, Hiromu Takahashi

Dominant pre-trained language models (PLMs) have demonstrated the potential risk of memorizing and outputting the training data. While this concern has been discussed mainly in English, it is also practically important to focus on domain-specific PLMs. In this study, we pre-trained domain-specific GPT-2 models using a limited corpus of Japanese newspaper articles and evaluated their behavior. Experiments replicated the empirical finding that memorization of PLMs is related to the duplication in the training data, model size, and prompt length, in Japanese the same as in previous English studies. Furthermore, we attempted membership inference attacks, demonstrating that the training data can be detected even in Japanese, which is the same trend as in English. The study warns that domain-specific PLMs, sometimes trained with valuable private data, can ''copy and paste'' on a large scale.

en cs.CL

Detail Sumber

arXiv Open Access 2024

JBBQ: Japanese Bias Benchmark for Analyzing Social Biases in Large Language Models

Hitomi Yanaka, Namgi Han, Ryoma Kumon et al.

With the development of large language models (LLMs), social biases in these LLMs have become a pressing issue. Although there are various benchmarks for social biases across languages, the extent to which Japanese LLMs exhibit social biases has not been fully investigated. In this study, we construct the Japanese Bias Benchmark dataset for Question Answering (JBBQ) based on the English bias benchmark BBQ, with analysis of social biases in Japanese LLMs. The results show that while current open Japanese LLMs with more parameters show improved accuracies on JBBQ, their bias scores increase. In addition, prompts with a warning about social biases and chain-of-thought prompting reduce the effect of biases in model outputs, but there is room for improvement in extracting the correct evidence from contexts in Japanese. Our dataset is available at https://github.com/ynklab/JBBQ_data.

en cs.CL

Detail Sumber

DOAJ Open Access 2023

METHODOLOGY OF THREE FOREIGN LANGUAGES SIMULTANEOUS TEACHING (BASED ON THE ANALYSIS OF YA.R. HAIDAROV’S AUTHOR’S COURSE)

A.N. Utekhina

The article substantiates the relevance of several foreign languages acquisition for specialists in all fields of Russian science and technology. Linguodidactics has always stated the connection be-tween the development of cultural and linguistic education with political, socio-economic and his-torical changes taking place in the world. Scientists have identified the target orientations and value-semantic orientation of modern linguodidactic concepts: cognitive-oriented, activity-oriented, per-sonality-oriented and intellectual-developing. The article reveals the content of these methodological provisions, updated in accordance with the requirements of the Federal State Educational Standard, developing cultural and linguistic potential at the present stage. In particular, it seems necessary to analyze the course proposed by Ya.R. Haidarov for simultaneous teaching of three Romance group languages: French, Italian, Spanish. In accordance with the didactic laws, this linguistically competently constructed language ma-terial presupposes the definition of the basic principles of teaching: the principle of personality-oriented learning and communication between teachers and students, the principle of activating the reserve capabilities of students and teachers, the principle of communicative situationality. The pro-posed technology of simultaneous teaching of several foreign languages is described, implemented in stages through a set of methods enriched with practices developed and tested in dissertation re-search by the scientists belonging to our scientific school: preliminary phonetic development of new sounds and words, presentation of a new text, comprehension of the text by students, inclusion in the practice of speech, reflection. The implementation of this technology enables us to propose a scheme for the development of a methodology for simultaneous teaching of several foreign lan-guages: justification of the theme relevance, definition of methodological provisions, according to which a didactic scheme of the basics is determined as well as the principles for implementing the content of teaching three foreign languages and a step-by-step technology is described. Communica-tive and informational readiness is not only an opportunity to carry out social and communicative interaction with representatives of other countries, but also to extract the necessary foreign language information as a resource for responding to modern challenges in the changing world.

Special aspects of education

Detail DOI Sumber

arXiv Open Access 2023

Text classification dataset and analysis for Uzbek language

Elmurod Kuriyozov, Ulugbek Salaev, Sanatbek Matlatipov et al.

Text classification is an important task in Natural Language Processing (NLP), where the goal is to categorize text data into predefined classes. In this study, we analyse the dataset creation steps and evaluation techniques of multi-label news categorisation task as part of text classification. We first present a newly obtained dataset for Uzbek text classification, which was collected from 10 different news and press websites and covers 15 categories of news, press and law texts. We also present a comprehensive evaluation of different models, ranging from traditional bag-of-words models to deep learning architectures, on this newly created dataset. Our experiments show that the Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN) based models outperform the rule-based models. The best performance is achieved by the BERTbek model, which is a transformer-based BERT model trained on the Uzbek corpus. Our findings provide a good baseline for further research in Uzbek text classification.

en cs.CL

Detail Sumber

arXiv Open Access 2023

"Mistakes Help Us Grow": Facilitating and Evaluating Growth Mindset Supportive Language in Classrooms

Kunal Handa, Margaret Clapper, Jessica Boyle et al.

Teachers' growth mindset supportive language (GMSL)--rhetoric emphasizing that one's skills can be improved over time--has been shown to significantly reduce disparities in academic achievement and enhance students' learning outcomes. Although teachers espouse growth mindset principles, most find it difficult to adopt GMSL in their practice due the lack of effective coaching in this area. We explore whether large language models (LLMs) can provide automated, personalized coaching to support teachers' use of GMSL. We establish an effective coaching tool to reframe unsupportive utterances to GMSL by developing (i) a parallel dataset containing GMSL-trained teacher reframings of unsupportive statements with an accompanying annotation guide, (ii) a GMSL prompt framework to revise teachers' unsupportive language, and (iii) an evaluation framework grounded in psychological theory for evaluating GMSL with the help of students and teachers. We conduct a large-scale evaluation involving 174 teachers and 1,006 students, finding that both teachers and students perceive GMSL-trained teacher and model reframings as more effective in fostering a growth mindset and promoting challenge-seeking behavior, among other benefits. We also find that model-generated reframings outperform those from the GMSL-trained teachers. These results show promise for harnessing LLMs to provide automated GMSL feedback for teachers and, more broadly, LLMs' potentiality for supporting students' learning in the classroom. Our findings also demonstrate the benefit of large-scale human evaluations when applying LLMs in educational domains.

en cs.CL

Detail Sumber

arXiv Open Access 2023

ALBERTI, a Multilingual Domain Specific Language Model for Poetry Analysis

Javier de la Rosa, Álvaro Pérez Pozo, Salvador Ros et al.

The computational analysis of poetry is limited by the scarcity of tools to automatically analyze and scan poems. In a multilingual settings, the problem is exacerbated as scansion and rhyme systems only exist for individual languages, making comparative studies very challenging and time consuming. In this work, we present \textsc{Alberti}, the first multilingual pre-trained large language model for poetry. Through domain-specific pre-training (DSP), we further trained multilingual BERT on a corpus of over 12 million verses from 12 languages. We evaluated its performance on two structural poetry tasks: Spanish stanza type classification, and metrical pattern prediction for Spanish, English and German. In both cases, \textsc{Alberti} outperforms multilingual BERT and other transformers-based models of similar sizes, and even achieves state-of-the-art results for German when compared to rule-based systems, demonstrating the feasibility and effectiveness of DSP in the poetry domain.

en cs.CL

Detail Sumber

DOAJ Open Access 2022

La variación traductológica en el título de una obra literaria.

Salud María Jarilla Bravo

El objetivo de este trabajo es analizar las dificultades en la traducción (español- italiano) de ciertos títulos de obras literarias que están formados por estructuras fijas del lenguaje, desde la perspectiva teórica de la variación como recurso traductológico. En el mercado actual nos encontramos con muchos títulos de novelas traducidas que difieren bastante del título original. No podemos ignorar que en la mayoría de los casos entran en juego decisiones editoriales y se alejan considerablemente de los títulos propuestos por el propio traductor. Pero ¿qué ocurre cuando el traductor se encuentra con un elemento paremiológico y añade variantes o desdeña una variación? Vamos a exponer cómo la tradición literaria española prevé el refrán como recurso estilístico a la hora de ser utilizado como título en una gran cantidad de obras, sobre todo de corte teatral.

Special aspects of education, Philology. Linguistics

Detail DOI Sumber

Hasil untuk "Language acquisition"