Hasil "Language acquisition"

DOAJ Open Access 2026

The impact of language context on inter-brain synchrony in bilingual families

Efstratia Papoutselou, Efstratia Papoutselou, Nivetha Saravanan et al.

BackgroundBilingualism is increasingly common in families worldwide, yet bilingual individuals remain underrepresented in developmental neuroscience research. In simultaneous bilingualism, children typically acquire two languages simultaneously from birth, while their parents tend to learn the societal language later in life. These differences in language acquisition may influence how parents and children communicate, particularly when interacting in a second language. Neural synchrony, the temporal alignment of brain activity between individuals, has emerged as a key mechanism underlying social connection, communication, and learning in early development. However, little is known about how language choice affects neural synchrony in bilingual parent–child interactions.MethodsThis study used functional near-infrared spectroscopy (fNIRS) hyperscanning to simultaneously record brain activity from 15 bilingual mother–child dyads during naturalistic play. Each dyad completed three conditions: collaborative play in the mother's native language, collaborative play in English (the mother's second language), and independent play. Neural activity was recorded from the prefrontal cortex (PFC) and temporoparietal junction (TPJ), regions associated with social cognition, joint attention, and mentalising. Families took part in a naturalistic free play paradigm, allowing them to interact in a comfortable and ecologically valid manner.ResultsBoth native- and English-language play elicited significantly greater neural synchrony across the PFC and the TPJ than independent play, validating the use of naturalistic free play paradigms. No significant overall differences emerged between native and English play, indicating that bilingual dyads maintain inter-brain coupling across languages when both partners are proficient. Exploratory analyses suggested a trend toward higher child-directed synchrony in English play and age-related trends in mother-directed synchrony; however, these effects did not reach statistical significance.DiscussionOur findings show that bilingualism does not compromise mother–child neural synchrony, supporting the inclusion of linguistically diverse families in developmental neuroscience. They underscore the value of naturalistic paradigms and highlight the need for future research on language proficiency, partner familiarity, and behavioral correlates of synchrony. This work highlights the importance of studying bilingual families in ecologically valid contexts to better understand how language use influences neural coupling in early development.

Consciousness. Cognition

Detail DOI Sumber

arXiv Open Access 2026

Bridging the Knowledge Void: Inference-time Acquisition of Unfamiliar Programming Languages for Coding Tasks

Chen Shen, Wei Cheng, Jingyue Yang et al.

The proficiency of Large Language Models (LLMs) in coding tasks is often a reflection of their extensive pre-training corpora, which typically collapses when confronted with previously unfamiliar programming languages. Departing from data-intensive finetuning, we investigate the paradigm of Inference-time Language Acquisition (ILA), where an LLM masters an unfamiliar language through dynamic interaction with limited external resources. In this paper, we propose ILA-agent, a general ILA framework that equips LLMs with a set of behavioral primitives. By modeling essential human-like behaviors as a suite of tools, ILA-agent enables LLMs to incrementally explore, apply, and verify language knowledge through structured interactions with the official documentation and execution environment. To provide a rigorous evaluation in a low-resource setting, we construct Cangjie-bench, a multi-task benchmark based on the novel statically-typed language Cangjie. We instantiate ILA-agent for Cangjie and evaluate its performance across code generation, translation, and program repair tasks. Results using diverse LLMs demonstrate that ILA-agent significantly outperforms retrieval-augmented baselines. Further analysis of agent trajectories characterizes the emergent behavior patterns while highlighting persisting performance gaps.

en cs.CL, cs.AI

Detail Sumber

DOAJ Open Access 2025

A Neuropsychological Intervention Study of Intellectual Activity in an Adolescent with Autism Spectrum Disorder

Lenia E. Meza-Salcido , Omar E. Torrado-Duarte , Yulia Solovieva

Background. Autism Spectrum Disorder (ASD) is characterised by difficulties in social communication and restrictive, repetitive patterns of behaviour. ASD represents a spectrum of functional manifestations, involving varying levels of severity in intellectual activity and social interaction. Adolescence involves various physical and cognitive changes, as well as increased social demands. The historical-cultural neuropsychology framework provides an effective theoretical and methodological approach to establish the level of functional development of brain mechanisms and allows for the description of the individual manifestation of the disorder. Objective. This study aims to describe the neuropsychological evaluation and intervention process from a historical-cultural perspective in a case of ASD during adolescence. Study Participants. A 12-year-old, male student with delays in language acquisition and socialisation, as well as psychomotor challenges. These difficulties led to a diagnosis of Global Developmental Delay, ASD, and Attention Deficit Disorder by a child psychiatrist. Methods. A neuropsychological intervention programme was implemented based on the principles of the historical-cultural approach. The programme specifically focused on the development of the affective-emotional sphere, in particular socio-affective communication, and on improving the understanding of oral and written information. Results. Among the results obtained, there was an improvement in expressive and receptive language, with an increase in the use of gestures, facial expressions, and eye contact, fostering more effective communication. The adolescent also demonstrated a greater understanding of social and emotional situations, as well as progress in understanding abstract concepts and problem-solving. Conclusions. The results obtained support the efficacy of neuropsychological interventions based on the historical-cultural perspective to improve the quality of life for individuals with ASD.

Psychology

Detail DOI Sumber

DOAJ Open Access 2025

Autobiographical Migration Narratives as Catalysts of Identity Resilience

Teodor STAN

This implemented pilot study articulates a comprehensive framework for “research, action, and training” designed to enhance migrants’ resilience through interventions assisted by diaspora community organizations. Drawing from both social psychology and political science, this research synthesizes the existing literature on assisted resilience, placing particular emphasis on the creation of autobiographical narratives as tools for bolstering cultural identity and self-actualization during the migrant integration process. By employing autobiographic qualitative interviews framed within a family intergenerational dialogue this investigation interrogates cultural identity transformation and resilience mechanisms, delineating protective factors that facilitate migrant integration, with a specific focus on the Romanian American diaspora in Minnesota. The discussions elucidate themes of cultural shock, the interplay between assimilation and integration, language acquisition as a vehicle for cultural retention, and the multifaceted nature of belonging within host societies. Participants' reflections on the complexities of acculturation underscore how familial dialogues can shape perceptions of belonging and construct identity narratives that serve immediate contextual needs. The findings advocate for community-based historiography projects that leverage narrative methodologies to foster resilience, combat social marginalization, and enhance civic engagement. This article emphasizes the critical importance of culturally sensitive mechanisms in promoting narrative construction to strengthen familial bonds and establish supportive diaspora networks in increasingly polarized host societies.

Political science

Detail DOI Sumber

arXiv Open Access 2025

CausalVLBench: Benchmarking Visual Causal Reasoning in Large Vision-Language Models

Aneesh Komanduri, Karuna Bhaila, Xintao Wu

Large language models (LLMs) have shown remarkable ability in various language tasks, especially with their emergent in-context learning capability. Extending LLMs to incorporate visual inputs, large vision-language models (LVLMs) have shown impressive performance in tasks such as recognition and visual question answering (VQA). Despite increasing interest in the utility of LLMs in causal reasoning tasks such as causal discovery and counterfactual reasoning, there has been relatively little work showcasing the abilities of LVLMs on visual causal reasoning tasks. We take this opportunity to formally introduce a comprehensive causal reasoning benchmark for multi-modal in-context learning from LVLMs. Our CausalVLBench encompasses three representative tasks: causal structure inference, intervention target prediction, and counterfactual prediction. We evaluate the ability of state-of-the-art open-source LVLMs on our causal reasoning tasks across three causal representation learning datasets and demonstrate their fundamental strengths and weaknesses. We hope that our benchmark elucidates the drawbacks of existing vision-language models and motivates new directions and paradigms in improving the visual causal reasoning abilities of LVLMs.

en cs.LG, cs.AI

Detail Sumber

arXiv Open Access 2024

L3Cube-IndicNews: News-based Short Text and Long Document Classification Datasets in Indic Languages

Aishwarya Mirashi, Srushti Sonavane, Purva Lingayat et al.

In this work, we introduce L3Cube-IndicNews, a multilingual text classification corpus aimed at curating a high-quality dataset for Indian regional languages, with a specific focus on news headlines and articles. We have centered our work on 10 prominent Indic languages, including Hindi, Bengali, Marathi, Telugu, Tamil, Gujarati, Kannada, Odia, Malayalam, and Punjabi. Each of these news datasets comprises 10 or more classes of news articles. L3Cube-IndicNews offers 3 distinct datasets tailored to handle different document lengths that are classified as: Short Headlines Classification (SHC) dataset containing the news headline and news category, Long Document Classification (LDC) dataset containing the whole news article and the news category, and Long Paragraph Classification (LPC) containing sub-articles of the news and the news category. We maintain consistent labeling across all 3 datasets for in-depth length-based analysis. We evaluate each of these Indic language datasets using 4 different models including monolingual BERT, multilingual Indic Sentence BERT (IndicSBERT), and IndicBERT. This research contributes significantly to expanding the pool of available text classification datasets and also makes it possible to develop topic classification models for Indian regional languages. This also serves as an excellent resource for cross-lingual analysis owing to the high overlap of labels among languages. The datasets and models are shared publicly at https://github.com/l3cube-pune/indic-nlp

en cs.CL, cs.LG

Detail Sumber

arXiv Open Access 2024

Detecting Mode Collapse in Language Models via Narration

Sil Hamilton

No two authors write alike. Personal flourishes invoked in written narratives, from lexicon to rhetorical devices, imply a particular author--what literary theorists label the implied or virtual author; distinct from the real author or narrator of a text. Early large language models trained on unfiltered training sets drawn from a variety of discordant sources yielded incoherent personalities, problematic for conversational tasks but proving useful for sampling literature from multiple perspectives. Successes in alignment research in recent years have allowed researchers to impose subjectively consistent personae on language models via instruction tuning and reinforcement learning from human feedback (RLHF), but whether aligned models retain the ability to model an arbitrary virtual author has received little scrutiny. By studying 4,374 stories sampled from three OpenAI language models, we show successive versions of GPT-3 suffer from increasing degrees of "mode collapse" whereby overfitting the model during alignment constrains it from generalizing over authorship: models suffering from mode collapse become unable to assume a multiplicity of perspectives. Our method and results are significant for researchers seeking to employ language models in sociological simulations.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2024

Fotheidil: an Automatic Transcription System for the Irish Language

Liam Lonergan, Ibon Saratxaga, John Sloan et al.

This paper sets out the first web-based transcription system for the Irish language - Fotheidil, a system that utilises speech-related AI technologies as part of the ABAIR initiative. The system includes both off-the-shelf pre-trained voice activity detection and speaker diarisation models and models trained specifically for Irish automatic speech recognition and capitalisation and punctuation restoration. Semi-supervised learning is explored to improve the acoustic model of a modular TDNN-HMM ASR system, yielding substantial improvements for out-of-domain test sets and dialects that are underrepresented in the supervised training set. A novel approach to capitalisation and punctuation restoration involving sequence-to-sequence models is compared with the conventional approach using a classification model. Experimental results show here also substantial improvements in performance. The system will be made freely available for public use, and represents an important resource to researchers and others who transcribe Irish language materials. Human-corrected transcriptions will be collected and included in the training dataset as the system is used, which should lead to incremental improvements to the ASR model in a cyclical, community-driven fashion.

en cs.CL, cs.SD

Detail Sumber

arXiv Open Access 2024

Beyond Data Quantity: Key Factors Driving Performance in Multilingual Language Models

Sina Bagheri Nezhad, Ameeta Agrawal, Rhitabrat Pokharel

Multilingual language models (MLLMs) are crucial for handling text across various languages, yet they often show performance disparities due to differences in resource availability and linguistic characteristics. While the impact of pre-train data percentage and model size on performance is well-known, our study reveals additional critical factors that significantly influence MLLM effectiveness. Analyzing a wide range of features, including geographical, linguistic, and resource-related aspects, we focus on the SIB-200 dataset for classification and the Flores-200 dataset for machine translation, using regression models and SHAP values across 204 languages. Our findings identify token similarity and country similarity as pivotal factors, alongside pre-train data and model size, in enhancing model performance. Token similarity facilitates cross-lingual transfer, while country similarity highlights the importance of shared cultural and linguistic contexts. These insights offer valuable guidance for developing more equitable and effective multilingual language models, particularly for underrepresented languages.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2024

Soft Language Prompts for Language Transfer

Ivan Vykopal, Simon Ostermann, Marián Šimko

Cross-lingual knowledge transfer, especially between high- and low-resource languages, remains challenging in natural language processing (NLP). This study offers insights for improving cross-lingual NLP applications through the combination of parameter-efficient fine-tuning methods. We systematically explore strategies for enhancing cross-lingual transfer through the incorporation of language-specific and task-specific adapters and soft prompts. We present a detailed investigation of various combinations of these methods, exploring their efficiency across 16 languages, focusing on 10 mid- and low-resource languages. We further present to our knowledge the first use of soft prompts for language transfer, a technique we call soft language prompts. Our findings demonstrate that in contrast to claims of previous work, a combination of language and task adapters does not always work best; instead, combining a soft language prompt with a task adapter outperforms most configurations in many cases.

en cs.CL

Detail DOI Sumber

arXiv Open Access 2024

Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models

Wenbin An, Feng Tian, Jiahao Nie et al.

Knowledge-based Visual Question Answering (KVQA) requires both image and world knowledge to answer questions. Current methods first retrieve knowledge from the image and external knowledge base with the original complex question, then generate answers with Large Language Models (LLMs). However, since the original question contains complex elements that require knowledge from different sources, acquiring different kinds of knowledge in a coupled manner may confuse models and hinder them from retrieving precise knowledge. Furthermore, the ``forward-only'' answering process fails to explicitly capture the knowledge needs of LLMs, which can further hurt answering quality. To cope with the above limitations, we propose DKA: Disentangled Knowledge Acquisition from LLM feedback, a training-free framework that disentangles knowledge acquisition to avoid confusion and uses LLM's feedback to specify the required knowledge. Specifically, DKA requires LLMs to specify what knowledge they need to answer the question and decompose the original complex question into two simple sub-questions: Image-based sub-question and Knowledge-based sub-question. Then we use the two sub-questions to retrieve knowledge from the image and knowledge base, respectively. In this way, two knowledge acquisition models can focus on the content that corresponds to them and avoid disturbance of irrelevant elements in the original complex question, which can help to provide more precise knowledge and better align the knowledge needs of LLMs to yield correct answers. Experiments on benchmark datasets show that DKA significantly outperforms SOTA models. To facilitate future research, our data and code are available at \url{https://github.com/Lackel/DKA}.

en cs.CV, cs.CL

Detail Sumber

arXiv Open Access 2024

Unmasking the Shadows of AI: Investigating Deceptive Capabilities in Large Language Models

Linge Guo

This research critically navigates the intricate landscape of AI deception, concentrating on deceptive behaviours of Large Language Models (LLMs). My objective is to elucidate this issue, examine the discourse surrounding it, and subsequently delve into its categorization and ramifications. The essay initiates with an evaluation of the AI Safety Summit 2023 (ASS) and introduction of LLMs, emphasising multidimensional biases that underlie their deceptive behaviours.The literature review covers four types of deception categorised: Strategic deception, Imitation, Sycophancy, and Unfaithful Reasoning, along with the social implications and risks they entail. Lastly, I take an evaluative stance on various aspects related to navigating the persistent challenges of the deceptive AI. This encompasses considerations of international collaborative governance, the reconfigured engagement of individuals with AI, proposal of practical adjustments, and specific elements of digital education.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2024

A Legal Framework for Natural Language Processing Model Training in Portugal

Rúben Almeida, Evelin Amorim

Recent advances in deep learning have promoted the advent of many computational systems capable of performing intelligent actions that, until then, were restricted to the human intellect. In the particular case of human languages, these advances allowed the introduction of applications like ChatGPT that are capable of generating coherent text without being explicitly programmed to do so. Instead, these models use large volumes of textual data to learn meaningful representations of human languages. Associated with these advances, concerns about copyright and data privacy infringements caused by these applications have emerged. Despite these concerns, the pace at which new natural language processing applications continued to be developed largely outperformed the introduction of new regulations. Today, communication barriers between legal experts and computer scientists motivate many unintentional legal infringements during the development of such applications. In this paper, a multidisciplinary team intends to bridge this communication gap and promote more compliant Portuguese NLP research by presenting a series of everyday NLP use cases, while highlighting the Portuguese legislation that may arise during its development.

en cs.CL, cs.ET

Detail Sumber

DOAJ Open Access 2023

Examining a technology-focused language teacher community on Facebook during a crisis situation

Yurika Ito

Abstract Due to the chaos and confusion caused by the sudden transition from face-to-face teaching to online and remote teaching in early 2020, numerous language teachers had no choice but to rely on online communities on social networking sites. The current study therefore examined how some language teachers were utilising online communities on Facebook during the COVID-19 pandemic. Employing a mixed-methods approach, data were mainly collected through: (1) an eight-month observation of a technology-focused language teacher community on Facebook to identify different types of posts generated by its members before and during the COVID-19 pandemic (n = 340); (2) a questionnaire to understand the community members’ backgrounds and experiences of being in the community (n = 51); (3) semi-structured interviews with some of the questionnaire participants (n = 13); and (4) a post-interview questionnaire (n = 12) to get a better understanding of their responses. A content analysis of online posts and community members’ responses suggest that language teacher communities on Facebook were supporting teachers during the stressful periods of the pandemic professionally and emotionally. The main findings are discussed in terms of the benefits and drawbacks of using online language teacher communities for professional purposes. The overall goal of the study is to offer much-needed answers on how pre-existing communities can be used to assist language teachers in times of a crisis.

Special aspects of education, Language acquisition

Detail DOI Sumber

DOAJ Open Access 2023

Factors affecting the quality and effectiveness of student teachers during their practicum experiences: the case of some selected colleges in Oromia, Ethiopia

Hika Negash Galana, Adinew Tadesse Degago, Alemayehu Getachew Tsegaye et al.

Abstract In this study, an attempt was made to investigate the constraints encountered by student teachers during their practicum experiences at some selected colleges in Oromia, Ethiopia. Adopting a convergent mixed research design, a questionnaire was distributed to student teachers, and a semi-structured interview was conducted with supervisors and mentors. The data found from questionnaire were analyzed using a descriptive statistics, inferential statistics and One-Way ANOVA. Besides, the interview results were analyzed using content analysis method. In the findings, factors such as mentors’ lack of continuous follow up and support, interest to share experience, and friendliness were identified. In addition, follow up and support were not continuously provided by supervisors, and there was no coordination between supervisors and mentors. Further, Colleges engage large numbers of candidates to one school, allot many student teachers for one academic supervisor, opportunity given for practice was inadequate and there was lack of necessary facilities in the cooperating schools. Hence, it can be concluded that there were limitations from the side of mentors, supervisors, colleges and cooperating schools on playing their roles in teaching practice. Therefore, based on the findings of the study and the drawn conclusions; mentors and supervisors of the practicum should make continuous follow up and provide immediate feedback for their student teachers. In addition, they should collaborate while evaluating and equipping their student teachers with all necessary things. Besides, colleges should have good rapport with cooperating schools, try to fulfill necessary facilities, and strengthen to make them effectively produce qualified students. They should work on how to mitigate the number of student teachers with the number of supervisors and schools. Finally, cooperating schools should learn from spontaneous limitations and go further to fulfill their needs.

Special aspects of education, Language acquisition

Detail DOI Sumber

arXiv Open Access 2023

Indian Language Summarization using Pretrained Sequence-to-Sequence Models

Ashok Urlana, Sahil Manoj Bhatt, Nirmal Surange et al.

The ILSUM shared task focuses on text summarization for two major Indian languages- Hindi and Gujarati, along with English. In this task, we experiment with various pretrained sequence-to-sequence models to find out the best model for each of the languages. We present a detailed overview of the models and our approaches in this paper. We secure the first rank across all three sub-tasks (English, Hindi and Gujarati). This paper also extensively analyzes the impact of k-fold cross-validation while experimenting with limited data size, and we also perform various experiments with a combination of the original and a filtered version of the data to determine the efficacy of the pretrained models.

en cs.CL

Detail Sumber

DOAJ Open Access 2022

Is The Natural Order of Morpheme Acquisition Being Appropriately Presented In English Language Teaching Course Books?

David D. Perrodin, Narumon Somboon

This study sought to determine the sequence of L2 morpheme presentation, as well as to determine whether or not the sequence of morpheme presentations correspond with the recognized natural order of morpheme acquisition in English Language Teaching course books utilized with young adult learners at a public sector vocational education institution in Thailand. Qualitative analysis was employed in the scrutinizing of twelve beginner and elementary level ESL and EFL learners course books that have been utilized as the primary teaching material for over a decade by the general education department of the institute. This examination revealed that the morpheme presentation sequence within the selected ELT course books was not analogous with the conclusions in the supporting literature. The findings further indicated that the widely accepted viewpoint of natural order morpheme acquisition was likewise not substantially reflected within the analyzed texts. Albeit, earlier studies have found that an unnatural sequence of morpheme presentation in EFL course books may hamper communicative competence in English, further study is required to establish if this may be a contributing factor for the overall low English proficiency of adult L2 learners in Thailand.

Education (General), English language

Detail DOI Sumber

DOAJ Open Access 2022

Review of Crosslinguistic influence and second language learning by Kevin McManus

Lixia Zhu , Jinting Cai

As a prevalent phenomenon in second language acquisition (SLA), crosslinguistic influence (CLI) has attracted ever-lasting attention, as reflected by the publication of several monographs (e.g., Cai, 2021; Jarvis & Pavlenko, 2008; Odlin, 1989; Ringbom, 2007), many edited volumes (e.g., Alonso, 2016; Gass & Selinker, 1983), and numerous research articles. In these books and papers, mounting evidence for CLI has been accumulated in various areas of languages. In particular, CLI may occur between first language (L1) and second language (L2) in lexicon, grammar, phonology, discourse, and pragmatics, with its effects being both positive and negative. Besides, it has been shown that the occurrence of CLI is constrained by a variety of factors, such as linguistic and psycholinguistic factors and those related to learning environment and language use (Jarvis & Pavlenko, 2008). CLI has been addressed from diverse theoretical perspectives including universal grammar, functional linguistics, and psycholinguistics (see Cai, 2021 for a review).

Philology. Linguistics

Detail DOI Sumber

arXiv Open Access 2022

VLP: A Survey on Vision-Language Pre-training

Feilong Chen, Duzhen Zhang, Minglun Han et al.

In the past few years, the emergence of pre-training models has brought uni-modal fields such as computer vision (CV) and natural language processing (NLP) to a new era. Substantial works have shown they are beneficial for downstream uni-modal tasks and avoid training a new model from scratch. So can such pre-trained models be applied to multi-modal tasks? Researchers have explored this problem and made significant progress. This paper surveys recent advances and new frontiers in vision-language pre-training (VLP), including image-text and video-text pre-training. To give readers a better overall grasp of VLP, we first review its recent advances from five aspects: feature extraction, model architecture, pre-training objectives, pre-training datasets, and downstream tasks. Then, we summarize the specific VLP models in detail. Finally, we discuss the new frontiers in VLP. To the best of our knowledge, this is the first survey focused on VLP. We hope that this survey can shed light on future research in the VLP field.

en cs.CV, cs.CL

Detail DOI Sumber

arXiv Open Access 2022

UzbekStemmer: Development of a Rule-Based Stemming Algorithm for Uzbek Language

Maksud Sharipov, Ollabergan Yuldashov

In this paper we present a rule-based stemming algorithm for the Uzbek language. Uzbek is an agglutinative language, so many words are formed by adding suffixes, and the number of suffixes is also large. For this reason, it is difficult to find a stem of words. The methodology is proposed for doing the stemming of the Uzbek words with an affix stripping approach whereas not including any database of the normal word forms of the Uzbek language. Word affixes are classified into fifteen classes and designed as finite state machines (FSMs) for each class according to morphological rules. We created fifteen FSMs and linked them together to create the Basic FSM. A lexicon of affixes in XML format was created and a stemming application for Uzbek words has been developed based on the FSMs.

en cs.CL

Detail Sumber

Hasil untuk "Language acquisition"