Hasil untuk "Language acquisition"

Menampilkan 20 dari ~5491455 hasil · dari DOAJ, arXiv, Semantic Scholar, CrossRef

JSON API
DOAJ Open Access 2025
LE COLLOCAZIONI N-ADJ DELL’ITALIANO ACCADEMICO PARLATO. UN’INDAGINE CORPUS-BASED

Diana Peppoloni

Il contributo si concentra su un aspetto lessicale della lingua accademica italiana parlata di particolare rilievo per gli apprendenti stranieri, ovvero quello delle collocazioni. Difatti, la loro conoscenza e il corretto impiego di tali unità fraseologiche sono indice di un livello avanzato di competenza linguistico-comunicativa del parlante e della sua eventuale appartenenza a un certo gruppo sociale o professionale, come nel caso della comunicazione accademica. Le collocazioni costituiscono un ostacolo anche per apprendenti di livello avanzato, in virtù dell’impredicibilità della combinatoria degli elementi che le compongono, della loro arbitrarietà e convenzionalità. Nonostante il ruolo irrinunciabile delle collocazioni nel percorso acquisizionale, queste possono venire sottostimate nella classe di lingua, spesso a causa di una carenza di materiali e risorse specificatamente dedicati, da cui la necessità di questo studio. Questo descrive i parametri costitutivi e i dati del corpus ASIC (Academic Spoken Italian Corpus), insieme ai risultati di un metodo di estrazione automatica di collocazioni accademiche orali italiane N-ADJ, a partire dai dati di ASIC. L’estrazione automatica degli elementi collocazionali, basata su indici statistici non sembra sufficiente da sola a misurare il grado di associazione che lega i due elementi di una coppia di parole. Occorre effettuare un’analisi a posteriori, basata sull’esperienza e la competenza di parlanti nativi, abituati a comunicare in ambito accademico. Per individuare dunque le collocazioni accademiche realmente attestate, i dati sono stati ulteriormente valutati da un campione di parlanti nativi italiani, coinvolti in un esperimento di crowd sourcing.   Academic spoken italian N-ADJ collocations. A corpus-based study This paper focuses on a lexical aspect of academic spoken Italian language that is particularly relevant for foreign learners, namely collocations. In fact, knowledge of collocations and the correct use of such phraseological units are predictive of an advanced level of linguistic-communicative competence of the speaker and of his/her belonging to a certain social or professional group, as in the case of academic communication. Collocations are also a hurdle for advanced learners, by virtue of the unpredictability of the combination of their elements, of their arbitrariness and conventionality. Despite the essential role of collocations in the acquisition process, they seem to be marginalised in the language classroom, often due to a lack of specific materials and resources, hence the need for this study. This latter describes the constitutive parameters and data of the ASIC corpus (Academic Spoken Italian Corpus), together with the results of an automatic extraction method of oral Italian academic collocations N-ADJ, starting from the ASIC data. The automatic extraction of collocational elements, based on statistical indices, does not seem to be sufficient by itself to measure the degree of association that links the two elements of a word pair. A further a posteriori analysis was necessary for this purpose, based on the experience and expertise of native speakers, who were familiar with the academic communicative context. Therefore, to identify the academic collocations actually attested, the data were further evaluated by a sample of native Italian speakers, involved in a crowd sourcing experiment.

Language and Literature, Philology. Linguistics
DOAJ Open Access 2025
Reading, watching and gaming: exploring the relationships between extramural English activities and academic L2 English reading comprehension in a Swedish university admissions test context

Teodora Neagu, Hanna Eklöf, Inga Laukaityte et al.

This study explores the type and frequency of extramural English (EE) (e.g. reading, speaking and gaming) and relationships between EE activities and performance on a standardised English reading comprehension (ERC) test in a sample of young adults, with a particular focus on potential sex differences and online gaming as an EE activity. Participants consist of 6,079 test-takers of the Swedish Scholastic Aptitude Test (SweSAT), a test used for admission to higher education, which has ERC as one of the subtests and the only SweSAT subtest that assesses English proficiency. The type and frequency of engagement in EE activities were assessed through a self-report questionnaire. Correlation, ANOVA and multiple linear regression analyses were conducted to explore how EE activities influence ERC performance, as well as differences between males and females. Overall, the results show positive correlations and significant relationships between EE activities and ERC ability, especially reading, watching English content and moderate-to-high frequency of gaming. Sex differences in EE exposure and test performance were observed. In summary, results from this large-scale study corroborate previous findings that English language acquisition also takes place outside the formal context and that leisure-time activities may enhance ERC ability. Implications are discussed.

Education (General)
DOAJ Open Access 2025
Sign language recognition based on dual-channel star-attention convolutional neural network

Jing Qin, Mengjiao Wang

Abstract To enhance effective communication between individuals with hearing impairments and those without, numerous researchers have developed a variety of sign language recognition technologies. However, in practical applications, sign language recognition devices must balance portability, energy consumption, cost, and user comfort, while vision-based sign language recognition must confront the challenge of model stability. Addressing these challenges, this study proposes an economical and stable dual-channel star-attention convolutional neural network (SACNN) deep learning network model based on computer vision technology. The model employs a star attention mechanism to enhance gesture features while concurrently diminishing background features, thereby achieving the acquisition of gesture features. Testing on the “ASL Finger Spelling” dataset demonstrated that the model achieved a high accuracy rate of 99.81%. Experimental results indicate that, compared to existing technologies, the SACNN network model proposed in this study exhibits superior generalization performance. You can find our source codes at https://github.com/wang123c/Sign-Language-Recognition .

Medicine, Science
arXiv Open Access 2025
From Data to Knowledge: Evaluating How Efficiently Language Models Learn Facts

Daniel Christoph, Max Ploner, Patrick Haller et al.

Sample efficiency is a crucial property of language models with practical implications for training efficiency. In real-world text, information follows a long-tailed distribution. Yet, we expect models to learn and recall frequent and infrequent facts. Sample-efficient models are better equipped to handle this challenge of learning and retaining rare information without requiring excessive exposure. This study analyzes multiple models of varying architectures and sizes, all trained on the same pre-training data. By annotating relational facts with their frequencies in the training corpus, we examine how model performance varies with fact frequency. Our findings show that most models perform similarly on high-frequency facts but differ notably on low-frequency facts. This analysis provides new insights into the relationship between model architecture, size, and factual learning efficiency.

en cs.CL, cs.LG
arXiv Open Access 2025
A Survey of AIOps in the Era of Large Language Models

Lingzhe Zhang, Tong Jia, Mengxi Jia et al.

As large language models (LLMs) grow increasingly sophisticated and pervasive, their application to various Artificial Intelligence for IT Operations (AIOps) tasks has garnered significant attention. However, a comprehensive understanding of the impact, potential, and limitations of LLMs in AIOps remains in its infancy. To address this gap, we conducted a detailed survey of LLM4AIOps, focusing on how LLMs can optimize processes and improve outcomes in this domain. We analyzed 183 research papers published between January 2020 and December 2024 to answer four key research questions (RQs). In RQ1, we examine the diverse failure data sources utilized, including advanced LLM-based processing techniques for legacy data and the incorporation of new data sources enabled by LLMs. RQ2 explores the evolution of AIOps tasks, highlighting the emergence of novel tasks and the publication trends across these tasks. RQ3 investigates the various LLM-based methods applied to address AIOps challenges. Finally, RQ4 reviews evaluation methodologies tailored to assess LLM-integrated AIOps approaches. Based on our findings, we discuss the state-of-the-art advancements and trends, identify gaps in existing research, and propose promising directions for future exploration.

en cs.SE, cs.CL
arXiv Open Access 2025
Hybrid Dialogue State Tracking for Persian Chatbots: A Language Model-Based Approach

Samin Mahdipour Aghabagher, Saeedeh Momtazi

Dialogue State Tracking (DST) is an essential element of conversational AI with the objective of deeply understanding the conversation context and leading it toward answering user requests. Due to high demands for open-domain and multi-turn chatbots, the traditional rule-based DST is not efficient enough, since it cannot provide the required adaptability and coherence for human-like experiences in complex conversations. This study proposes a hybrid DST model that utilizes rule-based methods along with language models, including BERT for slot filling and intent detection, XGBoost for intent validation, GPT for DST, and online agents for real-time answer generation. This model is uniquely designed to be evaluated on a comprehensive Persian multi-turn dialogue dataset and demonstrated significantly improved accuracy and coherence over existing methods in Persian-based chatbots. The results demonstrate how effectively a hybrid approach may improve DST capabilities, paving the way for conversational AI systems that are more customized, adaptable, and human-like.

en cs.CL, cs.AI
DOAJ Open Access 2024
Analysis of Phonological Acquisition and Children's Polite Speech in The Pre-Operational Period (3-4 Years)

Die Bhakti Wardoyo Putro, Sudartomo Macaryus, Desy Rufaidah et al.

Acquisition in toddlerhood is undoubtedly different. This is influenced by various factors such as environment, age, stimulation, and the maturity of the speech apparatus. Toddler language acquisition can be seen from the pronunciation of the words used. Children still cannot pronounce /s/or/r/, but some toddlers are already mastering it. In addition, at this age, some toddlers can convey messages pragmatically. Therefore, this study aims to describe children's phonological acquisition in the pre-operational period and children's politeness speech acts in the pre-operational period. This research is a descriptive qualitative research with a Psychopragmatic approach. The data source is pre-operational children or 3-4 years old. The data in this study are words, phrases, clauses, and sentences. Data collection is done by using the observation method with a note-taking technique. This method is realized by using advanced techniques like free listening techniques. This study showed that language acquisition in pre-operational children is in the form of sound changes /l/ to /n/ and /y/ and/s/ to /c/. The acquisition of pragmatics in children focuses on the types of politeness speech acts and the strategies used. The benefit of this study is to provide readers with an understanding that the phonological mastery of children aged 3-4 years is different and unique. In addition, children can reasonably convey speech acts

Language and Literature
DOAJ Open Access 2024
The child’s “or” construction: it’s all about choice

Mira Ariel, Inbal Arnon, Nicole Katzir et al.

“Or” is associated, in Gricean approaches, with the readings Inclusive (“at least one, and possibly both, options are true”) and Exclusive (“exactly one option is true”). Empirical findings show adults favoring Exclusive readings; but for children, the literature yields puzzling results. Laboratory comprehension tasks suggest children favor Inclusive, but naturalistic evidence suggests children’s “or” productions are overwhelmingly Exclusive. We first identify problems with previous research. Methodologically, asking children to provide truth judgements (the dominant experimental task) is not a child-friendly task. And theoretically, Inclusive and Exclusive are not optimal categories for classifying “or” readings. To resolve the comprehension-production puzzle, we adopt Ariel and Mauri’s richer analytic classification of “or” constructions, where Inclusive and Exclusive as such are not speaker-intended readings, and there are several, rather than one, “single-option” (Exclusive) readings. We apply this framework in analyzing the Berman corpus of Hebrew child language; and in designing a new, more ecologically valid, experimental task. Study 1 shows that in child-directed-speech, one specific Exclusive “or” construction, Choice Immediate (e.g., ↗Chocolates? Or ↘jelly beans?), is (i) the single dominant “or” function addressed to children, (ii) the one “or” reading children consistently respond appropriately to, and (iii) virtually the only “or” construction children produce. In Study 2, we present young children with a task involving this familiar “or” construction. The children respond with adult-like mastery even in the absence of a supporting context. These empirical findings argue for a usage-based account of how children acquire “or”.

Communication. Mass media
DOAJ Open Access 2024
The The impacts of blended corpus-based instruction on enhancing writing proficiency of Thai university students

Patcharin Satchayad, Sasima Charubusp

Collocational competence is gaining more attention in the field of second language acquisition as it indicates a higher level of target language proficiency. However, Thai university students have been reported as obtaining a low degree of collocational competence which affects productive second language writing (Boonyarattanasoonthorn et al., 2020). The current study aimed to examine the impacts of blended corpus-based instruction (BCBI) on enhancing Thai university students’ writing proficiency and to examine the students’ attitudes towards the implementation of BCBI. A mixed-methods one-group experimental study was conducted with 43 first-year English major students at an autonomous university in Thailand for 8 weeks. The data were gathered utilizing a pre-test, a post-test, an attitude questionnaire, and a semi-structured interview. The results revealed significant differences between the scores before and after the implementation of BCBI at 0.05. Despite there being no drastic change in their collocational competence, the students were highly positive towards the instruction, in particular the use of corpus-based data to assist their writing as well as the awareness of gaining collocational competence. To effectively integrate corpus-based instruction into a writing pedagogy, tasks should be manageable and achievable by the students.

Education, English language
DOAJ Open Access 2024
Cognitive Strategies of Second Language Vocabulary Inferencing: An Eye Tracking Study

Anna Izmalkova, Irina Blinnikova

In contextual vocabulary acquisition, second language learners address both morphological and contextual cues to infer the meanings of novel words. The study investigated cognitive strategies and their eye movement correlates in contextual vocabulary inferencing task. A sample of 30 healthy participants with normal or corrected to normal vision read an Intermediate-level text in English, aiming to define the meaning of 10 low-frequency words embedded in the text. Based on both unstructured and structured post-hoc reports, we attributed the inferencing process either to word-level processing, context-level processing, or discourse-level processing strategy. Eye movements were recorded on the target word and on the target word clause areas of interest. Inferencing success was associated with increased first fixation duration on the target word both for word- and context-level processing strategy, and increased dwell time on the target word clause for context-level processing strategy. Our findings emphasize the role of both morphological and contextual processing in vocabulary inferencing task.

DOAJ Open Access 2024
Challenging but Rewarding Aspects of Telecollaboration: The Case of Virtual Israeli-Polish (VIP) Project

Sabina Nowak, Aleksandra Rążewska

Starting from the premise that implementing telecollaboration can be a challenging enterprise, it is assumed that such virtual exchanges bring positive outcomes. Despite rich body of research on telecollaboration, few studies to date have explored group dynamics in the context of online exchange. The current research examined an Israeli-Polish (N = 100) telecollaborative intercultural experience. The Israeli participants were students from a BEd program in informal education and Polish students from a BA program in translation. The main purpose of the telecollaborative project was to develop English linguistic skills and intercultural communicative competence (Byram, 1997) of the students. However, the present study aimed to investigate how the group dynamics influenced the telecollaboration process and the project outcomes. The students engaged in the exchange of asynchronous video recordings (Vlogs), collaborative synchronous meetings on Zoom, and completed an on-line questionnaire. This paper outlines some of the challenges related to the international telecollaboration, including technological and institutional difficulties, context-specific demands, and some individual differences of the participants. Through “thematic analysis” (TA) (Braun & Clarke, 2012) the authors of the paper explored levels of “failed communication” (O’Dowd & Ritter, 2006) and how they impacted the ongoing development of the collaboration both technically and in terms of content and development of interaction skills to uncover certain themes that referred to learners’ and instructors’ challenges. The study concludes with pedagogical implications for more effective implementation of telecollaboration in higher education.

Theory and practice of education
arXiv Open Access 2024
Methods of Automatic Matrix Language Determination for Code-Switched Speech

Olga Iakovenko, Thomas Hain

Code-switching (CS) is the process of speakers interchanging between two or more languages which in the modern world becomes increasingly common. In order to better describe CS speech the Matrix Language Frame (MLF) theory introduces the concept of a Matrix Language, which is the language that provides the grammatical structure for a CS utterance. In this work the MLF theory was used to develop systems for Matrix Language Identity (MLID) determination. The MLID of English/Mandarin and English/Spanish CS text and speech was compared to acoustic language identity (LID), which is a typical way to identify a language in monolingual utterances. MLID predictors from audio show higher correlation with the textual principles than LID in all cases while also outperforming LID in an MLID recognition task based on F1 macro (60%) and correlation score (0.38). This novel approach has identified that non-English languages (Mandarin and Spanish) are preferred over the English language as the ML contrary to the monolingual choice of LID.

arXiv Open Access 2024
Enabling ASR for Low-Resource Languages: A Comprehensive Dataset Creation Approach

Ara Yeroyan, Nikolay Karpov

In recent years, automatic speech recognition (ASR) systems have significantly improved, especially in languages with a vast amount of transcribed speech data. However, ASR systems tend to perform poorly for low-resource languages with fewer resources, such as minority and regional languages. This study introduces a novel pipeline designed to generate ASR training datasets from audiobooks, which typically feature a single transcript associated with hours-long audios. The common structure of these audiobooks poses a unique challenge due to the extensive length of audio segments, whereas optimal ASR training requires segments ranging from 4 to 15 seconds. To address this, we propose a method for effectively aligning audio with its corresponding text and segmenting it into lengths suitable for ASR training. Our approach simplifies data preparation for ASR systems in low-resource languages and demonstrates its application through a case study involving the Armenian language. Our method, which is "portable" to many low-resource languages, not only mitigates the issue of data scarcity but also enhances the performance of ASR models for underrepresented languages.

en cs.CL, cs.LG
arXiv Open Access 2024
A Multilingual Sentiment Lexicon for Low-Resource Language Translation using Large Languages Models and Explainable AI

Melusi Malinga, Isaac Lupanda, Mike Wa Nkongolo et al.

South Africa and the Democratic Republic of Congo (DRC) present a complex linguistic landscape with languages such as Zulu, Sepedi, Afrikaans, French, English, and Tshiluba (Ciluba), which creates unique challenges for AI-driven translation and sentiment analysis systems due to a lack of accurately labeled data. This study seeks to address these challenges by developing a multilingual lexicon designed for French and Tshiluba, now expanded to include translations in English, Afrikaans, Sepedi, and Zulu. The lexicon enhances cultural relevance in sentiment classification by integrating language-specific sentiment scores. A comprehensive testing corpus is created to support translation and sentiment analysis tasks, with machine learning models such as Random Forest, Support Vector Machine (SVM), Decision Trees, and Gaussian Naive Bayes (GNB) trained to predict sentiment across low resource languages (LRLs). Among them, the Random Forest model performed particularly well, capturing sentiment polarity and handling language-specific nuances effectively. Furthermore, Bidirectional Encoder Representations from Transformers (BERT), a Large Language Model (LLM), is applied to predict context-based sentiment with high accuracy, achieving 99% accuracy and 98% precision, outperforming other models. The BERT predictions were clarified using Explainable AI (XAI), improving transparency and fostering confidence in sentiment classification. Overall, findings demonstrate that the proposed lexicon and machine learning models significantly enhance translation and sentiment analysis for LRLs in South Africa and the DRC, laying a foundation for future AI models that support underrepresented languages, with applications across education, governance, and business in multilingual contexts.

en cs.CL, cs.AI
DOAJ Open Access 2023
Psycholinguistic Studies: Acquisition of Indonesian Phonology in Down Syndrome Children

Lara Desma Sinaga, Gustianingsih, T. Thyrhaya Zein

Language acquisition is a process that takes place in a child's brain when he acquires his first language or mother tongue which means language acquisition in relation to his first language. This study aims to analyze the acquisition of Indonesian phonology in Down syndrome children. This study was conducted on 5 Down syndrome, age range of 6-9 years.  This study use Blumstein's theory of speech changes is used to determine the change of speech sounds and Jacobson's theory is used to determine the acquisition of phonology Indonesian and Jean Piaget's theory is used to describe the language development of them. The method used is proficient and record. It was found that the five respondents in pronouncing several words there was a change in the sound of speech, namely sound Deletion (Omision), sound addition (Addition), and sound replacement (Substitution). They have also acquired some of the Indonesian vowel sounds [i], [ u], [e], [o] and [a]. They can produce several vowels sounds either at the beginning, middle or end words. There are some consonants that have been acquired Indonesian and there are some consonant sounds that have not been obtained, [w], [f]. [z], and [h]. The phonological development is the stage of holoprastic. Their chronological age is 6 to 9 years, but the age of language acquisition is equivalent to the age of two years.

Special aspects of education
arXiv Open Access 2023
On Bilingual Lexicon Induction with Large Language Models

Yaoyiran Li, Anna Korhonen, Ivan Vulić

Bilingual Lexicon Induction (BLI) is a core task in multilingual NLP that still, to a large extent, relies on calculating cross-lingual word representations. Inspired by the global paradigm shift in NLP towards Large Language Models (LLMs), we examine the potential of the latest generation of LLMs for the development of bilingual lexicons. We ask the following research question: Is it possible to prompt and fine-tune multilingual LLMs (mLLMs) for BLI, and how does this approach compare against and complement current BLI approaches? To this end, we systematically study 1) zero-shot prompting for unsupervised BLI and 2) few-shot in-context prompting with a set of seed translation pairs, both without any LLM fine-tuning, as well as 3) standard BLI-oriented fine-tuning of smaller LLMs. We experiment with 18 open-source text-to-text mLLMs of different sizes (from 0.3B to 13B parameters) on two standard BLI benchmarks covering a range of typologically diverse languages. Our work is the first to demonstrate strong BLI capabilities of text-to-text mLLMs. The results reveal that few-shot prompting with in-context examples from nearest neighbours achieves the best performance, establishing new state-of-the-art BLI scores for many language pairs. We also conduct a series of in-depth analyses and ablation studies, providing more insights on BLI with (m)LLMs, also along with their limitations.

en cs.CL, cs.AI
arXiv Open Access 2023
Measuring Misogyny in Natural Language Generation: Preliminary Results from a Case Study on two Reddit Communities

Aaron J. Snoswell, Lucinda Nelson, Hao Xue et al.

Generic `toxicity' classifiers continue to be used for evaluating the potential for harm in natural language generation, despite mounting evidence of their shortcomings. We consider the challenge of measuring misogyny in natural language generation, and argue that generic `toxicity' classifiers are inadequate for this task. We use data from two well-characterised `Incel' communities on Reddit that differ primarily in their degrees of misogyny to construct a pair of training corpora which we use to fine-tune two language models. We show that an open source `toxicity' classifier is unable to distinguish meaningfully between generations from these models. We contrast this with a misogyny-specific lexicon recently proposed by feminist subject-matter experts, demonstrating that, despite the limitations of simple lexicon-based approaches, this shows promise as a benchmark to evaluate language models for misogyny, and that it is sensitive enough to reveal the known differences in these Reddit communities. Our preliminary findings highlight the limitations of a generic approach to evaluating harms, and further emphasise the need for careful benchmark design and selection in natural language evaluation.

en cs.CL, cs.CY
arXiv Open Access 2023
Learning Language-guided Adaptive Hyper-modality Representation for Multimodal Sentiment Analysis

Haoyu Zhang, Yu Wang, Guanghao Yin et al.

Though Multimodal Sentiment Analysis (MSA) proves effective by utilizing rich information from multiple sources (e.g., language, video, and audio), the potential sentiment-irrelevant and conflicting information across modalities may hinder the performance from being further improved. To alleviate this, we present Adaptive Language-guided Multimodal Transformer (ALMT), which incorporates an Adaptive Hyper-modality Learning (AHL) module to learn an irrelevance/conflict-suppressing representation from visual and audio features under the guidance of language features at different scales. With the obtained hyper-modality representation, the model can obtain a complementary and joint representation through multimodal fusion for effective MSA. In practice, ALMT achieves state-of-the-art performance on several popular datasets (e.g., MOSI, MOSEI and CH-SIMS) and an abundance of ablation demonstrates the validity and necessity of our irrelevance/conflict suppression mechanism.

en cs.AI, cs.CL
arXiv Open Access 2023
AutoHall: Automated Factuality Hallucination Dataset Generation for Large Language Models

Zouying Cao, Yifei Yang, XiaoJing Li et al.

Large language models (LLMs) have gained broad applications across various domains but still struggle with hallucinations. Currently, hallucinations occur frequently in the generation of factual content and pose a great challenge to trustworthy LLMs. However, hallucination detection is hindered by the laborious and expensive manual annotation of hallucinatory content. Meanwhile, as different LLMs exhibit distinct types and rates of hallucination, the collection of hallucination datasets is inherently model-specific, which also increases the cost. To address this issue, this paper proposes a method called $\textbf{AutoHall}$ for $\underline{Auto}$matically constructing model-specific $\underline{Hall}$ucination datasets based on existing fact-checking datasets. The empirical results reveal variations in hallucination proportions and types among different models. Moreover, we introduce a zero-resource and black-box hallucination detection method based on self-contradiction to recognize the hallucination in our constructed dataset, achieving superior detection performance compared to baselines. Further analysis on our dataset provides insight into factors that may contribute to LLM hallucinations. Our codes and datasets are publicly available at https://github.com/zouyingcao/AutoHall.

en cs.CL

Halaman 29 dari 274573