Hasil untuk "Language acquisition"

Menampilkan 20 dari ~5483184 hasil · dari DOAJ, CrossRef, arXiv, Semantic Scholar

JSON API
DOAJ Open Access 2026
Family language policy and translanguaging practices as support for language development in early childhood students with autism spectrum disorder in Indonesia

Annisa Rahmadani

Abstract This study investigates the intersection between family language policy (FLP) and translanguaging strategies in the language development of early childhood students with autism spectrum disorder (ASD) in Indonesia. By focusing on both home and school environments, the research explores how parents’ beliefs and practices influence family language policies and how their alignment with translanguaging strategies is employed in educational settings. Using a qualitative case study approach, data were gathered through interviews with parents, the classroom teacher, and observations of both family interactions and classroom practices. Findings suggest that parents’ language ideologies and daily practices significantly shape the children’s bilingual development and complement the translanguaging strategies used in the classroom. Furthermore, a strong alignment between FLP and school-based strategies appeared to be associated with better engagement, vocabulary acquisition, and social communication in ASD students. This research highlights the importance of collaborative approaches between families and educators in creating inclusive multilingual learning environments. The study contributes to the growing discourse on FLP and translanguaging by offering practical insights for educators, policymakers, and families supporting ASD learners in multilingual contexts.

DOAJ Open Access 2025
Assessing the adherence of large language models to clinical practice guidelines in Chinese medicine: a content analysis

Weilong Zhao, Honghao Lai, Bei Pan et al.

ObjectiveWhether large language models (LLMs) can effectively facilitate CM knowledge acquisition remains uncertain. This study aims to assess the adherence of LLMs to Clinical Practice Guidelines (CPGs) in CM.MethodsThis cross-sectional study randomly selected ten CPGs in CM and constructed 150 questions across three categories: medication based on differential diagnosis (MDD), specific prescription consultation (SPC), and CM theory analysis (CTA). Eight LLMs (GPT-4o, Claude-3.5 Sonnet, Moonshot-v1, ChatGLM-4, DeepSeek-v3, DeepSeek-r1, Claude-4 sonnet, and Claude-4 sonnet thinking) were evaluated using both English and Chinese queries. The main evaluation metrics included accuracy, readability, and use of safety disclaimers.ResultsOverall, DeepSeek-v3 and DeepSeek-r1 demonstrated superior performance in both English (median 5.00, interquartile range (IQR) 4.00–5.00 vs. median 5.00, IQR 3.70–5.00) and Chinese (both median 5.00, IQR 4.30–5.00), significantly outperforming all other models. All models achieved significantly higher accuracy in Chinese versus English responses (all p < 0.05). Significant variations in accuracy were observed across the categories of questions, with MDD and SPC questions presenting more challenges than CTA questions. English responses had lower readability (mean flesch reading ease score 32.7) compared to Chinese responses. Moonshot-v1 provided the highest rate of safety disclaimers (98.7% English, 100% Chinese).ConclusionLLMs showed varying degrees of potential for acquiring CM knowledge. The performance of DeepSeek-v3 and DeepSeek-r1 is satisfactory. Optimizing LLMs to become effective tools for disseminating CM information is an important direction for future development.

Therapeutics. Pharmacology
arXiv Open Access 2025
Large Language Models and Arabic Content: A Review

Haneh Rhel, Dmitri Roussinov

Over the past three years, the rapid advancement of Large Language Models (LLMs) has had a profound impact on multiple areas of Artificial Intelligence (AI), particularly in Natural Language Processing (NLP) across diverse languages, including Arabic. Although Arabic is considered one of the most widely spoken languages across 27 countries in the Arabic world and used as a second language in some other non-Arabic countries as well, there is still a scarcity of Arabic resources, datasets, and tools. Arabic NLP tasks face various challenges due to the complexities of the Arabic language, including its rich morphology, intricate structure, and diverse writing standards, among other factors. Researchers have been actively addressing these challenges, demonstrating that pre-trained Large Language Models (LLMs) trained on multilingual corpora achieve significant success in various Arabic NLP tasks. This study provides an overview of using large language models (LLMs) for the Arabic language, highlighting early pre-trained Arabic Language models across various NLP applications and their ability to handle diverse Arabic content tasks and dialects. It also provides an overview of how techniques like finetuning and prompt engineering can enhance the performance of these models. Additionally, the study summarizes common Arabic benchmarks and datasets while presenting our observations on the persistent upward trend in the adoption of LLMs.

en cs.CL, cs.AI
DOAJ Open Access 2024
The Impact of Short-Form Content Tiktok on English Language Learning Development Among Generation Z: A Case Study of Students at Institut Elkatarie

Bahya Alfitri

This study investigates the impact of TikTok’s short-form content on the development of English language skills among students at Institut Elkatarie. Employing a quantitative approach with a quasi-experimental design, the research compares two groups of students: an experimental group using TikTok as a supplementary learning tool and a control group following traditional classroom methods. The findings reveal that the experimental group showed significant improvement in listening comprehension, pronunciation, and vocabulary acquisition, with a 22.9% increase in their post-test scores compared to only an 8.2% improvement in the control group. Data were collected through pre-and post-tests, a TikTok usage questionnaire, and in-depth interviews. The study confirms that TikTok’s engaging, authentic content motivated students, aligning with constructivist learning theory and intrinsic motivation theory, which emphasize contextual and enjoyable learning experiences. The results suggest that TikTok can be a valuable tool in enhancing language learning, particularly in areas where traditional methods may fall short, such as informal language use and listening skills. This research contributes to the growing body of knowledge on digital platforms in education, offering insights into how social media can be effectively integrated into language learning practices.

English language, English literature
arXiv Open Access 2024
PPTC-R benchmark: Towards Evaluating the Robustness of Large Language Models for PowerPoint Task Completion

Zekai Zhang, Yiduo Guo, Yaobo Liang et al.

The growing dependence on Large Language Models (LLMs) for finishing user instructions necessitates a comprehensive understanding of their robustness to complex task completion in real-world situations. To address this critical need, we propose the PowerPoint Task Completion Robustness benchmark (PPTC-R) to measure LLMs' robustness to the user PPT task instruction and software version. Specifically, we construct adversarial user instructions by attacking user instructions at sentence, semantic, and multi-language levels. To assess the robustness of Language Models to software versions, we vary the number of provided APIs to simulate both the newest version and earlier version settings. Subsequently, we test 3 closed-source and 4 open-source LLMs using a benchmark that incorporates these robustness settings, aiming to evaluate how deviations impact LLMs' API calls for task completion. We find that GPT-4 exhibits the highest performance and strong robustness in our benchmark, particularly in the version update and the multilingual settings. However, we find that all LLMs lose their robustness when confronted with multiple challenges (e.g., multi-turn) simultaneously, leading to significant performance drops. We further analyze the robustness behavior and error reasons of LLMs in our benchmark, which provide valuable insights for researchers to understand the LLM's robustness in task completion and develop more robust LLMs and agents. We release the code and data at \url{https://github.com/ZekaiGalaxy/PPTCR}.

en cs.CL
arXiv Open Access 2024
A Novel Metric for Measuring the Robustness of Large Language Models in Non-adversarial Scenarios

Samuel Ackerman, Ella Rabinovich, Eitan Farchi et al.

We evaluate the robustness of several large language models on multiple datasets. Robustness here refers to the relative insensitivity of the model's answers to meaning-preserving variants of their input. Benchmark datasets are constructed by introducing naturally-occurring, non-malicious perturbations, or by generating semantically equivalent paraphrases of input questions or statements. We further propose a novel metric for assessing a model robustness, and demonstrate its benefits in the non-adversarial scenario by empirical evaluation of several models on the created datasets.

en cs.CL, stat.AP
arXiv Open Access 2024
Training Data for Large Language Model

Yiming Ju, Huanhuan Ma

In 2022, with the release of ChatGPT, large-scale language models gained widespread attention. ChatGPT not only surpassed previous models in terms of parameters and the scale of its pretraining corpus but also achieved revolutionary performance improvements through fine-tuning on a vast amount of high-quality, human-annotated data. This progress has led enterprises and research institutions to recognize that building smarter and more powerful models relies on rich and high-quality datasets. Consequently, the construction and optimization of datasets have become a critical focus in the field of artificial intelligence. This paper summarizes the current state of pretraining and fine-tuning data for training large-scale language models, covering aspects such as data scale, collection methods, data types and characteristics, processing workflows, and provides an overview of available open-source datasets.

en cs.AI
DOAJ Open Access 2023
One size fits all? The role of task complexity in L2 production via the audio chat

Li Qian, Sarimah Shamsudin

Abstract The pervasive use of information and computer technology in second or foreign language learning has led researchers to explore the ideal tasks for technological environments to facilitate second language (L2) learning. This study intended to contribute new knowledge to this area by examining the effects of the task complexity manipulated along the variable +-few elements in Robinson’s Cognition Hypothesis on L2 production of 42 lower intermediate Chinese EFL (English as a Foreign Language) learners who completed two interactive tasks (simple versus complex) in dyads via audio chat of the video-conferencing platform WeMeet in a laboratory setting. Participants were also instructed to measure the difficulty of the tasks by responding to a self-rating questionnaire immediately after they completed each task. Their L2 output in the two tasks were recorded, transcribed and coded in three dimensions namely, syntactic complexity, lexical complexity and accuracy. SPSS 26 was used for statistical analyses. The results revealed that increasing task complexity induced significantly more lexically complex language. However, it did not result in significant changes in terms of syntactic complexity or accuracy of learners’ L2 output via audio chat. These results contradicted the predictions of the Cognition Hypothesis, suggesting the inapplicability of Cognition Hypothesis in audio chat.

Special aspects of education, Language acquisition
DOAJ Open Access 2023
Covert Contrast in Acquiring Fricatives by Preschool Children: Evidence From Najdi Arabic

Eman Altoeriqi, Mohammad Aljutaily

Covert contrast is the statistically reliable distinction between target language phonemes produced in the process of language acquisition that is nevertheless not perceived by a native speaker of that language. This paper examines the acquisition of contrasts in four Najdi Arabic fricatives, /s/, /ʃ/, /θ/, and /ð/, and seeks to identify the most common substitutions in producing these sounds. Words were elicited from 25 preschool children (aged 3–5 years). The target words contained the studied fricative followed by either long /a:/ or short /a/ in the initial position. Praat software was used for acoustic analysis to extract the four acoustic cues: center of gravity, fricative noise duration, F1, and F2. Participants performed a word repetition task and a picture elicitation task. The results showed six cases of covert contrast, seven cases of no contrast, and 87 cases of accurate production (overt contrast). The results also revealed stopping, gliding, and affrication as substitutions in the manner of articulation, while replacement in place of articulation occurred in backing but most commonly in fronting (24 cases). The present study sought to determine implications for children’s linguistic performance, and, consequently, for language education and for planning treatment for children with speech disorders.

History of scholarship and learning. The humanities, Social Sciences
arXiv Open Access 2023
WizardLM: Empowering large pre-trained language models to follow complex instructions

Can Xu, Qingfeng Sun, Kai Zheng et al.

Training large language models (LLMs) with open-domain instruction following data brings colossal success. However, manually creating such instruction data is very time-consuming and labor-intensive. Moreover, humans may struggle to produce high-complexity instructions. In this paper, we show an avenue for creating large amounts of instruction data with varying levels of complexity using LLM instead of humans. Starting with an initial set of instructions, we use our proposed Evol-Instruct to rewrite them step by step into more complex instructions. Then, we mix all generated instruction data to fine-tune LLaMA. We call the resulting model WizardLM. Human evaluations on a complexity-balanced test bed and Vicuna's testset show that instructions from Evol-Instruct are superior to human-created ones. By analyzing the human evaluation results of the high complexity part, we demonstrate that outputs from our WizardLM are preferred to outputs from OpenAI ChatGPT. In GPT-4 automatic evaluation, WizardLM achieves more than 90\% capacity of ChatGPT on 17 out of 29 skills. Even though WizardLM still lags behind ChatGPT in some aspects, our findings suggest that fine-tuning with AI-evolved instructions is a promising direction for enhancing LLMs. Our code and data are public at https://github.com/nlpxucan/WizardLM

en cs.CL, cs.AI
arXiv Open Access 2023
NeCo@ALQAC 2023: Legal Domain Knowledge Acquisition for Low-Resource Languages through Data Enrichment

Hai-Long Nguyen, Dieu-Quynh Nguyen, Hoang-Trung Nguyen et al.

In recent years, natural language processing has gained significant popularity in various sectors, including the legal domain. This paper presents NeCo Team's solutions to the Vietnamese text processing tasks provided in the Automated Legal Question Answering Competition 2023 (ALQAC 2023), focusing on legal domain knowledge acquisition for low-resource languages through data enrichment. Our methods for the legal document retrieval task employ a combination of similarity ranking and deep learning models, while for the second task, which requires extracting an answer from a relevant legal article in response to a question, we propose a range of adaptive techniques to handle different question types. Our approaches achieve outstanding results on both tasks of the competition, demonstrating the potential benefits and effectiveness of question answering systems in the legal field, particularly for low-resource languages.

en cs.CL, cs.AI
DOAJ Open Access 2022
Jogos pedagógicos produzidos digitalmente para aprimorar a comunicação e a integração de crianças vulneráveis

Stefany Jardim da Silva, Ana Cristina Borba da Cunha, Ana Rachel Salgado et al.

A aquisição da língua materna e de línguas adicionais é extremamente relevante na formação geral do indivíduo, devido à indissociabilidade entre língua, cultura e sociedade. A inclusão de atividades lúdicas e jogos durante esse processo, facilita a aprendizagem, pois permite a correlação entre o abstrato e o concreto. Assim, jogos pedagógicos português-espanhol, fabricados digitalmente, foram produzidos e utilizados para auxiliar no processo de socialização e comunicação entre crianças venezuelanas e brasileiras acolhidas pela Aldeia Infantil SOS de Porto Alegre. Os jogos foram produzidos por manufatura subtrativa com corte a laser de placas de MDF (Medium Density Fiberboard) e decorados por gravação a laser de palavras e aplicação de adesivos coloridos. As placas de MDF foram escolhidas para a produção dos artefatos utilizados no presente trabalho por serem resistentes, de fácil aquisição, de baixo custo e por serem adequadas ao corte e a à gravação a laser. Formulários de coleta de opinião, encontros lúdicos com as crianças e diálogo com as responsáveis das casas-lares foram organizados para avaliação e acompanhamento do projeto. Sem revelar a motivação, as famílias venezuelanas optaram por não participar das atividades propostas nessa primeira etapa. Entre as crianças brasileiras, foi possível perceber interesse em relação aos jogos e melhora evidente nas suas habilidades de comunicação e socialização. Após um ano de uso, a durabilidade dos materiais foi considerada satisfatória. Alicerçados na ideia de compartilhamento do conhecimento, um dos pilares do movimento Maker, o portfólio de jogos elaborados (arquivos) e os protocolos (cookbooks) sobre como fabricá-los estão disponíveis, em meio digital, para que interessados de qualquer localidade possam fabricar e construir os jogos desenvolvidos durante o projeto. Palavras-chave: Aquisição da Linguagem; Ludicidade; Corte a Laser Digitally fabricated pedagogical games to improve communication and integration among vulnerable children Abstract: The acquisition of the native and additional languages is extremely relevant during individual development considering dialogical relationships involving language, culture, and society. The association of playful activities and educational games are often reported to help the improvement of the learning process. Thus, Portuguese-Spanish pedagogical games were produced and used as a supportive tool employed to contribute to socialization and communication progression among Venezuelan and Brazilian children sheltered by Aldeia Infantil SOS in Porto Alegre, Brazil. The games were digitally manufactured with laser cutting technology. MDF (Medium Density Fiberboard) boards are strong, readily available, most affordable, and suitable for laser cutting and engraving. Therefore, this material was chosen to make the artifacts of the present work. Words were engraved using the CNC machine or by applying stickers to newly cut pieces. Project evaluation survey, playful meetings with children and reunion with guardians were organized as part of our follow-up strategy. In the first stage of our activities, Venezuelan families have decided not to be involved in our initiatives without revealing their motive. However, among the Brazilian children it was possible to observe their interest in the interactions which lead to an evident improvement in their soft skills. After one year of use, the durability of the manufactured materials was considered satisfactory. Based on the idea of sharing knowledge, one of the pillars of the Maker Movement, the complete game portfolio and “How to Make It” protocols are available, in digital media, so anyone interested from any location could be able to manufacture the educational play sets developed during the present project. Keywords: Language Acquisition; Playfulness; Laser Cutting Technology

Education, Special aspects of education
arXiv Open Access 2022
Quantum Natural Language Generation on Near-Term Devices

Amin Karamlou, Marcel Pfaffhauser, James Wootton

The emergence of noisy medium-scale quantum devices has led to proof-of-concept applications for quantum computing in various domains. Examples include Natural Language Processing (NLP) where sentence classification experiments have been carried out, as well as procedural generation, where tasks such as geopolitical map creation, and image manipulation have been performed. We explore applications at the intersection of these two areas by designing a hybrid quantum-classical algorithm for sentence generation. Our algorithm is based on the well-known simulated annealing technique for combinatorial optimisation. An implementation is provided and used to demonstrate successful sentence generation on both simulated and real quantum hardware. A variant of our algorithm can also be used for music generation. This paper aims to be self-contained, introducing all the necessary background on NLP and quantum computing along the way.

en quant-ph, cs.CL
arXiv Open Access 2022
Towards Pragmatic Production Strategies for Natural Language Generation Tasks

Mario Giulianelli

This position paper proposes a conceptual framework for the design of Natural Language Generation (NLG) systems that follow efficient and effective production strategies in order to achieve complex communicative goals. In this general framework, efficiency is characterised as the parsimonious regulation of production and comprehension costs while effectiveness is measured with respect to task-oriented and contextually grounded communicative goals. We provide concrete suggestions for the estimation of goals, costs, and utility via modern statistical methods, demonstrating applications of our framework to the classic pragmatic task of visually grounded referential games and to abstractive text summarisation, two popular generation tasks with real-world applications. In sum, we advocate for the development of NLG systems that learn to make pragmatic production decisions from experience, by reasoning about goals, costs, and utility in a human-like way.

en cs.CL, cs.AI
arXiv Open Access 2022
Construction of English Resume Corpus and Test with Pre-trained Language Models

Chengguang Gan, Tatsunori Mori

Information extraction(IE) has always been one of the essential tasks of NLP. Moreover, one of the most critical application scenarios of information extraction is the information extraction of resumes. Constructed text is obtained by classifying each part of the resume. It is convenient to store these texts for later search and analysis. Furthermore, the constructed resume data can also be used in the AI resume screening system. Significantly reduce the labor cost of HR. This study aims to transform the information extraction task of resumes into a simple sentence classification task. Based on the English resume dataset produced by the prior study. The classification rules are improved to create a larger and more fine-grained classification dataset of resumes. This corpus is also used to test some current mainstream Pre-training language models (PLMs) performance.Furthermore, in order to explore the relationship between the number of training samples and the correctness rate of the resume dataset, we also performed comparison experiments with training sets of different train set sizes.The final multiple experimental results show that the resume dataset with improved annotation rules and increased sample size of the dataset improves the accuracy of the original resume dataset.

en cs.CL
arXiv Open Access 2022
Reinforcement Learning and Bandits for Speech and Language Processing: Tutorial, Review and Outlook

Baihan Lin

In recent years, reinforcement learning and bandits have transformed a wide range of real-world applications including healthcare, finance, recommendation systems, robotics, and last but not least, the speech and natural language processing. While most speech and language applications of reinforcement learning algorithms are centered around improving the training of deep neural networks with its flexible optimization properties, there are still many grounds to explore to utilize the benefits of reinforcement learning, such as its reward-driven adaptability, state representations, temporal structures and generalizability. In this survey, we present an overview of recent advancements of reinforcement learning and bandits, and discuss how they can be effectively employed to solve speech and natural language processing problems with models that are adaptive, interactive and scalable.

en cs.AI, cs.CL
arXiv Open Access 2022
Effectiveness of French Language Models on Abstractive Dialogue Summarization Task

Yongxin Zhou, François Portet, Fabien Ringeval

Pre-trained language models have established the state-of-the-art on various natural language processing tasks, including dialogue summarization, which allows the reader to quickly access key information from long conversations in meetings, interviews or phone calls. However, such dialogues are still difficult to handle with current models because the spontaneity of the language involves expressions that are rarely present in the corpora used for pre-training the language models. Moreover, the vast majority of the work accomplished in this field has been focused on English. In this work, we present a study on the summarization of spontaneous oral dialogues in French using several language specific pre-trained models: BARThez, and BelGPT-2, as well as multilingual pre-trained models: mBART, mBARThez, and mT5. Experiments were performed on the DECODA (Call Center) dialogue corpus whose task is to generate abstractive synopses from call center conversations between a caller and one or several agents depending on the situation. Results show that the BARThez models offer the best performance far above the previous state-of-the-art on DECODA. We further discuss the limits of such pre-trained models and the challenges that must be addressed for summarizing spontaneous dialogues.

en cs.CL, cs.AI

Halaman 20 dari 274160