{"results":[{"id":"doaj_10.1186/s12940-025-01173-8","title":"Impact of early life exposure to heat and cold on linguistic development in two-year-old children: findings from the ELFE cohort study","authors":[{"name":"Guillaume Barbalat"},{"name":"Ariane Guilbert"},{"name":"Lucie Adelaïde"},{"name":"Marie-Aline Charles"},{"name":"Ian Hough"},{"name":"Ludivine Launay"},{"name":"Itai Kloog"},{"name":"Johanna Lepeule"}],"abstract":"Abstract Background A number of negative developmental outcomes in response to extreme temperature have been documented. Yet, to our knowledge, environmental research has left the question of the effect of temperature on human neurodevelopment largely unexplored. Here, we aimed to investigate the effect of ambient temperature on linguistic development at the age of 2 years-old. Methods We used data from the prospective national French birth cohort ELFE (N = 12,163) and highly-resolved exposure models with daily temporal resolution and 200 m to 1 km spatial resolution. We investigated the effect of weekly averages of overall, daytime and night-time temperature in the prenatal (first 30 weeks of gestation) and postnatal (91 weeks after birth) period on vocabulary production scores from the MacArthur-Bates Communicative Development Inventories (MB-CDI) at 2 years-old. Exposure-response and lag-response relationships were modeled with confounder-adjusted distributed lag non-linear models. Results Scores at the MB-CDI decreased by 3.2% (relative risk (RR) 0.968, 95% confidence interval (CI): 0.939–0.998) following exposure to severe night-time heat of 15.6 °C (95th percentile) vs. 8.3 °C (median) throughout gestational weeks 14 to 19. In the postnatal period, scores at the MB-CDI decreased by 14.8% (RR 0.852; 95% CI: [0.756–0.96]) for severe overall heat of 21.9 °C (95th percentile) vs. 11.5 °C (median) throughout weeks 1 to 28. Consistent results were found for daytime and night-time heat. We observed positive effects of overall and night-time heat in the first few weeks of pregnancy. Night-time cold in the pre-natal period also resulted in improved scores at the MB-CDI. Adjusting our models for air pollutants (PM2.5, PM10 and NO2) tended to confirm these observations. Finally, there were no significant differences in temperature effects between boys and girls. Conclusion In this large cohort study, we showed a negative impact of hot temperatures during pregnancy and after birth on language acquisition. Positive associations observed in the first few weeks of pregnancy are likely the results of methodological artifacts. Positive associations with night-time cold during the prenatal period are likely truly protective, as colder temperatures may encourage staying indoors at a comfortable temperature. Policymakers should consider neurodevelopment impairments as a deleterious effect of climate change.","source":"DOAJ","year":2025,"language":"","subjects":["Industrial medicine. Industrial hygiene","Public aspects of medicine"],"doi":"10.1186/s12940-025-01173-8","url":"https://doi.org/10.1186/s12940-025-01173-8","is_open_access":true,"published_at":"","score":69},{"id":"doaj_10.31261/TAPSLA.17440","title":"Uncovering Procrastination in Language Teaching: Self-Efficacy, Anxiety, and Situational Influences","authors":[{"name":"Dino Dumančić"}],"abstract":"\nThe study employed a mixed-methods approach to investigate the relationship between English language teachers’ teaching efficacy, emotional experiences, and situation and task-related procrastination. It aimed to explore both self-reported teaching self-efficacy beliefs and the factors influencing language teachers’ procrastination behaviors and emotions during task delay. A total of 305 Croatian EFL teachers participated in this study. Descriptive, correlation, and directed content analyses were carried out. According to the findings, the Croatian language teachers viewed themselves as highly effective in the classroom and they also reported engaging in procrastination infrequently. When inquired about language proficiency-related anxiety, they admitted having experienced it sporadically. Those confident in utilizing instructional strategies and implementing classroom management strategies procrastinated less and reported lower anxiety levels. Qualitative analysis revealed that demotivating or fatiguing tasks, especially administrative and testing-related ones, instigated procrastination, among others. When procrastinating, the teachers reported primarily unpleasant emotions, such as anxiety, nervousness, frustration, and guilt.\n","source":"DOAJ","year":2025,"language":"","subjects":["Theory and practice of education"],"doi":"10.31261/TAPSLA.17440","url":"https://journals.us.edu.pl/index.php/TAPSLA/article/view/17440","is_open_access":true,"published_at":"","score":69},{"id":"doaj_10.29140/vli.v14n1.2161","title":"A comparison of flow state markers experienced across AR, game-based, and analog deliberate vocabulary study activities","authors":[{"name":"Adam Dabrowski"},{"name":"Ayako Yokogawa"}],"abstract":"Flow is described as a state in which people become so involved or engrossed in an activity that nothing else seems to matter (Csikszentmihalyi, 2009). This state of consciousness seems to occur when a person is involved in a task and seemingly unable to stop. Flow states are marked by (a) a perceived balance of skills and challenge, (b) opportunities for intense concentration, (c) clear task goals, (d) feedback that one is succeeding at the task, (e) a sense of control, (f) a lack of self-consciousness, and (g) the perception that time passes more quickly (Egbert, 2003). The Japanese Flow State Scale (JFSS) is an instrument which was created specifically to measure flow states experienced during deliberate vocabulary study and is a working component of the first author's Doctor of Philosophy research project, which focuses on the deliberate study of vocabulary with augmented reality (AR) and physical word cards. Analyses with mixed effects models indicated that statistically significant differences in markers of states of flow elicited with the JFSS of 179 L1 Japanese participants on the basis of four deliberate vocabulary study activities (AR, word card study, Quizlet live, and intensive reading) appear to exist.\n","source":"DOAJ","year":2025,"language":"","subjects":["Language acquisition"],"doi":"10.29140/vli.v14n1.2161","url":"https://www.castledown.com/journals/vli/article/view/2161","is_open_access":true,"published_at":"","score":69},{"id":"arxiv_2503.21676","title":"How do language models learn facts? Dynamics, curricula and hallucinations","authors":[{"name":"Nicolas Zucchet"},{"name":"Jörg Bornschein"},{"name":"Stephanie Chan"},{"name":"Andrew Lampinen"},{"name":"Razvan Pascanu"},{"name":"Soham De"}],"abstract":"Large language models accumulate vast knowledge during pre-training, yet the dynamics governing this acquisition remain poorly understood. This work investigates the learning dynamics of language models on a synthetic factual recall task, uncovering three key findings: First, language models learn in three phases, exhibiting a performance plateau before acquiring precise factual knowledge. Mechanistically, this plateau coincides with the formation of attention-based circuits that support recall. Second, the training data distribution significantly impacts learning dynamics, as imbalanced distributions lead to shorter plateaus. Finally, hallucinations emerge simultaneously with knowledge, and integrating new knowledge into the model through fine-tuning is challenging, as it quickly corrupts its existing parametric memories. Our results emphasize the importance of data distribution in knowledge acquisition and suggest novel data scheduling strategies to accelerate neural network training.","source":"arXiv","year":2025,"language":"en","subjects":["cs.CL","cs.LG"],"url":"https://arxiv.org/abs/2503.21676","pdf_url":"https://arxiv.org/pdf/2503.21676","is_open_access":true,"published_at":"2025-03-27T16:43:45Z","score":69},{"id":"arxiv_2512.08480","title":"Soft Inductive Bias Approach via Explicit Reasoning Perspectives in Inappropriate Utterance Detection Using Large Language Models","authors":[{"name":"Ju-Young Kim"},{"name":"Ji-Hong Park"},{"name":"Se-Yeon Lee"},{"name":"Sujin Park"},{"name":"Gun-Woo Kim"}],"abstract":"Recent incidents in certain online games and communities, where anonymity is guaranteed, show that unchecked inappropriate remarks frequently escalate into verbal abuse and even criminal behavior, raising significant social concerns. Consequently, there is a growing need for research on techniques that can detect inappropriate utterances within conversational texts to help build a safer communication environment. Although large-scale language models trained on Korean corpora and chain-of-thought reasoning have recently gained attention, research applying these approaches to inappropriate utterance detection remains limited. In this study, we propose a soft inductive bias approach that explicitly defines reasoning perspectives to guide the inference process, thereby promoting rational decision-making and preventing errors that may arise during reasoning. We fine-tune a Korean large language model using the proposed method and conduct both quantitative performance comparisons and qualitative evaluations across different training strategies. Experimental results show that the Kanana-1.5 model achieves an average accuracy of 87.0046, improving by approximately 3.89 percent over standard supervised learning. These findings indicate that the proposed method goes beyond simple knowledge imitation by large language models and enables more precise and consistent judgments through constrained reasoning perspectives, demonstrating its effectiveness for inappropriate utterance detection.","source":"arXiv","year":2025,"language":"en","subjects":["cs.CL"],"url":"https://arxiv.org/abs/2512.08480","pdf_url":"https://arxiv.org/pdf/2512.08480","is_open_access":true,"published_at":"2025-12-09T10:55:33Z","score":69},{"id":"arxiv_2409.02228","title":"Unforgettable Generalization in Language Models","authors":[{"name":"Eric Zhang"},{"name":"Leshem Chosen"},{"name":"Jacob Andreas"}],"abstract":"When language models (LMs) are trained to forget (or \"unlearn'') a skill, how precisely does their behavior change? We study the behavior of transformer LMs in which tasks have been forgotten via fine-tuning on randomized labels. Such LMs learn to generate near-random predictions for individual examples in the \"training'' set used for forgetting. Across tasks, however, LMs exhibit extreme variability in whether LM predictions change on examples outside the training set. In some tasks (like entailment classification), forgetting generalizes robustly, and causes models to produce uninformative predictions on new task instances; in other tasks (like physical commonsense reasoning and scientific question answering) forgetting affects only the training examples, and models continue to perform the \"forgotten'' task accurately even for examples very similar to those that appeared in the training set. Dataset difficulty is not predictive of whether a behavior can be forgotten; instead, generalization in forgetting is (weakly) predicted by the confidence of LMs' initial task predictions and the variability of LM representations of training data, with low confidence and low variability both associated with greater generalization. Perhaps most surprisingly, random-label forgetting appears to be somewhat insensitive to the contents of the training set: for example, models trained on science questions with random labels continue to answer other science questions accurately, but begin to produce random labels on entailment classification tasks. Finally, we show that even generalizable forgetting is shallow: linear probes trained on LMs' representations can still perform tasks reliably after forgetting. Our results highlight the difficulty and unpredictability of performing targeted skill removal from models via fine-tuning.","source":"arXiv","year":2024,"language":"en","subjects":["cs.LG","cs.CL"],"url":"https://arxiv.org/abs/2409.02228","pdf_url":"https://arxiv.org/pdf/2409.02228","is_open_access":true,"published_at":"2024-09-03T18:55:54Z","score":68},{"id":"arxiv_2403.17811","title":"Are Compressed Language Models Less Subgroup Robust?","authors":[{"name":"Leonidas Gee"},{"name":"Andrea Zugarini"},{"name":"Novi Quadrianto"}],"abstract":"To reduce the inference cost of large language models, model compression is increasingly used to create smaller scalable models. However, little is known about their robustness to minority subgroups defined by the labels and attributes of a dataset. In this paper, we investigate the effects of 18 different compression methods and settings on the subgroup robustness of BERT language models. We show that worst-group performance does not depend on model size alone, but also on the compression method used. Additionally, we find that model compression does not always worsen the performance on minority subgroups. Altogether, our analysis serves to further research into the subgroup robustness of model compression.","source":"arXiv","year":2024,"language":"en","subjects":["cs.LG","cs.CL"],"doi":"10.18653/v1/2023.emnlp-main.983","url":"https://arxiv.org/abs/2403.17811","pdf_url":"https://arxiv.org/pdf/2403.17811","is_open_access":true,"published_at":"2024-03-26T15:50:37Z","score":68},{"id":"arxiv_2402.15010","title":"How Important Is Tokenization in French Medical Masked Language Models?","authors":[{"name":"Yanis Labrak"},{"name":"Adrien Bazoge"},{"name":"Beatrice Daille"},{"name":"Mickael Rouvier"},{"name":"Richard Dufour"}],"abstract":"Subword tokenization has become the prevailing standard in the field of natural language processing (NLP) over recent years, primarily due to the widespread utilization of pre-trained language models. This shift began with Byte-Pair Encoding (BPE) and was later followed by the adoption of SentencePiece and WordPiece. While subword tokenization consistently outperforms character and word-level tokenization, the precise factors contributing to its success remain unclear. Key aspects such as the optimal segmentation granularity for diverse tasks and languages, the influence of data sources on tokenizers, and the role of morphological information in Indo-European languages remain insufficiently explored. This is particularly pertinent for biomedical terminology, characterized by specific rules governing morpheme combinations. Despite the agglutinative nature of biomedical terminology, existing language models do not explicitly incorporate this knowledge, leading to inconsistent tokenization strategies for common terms. In this paper, we seek to delve into the complexities of subword tokenization in French biomedical domain across a variety of NLP tasks and pinpoint areas where further enhancements can be made. We analyze classical tokenization algorithms, including BPE and SentencePiece, and introduce an original tokenization strategy that integrates morpheme-enriched word segmentation into existing tokenization methods.","source":"arXiv","year":2024,"language":"en","subjects":["cs.CL","cs.AI","cs.LG"],"url":"https://arxiv.org/abs/2402.15010","pdf_url":"https://arxiv.org/pdf/2402.15010","is_open_access":true,"published_at":"2024-02-22T23:11:08Z","score":68},{"id":"arxiv_2403.13369","title":"Clinical information extraction for Low-resource languages with Few-shot learning using Pre-trained language models and Prompting","authors":[{"name":"Phillip Richter-Pechanski"},{"name":"Philipp Wiesenbach"},{"name":"Dominic M. Schwab"},{"name":"Christina Kiriakou"},{"name":"Nicolas Geis"},{"name":"Christoph Dieterich"},{"name":"Anette Frank"}],"abstract":"Automatic extraction of medical information from clinical documents poses several challenges: high costs of required clinical expertise, limited interpretability of model predictions, restricted computational resources and privacy regulations. Recent advances in domain-adaptation and prompting methods showed promising results with minimal training data using lightweight masked language models, which are suited for well-established interpretability methods. We are first to present a systematic evaluation of these methods in a low-resource setting, by performing multi-class section classification on German doctor's letters. We conduct extensive class-wise evaluations supported by Shapley values, to validate the quality of our small training data set and to ensure the interpretability of model predictions. We demonstrate that a lightweight, domain-adapted pretrained model, prompted with just 20 shots, outperforms a traditional classification model by 30.5% accuracy. Our results serve as a process-oriented guideline for clinical information extraction projects working with low-resource.","source":"arXiv","year":2024,"language":"en","subjects":["cs.CL","cs.AI","cs.LG"],"doi":"10.1017/nlp.2024.52","url":"https://arxiv.org/abs/2403.13369","pdf_url":"https://arxiv.org/pdf/2403.13369","is_open_access":true,"published_at":"2024-03-20T08:01:33Z","score":68},{"id":"arxiv_2408.13040","title":"SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks","authors":[{"name":"Kai-Wei Chang"},{"name":"Haibin Wu"},{"name":"Yu-Kai Wang"},{"name":"Yuan-Kuei Wu"},{"name":"Hua Shen"},{"name":"Wei-Cheng Tseng"},{"name":"Iu-thing Kang"},{"name":"Shang-Wen Li"},{"name":"Hung-yi Lee"}],"abstract":"Prompting has become a practical method for utilizing pre-trained language models (LMs). This approach offers several advantages. It allows an LM to adapt to new tasks with minimal training and parameter updates, thus achieving efficiency in both storage and computation. Additionally, prompting modifies only the LM's inputs and harnesses the generative capabilities of language models to address various downstream tasks in a unified manner. This significantly reduces the need for human labor in designing task-specific models. These advantages become even more evident as the number of tasks served by the LM scales up. Motivated by the strengths of prompting, we are the first to explore the potential of prompting speech LMs in the domain of speech processing. Recently, there has been a growing interest in converting speech into discrete units for language modeling. Our pioneer research demonstrates that these quantized speech units are highly versatile within our unified prompting framework. Not only can they serve as class labels, but they also contain rich phonetic information that can be re-synthesized back into speech signals for speech generation tasks. Specifically, we reformulate speech processing tasks into speech-to-unit generation tasks. As a result, we can seamlessly integrate tasks such as speech classification, sequence generation, and speech generation within a single, unified prompting framework. The experiment results show that the prompting method can achieve competitive performance compared to the strong fine-tuning method based on self-supervised learning models with a similar number of trainable parameters. The prompting method also shows promising results in the few-shot setting. Moreover, with the advanced speech LMs coming into the stage, the proposed prompting framework attains great potential.","source":"arXiv","year":2024,"language":"en","subjects":["eess.AS","cs.AI","cs.CL","cs.LG"],"doi":"10.1109/TASLP.2024.3436618","url":"https://arxiv.org/abs/2408.13040","pdf_url":"https://arxiv.org/pdf/2408.13040","is_open_access":true,"published_at":"2024-08-23T13:00:10Z","score":68},{"id":"doaj_10.36088/fondatia.v7i1.2896","title":"Pengaruh Game Online terhadap Pemerolehan Bahasa Anak Sekolah Dasar","authors":[{"name":"Muhammad Luqman Asy'ary"},{"name":"Setia Rini"},{"name":"Erna Risfaula Kusumawati"}],"abstract":"The development of information technology is very fast, many benefits are obtained from the development of this technology. One of the benefits of technological developments is the existence of online Game entertainment facilities. This online Game can have an impact on the language acquisition of elementary school children. This study aims to describe the acquisition of language in grade 4 elementary school children aged 9-11 through semantic studies, namely the study of the meaning of a verb, adjective, or noun. The data source comes from the subject to be studied. This research method uses a qualitative approach. Data collection was carried out using five methods, namely observation, observation, notes, interviews, and questionnaires. The results of this study are that online Games can affect the language acquisition of children aged 9-11 years at SDIT Ibnu Mas'ud, Ambarawa District, marked by an increasingly widespread acquisition of words that are absorbed through online Games, such as verbs (victory, booyah, defeat, login, kill, mabar, drift, survival, by one, ngemoti, push), adjectives (perfect, hockey, noob, idiot, bot, asu, op, good at) and nouns (skin, turret, inventory, tournament) which they apply in everyday language","source":"DOAJ","year":2023,"language":"","subjects":["Education"],"doi":"10.36088/fondatia.v7i1.2896","url":"https://ejournal.stitpn.ac.id/index.php/fondatia/article/view/2896","is_open_access":true,"published_at":"","score":67},{"id":"arxiv_2307.14850","title":"Turkish Native Language Identification V2","authors":[{"name":"Ahmet Yavuz Uluslu"},{"name":"Gerold Schneider"}],"abstract":"This paper presents the first application of Native Language Identification (NLI) for the Turkish language. NLI is the task of automatically identifying an individual's native language (L1) based on their writing or speech in a non-native language (L2). While most NLI research has focused on L2 English, our study extends this scope to L2 Turkish by analyzing a corpus of texts written by native speakers of Albanian, Arabic and Persian. We leverage a cleaned version of the Turkish Learner Corpus and demonstrate the effectiveness of syntactic features, comparing a structural Part-of-Speech n-gram model to a hybrid model that retains function words. Our models achieve promising results, and we analyze the most predictive features to reveal L1-specific transfer effects. We make our data and code publicly available for further study.","source":"arXiv","year":2023,"language":"en","subjects":["cs.CL"],"url":"https://arxiv.org/abs/2307.14850","pdf_url":"https://arxiv.org/pdf/2307.14850","is_open_access":true,"published_at":"2023-07-27T13:28:31Z","score":67},{"id":"arxiv_2306.06371","title":"A Comprehensive Review of State-of-The-Art Methods for Java Code Generation from Natural Language Text","authors":[{"name":"Jessica López Espejel"},{"name":"Mahaman Sanoussi Yahaya Alassan"},{"name":"El Mehdi Chouham"},{"name":"Walid Dahhane"},{"name":"El Hassane Ettifouri"}],"abstract":"Java Code Generation consists in generating automatically Java code from a Natural Language Text. This NLP task helps in increasing programmers' productivity by providing them with immediate solutions to the simplest and most repetitive tasks. Code generation is a challenging task because of the hard syntactic rules and the necessity of a deep understanding of the semantic aspect of the programming language. Many works tried to tackle this task using either RNN-based, or Transformer-based models. The latter achieved remarkable advancement in the domain and they can be divided into three groups: (1) encoder-only models, (2) decoder-only models, and (3) encoder-decoder models. In this paper, we provide a comprehensive review of the evolution and progress of deep learning models in Java code generation task. We focus on the most important methods and present their merits and limitations, as well as the objective functions used by the community. In addition, we provide a detailed description of datasets and evaluation metrics used in the literature. Finally, we discuss results of different models on CONCODE dataset, then propose some future directions.","source":"arXiv","year":2023,"language":"en","subjects":["cs.CL"],"doi":"10.1016/j.nlp.2023.100013","url":"https://arxiv.org/abs/2306.06371","pdf_url":"https://arxiv.org/pdf/2306.06371","is_open_access":true,"published_at":"2023-06-10T07:27:51Z","score":67},{"id":"arxiv_2310.14025","title":"Large Language Models and Multimodal Retrieval for Visual Word Sense Disambiguation","authors":[{"name":"Anastasia Kritharoula"},{"name":"Maria Lymperaiou"},{"name":"Giorgos Stamou"}],"abstract":"Visual Word Sense Disambiguation (VWSD) is a novel challenging task with the goal of retrieving an image among a set of candidates, which better represents the meaning of an ambiguous word within a given context. In this paper, we make a substantial step towards unveiling this interesting task by applying a varying set of approaches. Since VWSD is primarily a text-image retrieval task, we explore the latest transformer-based methods for multimodal retrieval. Additionally, we utilize Large Language Models (LLMs) as knowledge bases to enhance the given phrases and resolve ambiguity related to the target word. We also study VWSD as a unimodal problem by converting to text-to-text and image-to-image retrieval, as well as question-answering (QA), to fully explore the capabilities of relevant models. To tap into the implicit knowledge of LLMs, we experiment with Chain-of-Thought (CoT) prompting to guide explainable answer generation. On top of all, we train a learn to rank (LTR) model in order to combine our different modules, achieving competitive ranking results. Extensive experiments on VWSD demonstrate valuable insights to effectively drive future directions.","source":"arXiv","year":2023,"language":"en","subjects":["cs.CL"],"doi":"10.18653/v1/2023.emnlp-main.807","url":"https://arxiv.org/abs/2310.14025","pdf_url":"https://arxiv.org/pdf/2310.14025","is_open_access":true,"published_at":"2023-10-21T14:35:42Z","score":67},{"id":"arxiv_2306.02920","title":"Second Language Acquisition of Neural Language Models","authors":[{"name":"Miyu Oba"},{"name":"Tatsuki Kuribayashi"},{"name":"Hiroki Ouchi"},{"name":"Taro Watanabe"}],"abstract":"With the success of neural language models (LMs), their language acquisition has gained much attention. This work sheds light on the second language (L2) acquisition of LMs, while previous work has typically explored their first language (L1) acquisition. Specifically, we trained bilingual LMs with a scenario similar to human L2 acquisition and analyzed their cross-lingual transfer from linguistic perspectives. Our exploratory experiments demonstrated that the L1 pretraining accelerated their linguistic generalization in L2, and language transfer configurations (e.g., the L1 choice, and presence of parallel texts) substantially affected their generalizations. These clarify their (non-)human-like L2 acquisition in particular aspects.","source":"arXiv","year":2023,"language":"en","subjects":["cs.CL"],"url":"https://arxiv.org/abs/2306.02920","pdf_url":"https://arxiv.org/pdf/2306.02920","is_open_access":true,"published_at":"2023-06-05T14:32:41Z","score":67},{"id":"crossref_10.1075/lia.00014.eng","title":"SLA2","authors":[{"name":"Krister Schönström"},{"name":"Chloë Marshall"}],"abstract":"","source":"CrossRef","year":2022,"language":"en","subjects":null,"doi":"10.1075/lia.00014.eng","url":"https://doi.org/10.1075/lia.00014.eng","is_open_access":true,"citations":3,"published_at":"","score":66.09},{"id":"doaj_The+Riches+of+Hands-on+Subtitling+in+the+Foreign+Language+Classroom","title":"The Riches of Hands-on Subtitling in the Foreign Language Classroom","authors":[{"name":"Chengcheng Wang Wang"},{"name":"Jorge Díaz Cintas"}],"abstract":"\nUsing subtitles and subtitling as a means of diversifying foreign language teaching and learning has become increasingly popular in recent decades, particularly across Europe, where the European Commission has promoted, among others, the development of projects like ClipFlair, a web-based subtitling platform for foreign language learning (FLL). As part of this boost of research on FLL through subtitling, this empirical study was conducted in mainland China, where the role of subtitling in foreign language classroom has not been widely recognised by scholars. Carried out on seventeen higher education level Chinese L1 students, the experiment studied the effects of performing subtitling activities on English L2 vocabulary acquisition and discovered that doing subtitling tasks from L2 to L1 can result in a significantly better performance in vocabulary acquisition than doing intralingual subtitling activities (L2 to L2) or doing non-subtitling activities.\n","source":"DOAJ","year":2022,"language":"","subjects":["Language. Linguistic theory. Comparative grammar"],"url":"https://ojsspdc.ulpgc.es/ojs/index.php/LFE/article/view/1450","is_open_access":true,"published_at":"","score":66},{"id":"doaj_10.3389/fcomm.2022.900399","title":"Learning a second language via print: On the logical necessity of a fluent first language","authors":[{"name":"Catherine L. Caldwell-Harris"},{"name":"Robert J. Hoffmeister"}],"abstract":"How Deaf children should be taught to read has long been debated. Severely or profoundly Deaf children, who face challenges in acquiring language from its spoken forms, must learn to read a language they do not speak. We refer to this as learning a language via print. How children can learn language via print is not a topic regularly studied by educators, psychologists, or language acquisition theorists. Nonetheless, Deaf children can do this. We discuss how Deaf children can learn a written language via print by mapping print words and phrases to sign language sequences. However, established, time-tested curricula for using a signed language to teach the print forms of spoken languages do not exist. We describe general principles for approaching this task, how it differs from acquiring a spoken language naturalistically, and empirical evidence that Deaf children's knowledge of a signed language facilitates and advances learning a printed language.","source":"DOAJ","year":2022,"language":"","subjects":["Communication. Mass media"],"doi":"10.3389/fcomm.2022.900399","url":"https://www.frontiersin.org/articles/10.3389/fcomm.2022.900399/full","is_open_access":true,"published_at":"","score":66},{"id":"doaj_10.3389/feduc.2022.967117","title":"Tracing writing progression in English for academic purposes: A data-driven possibility in the post-COVID era in Hong Kong","authors":[{"name":"Dennis Foung"},{"name":"Julia Chen"}],"abstract":"It is rare to use “big data” in writing progression studies in the field of second language acquisition around the globe. The difficulty of recruiting participants for longitudinal studies often results in sample sizes that are too small for quantitative analysis. Due to the global pandemic, students began to face more academic and emotional challenges, and it became more important to track the progression of their writing across courses. This study utilizes big data in a study of over 4,500 students who took a basic English for Academic Purposes (EAP) course followed by an advanced one at a university in Hong Kong. The findings suggest that analytics studies can provide a range of insights into course design and strategic planning, including how students’ language use and citation skills improve. They can also allow researchers to study the progression of students based on the level of achievement and the time elapsed between the two EAP courses. Further, studies using mega-sized datasets will be more generalizable than previous studies with smaller sample sizes. These results indicate that data-driven analytics can be a helpful approach to writing progression studies, especially in the post-COVID era.","source":"DOAJ","year":2022,"language":"","subjects":["Education (General)"],"doi":"10.3389/feduc.2022.967117","url":"https://www.frontiersin.org/articles/10.3389/feduc.2022.967117/full","is_open_access":true,"published_at":"","score":66},{"id":"arxiv_2212.03812","title":"An Overview of Indian Spoken Language Recognition from Machine Learning Perspective","authors":[{"name":"Spandan Dey"},{"name":"Md Sahidullah"},{"name":"Goutam Saha"}],"abstract":"Automatic spoken language identification (LID) is a very important research field in the era of multilingual voice-command-based human-computer interaction (HCI). A front-end LID module helps to improve the performance of many speech-based applications in the multilingual scenario. India is a populous country with diverse cultures and languages. The majority of the Indian population needs to use their respective native languages for verbal interaction with machines. Therefore, the development of efficient Indian spoken language recognition systems is useful for adapting smart technologies in every section of Indian society. The field of Indian LID has started gaining momentum in the last two decades, mainly due to the development of several standard multilingual speech corpora for the Indian languages. Even though significant research progress has already been made in this field, to the best of our knowledge, there are not many attempts to analytically review them collectively. In this work, we have conducted one of the very first attempts to present a comprehensive review of the Indian spoken language recognition research field. In-depth analysis has been presented to emphasize the unique challenges of low-resource and mutual influences for developing LID systems in the Indian contexts. Several essential aspects of the Indian LID research, such as the detailed description of the available speech corpora, the major research contributions, including the earlier attempts based on statistical modeling to the recent approaches based on different neural network architectures, and the future research trends are discussed. This review work will help assess the state of the present Indian LID research by any active researcher or any research enthusiasts from related fields.","source":"arXiv","year":2022,"language":"en","subjects":["cs.CL","cs.SD","eess.AS"],"doi":"10.1145/3523179","url":"https://arxiv.org/abs/2212.03812","pdf_url":"https://arxiv.org/pdf/2212.03812","is_open_access":true,"published_at":"2022-11-30T11:03:51Z","score":66}],"total":1400396,"page":1,"page_size":20,"sources":["DOAJ","arXiv","CrossRef"],"query":"Language acquisition"}