Hasil untuk "English language"

Menampilkan 20 dari ~6565859 hasil · dari CrossRef, DOAJ, arXiv, Semantic Scholar

JSON API
arXiv Open Access 2026
Task-Specific Efficiency Analysis: When Small Language Models Outperform Large Language Models

Jinghan Cao, Yu Ma, Xinjin Li et al.

Large Language Models achieve remarkable performance but incur substantial computational costs unsuitable for resource-constrained deployments. This paper presents the first comprehensive task-specific efficiency analysis comparing 16 language models across five diverse NLP tasks. We introduce the Performance-Efficiency Ratio (PER), a novel metric integrating accuracy, throughput, memory, and latency through geometric mean normalization. Our systematic evaluation reveals that small models (0.5--3B parameters) achieve superior PER scores across all given tasks. These findings establish quantitative foundations for deploying small models in production environments prioritizing inference efficiency over marginal accuracy gains.

en cs.CL, cs.LG
arXiv Open Access 2026
FreeTxt-Vi: A Benchmarked Vietnamese-English Toolkit for Segmentation, Sentiment, and Summarisation

Hung Nguyen Huy, Mo El-Haj, Dawn Knight et al.

FreeTxt-Vi is a free and open source web based toolkit for creating and analysing bilingual Vietnamese English text collections. Positioned at the intersection of corpus linguistics and natural language processing NLP it enables users to build explore and interpret free text data without requiring programming expertise. The system combines corpus analysis features such as concordancing keyword analysis word relation exploration and interactive visualisation with transformer based NLP components for sentiment analysis and summarisation. A key contribution of this work is the design of a unified bilingual NLP pipeline that integrates a hybrid VnCoreNLP and Byte Pair Encoding BPE segmentation strategy a fine tuned TabularisAI sentiment classifier and a fine tuned Qwen2.5 model for abstractive summarisation. Unlike existing text analysis platforms FreeTxt Vi is evaluated as a set of language processing components. We conduct a three part evaluation covering segmentation sentiment analysis and summarisation and show that our approach achieves competitive or superior performance compared to widely used baselines in both Vietnamese and English. By reducing technical barriers to multilingual text analysis FreeTxt Vi supports reproducible research and promotes the development of language resources for Vietnamese a widely spoken but underrepresented language in NLP. The toolkit is applicable to domains including education digital humanities cultural heritage and the social sciences where qualitative text data are common but often difficult to process at scale.

en cs.CL
CrossRef Open Access 2025
Students' perceptions of Photomindset media in English language teaching

Fadhilah Ghaisani, Sulastri Manurung, Safnidar Siahaan

The demand for innovative and attractive teaching resources in the digital age is crucial for enhancing students' language proficiency and learning engagement. This study examines students' perceptions of Photomindset as media in English teaching and learning. This study employed a mixed-method approach, combining a close-ended questionnaire and interviews. The study involved 32 students in one of Junior High School in Batam, Indonesia and utilized the Technology Acceptance Model (TAM) to assess their perceptions. Three students with the highest TAM scores were selected for interviews to gain deeper insights into their perception. Data were analyzed using descriptive statistics to describe the tendency of students' perceptions of Photomindset and the relationship between perceived usefulness and perceived ease, as measured by Pearson correlation. Thematic analysis was employed to identify recurring themes and pattern from interviews. The results showed that students have positive perception of Photomindset with the mean score of 116 for Perceived Usefulness and 115 for Perceived Ease of Use. However, there is no correlation between perceived usefulness and perceived ease of use. The findings suggests that Photomindset is an effective, beneficial and user-friendly tool for enhancing English learning.

DOAJ Open Access 2025
Presence and Absence in Margaret Atwood’s Dearly

Pilar Sánchez-Calle

In Morning in the Burned House (1995), Margaret Atwood includes a sequence of elegiac poems mourning the process of her father’s illness and death. Her subsequent collection, The Door (2007), while not explicitly elegiac, explores topics such as memory, aging, death, loss, and decay. These subjects are often central to both traditional and contemporary elegies. Other poems in this volume deal with writing and poetry, examining their capacity to offer consolation in the face of death, a key aspect of elegy. Drawing on critical studies of elegy in contemporary English-language poetry and on the role of elegy in Atwood’s poetry, this essay analyses the elegiac dimension of Dearly (2020), Atwood’s most recent poetry collection. Many of these poems are dedicated to her partner Graeme Gibson, who was diagnosed with vascular dementia in 2017 and passed away in 2019. Through close readings and formal analysis, I aim to demonstrate how these elegiac poems articulate a psychic landscape of mourning where separation after death is rejected and an alternative space for reunion with the deceased is created. Atwood moves beyond simple lamentation, exploring the liminal space between life and death, presence and absence.

Philology. Linguistics, Literature (General)
DOAJ Open Access 2025
Global and population-specific association of MTHFR polymorphisms with preterm birth risk: a consolidated analysis of 44 studies

Maryam Vafapour, Hanieh Talebi, Mahsa Danaei et al.

Abstract Background This study investigates the relationship between polymorphisms in the MTHFR gene and the risk of preterm birth (PTB). Methods A comprehensive literature review was conducted using databases such as PubMed, Web of Science, and CNKI, with the search finalized on January 1, 2025. The review specifically targeted studies published prior to this date, utilizing relevant keywords and MeSH terms associated with PTB and genetic factors. Inclusion criteria encompassed original case-control, longitudinal, or cohort studies, with no limitations on language or publication date. Associations were quantified using odds ratios (ORs) and 95% confidence intervals (CIs) via Comprehensive Meta-Analysis software. Results The analysis included 44 case-control studies comprising 7,384 cases and 51,449 controls, extracted from 28 publications in both English and Chinese. Among these studies, 29 focused on the MTHFR C677T polymorphism, while 15 examined the MTHFR A1298C variant. Pooled results demonstrated a significant association between the MTHFR C677T polymorphism and PTB under five genetic models: allele (C vs. T; OR = 1.303, 95% CI 1.151–1.475, p ≤ 0.001), homozygote (CC vs. AA; OR = 1.494, 95% CI 1.212–1.842, p ≤ 0.001), heterozygote (CT vs. AA; OR = 1.303, 95% CI 1.119–1.516, p = 0.001), dominant (CC + CT vs. AA; OR = 1.341, 95% CI 1.161–1.548, p ≤ 0.001), and recessive (CC vs. CT + AA; OR = 1.340, 95% CI 1.119–1.604, p = 0.001). Subgroup analyses indicated significant associations in Asian populations, particularly in studies conducted in China and India, while no significant correlations were found in Caucasian populations, including those from Austria. Moreover, the MTHFR A1298C polymorphism did not demonstrate a significant relationship with PTB risk across the studied ethnicities. Conclusions The findings indicate a significant association between the MTHFR C677T polymorphism and PTB risk, particularly in Asian and Indian populations, while no significant associations were identified in Caucasian groups. Conversely, the MTHFR A1298C polymorphism appeared to have a negligible impact on PTB risk, underscoring the importance of considering population-specific factors in understanding the genetic epidemiology of PTB.

Gynecology and obstetrics
arXiv Open Access 2025
CEFR-Annotated WordNet: LLM-Based Proficiency-Guided Semantic Database for Language Learning

Masato Kikuchi, Masatsugu Ono, Toshioki Soga et al.

Although WordNet is a valuable resource because of its structured semantic networks and extensive vocabulary, its fine-grained sense distinctions can be challenging for second-language learners. To address this issue, we developed a version of WordNet annotated with the Common European Framework of Reference for Languages (CEFR), integrating its semantic networks with language-proficiency levels. We automated this process using a large language model to measure the semantic similarity between sense definitions in WordNet and entries in the English Vocabulary Profile Online. To validate our approach, we constructed a large-scale corpus containing both sense and CEFR-level information from the annotated WordNet and used it to develop contextual lexical classifiers. Our experiments demonstrate that models fine-tuned on this corpus perform comparably to those fine-tuned on gold-standard annotations. Furthermore, by combining this corpus with the gold-standard data, we developed a practical classifier that achieves a Macro-F1 score of 0.81. This result provides indirect evidence that the transferred labels are largely consistent with the gold-standard levels. The annotated WordNet, corpus, and classifiers are publicly available to help bridge the gap between natural language processing and language education, thereby facilitating more effective and efficient language learning.

en cs.CL
arXiv Open Access 2025
Classifying German Language Proficiency Levels Using Large Language Models

Elias-Leander Ahlers, Witold Brunsmann, Malte Schilling

Assessing language proficiency is essential for education, as it enables instruction tailored to learners needs. This paper investigates the use of Large Language Models (LLMs) for automatically classifying German texts according to the Common European Framework of Reference for Languages (CEFR) into different proficiency levels. To support robust training and evaluation, we construct a diverse dataset by combining multiple existing CEFR-annotated corpora with synthetic data. We then evaluate prompt-engineering strategies, fine-tuning of a LLaMA-3-8B-Instruct model and a probing-based approach that utilizes the internal neural state of the LLM for classification. Our results show a consistent performance improvement over prior methods, highlighting the potential of LLMs for reliable and scalable CEFR classification.

en cs.CL, cs.AI
S2 Open Access 2019
Offensive Language and Hate Speech Detection for Danish

Gudbjartur Ingi Sigurbergsson, Leon Derczynski

The presence of offensive language on social media platforms and the implications this poses is becoming a major concern in modern society. Given the enormous amount of content created every day, automatic methods are required to detect and deal with this type of content. Until now, most of the research has focused on solving the problem for the English language, while the problem is multilingual. We construct a Danish dataset DKhate containing user-generated comments from various social media platforms, and to our knowledge, the first of its kind, annotated for various types and target of offensive language. We develop four automatic classification systems, each designed to work for both the English and the Danish language. In the detection of offensive language in English, the best performing system achieves a macro averaged F1-score of 0.74, and the best performing system for Danish achieves a macro averaged F1-score of 0.70. In the detection of whether or not an offensive post is targeted, the best performing system for English achieves a macro averaged F1-score of 0.62, while the best performing system for Danish achieves a macro averaged F1-score of 0.73. Finally, in the detection of the target type in a targeted offensive post, the best performing system for English achieves a macro averaged F1-score of 0.56, and the best performing system for Danish achieves a macro averaged F1-score of 0.63. Our work for both the English and the Danish language captures the type and targets of offensive language, and present automatic methods for detecting different kinds of offensive language such as hate speech and cyberbullying.

184 sitasi en Computer Science
DOAJ Open Access 2024
Healthcare delivery to patients from culturally and linguistically diverse backgrounds in emergency care: a scoping review protocol

Ya-Ling Huang, Sarah Thorning, Chun-Chih Lin et al.

Abstract Background Worldwide, the culturally and linguistically diverse (CALD) population is increasing, and is predicted to reach 405 million by 2050. The delivery of emergency care for the CALD population can be complex due to cultural, social, and language factors. The extent to which cultural, social, and contextual factors influence care delivery to patients from CALD backgrounds throughout their emergency care journey is unclear. Using a systematic approach, this review aims to map the existing evidence regarding emergency healthcare delivery for patients from CALD backgrounds and uses a social ecological framework to provide a broader perspective on cultural, social, and contextual influence on emergency care delivery. Methods The Joanna Briggs Institute (JBI) scoping review methodology will be used to guide this review. The population is patients from CALD backgrounds who received care and emergency care clinicians who provided direct care. The concept is healthcare delivery to patients from CALD backgrounds. The context is emergency care. This review will include quantitative, qualitative, and mixed-methods studies published in English from January 1, 2012, onwards. Searches will be conducted in the databases of CINAHL (EBSCO), MEDLINE (Ovid), Embase (Elsevier), SocINDEX (EBSCO), Scopus (Elsevier), and a web search of Google Scholar. A PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram will be used to present the search decision process. All included articles will be appraised using the Mixed Methods Appraisal Tool (MMAT). Data will be presented in tabular form and accompanied by a narrative synthesis of the literature. Discussion Despite the increased use of emergency care service by patients from CALD backgrounds, there has been no comprehensive review of healthcare delivery to patients from CALD backgrounds in the emergency care context (ED and prehospital settings) that includes consideration of cultural, social, and contextual influences. The results of this scoping review may be used to inform future research and strategies that aim to enhance care delivery and experiences for people from CALD backgrounds who require emergency care. Systematic review registration This scoping review has been registered in the Open Science Framework https://doi.org/10.17605/OSF.IO/HTMKQ

arXiv Open Access 2024
Why do objects have many names? A study on word informativeness in language use and lexical systems

Eleonora Gualdoni, Gemma Boleda

Human lexicons contain many different words that speakers can use to refer to the same object, e.g., "purple" or "magenta" for the same shade of color. On the one hand, studies on language use have explored how speakers adapt their referring expressions to successfully communicate in context, without focusing on properties of the lexical system. On the other hand, studies in language evolution have discussed how competing pressures for informativeness and simplicity shape lexical systems, without tackling in-context communication. We aim at bridging the gap between these traditions, and explore why a soft mapping between referents and words is a good solution for communication, by taking into account both in-context communication and the structure of the lexicon. We propose a simple measure of informativeness for words and lexical systems, grounded in a visual space, and analyze color naming data for English and Mandarin Chinese. We conclude that optimal lexical systems are those where multiple words can apply to the same referent, conveying different amounts of information. Such systems allow speakers to maximize communication accuracy and minimize the amount of information they convey when communicating about referents in contexts.

en cs.CL
arXiv Open Access 2024
Bridging the Gap: Transfer Learning from English PLMs to Malaysian English

Mohan Raj Chanthran, Lay-Ki Soon, Huey Fang Ong et al.

Malaysian English is a low resource creole language, where it carries the elements of Malay, Chinese, and Tamil languages, in addition to Standard English. Named Entity Recognition (NER) models underperform when capturing entities from Malaysian English text due to its distinctive morphosyntactic adaptations, semantic features and code-switching (mixing English and Malay). Considering these gaps, we introduce MENmBERT and MENBERT, a pre-trained language model with contextual understanding, specifically tailored for Malaysian English. We have fine-tuned MENmBERT and MENBERT using manually annotated entities and relations from the Malaysian English News Article (MEN) Dataset. This fine-tuning process allows the PLM to learn representations that capture the nuances of Malaysian English relevant for NER and RE tasks. MENmBERT achieved a 1.52\% and 26.27\% improvement on NER and RE tasks respectively compared to the bert-base-multilingual-cased model. Although the overall performance of NER does not have a significant improvement, our further analysis shows that there is a significant improvement when evaluated by the 12 entity labels. These findings suggest that pre-training language models on language-specific and geographically-focused corpora can be a promising approach for improving NER performance in low-resource settings. The dataset and code published in this paper provide valuable resources for NLP research work focusing on Malaysian English.

en cs.CL
arXiv Open Access 2024
On the Applicability of Zero-Shot Cross-Lingual Transfer Learning for Sentiment Classification in Distant Language Pairs

Andre Rusli, Makoto Shishido

This research explores the applicability of cross-lingual transfer learning from English to Japanese and Indonesian using the XLM-R pre-trained model. The results are compared with several previous works, either by models using a similar zero-shot approach or a fully-supervised approach, to provide an overview of the zero-shot transfer learning approach's capability using XLM-R in comparison with existing models. Our models achieve the best result in one Japanese dataset and comparable results in other datasets in Japanese and Indonesian languages without being trained using the target language. Furthermore, the results suggest that it is possible to train a multi-lingual model, instead of one model for each language, and achieve promising results.

en cs.CL, cs.AI
arXiv Open Access 2024
A Multilingual Sentiment Lexicon for Low-Resource Language Translation using Large Languages Models and Explainable AI

Melusi Malinga, Isaac Lupanda, Mike Wa Nkongolo et al.

South Africa and the Democratic Republic of Congo (DRC) present a complex linguistic landscape with languages such as Zulu, Sepedi, Afrikaans, French, English, and Tshiluba (Ciluba), which creates unique challenges for AI-driven translation and sentiment analysis systems due to a lack of accurately labeled data. This study seeks to address these challenges by developing a multilingual lexicon designed for French and Tshiluba, now expanded to include translations in English, Afrikaans, Sepedi, and Zulu. The lexicon enhances cultural relevance in sentiment classification by integrating language-specific sentiment scores. A comprehensive testing corpus is created to support translation and sentiment analysis tasks, with machine learning models such as Random Forest, Support Vector Machine (SVM), Decision Trees, and Gaussian Naive Bayes (GNB) trained to predict sentiment across low resource languages (LRLs). Among them, the Random Forest model performed particularly well, capturing sentiment polarity and handling language-specific nuances effectively. Furthermore, Bidirectional Encoder Representations from Transformers (BERT), a Large Language Model (LLM), is applied to predict context-based sentiment with high accuracy, achieving 99% accuracy and 98% precision, outperforming other models. The BERT predictions were clarified using Explainable AI (XAI), improving transparency and fostering confidence in sentiment classification. Overall, findings demonstrate that the proposed lexicon and machine learning models significantly enhance translation and sentiment analysis for LRLs in South Africa and the DRC, laying a foundation for future AI models that support underrepresented languages, with applications across education, governance, and business in multilingual contexts.

en cs.CL, cs.AI
arXiv Open Access 2024
Is Self-knowledge and Action Consistent or Not: Investigating Large Language Model's Personality

Yiming Ai, Zhiwei He, Ziyin Zhang et al.

In this study, we delve into the validity of conventional personality questionnaires in capturing the human-like personality traits of Large Language Models (LLMs). Our objective is to assess the congruence between the personality traits LLMs claim to possess and their demonstrated tendencies in real-world scenarios. By conducting an extensive examination of LLM outputs against observed human response patterns, we aim to understand the disjunction between self-knowledge and action in LLMs.

en cs.CL, cs.CY
S2 Open Access 2019
Global Englishes for Language Teaching

H. Rose, Nicola Galloway

The spread of English as a global language has resulted in the emergence of a number of related fields of research within applied linguistics, including English as an International Language, English as a Lingua Franca, and World Englishes. Here, Heath Rose and Nicola Galloway consolidate this work by exploring how the global spread of English has impacted TESOL, uniting similar movements in second language acquisition, such as translanguaging and the multilingual turn. They build on a number of concrete proposals for change and innovation in English language teaching practice, whilst offering a detailed examination of how to incorporate a Global Englishes perspective into the multiple faces of TESOL, putting research-informed practice at the forefront. Global Englishes for Language Teaching is a ground-breaking attempt to unite discussions on the pedagogical implications of the global spread of English into a single text for researchers and practicing teachers.

160 sitasi en Sociology

Halaman 19 dari 328293