Hasil untuk "English literature"

Menampilkan 20 dari ~9543813 hasil · dari arXiv, DOAJ, Semantic Scholar, CrossRef

JSON API
arXiv Open Access 2026
A Dataset for Probing Translationese Preferences in English-to-Swedish Translation

Jenny Kunz, Anja Jarochenko, Marcel Bollmann

Translations often carry traces of the source language, a phenomenon known as translationese. We introduce the first freely available English-to-Swedish dataset contrasting translationese sentences with idiomatic alternatives, designed to probe intrinsic preferences of language models. It includes error tags and descriptions of the problems in the original translations. In experiments evaluating smaller Swedish and multilingual LLMs with our dataset, we find that they often favor the translationese phrasing. Human alternatives are chosen more often when the English source sentence is omitted, indicating that exposure to the source biases models toward literal translations, although even without context models often prefer the translationese variant. Our dataset and findings provide a resource and benchmark for developing models that produce more natural, idiomatic output in non-English languages.

en cs.CL
arXiv Open Access 2025
Verifiable Natural Language to Linear Temporal Logic Translation: A Benchmark Dataset and Evaluation Suite

William H English, Chase Walker, Dominic Simon et al.

Empirical evaluation of state-of-the-art natural-language (NL) to temporal-logic (TL) translation systems reveals near-perfect performance on existing benchmarks. However, current studies measure only the accuracy of the translation of NL logic into formal TL, ignoring a system's capacity to ground atomic propositions into new scenarios or environments. This is a critical feature, necessary for the verification of resulting formulas in a concrete state space. Consequently, most NL-to-TL translation frameworks propose their own bespoke dataset in which the correct grounding is known a-priori, inflating performance metrics and neglecting the need for extensible, domain-general systems. In this paper, we introduce the Verifiable Linear Temporal Logic Benchmark ( VLTL-Bench), a unifying benchmark that measures verification and verifiability of automated NL-to-LTL translation. The dataset consists of four unique state spaces and thousands of diverse natural language specifications and corresponding formal specifications in temporal logic. Moreover, the benchmark contains sample traces to validate the temporal logic expressions. While the benchmark directly supports end-to-end evaluation, we observe that many frameworks decompose the process into i) lifting, ii) grounding, iii) translation, and iv) verification. The benchmark provides ground truths after each of these steps to enable researches to improve and evaluate different substeps of the overall problem. To encourage methodologically sound advances in verifiable NL-to-LTL translation approaches, we release VLTL-Bench here: https://www.kaggle.com/datasets/dubascudes/vltl bench.

en eess.SY, cs.CL
arXiv Open Access 2025
Keyword Extraction, and Aspect Classification in Sinhala, English, and Code-Mixed Content

F. A. Rizvi, T. Navojith, A. M. N. H. Adhikari et al.

Brand reputation in the banking sector is maintained through insightful analysis of customer opinion on code-mixed and multilingual content. Conventional NLP models misclassify or ignore code-mixed text, when mix with low resource languages such as Sinhala-English and fail to capture domain-specific knowledge. This study introduces a hybrid NLP method to improve keyword extraction, content filtering, and aspect-based classification of banking content. Keyword extraction in English is performed with a hybrid approach comprising a fine-tuned SpaCy NER model, FinBERT-based KeyBERT embeddings, YAKE, and EmbedRank, which results in a combined accuracy of 91.2%. Code-mixed and Sinhala keywords are extracted using a fine-tuned XLM-RoBERTa model integrated with a domain-specific Sinhala financial vocabulary, and it results in an accuracy of 87.4%. To ensure data quality, irrelevant comment filtering was performed using several models, with the BERT-base-uncased model achieving 85.2% for English and XLM-RoBERTa 88.1% for Sinhala, which was better than GPT-4o, SVM, and keyword-based filtering. Aspect classification followed the same pattern, with the BERT-base-uncased model achieving 87.4% for English and XLM-RoBERTa 85.9% for Sinhala, both exceeding GPT-4 and keyword-based approaches. These findings confirm that fine-tuned transformer models outperform traditional methods in multilingual financial text analysis. The present framework offers an accurate and scalable solution for brand reputation monitoring in code-mixed and low-resource banking environments.

en cs.CL, cs.AI
arXiv Open Access 2025
Unsupervised Classification of English Words Based on Phonological Information: Discovery of Germanic and Latinate Clusters

Takashi Morita, Timothy J. O'Donnell

Cross-linguistically, native words and loanwords follow different phonological rules. In English, for example, words of Germanic and Latinate origin exhibit different stress patterns, and a certain syntactic structure, double-object datives, is predominantly associated with Germanic verbs rather than Latinate verbs. From the perspective of language acquisition, however, such etymology-based generalizations raise learnability concerns, since the historical origins of words are presumably inaccessible information for general language learners. In this study, we present computational evidence indicating that the Germanic-Latinate distinction in the English lexicon is learnable from the phonotactic information of individual words. Specifically, we performed an unsupervised clustering on corpus-extracted words, and the resulting word clusters largely aligned with the etymological distinction. The model-discovered clusters also recovered various linguistic generalizations documented in the previous literature regarding the corresponding etymological classes. Moreover, our model also uncovered previously unrecognized features of the quasi-etymological clusters. Taken together with prior results from Japanese, our findings indicate that the proposed method provides a general, cross-linguistic approach to discovering etymological structure from phonotactic cues in the lexicon.

en cs.CL
arXiv Open Access 2025
HieroGlyphTranslator: Automatic Recognition and Translation of Egyptian Hieroglyphs to English

Ahmed Nasser, Marwan Mohamed, Alaa Sherif et al.

Egyptian hieroglyphs, the ancient Egyptian writing system, are composed entirely of drawings. Translating these glyphs into English poses various challenges, including the fact that a single glyph can have multiple meanings. Deep learning translation applications are evolving rapidly, producing remarkable results that significantly impact our lives. In this research, we propose a method for the automatic recognition and translation of ancient Egyptian hieroglyphs from images to English. This study utilized two datasets for classification and translation: the Morris Franken dataset and the EgyptianTranslation dataset. Our approach is divided into three stages: segmentation (using Contour and Detectron2), mapping symbols to Gardiner codes, and translation (using the CNN model). The model achieved a BLEU score of 42.2, a significant result compared to previous research.

en cs.CV, cs.LG
arXiv Open Access 2025
Automatic Proficiency Assessment in L2 English Learners

Armita Mohammadi, Alessandro Lameiras Koerich, Laureano Moro-Velazquez et al.

Second language proficiency (L2) in English is usually perceptually evaluated by English teachers or expert evaluators, with the inherent intra- and inter-rater variability. This paper explores deep learning techniques for comprehensive L2 proficiency assessment, addressing both the speech signal and its correspondent transcription. We analyze spoken proficiency classification prediction using diverse architectures, including 2D CNN, frequency-based CNN, ResNet, and a pretrained wav2vec 2.0 model. Additionally, we examine text-based proficiency assessment by fine-tuning a BERT language model within resource constraints. Finally, we tackle the complex task of spontaneous dialogue assessment, managing long-form audio and speaker interactions through separate applications of wav2vec 2.0 and BERT models. Results from experiments on EFCamDat and ANGLISH datasets and a private dataset highlight the potential of deep learning, especially the pretrained wav2vec 2.0 model, for robust automated L2 proficiency evaluation.

en cs.CL, cs.SD
DOAJ Open Access 2025
An overview of the treatment interventions and assessment of fear-avoidance for chronic musculoskeletal pain in adults: A scoping review protocol.

Sam Tan, Anju Jaggi, Alex Tasker et al.

<h4>Introduction</h4>The Fear-Avoidance (FA) model aims to explain how an acute pain experience can develop into a persistent state. The FA model considers five core components: kinesiophobia, pain-related fear, catastrophisation, victimisation, and interpersonal social environment. Amongst these, kinesiophobia, tends to dominate the literature on chronic musculoskeletal pain. As a result, current reviews have not considered the other core components of the FA model when exploring its interventions. Moreover, several synonyms of the term kinesiophobia is not reflected in their search strategies. Coupled with the preference of particular study designs and outcome measures, this scoping review aims to provide and characterise an overview of treatment interventions that consider all study designs, relevant outcome measures, FA components, and FA component synonyms.<h4>Methods and analysis</h4>Eligible studies will be in English or with an available English translation from 1970 onwards. Databases to be searched include Cochrane Central Register of Controlled Trials (CENTRAL), MEDLINE, Embase, The Allied and Complementary Database (AMED), PEDro, Web of Science, and grey literature. We will include studies involving participants ≥18 years old with chronic musculoskeletal pain, and interventions targeting FA and/or its components. Three review authors will independently screen papers using preestablished eligibility criteria and conduct assessments of risk of bias, with a fourth independent researcher employed to resolve disagreements where found. Qualitative synthesis techniques will be used to characterise the interventions. Patient and Public Involvement (PPI) has been utilised to develop this protocol and will be conducted following completion of the systematic review to discuss and reflect on the findings.<h4>Ethics and dissemination</h4>This systematic review does not require ethical approval as existing data will be used and the PPI to be conducted is an involvement activity rather than study data. The results will be disseminated through a peer-reviewed journal and via national and international conferences.<h4>Open science framework registration number</h4>This protocol is registered on Open Science Framework: https://doi.org/10.17605/OSF.IO/NR37A.

Medicine, Science
arXiv Open Access 2024
Breaking the Programming Language Barrier: Multilingual Prompting to Empower Non-Native English Learners

James Prather, Brent N. Reeves, Paul Denny et al.

Non-native English speakers (NNES) face multiple barriers to learning programming. These barriers can be obvious, such as the fact that programming language syntax and instruction are often in English, or more subtle, such as being afraid to ask for help in a classroom full of native English speakers. However, these barriers are frustrating because many NNES students know more about programming than they can articulate in English. Advances in generative AI (GenAI) have the potential to break down these barriers because state of the art models can support interactions in multiple languages. Moreover, recent work has shown that GenAI can be highly accurate at code generation and explanation. In this paper, we provide the first exploration of NNES students prompting in their native languages (Arabic, Chinese, and Portuguese) to generate code to solve programming problems. Our results show that students are able to successfully use their native language to solve programming problems, but not without some difficulty specifying programming terminology and concepts. We discuss the challenges they faced, the implications for practice in the short term, and how this might transform computing education globally in the long term.

en cs.CY, cs.AI
arXiv Open Access 2024
Thresholds for post-selected quantum error correction from statistical mechanics

Lucas H. English, Dominic J. Williamson, Stephen D. Bartlett

We identify regimes where post-selection can be used scalably in quantum error correction (QEC) to improve performance. We use statistical mechanical models to analytically quantify the performance and thresholds of post-selected QEC, with a focus on the surface code. Based on the non-equilibrium magnetization of these models, we identify a simple heuristic technique for post-selection that does not require a decoder. Along with performance gains, this heuristic allows us to derive analytic expressions for post-selected conditional logical thresholds and abort thresholds of surface codes. We find that such post-selected QEC is characterised by four distinct thermodynamic phases, and detail the implications of this phase space for practical, scalable quantum computation.

en quant-ph
arXiv Open Access 2024
How BERT Speaks Shakespearean English? Evaluating Historical Bias in Contextual Language Models

Miriam Cuscito, Alfio Ferrara, Martin Ruskov

In this paper, we explore the idea of analysing the historical bias of contextual language models based on BERT by measuring their adequacy with respect to Early Modern (EME) and Modern (ME) English. In our preliminary experiments, we perform fill-in-the-blank tests with 60 masked sentences (20 EME-specific, 20 ME-specific and 20 generic) and three different models (i.e., BERT Base, MacBERTh, English HLM). We then rate the model predictions according to a 5-point bipolar scale between the two language varieties and derive a weighted score to measure the adequacy of each model to EME and ME varieties of English.

en cs.CL, cs.CY
arXiv Open Access 2024
Since the Scientific Literature Is Multilingual, Our Models Should Be Too

Abteen Ebrahimi, Kenneth Church

English has long been assumed the $\textit{lingua franca}$ of scientific research, and this notion is reflected in the natural language processing (NLP) research involving scientific document representation. In this position piece, we quantitatively show that the literature is largely multilingual and argue that current models and benchmarks should reflect this linguistic diversity. We provide evidence that text-based models fail to create meaningful representations for non-English papers and highlight the negative user-facing impacts of using English-only models non-discriminately across a multilingual domain. We end with suggestions for the NLP community on how to improve performance on non-English documents.

en cs.CL
arXiv Open Access 2024
Beyond English-Centric LLMs: What Language Do Multilingual Language Models Think in?

Chengzhi Zhong, Fei Cheng, Qianying Liu et al.

In this study, we investigate whether non-English-centric LLMs, despite their strong performance, `think' in their respective dominant language: more precisely, `think' refers to how the representations of intermediate layers, when un-embedded into the vocabulary space, exhibit higher probabilities for certain dominant languages during generation. We term such languages as internal $\textbf{latent languages}$. We examine the latent language of three typical categories of models for Japanese processing: Llama2, an English-centric model; Swallow, an English-centric model with continued pre-training in Japanese; and LLM-jp, a model pre-trained on balanced English and Japanese corpora. Our empirical findings reveal that, unlike Llama2 which relies exclusively on English as the internal latent language, Japanese-specific Swallow and LLM-jp employ both Japanese and English, exhibiting dual internal latent languages. For any given target language, the model preferentially activates the latent language most closely related to it. In addition, we explore how intermediate layers respond to questions involving cultural conflicts between latent internal and target output languages. We further explore how the language identity shifts across layers while keeping consistent semantic meaning reflected in the intermediate layer representations. This study deepens the understanding of non-English-centric large language models, highlighting the intricate dynamics of language representation within their intermediate layers.

en cs.CL, cs.AI
DOAJ Open Access 2024
Beyond Rest: Exploring the Sleep-Exercise Connection

J. Kim, T. Kainth, E. Garrels et al.

Introduction The bidirectional relationship between the effects of sleep and exercise is often underappreciated. We aim to explore the bidirectional relationship of sleep and exercise. We further discuss the prominence of poor sleep in both the athletic and general population and understand the underlying mechanisms of interdependencies between the two. The goal is to illuminate practical implications to improve both areas and optimize physical and mental health. Objectives - To explore the bidirectional relationship between sleep and exercise - To understand how exercise can counterbalance the adverse metabolic consequences of sleep deprivation. Methods We conducted a systemic literature review from Pubmed, Scopus, and PsychINFO using the search terms: “(exercise) and (sleep),” “(exercise performance) and (sleep),” “(sleep quality) and (exercise).” We included original studies in English conducted on age groups 18 years and older. Results Data from 31 studies shows that a significant number of athletes experience poor sleep quality and daytime sleepiness. 68.5% of Qatar Stars League soccer players and 61% of collegiate athletes in NCAA institutions report daytime fatigue several times a week. Most common causes include overtraining, hectic travel schedules, and sleeping in unfamiliar settings. Studies confirm athletes often sleep less before intense training or competitions. Sleep deficiency may lead to reduced muscular strength and endurance, mood changes, increased perceived effort, impaired cognitive processing, and diminished motor skills. Athletes averaging less than 8 hours of sleep nightly were 1.7 times more prone to injuries. Physiologically, sleep loss alters ventilation, plasma lactate concentration, hormone secretion, and inflammatory responses, hinders muscle glycogen restoration. Extended sleep restriction decreases testosterone levels, which influence muscle mass, energy, bone strength, and more. On the contrary, exercise may counter adverse metabolic impacts of sleep deprivation. High-intensity interval exercise (HIIE) has shown to nullify negative metabolic effects of sleep deprivation, suggesting exercise’s protective potential. Conclusions Sleep and exercise are fundamental to maintaining physical, mental, emotional, and spiritual health. The bidirectional, interdependent relationship can be best utilized by the providers to optimize overall well being. The critical impact of adequate sleep, particularly among athletes, is frequently underestimated. Poor sleep can detrimentally affect performance, amplify injury risks, and disrupt physiological functions, yet contemporary lifestyles often downplay its significance. It is important for healthcare professionals to emphasize a balanced approach to optimize these vital aspects. Continued research can offer strategies that benefit athletes and the broader populace, aiming to uplift daily life functionality. Disclosure of Interest None Declared

S2 Open Access 2015
Titanium-Nitride Coating of Orthopaedic Implants: A Review of the Literature

R. V. van Hove, I. Sierevelt, B. V. van Royen et al.

Surfaces of medical implants can be enhanced with the favorable properties of titanium-nitride (TiN). In a review of English medical literature, the effects of TiN-coating on orthopaedic implant material in preclinical studies were identified and the influence of these effects on the clinical outcome of TiN-coated orthopaedic implants was explored. The TiN-coating has a positive effect on the biocompatibility and tribological properties of implant surfaces; however, there are several reports of third body wear due to delamination, increased ultrahigh molecular weight polyethylene wear, and cohesive failure of the TiN-coating. This might be due to the coating process. The TiN-coating process should be optimized and standardized for titanium alloy articulating surfaces. The clinical benefit of TiN-coating of CoCrMo knee implant surfaces should be further investigated.

245 sitasi en Materials Science, Medicine
arXiv Open Access 2022
Think-Aloud Verbalizations for Identifying User Experience Problems: Effects of Language Proficiency with Chinese Non-Native English Speakers

Mingming Fan, Lingyun Zhu

Subtle patterns in users' think-aloud (TA) verbalizations (i.e., utterances) are shown to be telltale signs of user experience (UX) problems and used to build artificial intelligence (AI) models or AI-assisted tools to help UX evaluators identify UX problems automatically or semi-automatically. Despite the potential of such verbalization patterns, they were uncovered with native English speakers. As most people who speak English are non-native speakers, it is important to investigate whether similar patterns exist in non-native English speakers' TA verbalizations. As a first step to answer this question, we conducted think-aloud usability testing with Chinese non-native English speakers and native English speakers using three common TA protocols. We compared their verbalizations and UX problems that they encountered to understand the effects of language and TA protocols. Our findings show that both language groups had similar amounts and proportions of verbalization categories, encountered similar problems, and had similar verbalization patterns that indicate UX problems. Furthermore, TA protocols did not significantly affect the correlations between verbalizations and problems. Based on the findings, we present three design implications for UX practitioners and the design of AI-assisted analysis tools.

DOAJ Open Access 2022
Safety and efficacy of acupuncture for varicocele-induced male infertility: a systematic review protocol

Jing Ding, Miaomiao Sun, Sijia Wang et al.

Introduction Varicocele (VC) is a common clinical disease in andrology. Among a number of ways for VC treatment, surgery is the most common one, but the measurable benefit of surgical repair was slight. A growing exploration of complementary therapies has been conducted in clinical research on acupuncture for VC, but there is no relevant systematic review and meta-analysis to assess the efficacy and safety of acupuncture for VC.Methods and analysis All relevant publications published from database inception through August 2022 will be searched in three English-language databases (Embase, CENTRAL, MEDLINE) and four Chinese-language databases (China National Knowledge Infrastructure, China Science and Technology Journal Database, Chinese Biomedical Literature Database and Wanfang Data). Randomised controlled trials in English and Chinese concerned with acupuncture for patients with VC will be included. The input clinical data will be processed by the Review Manager software (RevMan). The literature will be appraised with the Cochrane Collaboration risk of bias tool. The Grading of Recommendations Assessment, Development and Evaluation system (GRADE system) will be used to evaluate the quality of evidence.Ethics and dissemination This study is a secondary study based on clinical studies so it does not relate to any individual patient information or infringe the rights of participants. Hence no ethical approval is required. The results will be reported in peer-reviewed journals or disseminated at relevant conferences.PROSPERO registration number CRD42022316005.

Halaman 20 dari 477191