Hasil untuk "English literature"

Menampilkan 20 dari ~9553382 hasil · dari arXiv, DOAJ, CrossRef, Semantic Scholar

JSON API
DOAJ Open Access 2026
Three Decades of Use of the Minimum Basic Data Set in Infectious Disease Research in Spain: A Scoping Review with an Evidence-Mapping Approach

Beatriz Rodríguez-Alonso, Hugo Almeida, Montserrat Alonso-Sardón et al.

Nationwide hospital discharge databases are increasingly used in infectious disease research, yet their methodological strengths and limitations are rarely synthesised. In Spain, the Minimum Basic Data Set (Conjunto Mínimo Básico de Datos, CMBD) was implemented in 1987 and provides near-universal coverage of acute-care hospitalisations and has been widely applied in infectious disease epidemiology. However, its overall contribution and intrinsic constraints have not been comprehensively mapped. Given the breadth of infections, study designs, populations and outcome definitions in CMBD-based research, effect-size synthesis was not feasible; therefore, we conducted a scoping review with an evidence-mapping approach. We aimed to synthesise the scope, applications and methodological limitations of CMBD-based infectious disease research since its implementation. We conducted a scoping review following JBI guidance and reported according to PRISMA-ScR. PubMed, Embase, Web of Science and Scopus were searched from inception to 25 November 2024 for peer-reviewed journal articles in English or Spanish using CMBD data to investigate infectious diseases in Spain (no restrictions were applied by study design; grey literature was excluded). Screening, data charting and synthesis were completed during 2025. Four reviewers independently screened records and charted data. Studies were classified by infectious disease focus, syndromic category, study design and geographical scope. A total of 359 studies published between 1996 and 2024 were included, mostly retrospective observational analyses. Infectious diseases were the primary focus in 225 studies (62.7%), most commonly respiratory, gastrointestinal/liver and vaccine-preventable infections. Subnational analyses were concentrated in a limited number of regions. Over 80% of reported limitations reflected intrinsic CMBD features. Over three decades, the CMBD has become a cornerstone of hospital-based infectious disease research in Spain, enabling robust national analyses. However, limitations in clinical detail, microbiological confirmation and coding consistency constrain aetiological specificity and causal inference, highlighting the need for data validation and linkage with complementary sources.

arXiv Open Access 2025
Beyond English: The Impact of Prompt Translation Strategies across Languages and Tasks in Multilingual LLMs

Itai Mondshine, Tzuf Paz-Argaman, Reut Tsarfaty

Despite advances in the multilingual capabilities of Large Language Models (LLMs) across diverse tasks, English remains the dominant language for LLM research and development. So, when working with a different language, this has led to the widespread practice of pre-translation, i.e., translating the task prompt into English before inference. Selective pre-translation, a more surgical approach, focuses on translating specific prompt components. However, its current use is sporagic and lacks a systematic research foundation. Consequently, the optimal pre-translation strategy for various multilingual settings and tasks remains unclear. In this work, we aim to uncover the optimal setup for pre-translation by systematically assessing its use. Specifically, we view the prompt as a modular entity, composed of four functional parts: instruction, context, examples, and output, either of which could be translated or not. We evaluate pre-translation strategies across 35 languages covering both low and high-resource languages, on various tasks including Question Answering (QA), Natural Language Inference (NLI), Named Entity Recognition (NER), and Abstractive Summarization. Our experiments show the impact of factors as similarity to English, translation quality and the size of pre-trained data, on the model performance with pre-translation. We suggest practical guidelines for choosing optimal strategies in various multilingual settings.

en cs.CL
arXiv Open Access 2025
AfriSpeech-MultiBench: A Verticalized Multidomain Multicountry Benchmark Suite for African Accented English ASR

Gabrial Zencha Ashungafac, Mardhiyah Sanni, Busayo Awobade et al.

Recent advances in speech-enabled AI, including Google's NotebookLM and OpenAI's speech-to-speech API, are driving widespread interest in voice interfaces globally. Despite this momentum, there exists no publicly available application-specific model evaluation that caters to Africa's linguistic diversity. We present AfriSpeech-MultiBench, the first domain-specific evaluation suite for over 100 African English accents across 10+ countries and seven application domains: Finance, Legal, Medical, General dialogue, Call Center, Named Entities and Hallucination Robustness. We benchmark a diverse range of open, closed, unimodal ASR and multimodal LLM-based speech recognition systems using both spontaneous and non-spontaneous speech conversation drawn from various open African accented English speech datasets. Our empirical analysis reveals systematic variation: open-source ASR models excels in spontaneous speech contexts but degrades on noisy, non-native dialogue; multimodal LLMs are more accent-robust yet struggle with domain-specific named entities; proprietary models deliver high accuracy on clean speech but vary significantly by country and domain. Models fine-tuned on African English achieve competitive accuracy with lower latency, a practical advantage for deployment, hallucinations still remain a big problem for most SOTA models. By releasing this comprehensive benchmark, we empower practitioners and researchers to select voice technologies suited to African use-cases, fostering inclusive voice applications for underserved communities.

en cs.CL
DOAJ Open Access 2025
Integrating Gender Sensitization and Gender-Neutral Language of English by Employing CLIL Approach to Enhance Critical Thinking

Ravi Prakash Jalli, Spoorthi Boda

This study aims to examine how CLIL approach can be employed for integrating gender sensitization and gender-neutral language of English to enhance critical thinking. CLIL stands for Content and Language Integrated Learning approach which integrates content and language, placing equal focus on both. The study uses a cross-sectional pre-experimental research design, following one group: pre-test, intervention and post-test model. Convenience sampling is employed to select the participants: 105 engineering second-year students attending Gender Sensitization MNC at an NIT. The results of the study were analysed using both inductive and descriptive analyses along with inferential statistics. They conclude that CLIL is an effective approach to integrating gender sensitization and gender-neutral language of English to enhance critical thinking.

Language and Literature, English literature
arXiv Open Access 2024
Self-supervised Speech Representations Still Struggle with African American Vernacular English

Kalvin Chang, Yi-Hui Chou, Jiatong Shi et al.

Underperformance of ASR systems for speakers of African American Vernacular English (AAVE) and other marginalized language varieties is a well-documented phenomenon, and one that reinforces the stigmatization of these varieties. We investigate whether or not the recent wave of Self-Supervised Learning (SSL) speech models can close the gap in ASR performance between AAVE and Mainstream American English (MAE). We evaluate four SSL models (wav2vec 2.0, HuBERT, WavLM, and XLS-R) on zero-shot Automatic Speech Recognition (ASR) for these two varieties and find that these models perpetuate the bias in performance against AAVE. Additionally, the models have higher word error rates on utterances with more phonological and morphosyntactic features of AAVE. Despite the success of SSL speech models in improving ASR for low resource varieties, SSL pre-training alone may not bridge the gap between AAVE and MAE. Our code is publicly available at https://github.com/cmu-llab/s3m-aave.

en cs.CL, cs.SD
arXiv Open Access 2024
A holographic mobile-based application for practicing pronunciation of basic English vocabulary for Spanish speaking children

R. Cerezo, V. Calderon, C. Romero

This paper describes a holographic mobile-based application designed to help Spanish-speaking children to practice the pronunciation of basic English vocabulary words. The mastery of vocabulary is a fundamental step when learning a language but is often perceived as boring. Producing the correct pronunciation is frequently regarded as the most difficult and complex skill for new learners of English. In order to address these problems this research takes advantage of the power of multi-channel stimuli (sound, image and interaction) in a mobilebased hologram application in order to motivate students and improve their experience of practicing. We adapted the prize-winning HolograFX game and developed a new mobile application to help practice English pronunciation. A 3D holographic robot that acts as a virtual teacher interacts via voice with the children. To test the tool we carried out an experiment with 70 Spanish pre-school children divided into three classes, the control group using traditional methods such as images in books and on the blackboard, and two experimental groups using our drills and practice software. One experimental group used the mobile application without the holographic game and the other experimental group used the application with the holographic game. We performed pre-test and post-test performance assessments, a satisfaction survey and emotion analysis. The results are very promising. They show that the use of the holographic mobile-based application had a significant impact on the children's motivation. It also improved their performance compared to traditional methods used in the classroom.

arXiv Open Access 2024
ZAEBUC-Spoken: A Multilingual Multidialectal Arabic-English Speech Corpus

Injy Hamed, Fadhl Eryani, David Palfreyman et al.

We present ZAEBUC-Spoken, a multilingual multidialectal Arabic-English speech corpus. The corpus comprises twelve hours of Zoom meetings involving multiple speakers role-playing a work situation where Students brainstorm ideas for a certain topic and then discuss it with an Interlocutor. The meetings cover different topics and are divided into phases with different language setups. The corpus presents a challenging set for automatic speech recognition (ASR), including two languages (Arabic and English) with Arabic spoken in multiple variants (Modern Standard Arabic, Gulf Arabic, and Egyptian Arabic) and English used with various accents. Adding to the complexity of the corpus, there is also code-switching between these languages and dialects. As part of our work, we take inspiration from established sets of transcription guidelines to present a set of guidelines handling issues of conversational speech, code-switching and orthography of both languages. We further enrich the corpus with two layers of annotations; (1) dialectness level annotation for the portion of the corpus where mixing occurs between different variants of Arabic, and (2) automatic morphological annotations, including tokenization, lemmatization, and part-of-speech tagging.

en cs.CL
arXiv Open Access 2024
Can Code-Switched Texts Activate a Knowledge Switch in LLMs? A Case Study on English-Korean Code-Switching

Seoyeon Kim, Huiseo Kim, Chanjun Park et al.

Recent large language models (LLMs) demonstrate multilingual abilities, yet they are English-centric due to dominance of English in training corpora. The limited resource for low-resource languages remains a crucial challenge. Code-switching (CS), a phenomenon where multilingual speakers alternate between languages in a discourse, can convey subtle cultural and linguistic nuances that can be otherwise lost in translation and elicits language-specific knowledge in human communications. In light of this, we investigate whether code-switching can activate, or identify and leverage knowledge for reasoning when LLMs solve low-resource language tasks. To facilitate the research, we first present EnKoQA, a synthetic English-Korean CS question-answering dataset. We provide comprehensive analysis on a variety of multilingual LLMs by subdividing activation process into knowledge identification and knowledge leveraging. Our results demonstrate that compared to English text, CS can faithfully activate knowledge inside LLMs especially on language-specific domains, suggesting the potential of code-switching on low-resource language tasks.

en cs.CL
DOAJ Open Access 2024
“The Aftermath of Historic Chaos”: Trumpism as a Totalist Worldview in the Testimonies of Participants in the January 6 Attacks

Murariu Mihai, Bercuci Loredana

This article explores how the January 6th attack on the Capitol building is represented in the discourse of the participants in the insurrection. We analyze a self-compiled corpus of depositions and interviews with individuals involved in the January 6, 2021, assault, which we have named TruJan. The TruJan corpus was compiled from the GovInfo website. Linguistic Inquiry and Word Count (LIWC-22) was used for the analysis to extract the most frequent topics discussed by the interviewees during the hearings, along with sentences that showcase the attitudes of the speakers regarding the most salient issues. The second part of the article offers a close reading of the depositions and interviews with movement leaders or founders: Ali Alexander (Stop the Steal), Henry “Enrique” Tarrio (Proud Boys), and Stewart Rhodes (Oath Keepers). We argue that Trumpism’s latent totalist potential was made manifest in the January 6 attacks due to the political setbacks Trump faced when losing the 2020 elections, which was perceived as a state of crisis.

History (General) and history of Europe, English literature
arXiv Open Access 2023
Exploring the Potential of Machine Translation for Generating Named Entity Datasets: A Case Study between Persian and English

Amir Sartipi, Afsaneh Fatemi

This study focuses on the generation of Persian named entity datasets through the application of machine translation on English datasets. The generated datasets were evaluated by experimenting with one monolingual and one multilingual transformer model. Notably, the CoNLL 2003 dataset has achieved the highest F1 score of 85.11%. In contrast, the WNUT 2017 dataset yielded the lowest F1 score of 40.02%. The results of this study highlight the potential of machine translation in creating high-quality named entity recognition datasets for low-resource languages like Persian. The study compares the performance of these generated datasets with English named entity recognition systems and provides insights into the effectiveness of machine translation for this task. Additionally, this approach could be used to augment data in low-resource language or create noisy data to make named entity systems more robust and improve them.

en cs.CL, cs.AI
DOAJ Open Access 2023
Investigasi Desain Pengajaran untuk Penalaran Matematis

Novita Fatmiyati, Freddy Prasetyo, Dela Ambarwati et al.

Abstract: Learning mathematics requires students to have various abilities and skills, including mathematical reasoning ability. This study aims to identify any matters related to teaching design in mathematics education toward students' mathematical reasoning abilities. The method used was Systematic Literature Review (SLR) and the collection of literature used as research subjects were carried out by searching the Scopus database and using inclusion criteria in the form of (1) written in English, (2) published in the last ten years (2013-2022), and (3) indexed by Scopus. There are 24 articles identified that meet the inclusion criteria and were analyzed based on teaching design categorization by van der Akker that has been modified by Lithner. The results found that there are learning designs that affect various mathematical reasoning abilities, namely the Zone of Proximal Development (ZPD), Realistic Mathematics Education (RME), and Theory of Didactical Situation (TDS), where various claims in the form of interventions such as learning models, learning instructions and teaching media or materials, which have been seen to effectively improve students' mathematical reasoning abilities.   Abstrak: Pembelajaran matematika menuntut kemampuan dan keterampilan siswa, salah satunya kemampuan penalaran matematika. Penelitian ini bertujuan mengindentifikasi desain pembelajaran matematika terkait kemampuan penalaran matematis siswa. Metode yang digunakan yaitu Systematic Literature Review (SLR) dengan mengidentifikasi database Scopusdengan kriteria inklusi berupa (1) ditulis dalam Bahasa Inggris, (2) diterbitkan selama sepuluh tahun terakhir (2013-2022), dan (3) terindeks Scopus. Terdapat 24 artikel yang teridentifikasi memenuhi kriteria tersebut yang selanjutnya dianalisis berdasarkan kategorisasi desain pengajaran menurut van der Akker yang dimodifikasi oleh Lithner. Hasil penelitian ini menemukan terdapat desain pengajaran yang mempengaruhi berbagai kemampuan penalaran matematis, yaitu Zone of Proximal Development (ZPD), Realistic Mathematics Education (RME), dan Theory of Didactical Situation (TDS), dengan berbagai klaim diantaranya intervensi model pembelajaran, instruksi pembelajaran, dan media atau bahan ajar, yang secara efektif meningkatkan kemampuan penalaran matematika siswa.

Education, Education (General)
DOAJ Open Access 2023
Revisiting the role of breadth and depth of vocabulary knowledge in reading comprehension

Animut Tadele Dagnaw

AbstractThe purpose of this study was to investigate the role of breadth and depth of vocabulary knowledge in reading comprehension at Debre Markos University. A quantitative approach was taken to gather and analyze the data. Out of 235 students learning at the college, 61 samples were taken randomly. To investigate their knowledge of vocabulary breadth, the Vocabulary Levels Test (VLT) was employed. The Depth of Vocabulary Knowledge (DVK) test was utilized to investigate the depth of vocabulary knowledge. The reading section of the Test of English as a Foreign Language (TOFEL) was used to determine the reading comprehension performance of the students. Pearson Product-moment correlation was used to examine the relationship between vocabulary knowledge (breadth and depth) and reading comprehension. In addition, to find out which aspect of vocabulary knowledge best explains reading comprehension, Standard Multiple Regression was employed. The data were analyzed using SPSS (version 21). The findings suggest that there was a significant strong positive relationship between knowledge of vocabulary breadth and reading comprehension (r = .73, n = 61, P, =.000 < 0.05). Besides, the result reveals that there was a significant strong positive relationship between knowledge of vocabulary depth and reading comprehension (r = .60, n = 61, P, =.000 < 0.05). The finding also shows that vocabulary breadth and depth together were able to predict respondents’ reading comprehension. However, vocabulary breadth (Beta =.58) had a more unique explanatory power than knowledge of vocabulary depth (Beta =.315).

Education (General)
arXiv Open Access 2022
Auto-Select Reading Passages in English Assessment Tests?

Bruce W. Lee, Jason H. Lee

We show a method to auto-select reading passages in English assessment tests and share some key insights that can be helpful in related fields. In specifics, we prove that finding a similar passage (to a passage that already appeared in the test) can give a suitable passage for test development. In the process, we create a simple database-tagger-filter algorithm and perform a human evaluation. However, 1. the textual features, that we analyzed, lack coverage, and 2. we fail to find meaningful correlations between each feature and suitability score. Lastly, we describe the future developments to improve automated reading passage selection.

en cs.CL, cs.IR
arXiv Open Access 2022
Controlling Extra-Textual Attributes about Dialogue Participants -- A Case Study of English-to-Polish Neural Machine Translation

Sebastian T. Vincent, Loïc Barrault, Carolina Scarton

Unlike English, morphologically rich languages can reveal characteristics of speakers or their conversational partners, such as gender and number, via pronouns, morphological endings of words and syntax. When translating from English to such languages, a machine translation model needs to opt for a certain interpretation of textual context, which may lead to serious translation errors if extra-textual information is unavailable. We investigate this challenge in the English-to-Polish language direction. We focus on the underresearched problem of utilising external metadata in automatic translation of TV dialogue, proposing a case study where a wide range of approaches for controlling attributes in translation is employed in a multi-attribute scenario. The best model achieves an improvement of +5.81 chrF++/+6.03 BLEU, with other models achieving competitive performance. We additionally contribute a novel attribute-annotated dataset of Polish TV dialogue and a morphological analysis script used to evaluate attribute control in models.

en cs.CL, cs.AI
arXiv Open Access 2022
ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic - English

Injy Hamed, Nizar Habash, Slim Abdennadher et al.

We present our work on collecting ArzEn-ST, a code-switched Egyptian Arabic - English Speech Translation Corpus. This corpus is an extension of the ArzEn speech corpus, which was collected through informal interviews with bilingual speakers. In this work, we collect translations in both directions, monolingual Egyptian Arabic and monolingual English, forming a three-way speech translation corpus. We make the translation guidelines and corpus publicly available. We also report results for baseline systems for machine translation and speech translation tasks. We believe this is a valuable resource that can motivate and facilitate further research studying the code-switching phenomenon from a linguistic perspective and can be used to train and evaluate NLP systems.

en cs.CL
DOAJ Open Access 2022
How does the English4IT Platform Provide English Materials for Computer Science Learners?

Puspa Fortuna Zulfa

During the COVID-19 pandemic, the learning process shifted from face-to-face to online learning. English4IT is one of the platforms that can support the success of the online learning process. This study aims to investigate and describe the use of the English4IT platform to assist the English learning process for computer science learners. The platform contains materials suitable for computer science learners’ needs, including vocabulary, four main English skills, and practical activities, either synchronous or asynchronous. This descriptive qualitative study gathered the data from the contents of the English4IT platform. The data were then analyzed qualitatively by applying the platform and comparing the findings with previous studies and existing theories. This study’s novelty lies in its originality since a deep investigation of this platform has not been conducted by any previous researchers. The findings indicated that the use of this platform could effectively assist the students in learning English and motivate them to join. Various materials provided adequate teaching materials and references for computer science learners. The activities could enhance the students’ progress in each skill, and the re-do feature made it easier for learners to correct their mistakes. This study implies that the English4IT platform is recommended as an alternative way to teach English to computer science students. Before using this platform, it is recommended to recognize the strengths and weaknesses so that the English teachers can use it effectively.

English language, Special aspects of education
DOAJ Open Access 2022
Look what we’ve got for you!

Viola Voß, Göran Hamrin

Librarians put a lot of time and thought into the question "what to buy for the library?" to meet users’ needs as best as possible. But what happens once a book or a database or a journal has made it onto the (virtual) shelf? How do users learn about new acquisitions or other interesting holdings? In this article, we take a tour through collection-marketing activities by academic libraries, highlighting some interesting examples and collecting ideas for reuse. Method: We scouted the internet presence of all IATUL member libraries, considering their websites and, if available, their web 2.0 / social media activities. We added findings from literature and from chance encounters. Results: Our data collection provides an overview of collection-marketing activities in academic libraries all over the world. We discuss the different types of activities, present some examples that we consider interesting, and share insights into recent activities at one of the authors’ libraries. Limitations: We can only analyse the activities of those libraries that use a working language we understand, which rules out some libraries that don’t have, e.g., a version of their website in English, German, French, or a Scandinavian language. Moreover, we only consider the perspective and activities of the libraries – but not the perspective and expectations of their users. Investigating whether they have informational needs about collections that are not yet met by libraries would be an interesting complement to our study.

Bibliography. Library science. Information resources
arXiv Open Access 2021
A Case Study on the Independence of Speech Emotion Recognition in Bangla and English Languages using Language-Independent Prosodic Features

Fardin Saad, Hasan Mahmud, Mohammad Ridwan Kabir et al.

A language agnostic approach to recognizing emotions from speech remains an incomplete and challenging task. In this paper, we performed a step-by-step comparative analysis of Speech Emotion Recognition (SER) using Bangla and English languages to assess whether distinguishing emotions from speech is independent of language. Six emotions were categorized for this study, such as - happy, angry, neutral, sad, disgust, and fear. We employed three Emotional Speech Sets (ESS), of which the first two were developed by native Bengali speakers in Bangla and English languages separately. The third was a subset of the Toronto Emotional Speech Set (TESS), which was developed by native English speakers from Canada. We carefully selected language-independent prosodic features, adopted a Support Vector Machine (SVM) model, and conducted three experiments to carry out our proposition. In the first experiment, we measured the performance of the three speech sets individually, followed by the second experiment, where different ESS pairs were integrated to analyze the impact on SER. Finally, we measured the recognition rate by training and testing the model with different speech sets in the third experiment. Although this study reveals that SER in Bangla and English languages is mostly language-independent, some disparities were observed while recognizing emotional states like disgust and fear in these two languages. Moreover, our investigations revealed that non-native speakers convey emotions through speech, much like expressing themselves in their native tongue.

en cs.CL, cs.HC

Halaman 32 dari 477670