Hasil "English language"

S2 Open Access 2023

Baichuan 2: Open Large-scale Language Models

Ai Ming Yang, Bin Xiao, Bingning Wang et al.

Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering. However, most powerful LLMs are closed-source or limited in their capability for languages other than English. In this technical report, we present Baichuan 2, a series of large-scale multilingual language models containing 7 billion and 13 billion parameters, trained from scratch, on 2.6 trillion tokens. Baichuan 2 matches or outperforms other open-source models of similar size on public benchmarks like MMLU, CMMLU, GSM8K, and HumanEval. Furthermore, Baichuan 2 excels in vertical domains such as medicine and law. We will release all pre-training model checkpoints to benefit the research community in better understanding the training dynamics of Baichuan 2.

962 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2020

AraBERT: Transformer-based Model for Arabic Language Understanding

Wissam Antoun, Fady Baly, Hazem M. Hajj

The Arabic language is a morphologically rich language with relatively few resources and a less explored syntax compared to English. Given these limitations, Arabic Natural Language Processing (NLP) tasks like Sentiment Analysis (SA), Named Entity Recognition (NER), and Question Answering (QA), have proven to be very challenging to tackle. Recently, with the surge of transformers based models, language-specific BERT based models have proven to be very efficient at language understanding, provided they are pre-trained on a very large corpus. Such models were able to set new standards and achieve state-of-the-art results for most NLP tasks. In this paper, we pre-trained BERT specifically for the Arabic language in the pursuit of achieving the same success that BERT did for the English language. The performance of AraBERT is compared to multilingual BERT from Google and other state-of-the-art approaches. The results showed that the newly developed AraBERT achieved state-of-the-art performance on most tested Arabic NLP tasks. The pretrained araBERT models are publicly available on https://github.com/aub-mind/araBERT hoping to encourage research and applications for Arabic NLP.

1295 sitasi en Computer Science

Detail Sumber

S2 Open Access 2020

Beyond English-Centric Multilingual Machine Translation

Angela Fan, Shruti Bhosale, Holger Schwenk et al.

Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages. However, much of this work is English-Centric by training only on data which was translated from or to English. While this is supported by large sources of training data, it does not reflect translation needs worldwide. In this work, we create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages. We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining. Then, we explore how to effectively increase model capacity through a combination of dense scaling and language-specific sparse parameters to create high quality models. Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT. We open-source our scripts so that others may reproduce the data, evaluation, and final M2M-100 model.

1032 sitasi en Computer Science

Detail Sumber

S2 Open Access 2014

English with an accent: language, ideology, and discrimination in the United States

A. Weber

1092 sitasi en Sociology

Detail DOI Sumber

S2 Open Access 2012

THE EFFECT OF ENGLISH-LANGUAGE RESTRICTION ON SYSTEMATIC REVIEW-BASED META-ANALYSES: A SYSTEMATIC REVIEW OF EMPIRICAL STUDIES

A. Morrison, Julie Polisena, D. Husereau et al.

970 sitasi en Medicine, Psychology

Detail DOI Sumber

S2 Open Access 1999

Collaborative Action Research for English Language Teachers

A. Burns

1215 sitasi en Computer Science

Detail Sumber

S2 Open Access 1990

Genre Analysis: English in Academic and Research Settings

J. Swales

7248 sitasi en Sociology, Computer Science

Detail Sumber

S2 Open Access 2002

The Cambridge Grammar of the English Language

R. Huddleston, G. Pullum

1216 sitasi en History, Sociology

Detail DOI Sumber

S2 Open Access 2023

The manifold costs of being a non-native English speaker in science

Tatsuya Amano, Valeria Ramírez-Castañeda, V. Berdejo-Espinola et al.

The use of English as the common language of science represents a major impediment to maximising the contribution of non-native English speakers to science. Yet few studies have quantified the consequences of language barriers on the career development of researchers who are non-native English speakers. By surveying 908 researchers in environmental sciences, this study estimates and compares the amount of effort required to conduct scientific activities in English between researchers from different countries and, thus, different linguistic and economic backgrounds. Our survey demonstrates that non-native English speakers, especially early in their careers, spend more effort than native English speakers in conducting scientific activities, from reading and writing papers and preparing presentations in English, to disseminating research in multiple languages. Language barriers can also cause them not to attend, or give oral presentations at, international conferences conducted in English. We urge scientific communities to recognise and tackle these disadvantages to release the untapped potential of non-native English speakers in science. This study also proposes potential solutions that can be implemented today by individuals, institutions, journals, funders, and conferences. Please see the Supporting information files (S2–S6 Text) for Alternative Language Abstracts and Figs 5 and 6.

279 sitasi en Medicine

Detail DOI Sumber

S2 Open Access 2018

Clinical Natural Language Processing in languages other than English: opportunities and challenges

Aurélie Névéol, H. Dalianis, G. Savova et al.

Natural language processing applied to clinical text or aimed at a clinical outcome has been thriving in recent years. This paper offers the first broad overview of clinical Natural Language Processing (NLP) for languages other than English. Recent studies are summarized to offer insights and outline opportunities in this area. We envision three groups of intended readers: (1) NLP researchers leveraging experience gained in other languages, (2) NLP researchers faced with establishing clinical text processing in a language other than English, and (3) clinical informatics researchers and practitioners looking for resources in their languages in order to apply NLP techniques and tools to clinical practice and/or investigation. We review work in clinical NLP in languages other than English. We classify these studies into three groups: (i) studies describing the development of new NLP systems or components de novo, (ii) studies describing the adaptation of NLP architectures developed for English to another language, and (iii) studies focusing on a particular clinical application. We show the advantages and drawbacks of each method, and highlight the appropriate application context. Finally, we identify major challenges and opportunities that will affect the impact of NLP on clinical practice and public health studies in a context that encompasses English as well as other languages.

289 sitasi en Computer Science, Medicine

Detail DOI Sumber

CrossRef Open Access 2026

The Changing English Language

en

Detail DOI Sumber

DOAJ Open Access 2026

Evaluating the readability, understandability, quality, and popularity of online materials about epilepsy in children

Fatma Sargin, Mehmet Alçı, Büşra Kaygusuz Aydemir et al.

Abstract Background Childhood epilepsy is one of the most common neurological disorders worldwide, significantly affecting cognitive, emotional, and social development. As caregivers often seek medical guidance online, the readability, understandability, and quality of internet-based patient education materials (IPEMs) are crucial for health literacy and decision-making. This study evaluates the readability, understandability, quality, and popularity of online pediatric epilepsy materials in relation to established health communication standards. Methods A Google search using the term ‘epilepsy in children’ (July 20, 2025) identified 84 eligible English-language websites. These were classified as (I) academic departments/societies, (II) clinics/hospitals, and (III) miscellaneous healthcare platforms. Readability was measured by seven validated indices and summarized as the Average Reading Level Consensus (ARLC), understandability by Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P), quality by Journal of the American Medical Association (JAMA) benchmarks, and popularity via Similarweb. Results The mean ARLC was 11.13 ± 2.10, exceeding the recommended sixth-grade level, with no readability differences across website types (p = 0.167). PEMAT scores were high (median 82.3%) and varied by source (p = 0.022), favoring academic sites. Only 34.52% met high-quality standards (JAMA ≥ 3), with Group I superior (p = 0.002). Readability showed no significant correlation with understandability (r = − 0.151, p = 0.170) or quality (r = − 0.154, p = 0.161). Most websites had moderate popularity. Conclusions Although many pediatric epilepsy websites are understandable, most surpass recommended readability levels and lack key quality indicators. Healthcare professionals should guide families to reliable, accessible resources and promote user-centered digital health communication.

Pediatrics

Detail DOI Sumber

S2 Open Access 2014

A Course in English Language Teaching

Alan Maley

369 sitasi en Psychology

Detail DOI Sumber

DOAJ Open Access 2025

La citoyenneté des Anglaises, 1850-1914. À la conquête de l’opinion publique

Myriam Boussahba

Following the publication of Mary Wollstonecraft’s Vindication of the Rights of Woman (1792), the women’s suffrage campaign was forged around the slogan “on the same terms as men”. The suffrage, though restricted at the time to men, was gradually extended to include some workers (1867), farm workers who were heads of households (1885) and finally all men, with the advent of universal male suffrage in 1918. In 1884, those wives who now had control over their own bodies, joined single women in demanding representation and the right to vote at local and national level. Citing their irrefutable status as citizens in their own right, British women opposed and denounced the clear injustice of biological arguments used to justify political inequality. In calling for social and political reform based on the equality of the sexes, such women asserted both their status as political subjects and their place in history. In so doing, they called on the state to provide financial assistance to poorer pregnant women and to take action in the struggle against wage inequality and the doctrine of “separate spheres”. Women’s history in the 1970s, and gender history in the 1980s, precipitated the emergence of new approaches in the vast majority of academic fields. Gender inequalities – linked to themes such as masculinities, consent or sexual violence – are thus constitutive of history.

History of Great Britain, English literature

Detail DOI Sumber

arXiv Open Access 2025

GinSign: Grounding Natural Language Into System Signatures for Temporal Logic Translation

William English, Chase Walker, Dominic Simon et al.

Natural language (NL) to temporal logic (TL) translation enables engineers to specify, verify, and enforce system behaviors without manually crafting formal specifications-an essential capability for building trustworthy autonomous systems. While existing NL-to-TL translation frameworks have demonstrated encouraging initial results, these systems either explicitly assume access to accurate atom grounding or suffer from low grounded translation accuracy. In this paper, we propose a framework for Grounding Natural Language Into System Signatures for Temporal Logic translation called GinSign. The framework introduces a grounding model that learns the abstract task of mapping NL spans onto a given system signature: given a lifted NL specification and a system signature $\mathcal{S}$, the classifier must assign each lifted atomic proposition to an element of the set of signature-defined atoms $\mathcal{P}$. We decompose the grounding task hierarchically -- first predicting predicate labels, then selecting the appropriately typed constant arguments. Decomposing this task from a free-form generation problem into a structured classification problem permits the use of smaller masked language models and eliminates the reliance on expensive LLMs. Experiments across multiple domains show that frameworks which omit grounding tend to produce syntactically correct lifted LTL that is semantically nonequivalent to grounded target expressions, whereas our framework supports downstream model checking and achieves grounded logical-equivalence scores of $95.5\%$, a $1.4\times$ improvement over SOTA.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2025

WikiGap: Promoting Epistemic Equity by Surfacing Knowledge Gaps Between English Wikipedia and other Language Editions

Zining Wang, Yuxuan Zhang, Dongwook Yoon et al.

With more than 11 times as many pageviews as the next largest edition, English Wikipedia dominates global knowledge access relative to other language editions. Readers are prone to assuming English Wikipedia as a superset of all language editions, leading many to prefer it even when their primary language is not English. Other language editions, however, comprise complementary facts rooted in their respective cultures and media environments, which are marginalized in English Wikipedia. While Wikipedia's user interface enables switching between language editions through its Interlanguage Link (ILL) system, it does not reveal to readers that other language editions contain valuable, complementary information. We present WikiGap, a system that surfaces complementary facts sourced from other Wikipedias within the English Wikipedia interface. Specifically, by combining a recent multilingual information-gap discovery method with a user-centered design, WikiGap enables access to complementary information from French, Russian, and Chinese Wikipedia. In a mixed-methods study (n=21), WikiGap significantly improved fact-finding accuracy, reduced task time, and received a 32-point higher usability score relative to Wikipedia's current ILL-based navigation system. Participants reported increased awareness of the availability of complementary information in non-English editions and reconsidered the completeness of English Wikipedia. WikiGap thus paves the way for improved epistemic equity across language editions.

en cs.HC, cs.CL

Detail Sumber

arXiv Open Access 2025

Can Grammarly and ChatGPT accelerate language change? AI-powered technologies and their impact on the English language: wordiness vs. conciseness

Karolina Rudnicka

The proliferation of NLP-powered language technologies, AI-based natural language generation models, and English as a mainstream means of communication among both native and non-native speakers make the output of AI-powered tools especially intriguing to linguists. This paper investigates how Grammarly and ChatGPT affect the English language regarding wordiness vs. conciseness. A case study focusing on the purpose subordinator in order to is presented to illustrate the way in which Grammarly and ChatGPT recommend shorter grammatical structures instead of longer and more elaborate ones. Although the analysed sentences were produced by native speakers, are perfectly correct, and were extracted from a language corpus of contemporary English, both Grammarly and ChatGPT suggest more conciseness and less verbosity, even for relatively short sentences. The present article argues that technologies such as Grammarly not only mirror language change but also have the potential to facilitate or accelerate it.

en cs.CL, cs.CY

Detail DOI Sumber

S2 Open Access 2019

Learning English from YouTubers: English L2 learners’ self-regulated language learning on YouTube

Hung-chün Wang, Cheryl Wei-yu Chen

ABSTRACT Focusing on a growing English-learning trend in Taiwan, this study investigated EFL university students’ self-regulated language learning on YouTube outside of the classroom. Twenty university students who had substantial experience of watching YouTubers’ English-teaching videos were invited for an individual interview to bring to light their perceptions of this self-directed learning approach. Their responses were analyzed to provide insights into learners’ attitudes toward this technology-enhanced learning strategy and its impact on their learning of English. Results show that the most highlighted purposes for learning English on YouTube were to explore more learning resources, to seek the attraction of learning English, and to explore cultural knowledge. After viewing the videos on YouTube, the students were more likely to press like and share the videos with their friends. Moreover, learning English on YouTube was considered to be more flexible, more interesting, and more interactive than formal learning in the classroom; nevertheless, this informal learning approach was also deemed less effective for students who wanted to improve their English or prepare for English exams. Based on the results, this study concludes by highlighting the pedagogical implications of this research and proposing the complementary use of YouTubers’ English-teaching videos to classroom learning.

186 sitasi en Psychology

Detail DOI Sumber

S2 Open Access 2019

The Effect of Classroom Emotions, Attitudes Toward English, and Teacher Behavior on Willingness to Communicate Among English Foreign Language Learners

Jean–Marc Dewaele

Willingness to communicate (WTC) in a foreign language is linked to a range of interacting learner-internal and learner-external variables. The present study identified the predictors of WTC of 210 foreign language learners of English from Spain. Multiple regression analyses revealed that the strongest (negative) predictor of WTC was foreign language classroom anxiety, while foreign language enjoyment and frequency of foreign language use by the teacher were positive predictors.

174 sitasi en Psychology

Detail DOI Sumber

DOAJ Open Access 2024

EXPLORING Z GENERATION ATTITUDES TOWARD VARIETIES OF ENGLISH(ES)

Laela Rohadatul Aisy, Ribut Wahyudi

English, one of the most dominant languages, has undergone transformations and divergences that have created a variety of variations in different parts of the world. The fact is that English has more than 160 acknowledged variations of accents across the globe. Each variation from standard English to a distinctive local reflects its unique culture and history. This study aims to investigate Generation Z's attitudes towards variations of Englishes of their English as foreign language communication experiences. This research adopted qualitative research benefiting from Saraceni's (2010) Space, Culture, Ideology and Psychology (SCIP) model to understand variety of English(es). Four English literature students were selected as respondents when they were still in their 7th semester and aged 21-22 at an Islamic university (under Ministry of Religious Affairs) in East Java, Indonesia. The results revealed that the dominance of American English is still the benchmark in most participants' preferences. A number of competing and interconnected factors such as habits, motivations, and practices with the influence of family, social, educational, and environmental factors shape their preferences on English(es) varieties. The participants showed positive, contradictory attitudes (positive and negative) to negative attitude towards the varieties of Englishes.

Language and Literature, Philology. Linguistics

Detail DOI Sumber

Hasil untuk "English language"