Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering. However, most powerful LLMs are closed-source or limited in their capability for languages other than English. In this technical report, we present Baichuan 2, a series of large-scale multilingual language models containing 7 billion and 13 billion parameters, trained from scratch, on 2.6 trillion tokens. Baichuan 2 matches or outperforms other open-source models of similar size on public benchmarks like MMLU, CMMLU, GSM8K, and HumanEval. Furthermore, Baichuan 2 excels in vertical domains such as medicine and law. We will release all pre-training model checkpoints to benefit the research community in better understanding the training dynamics of Baichuan 2.
The Arabic language is a morphologically rich language with relatively few resources and a less explored syntax compared to English. Given these limitations, Arabic Natural Language Processing (NLP) tasks like Sentiment Analysis (SA), Named Entity Recognition (NER), and Question Answering (QA), have proven to be very challenging to tackle. Recently, with the surge of transformers based models, language-specific BERT based models have proven to be very efficient at language understanding, provided they are pre-trained on a very large corpus. Such models were able to set new standards and achieve state-of-the-art results for most NLP tasks. In this paper, we pre-trained BERT specifically for the Arabic language in the pursuit of achieving the same success that BERT did for the English language. The performance of AraBERT is compared to multilingual BERT from Google and other state-of-the-art approaches. The results showed that the newly developed AraBERT achieved state-of-the-art performance on most tested Arabic NLP tasks. The pretrained araBERT models are publicly available on https://github.com/aub-mind/araBERT hoping to encourage research and applications for Arabic NLP.
Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages. However, much of this work is English-Centric by training only on data which was translated from or to English. While this is supported by large sources of training data, it does not reflect translation needs worldwide. In this work, we create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages. We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining. Then, we explore how to effectively increase model capacity through a combination of dense scaling and language-specific sparse parameters to create high quality models. Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT. We open-source our scripts so that others may reproduce the data, evaluation, and final M2M-100 model.
Tatsuya Amano, Valeria Ramírez-Castañeda, V. Berdejo-Espinola
et al.
The use of English as the common language of science represents a major impediment to maximising the contribution of non-native English speakers to science. Yet few studies have quantified the consequences of language barriers on the career development of researchers who are non-native English speakers. By surveying 908 researchers in environmental sciences, this study estimates and compares the amount of effort required to conduct scientific activities in English between researchers from different countries and, thus, different linguistic and economic backgrounds. Our survey demonstrates that non-native English speakers, especially early in their careers, spend more effort than native English speakers in conducting scientific activities, from reading and writing papers and preparing presentations in English, to disseminating research in multiple languages. Language barriers can also cause them not to attend, or give oral presentations at, international conferences conducted in English. We urge scientific communities to recognise and tackle these disadvantages to release the untapped potential of non-native English speakers in science. This study also proposes potential solutions that can be implemented today by individuals, institutions, journals, funders, and conferences. Please see the Supporting information files (S2–S6 Text) for Alternative Language Abstracts and Figs 5 and 6.
Natural language processing applied to clinical text or aimed at a clinical outcome has been thriving in recent years. This paper offers the first broad overview of clinical Natural Language Processing (NLP) for languages other than English. Recent studies are summarized to offer insights and outline opportunities in this area. We envision three groups of intended readers: (1) NLP researchers leveraging experience gained in other languages, (2) NLP researchers faced with establishing clinical text processing in a language other than English, and (3) clinical informatics researchers and practitioners looking for resources in their languages in order to apply NLP techniques and tools to clinical practice and/or investigation. We review work in clinical NLP in languages other than English. We classify these studies into three groups: (i) studies describing the development of new NLP systems or components de novo, (ii) studies describing the adaptation of NLP architectures developed for English to another language, and (iii) studies focusing on a particular clinical application. We show the advantages and drawbacks of each method, and highlight the appropriate application context. Finally, we identify major challenges and opportunities that will affect the impact of NLP on clinical practice and public health studies in a context that encompasses English as well as other languages.
Fatma Sargin, Mehmet Alçı, Büşra Kaygusuz Aydemir
et al.
Abstract Background Childhood epilepsy is one of the most common neurological disorders worldwide, significantly affecting cognitive, emotional, and social development. As caregivers often seek medical guidance online, the readability, understandability, and quality of internet-based patient education materials (IPEMs) are crucial for health literacy and decision-making. This study evaluates the readability, understandability, quality, and popularity of online pediatric epilepsy materials in relation to established health communication standards. Methods A Google search using the term ‘epilepsy in children’ (July 20, 2025) identified 84 eligible English-language websites. These were classified as (I) academic departments/societies, (II) clinics/hospitals, and (III) miscellaneous healthcare platforms. Readability was measured by seven validated indices and summarized as the Average Reading Level Consensus (ARLC), understandability by Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P), quality by Journal of the American Medical Association (JAMA) benchmarks, and popularity via Similarweb. Results The mean ARLC was 11.13 ± 2.10, exceeding the recommended sixth-grade level, with no readability differences across website types (p = 0.167). PEMAT scores were high (median 82.3%) and varied by source (p = 0.022), favoring academic sites. Only 34.52% met high-quality standards (JAMA ≥ 3), with Group I superior (p = 0.002). Readability showed no significant correlation with understandability (r = − 0.151, p = 0.170) or quality (r = − 0.154, p = 0.161). Most websites had moderate popularity. Conclusions Although many pediatric epilepsy websites are understandable, most surpass recommended readability levels and lack key quality indicators. Healthcare professionals should guide families to reliable, accessible resources and promote user-centered digital health communication.
Following the publication of Mary Wollstonecraft’s Vindication of the Rights of Woman (1792), the women’s suffrage campaign was forged around the slogan “on the same terms as men”. The suffrage, though restricted at the time to men, was gradually extended to include some workers (1867), farm workers who were heads of households (1885) and finally all men, with the advent of universal male suffrage in 1918. In 1884, those wives who now had control over their own bodies, joined single women in demanding representation and the right to vote at local and national level. Citing their irrefutable status as citizens in their own right, British women opposed and denounced the clear injustice of biological arguments used to justify political inequality. In calling for social and political reform based on the equality of the sexes, such women asserted both their status as political subjects and their place in history. In so doing, they called on the state to provide financial assistance to poorer pregnant women and to take action in the struggle against wage inequality and the doctrine of “separate spheres”. Women’s history in the 1970s, and gender history in the 1980s, precipitated the emergence of new approaches in the vast majority of academic fields. Gender inequalities – linked to themes such as masculinities, consent or sexual violence – are thus constitutive of history.
William English, Chase Walker, Dominic Simon
et al.
Natural language (NL) to temporal logic (TL) translation enables engineers to specify, verify, and enforce system behaviors without manually crafting formal specifications-an essential capability for building trustworthy autonomous systems. While existing NL-to-TL translation frameworks have demonstrated encouraging initial results, these systems either explicitly assume access to accurate atom grounding or suffer from low grounded translation accuracy. In this paper, we propose a framework for Grounding Natural Language Into System Signatures for Temporal Logic translation called GinSign. The framework introduces a grounding model that learns the abstract task of mapping NL spans onto a given system signature: given a lifted NL specification and a system signature $\mathcal{S}$, the classifier must assign each lifted atomic proposition to an element of the set of signature-defined atoms $\mathcal{P}$. We decompose the grounding task hierarchically -- first predicting predicate labels, then selecting the appropriately typed constant arguments. Decomposing this task from a free-form generation problem into a structured classification problem permits the use of smaller masked language models and eliminates the reliance on expensive LLMs. Experiments across multiple domains show that frameworks which omit grounding tend to produce syntactically correct lifted LTL that is semantically nonequivalent to grounded target expressions, whereas our framework supports downstream model checking and achieves grounded logical-equivalence scores of $95.5\%$, a $1.4\times$ improvement over SOTA.
With more than 11 times as many pageviews as the next largest edition, English Wikipedia dominates global knowledge access relative to other language editions. Readers are prone to assuming English Wikipedia as a superset of all language editions, leading many to prefer it even when their primary language is not English. Other language editions, however, comprise complementary facts rooted in their respective cultures and media environments, which are marginalized in English Wikipedia. While Wikipedia's user interface enables switching between language editions through its Interlanguage Link (ILL) system, it does not reveal to readers that other language editions contain valuable, complementary information. We present WikiGap, a system that surfaces complementary facts sourced from other Wikipedias within the English Wikipedia interface. Specifically, by combining a recent multilingual information-gap discovery method with a user-centered design, WikiGap enables access to complementary information from French, Russian, and Chinese Wikipedia. In a mixed-methods study (n=21), WikiGap significantly improved fact-finding accuracy, reduced task time, and received a 32-point higher usability score relative to Wikipedia's current ILL-based navigation system. Participants reported increased awareness of the availability of complementary information in non-English editions and reconsidered the completeness of English Wikipedia. WikiGap thus paves the way for improved epistemic equity across language editions.
The proliferation of NLP-powered language technologies, AI-based natural language generation models, and English as a mainstream means of communication among both native and non-native speakers make the output of AI-powered tools especially intriguing to linguists. This paper investigates how Grammarly and ChatGPT affect the English language regarding wordiness vs. conciseness. A case study focusing on the purpose subordinator in order to is presented to illustrate the way in which Grammarly and ChatGPT recommend shorter grammatical structures instead of longer and more elaborate ones. Although the analysed sentences were produced by native speakers, are perfectly correct, and were extracted from a language corpus of contemporary English, both Grammarly and ChatGPT suggest more conciseness and less verbosity, even for relatively short sentences. The present article argues that technologies such as Grammarly not only mirror language change but also have the potential to facilitate or accelerate it.
ABSTRACT Focusing on a growing English-learning trend in Taiwan, this study investigated EFL university students’ self-regulated language learning on YouTube outside of the classroom. Twenty university students who had substantial experience of watching YouTubers’ English-teaching videos were invited for an individual interview to bring to light their perceptions of this self-directed learning approach. Their responses were analyzed to provide insights into learners’ attitudes toward this technology-enhanced learning strategy and its impact on their learning of English. Results show that the most highlighted purposes for learning English on YouTube were to explore more learning resources, to seek the attraction of learning English, and to explore cultural knowledge. After viewing the videos on YouTube, the students were more likely to press like and share the videos with their friends. Moreover, learning English on YouTube was considered to be more flexible, more interesting, and more interactive than formal learning in the classroom; nevertheless, this informal learning approach was also deemed less effective for students who wanted to improve their English or prepare for English exams. Based on the results, this study concludes by highlighting the pedagogical implications of this research and proposing the complementary use of YouTubers’ English-teaching videos to classroom learning.
Willingness to communicate (WTC) in a foreign language is linked to a range of interacting learner-internal and learner-external variables. The present study identified the predictors of WTC of 210 foreign language learners of English from Spain. Multiple regression analyses revealed that the strongest (negative) predictor of WTC was foreign language classroom anxiety, while foreign language enjoyment and frequency of foreign language use by the teacher were positive predictors.
English, one of the most dominant languages, has undergone transformations and divergences that have created a variety of variations in different parts of the world. The fact is that English has more than 160 acknowledged variations of accents across the globe. Each variation from standard English to a distinctive local reflects its unique culture and history. This study aims to investigate Generation Z's attitudes towards variations of Englishes of their English as foreign language communication experiences. This research adopted qualitative research benefiting from Saraceni's (2010) Space, Culture, Ideology and Psychology (SCIP) model to understand variety of English(es). Four English literature students were selected as respondents when they were still in their 7th semester and aged 21-22 at an Islamic university (under Ministry of Religious Affairs) in East Java, Indonesia. The results revealed that the dominance of American English is still the benchmark in most participants' preferences. A number of competing and interconnected factors such as habits, motivations, and practices with the influence of family, social, educational, and environmental factors shape their preferences on English(es) varieties. The participants showed positive, contradictory attitudes (positive and negative) to negative attitude towards the varieties of Englishes.