Hasil untuk "English language"

Menampilkan 20 dari ~6566030 hasil · dari CrossRef, DOAJ, Semantic Scholar

JSON API
S2 Open Access 2018
Discourse Analysis

Margarethe Olbertz-Siitonen

. The informalization and conversationalization of public discourse in the latter part of the twentieth century is a complex phenomenon that has been well documented in English speaking societies. This article parts from Fairclough’s (1995) premises to analyse how this phenomenon is taking place in Romance languages with a similar technological development in comparison to the discourse practices of a country where the penetration of the Internet in the everyday life of its

4126 sitasi en
S2 Open Access 2016
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Melvin Johnson, M. Schuster, Quoc V. Le et al.

We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no changes to the model architecture from a standard NMT system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT systems using a single model. On the WMT’14 benchmarks, a single multilingual model achieves comparable performance for English→French and surpasses state-of-theart results for English→German. Similarly, a single multilingual model surpasses state-of-the-art results for French→English and German→English on WMT’14 and WMT’15 benchmarks, respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. Our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and also show some interesting examples when mixing languages.

2203 sitasi en Computer Science
S2 Open Access 2015
Improving Neural Machine Translation Models with Monolingual Data

Rico Sennrich, B. Haddow, Alexandra Birch

Neural Machine Translation (NMT) has obtained state-of-the art performance for several language pairs, while only using parallel data for training. Target-side monolingual data plays an important role in boosting fluency for phrase-based statistical machine translation, and we investigate the use of monolingual data for NMT. In contrast to previous work, which combines NMT models with separately trained language models, we note that encoder-decoder NMT architectures already have the capacity to learn the same information as a language model, and we explore strategies to train with monolingual data without changing the neural network architecture. By pairing monolingual training data with an automatic back-translation, we can treat it as additional parallel training data, and we obtain substantial improvements on the WMT 15 task English German (+2.8-3.7 BLEU), and for the low-resourced IWSLT 14 task Turkish->English (+2.1-3.4 BLEU), obtaining new state-of-the-art results. We also show that fine-tuning on in-domain monolingual and parallel data gives substantial improvements for the IWSLT 15 task English->German.

2911 sitasi en Computer Science
S2 Open Access 2013
Exploiting Similarities among Languages for Machine Translation

Tomas Mikolov, Quoc V. Le, I. Sutskever

Dictionaries and phrase tables are the basis of modern statistical machine translation systems. This paper develops a method that can automate the process of generating and extending dictionaries and phrase tables. Our method can translate missing word and phrase entries by learning language structures based on large monolingual data and mapping between languages from small bilingual data. It uses distributed representation of words and learns a linear mapping between vector spaces of languages. Despite its simplicity, our method is surprisingly effective: we can achieve almost 90% precision@5 for translation of words between English and Spanish. This method makes little assumption about the languages, so it can be used to extend and refine dictionaries and translation tables for any language pairs.

1638 sitasi en Computer Science
S2 Open Access 2023
Crosslingual Generalization through Multitask Finetuning

Niklas Muennighoff, Thomas Wang, Lintang Sutawika et al.

Multitask prompted finetuning (MTF) has been shown to help large language models generalize to new tasks in a zero-shot setting, but so far explorations of MTF have focused on English data and models. We apply MTF to the pretrained multilingual BLOOM and mT5 model families to produce finetuned variants called BLOOMZ and mT0. We find finetuning large multilingual language models on English tasks with English prompts allows for task genrealization to non-English languages that appear only in the pretraining corpus. Finetuning on multilingual tasks with English prompts further improves performance on English and non-English tasks leading to various state-of-the-art zero-shot results. We also investigate finetuning on multilingual tasks with prompts that have been machine-translated from English to match the language of each dataset. We find training on these machine-translated prompts leads to better performance on human-written prompts in the respective languages. Surprisingly, we find models are capable of zero-shot generalization to tasks in languages they have never intentionally seen. We conjecture that the models are learning higher-level capabilities that are both task- and language-agnostic. In addition, we introduce xP3, a composite of supervised datasets in 46 languages with English and machine-translated prompts. Our code, datasets and models are freely available at https://github.com/ bigscience-workshop/xmtf.

576 sitasi en Computer Science
S2 Open Access 2010
Quantitative Analysis of Culture Using Millions of Digitized Books

Erez Lieberman Aiden, Jean-Baptiste Michel

Linguistic and cultural changes are revealed through the analyses of words appearing in books. We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of ‘culturomics,’ focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. Culturomics extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.

2830 sitasi en History, Art
S2 Open Access 2003
Language Proficiency and Labour Market Performance of Immigrants in the UK

C. Dustmann, F. Fabbri

This paper uses two recent UK surveys to investigate the determinants of language proficiency and the effect of language on earnings and employment probabilities of non-white immigrants. We address the problem of endogenous choice of language acquisition and measurement error in language variables. Our results show that language acquisition, employment probabilities, as well as earnings differ widely across non-white immigrants, according to their ethnic origin. Language proficiency has a positive effect on employment probabilities, and lack of English fluency leads to earning losses.

871 sitasi en Political Science, Economics
DOAJ Open Access 2025
Multimodal AI and Large Language Models for Orthopantomography Radiology Report Generation and Q&A

Chirath Dasanayaka, Kanishka Dandeniya, Maheshi B. Dissanayake et al.

Access to high-quality dental healthcare remains a challenge in many countries due to limited resources, lack of trained professionals, and time-consuming report generation tasks. An intelligent clinical decision support system (ICDSS), which can make informed decisions based on past data, is an innovative solution to address these shortcomings while improving continuous patient support in dental healthcare. This study proposes a viable solution with the aid of multimodal artificial intelligence (AI) and large language models (LLMs), focusing on their application for generating orthopantomography radiology reports and answering questions in the dental domain. This work also discusses efficient adaptation methods of LLMs for specific language and application domains. The proposed system primarily consists of a Blip-2-based caption generator tuned on DPT images followed by a Llama 3 8B based LLM for radiology report generation. The performance of the entire system is evaluated in two ways. The diagnostic performance of the system achieved an overall accuracy of 81.3%, with specific detection rates of 87.9% for dental caries, 89.7% for impacted teeth, 88% for bone loss, and 81.8% for periapical lesions. Subjective evaluation of AI-generated radiology reports by certified dental professionals demonstrates an overall accuracy score of 7.5 out of 10. In addition, the proposed solution includes a question-answering platform in the native Sinhala language, alongside the English language, designed to function as a chatbot for dental-related queries. We hope that this platform will eventually bridge the gap between dental services and patients, created due to a lack of human resources. Overall, our proposed solution creates new opportunities for LLMs in healthcare by introducing a robust end-to-end system for the automated generation of dental radiology reports and enhancing patient interaction and awareness.

Technology, Applied mathematics. Quantitative methods
DOAJ Open Access 2025
Exploring out-of-class contexts in EFL learning: Romanian high school students' perspectives and practices

Elena Meștereagă, Daniel Dejica

This study explores the role of out-of-class contexts (OOCCs) in English as a Foreign Language (EFL) learning among Romanian high school students, emphasizing how informal environments complement formal instruction. Data from 125 students across four public schools reveals active engagement with English through media consumption, gaming, and social media, enhancing vocabulary, listening comprehension, and speaking confidence. Informal settings provide low-anxiety opportunities for meaningful communication and experimentation. Motivation plays a pivotal role, with students linking English proficiency to future opportunities. Personal learning ecologies, such as online reading and conversations with others, foster development within supportive spaces. The study advocates integrating OOCCs into formal instruction to bridge theoretical and practical knowledge, creating a holistic learning environment aligned with learners’ interests and real-world needs.

Language and Literature
DOAJ Open Access 2025
Unveiling Dark Web Identity Patterns: A Network-Based Analysis of Identification Types and Communication Channels in Illicit Activities

Luis de-Marcos, Adrián Domínguez-Díaz, Javier Junquera-Sánchez et al.

The Dark Web, a hidden segment of the internet, has become a hub for illicit activities, facilitated by various forms of digital identification (IDs) such as email addresses, Telegram accounts, and cryptocurrency wallets. This study conducts a comprehensive analysis of the Dark Web’s identification and communication patterns, focusing on the roles of different ID types and their associated activities. Using a dataset of Dark Web documents, we construct and analyze a bipartite network to model the relationships between IDs and web documents, employing graph–theoretical metrics such as degree centrality, closeness centrality, betweenness centrality, and k-core decomposition, while analyzing subnetworks formed by ID type. Our findings reveal that Telegram forms the backbone of the network, serving as the primary communication tool for hacking-related activities, particularly within Russian-speaking communities. In contrast, email plays a more decentralized role, facilitating finance–crypto and other activities but with a high level of fragmentation and English as the predominant language. XMR (Monero) wallets emerge as a key component in financial transactions, forming a cohesive subnetwork focused on cryptocurrency-related activities. The analysis also highlights the modular and hierarchical nature of the Dark Web, with distinct clusters for hacking, finance–crypto, and drugs–narcotics, often operating independently but with some cross-topic interactions. This study provides a foundation for understanding the Dark Web’s structure and dynamics, offering insights that can inform strategies for monitoring and mitigating its risks.

Information technology

Halaman 29 dari 328302