Hasil untuk "English literature"

Menampilkan 20 dari ~9553290 hasil · dari CrossRef, arXiv, DOAJ, Semantic Scholar

JSON API
arXiv Open Access 2026
English translation of Sophie Kowalevski's "On the problem of the rotation of a rigid body about a fixed point"

Sophie Kowalevski

This is an English translation and digitisation of Sophie Kowalevski's (also know as Sofya Kovalevskaya) paper on what is now known as the Kovalevskaya Top. The original paper was written in French and published in Vol 12 of Acta Mathematica in 1889 with the title "Sur le probleme de la rotation d'un corps solide autour d'un point fixe".

en math.HO, math.DS
arXiv Open Access 2026
Progressive Training for Explainable Citation-Grounded Dialogue: Reducing Hallucination to Zero in English-Hindi LLMs

Vedant Pandya

Knowledge-grounded dialogue systems aim to generate informative, contextually relevant responses by conditioning on external knowledge sources. However, most existing approaches focus exclusively on English, lack explicit citation mechanisms for verifying factual claims, and offer limited transparency into model decision-making. We present XKD-Dial, a progressive four-stage training pipeline for explainable, knowledge-grounded dialogue generation in a bilingual (English-Hindi) setting, comprising: (1) multilingual adaptation, (2) English dialogue SFT with citation grounding, (3) bilingual dialogue SFT, and (4) GRPO alignment with citation-aware rewards. We evaluate six models spanning encoder-decoder (250M-3B) and decoder-only (1B-7B) architectures at every pipeline stage. Our key contributions are: (i) three post-hoc explainability analyses - cross-attention alignment, Integrated Gradients attribution, and occlusion-based causal grounding - applied systematically across the training trajectory to reveal how citation behaviour is learned, not only whether it is learned; (ii) citation-grounded SFT reduces hallucination to 0.0% for encoder-decoder models from Stage 2 onward; (iii) the progressive pipeline prevents catastrophic forgetting while improving Hindi capabilities; (iv) smaller models match larger models on English after SFT; and (v) GRPO provides marginal improvement over well-designed SFT for structured citation tasks. We evaluate across six automatic metrics (BLEU, ROUGE, BERTScore, FactScore, Citation-F1, and hallucination rate).

en cs.CL, cs.AI
arXiv Open Access 2026
Indirect Question Answering in English, German and Bavarian: A Challenging Task for High- and Low-Resource Languages Alike

Miriam Winkler, Verena Blaschke, Barbara Plank

Indirectness is a common feature of daily communication, yet is underexplored in NLP research for both low-resource as well as high-resource languages. Indirect Question Answering (IQA) aims at classifying the polarity of indirect answers. In this paper, we present two multilingual corpora for IQA of varying quality that both cover English, Standard German and Bavarian, a German dialect without standard orthography: InQA+, a small high-quality evaluation dataset with hand-annotated labels, and GenIQA, a larger training dataset, that contains artificial data generated by GPT-4o-mini. We find that IQA is a pragmatically hard task that comes with various challenges, based on several experiment variations with multilingual transformer models (mBERT, XLM-R and mDeBERTa). We suggest and employ recommendations to tackle these challenges. Our results reveal low performance, even for English, and severe overfitting. We analyse various factors that influence these results, including label ambiguity, label set and dataset size. We find that the IQA performance is poor in high- (English, German) and low-resource languages (Bavarian) and that it is beneficial to have a large amount of training data. Further, GPT-4o-mini does not possess enough pragmatic understanding to generate high-quality IQA data in any of our tested languages.

en cs.CL
S2 Open Access 2020
Global Englishes and language teaching: A review of pedagogical research

H. Rose, Jim McKinley, Nicola Galloway

Abstract The rise of English as a global language has led scholars to call for a paradigm shift in the field of English language teaching (ELT) to match the new sociolinguistic landscape of the twenty-first century. In recent years a considerable amount of classroom-based research and language teacher education (LTE) research has emerged to investigate these proposals in practice. This paper outlines key proposals for change in language teaching from the related fields of World Englishes (WE), English as a lingua franca (ELF), English as an international language (EIL), and Global Englishes, and critically reviews the growing body of pedagogical research conducted within these domains. Adopting the methodology of a systematic review, 58 empirical articles published between 2010 and 2020 were shortlisted, of which 38 were given an in-depth critical review and contextualized within a wider body of literature. Synthesis of classroom research suggests a current lack of longitudinal designs, an underuse of direct measures to explore the effects of classroom interventions, and under-representation of contexts outside of university language classrooms. Synthesis of teacher education research suggests future studies need to adopt more robust methodological designs which measure the effects of Global Englishes content on teacher beliefs and pedagogical practices both before and throughout the programme, and after teachers return to the classroom.

171 sitasi en Sociology
arXiv Open Access 2025
Beyond English: Unveiling Multilingual Bias in LLM Copyright Compliance

Yupeng Chen, Xiaoyu Zhang, Yixian Huang et al.

Large Language Models (LLMs) have raised significant concerns regarding the fair use of copyright-protected content. While prior studies have examined the extent to which LLMs reproduce copyrighted materials, they have predominantly focused on English, neglecting multilingual dimensions of copyright protection. In this work, we investigate multilingual biases in LLM copyright protection by addressing two key questions: (1) Do LLMs exhibit bias in protecting copyrighted works across languages? (2) Is it easier to elicit copyrighted content using prompts in specific languages? To explore these questions, we construct a dataset of popular song lyrics in English, French, Chinese, and Korean and systematically probe seven LLMs using prompts in these languages. Our findings reveal significant imbalances in LLMs' handling of copyrighted content, both in terms of the language of the copyrighted material and the language of the prompt. These results highlight the need for further research and development of more robust, language-agnostic copyright protection mechanisms to ensure fair and consistent protection across languages.

en cs.CY, cs.CL
arXiv Open Access 2025
ArzEn-MultiGenre: An aligned parallel dataset of Egyptian Arabic song lyrics, novels, and subtitles, with English translations

Rania Al-Sabbagh

ArzEn-MultiGenre is a parallel dataset of Egyptian Arabic song lyrics, novels, and TV show subtitles that are manually translated and aligned with their English counterparts. The dataset contains 25,557 segment pairs that can be used to benchmark new machine translation models, fine-tune large language models in few-shot settings, and adapt commercial machine translation applications such as Google Translate. Additionally, the dataset is a valuable resource for research in various disciplines, including translation studies, cross-linguistic analysis, and lexical semantics. The dataset can also serve pedagogical purposes by training translation students and aid professional translators as a translation memory. The contributions are twofold: first, the dataset features textual genres not found in existing parallel Egyptian Arabic and English datasets, and second, it is a gold-standard dataset that has been translated and aligned by human experts.

arXiv Open Access 2025
Code-Mixed Telugu-English Hate Speech Detection

Santhosh Kakarla, Gautama Shastry Bulusu Venkata

Hate speech detection in low-resource languages like Telugu is a growing challenge in NLP. This study investigates transformer-based models, including TeluguHateBERT, HateBERT, DeBERTa, Muril, IndicBERT, Roberta, and Hindi-Abusive-MuRIL, for classifying hate speech in Telugu. We fine-tune these models using Low-Rank Adaptation (LoRA) to optimize efficiency and performance. Additionally, we explore a multilingual approach by translating Telugu text into English using Google Translate to assess its impact on classification accuracy. Our experiments reveal that most models show improved performance after translation, with DeBERTa and Hindi-Abusive-MuRIL achieving higher accuracy and F1 scores compared to training directly on Telugu text. Notably, Hindi-Abusive-MuRIL outperforms all other models in both the original Telugu dataset and the translated dataset, demonstrating its robustness across different linguistic settings. This suggests that translation enables models to leverage richer linguistic features available in English, leading to improved classification performance. The results indicate that multilingual processing can be an effective approach for hate speech detection in low-resource languages. These findings demonstrate that transformer models, when fine-tuned appropriately, can significantly improve hate speech detection in Telugu, paving the way for more robust multilingual NLP applications.

en cs.CL
DOAJ Open Access 2025
Comparison of efficacy and safety of different types of electrical stimulation for shoulder subluxation after acute stroke: protocol for a systematic review and network meta-analysis of randomised controlled trials

Linlin Zhang, Qiang Chen, Linlin Li et al.

Introduction Glenohumeral subluxation (GHS) is a common rehabilitation challenge in the hemiplegic upper limb following stroke, potentially leading to shoulder pain, secondary brachial plexus injury and various other complications. While electrical stimulation therapies, such as electromyography biofeedback, electroacupuncture and neuromuscular electrical stimulation, have shown promise in managing GHS, some controversy remains. Although clinical trials and meta-analyses have confirmed the efficacy of these therapies, healthcare professionals have yet to reach a consensus on which specific therapy is most effective for reducing shoulder subluxation (SS), alleviating pain and improving quality of life. This study will perform a network meta-analysis to compare the relative efficacy of different electrical stimulation therapies for treating GHS in patients following acute stroke.Methods and analysis We will systematically search the following databases: PubMed, MEDLINE, Embase, Cochrane Library, Web of Science, Chinese biomedical literature database (SinoMed), Wanfang databases (Wanfang), VIP Journal Integration Platform (VIP) and China National Knowledge Infrastructure (CNKI). Our search will cover the period from the inception of each database until 14 April 2025, and will be restricted to studies published in Chinese or English. The primary outcomes of interest will be the degree of improvement in SS, improvements in quality of life and reductions in pain. We will conduct pairwise meta-analyses using RevMan V.5.3 (The Cochrane Collaboration, Copenhagen, Denmark), and network meta-analyses using ADDIS V.1.16.6 (Drugis, Groningen, The Netherlands) and Stata V.16.0 (StataCorp, College Station, Texas, USA) to compare the relative efficacy of different electrical stimulation therapies. Screening, data extraction, risk of bias assessment and evaluation of the certainty of evidence will all be performed independently by two reviewers to ensure accuracy and reliability. The risk of bias within individual studies will be assessed using the Cochrane Risk of Bias 2 (ROB 2) tool, and the certainty of evidence will be evaluated using the Grading of Recommendations Assessment, and Evaluation (GRADE) and Confidence in Network Meta-Analysis (CINeMA) frameworks to ensure transparency and methodological rigour.Ethics and dissemination Ethical approval is not required for this study. The findings will be submitted to a peer-reviewed journal or conference.PROSPERO registration number CRD42024541228.

arXiv Open Access 2024
The role of inhibitory control in garden-path sentence processing: A Chinese-English bilingual perspective

Xiaohui Rao, Haoze Li, Xiaofang Lin et al.

In reading garden-path sentences, people must resolve competing interpretations, though initial misinterpretations can linger despite reanalysis. This study examines the role of inhibitory control (IC) in managing these misinterpretations among Chinese-English bilinguals. Using self-paced reading tasks, we investigated how IC influences recovery from garden-path sentences in Chinese (L1) and its interaction with language proficiency during English (L2) processing. Results indicate that IC does not affect garden-path recovery in Chinese, suggesting reliance on semantic context may reduce the need for IC. In contrast, findings for English L2 learners reveal a complex relationship between language proficiency and IC: Participants with low L2 proficiency but high IC showed lingering misinterpretations, while those with high proficiency exhibited none. These results support and extend the Model of Cognitive Control (Ness et al., 2023). Moreover, our comparison of three Stroop task versions identifies L1 colour-word Stroop task as the preferred measure of IC in bilingual research.

en cs.CL
arXiv Open Access 2024
Detection of Non-recorded Word Senses in English and Swedish

Jonathan Lautenschlager, Emma Sköldberg, Simon Hengchen et al.

This study addresses the task of Unknown Sense Detection in English and Swedish. The primary objective of this task is to determine whether the meaning of a particular word usage is documented in a dictionary or not. For this purpose, sense entries are compared with word usages from modern and historical corpora using a pre-trained Word-in-Context embedder that allows us to model this task in a few-shot scenario. Additionally, we use human annotations on the target corpora to adapt hyperparameters and evaluate our models using 5-fold cross-validation. Compared to a random sample from a corpus, our model is able to considerably increase the detected number of word usages with non-recorded senses.

en cs.CL
arXiv Open Access 2024
Advancing Speech Translation: A Corpus of Mandarin-English Conversational Telephone Speech

Shannon Wotherspoon, William Hartmann, Matthew Snover

This paper introduces a set of English translations for a 123-hour subset of the CallHome Mandarin Chinese data and the HKUST Mandarin Telephone Speech data for the task of speech translation. Paired source-language speech and target-language text is essential for training end-to-end speech translation systems and can provide substantial performance improvements for cascaded systems as well, relative to training on more widely available text data sets. We demonstrate that fine-tuning a general-purpose translation model to our Mandarin-English conversational telephone speech training set improves target-domain BLEU by more than 8 points, highlighting the importance of matched training data.

en eess.AS, cs.CL
DOAJ Open Access 2024
Childhood trauma, PTSD/CPTSD and chronic pain: A systematic review.

Maria Karimov-Zwienenberg, Wilfried Symphor, William Peraud et al.

<h4>Background</h4>Despite the growing body of literature on posttraumatic stress disorder (PTSD) and chronic pain comorbidity, studies taking into account the role of childhood exposure to traumatic and adverse events remains minimal. Additionally, it has been well established that survivors of childhood trauma may develop more complex reactions that extend beyond those observed in PTSD, typically categorized as complex trauma or CPTSD. Given the recent introduction of CPTSD within diagnostic nomenclature, the aim of the present study is to describe associations between childhood trauma in relation to PTSD/CPTSD and pain outcomes in adults with chronic pain.<h4>Methods</h4>Following PRSIMA guidelines, a systematic review was performed using the databases Pubmed, PsychInfo, Psychology and Behavioral Sciences Collection, and Web of Science. Articles in English or French that reported on childhood trauma, PTSD/CPTSD and pain outcomes in individuals with chronic pain were included. Titles and abstracts were screened by two authors independently and full texts were consequently evaluated and assessed on methodological quality using JBI checklist tools. Study design and sample characteristics, childhood trauma, PTSD/CPTSD, pain outcomes as well as author's recommendations for scientific research and clinical practice were extracted for analyses.<h4>Results</h4>Of the initial 295 search records, 13 studies were included in this review. Only four studies explicitly assessed links between trauma factors and pain symptoms in individuals with chronic pain. Findings highlight the long-term and complex impact of cumulative childhood maltreatment (e.g., abuse and neglect) on both PTSD/CPTSD and chronic pain outcomes in adulthood.<h4>Conclusion</h4>This review contributes to current conceptual models of PTSD and chronic pain comorbidity, while adding to the role of childhood trauma and CPTSD. The need for clinical and translational pain research is emphasized to further support specialized PTSD/CPTSD treatment as well as trauma-informed pain management in routine care.

Medicine, Science
arXiv Open Access 2023
Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts

Xuan-Phi Nguyen, Sharifah Mahani Aljunied, Shafiq Joty et al.

Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars. However, in low-resource languages, obtaining such hand-picked exemplars can still be challenging, where unsupervised techniques may be necessary. Moreover, competent generative capabilities of LLMs are observed only in high-resource languages, while their performances among under-represented languages fall behind due to pre-training data imbalance. To elicit LLMs' ability onto low-resource languages without any supervised data, we propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English. These prompts are then used to create intra-lingual exemplars to perform tasks in the target languages. Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages. We also show that fine-tuning a 7B model on data generated from our method helps it perform competitively with a 175B model. In non-English translation tasks, our method even outperforms supervised prompting by up to 3 chrF++ in many low-resource languages. When evaluated on zero-shot multilingual summarization, our method surpasses other English-pivoting baselines by up to 4 ROUGE-L and is also favored by GPT-4.

en cs.CL, cs.AI
DOAJ Open Access 2023
Patient-Reported Outcomes of Kinematic vs Mechanical Alignment in Total Knee Arthroplasty: A Systematic Review and Meta-analysis of Randomized Controlled Trials

Adithya Shekhar, MD, Danton Dungy, MD, Susan L. Stewart, PhD et al.

Background: Total knee arthroplasty (TKA) is an effective treatment method for severe osteoarthritis of the knee. Poor alignment of a knee replacement has been associated with suboptimal clinical results. Traditionally, mechanical alignment (MA) has been considered the gold standard. In light of reports of decreased satisfaction with TKA, a new technique called kinematic alignment (KA) has been developed. The purpose of this study is to (1) review the results of KA and MA for TKA in randomized controlled trials based on the Western Ontario and McMaster Universities Arthritis Index score, the Oxford Knee Score, and the Knee Society Scores, (2) perform a meta-analyses of the randomized controlled trials with baseline and follow-up values of these parameters, and (3) discuss other shortcomings of this literature from the perspective of study design and execution. Methods: Two independent reviewers performed a systematic review of the English literature using the Embase, Scopus, and PubMed databases searching for randomized controlled trials of MA vs KA in TKA. Of the initial 481 published reports, 6 studies were included in the final review for meta-analysis. The individual studies were then analyzed to evaluate for risks of bias and inconsistencies of methodology. Results: A majority of studies demonstrated low risk of bias. All studies had fundamental technical issues by utilizing different techniques to achieve KA vs MA. There was no significant difference between KA and MA in these studies. Conclusions: There is no significant difference in any outcomes measured between KA and MA in TKA. Both statistical and methodological factors diminish the value of these conclusions.

Orthopedic surgery
DOAJ Open Access 2023
Use of Patient-Reported Outcome Measures in Lower Extremity Research

Yongni Zhang, Yaning Zang, Jiayi Ren et al.

# Background A large number of patient reported outcome measures (PROMs) have been developed for specific lower extremity orthopaedic pathologies. However, a consensus as to which PROMs are recommended for use in evaluating treatment outcomes for patients with hip, knee, ankle and/or foot pathology based on the strength of their psychometric properties is lacking. # Objective To identify PROMs that are recommended in systematic reviews (SRs) for those with orthopaedic hip, knee, foot, and ankle pathologies or surgeries and identify if these PROMs are used in the literature. # Study design Umbrella Review # Methods PubMed, Embase, Medline, Cochrane, CINAHL, SPORTDisucs and Scopus were searched for SRs through May 2022. A second search was done to count the use of PROMs in seven representative journals from January 2011 through May 2022. SRs that recommended the use of PROMs based on their psychometric properties were included in the first search. SRs or PROMs not available in the English were excluded. The second search included clinical research articles that utilized a PROM. Case reports, reviews, and basic science articles were excluded. # Results Nineteen SRs recommended 20 PROMs for 15 lower extremity orthopaedic pathologies or surgeries. These results identified consistency between recommended PROMs and utilization in clinical research for only two of the 15 lower extremity pathologies or surgeries. This included the use of the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) and the Copenhagen Hip and Groin Outcome Score to assess outcomes (HAGOS) for those with knee osteoarthritis and groin pain, respectively. # Conclusion A discrepancy was found between the PROMs that were recommended by SRs and those used to assess clinical outcomes in published research. The results of this study will help to produce more uniformity with the use of PROMs that have the most appropriate psychometric properties when the reporting treatment outcomes for those with extremity pathologies. # Level of evidence 3a

Sports medicine
DOAJ Open Access 2023
Validação de um Questionário de Levantamento de Uso de Línguas para Usuários de Inglês como L2 Imersos em Contexto Brasileiro

Marcus Guilherme Valadares, Ricardo Augusto de Souza, Juliane Venturelli Silva Lima

A multiplicidade de experiências e contextos de aquisição de L2 desafiam os estudos de bilinguismo, visto que as diferenças individuais complexificam a comparabilidade entre estudos (Treffers-Daller; Korybsky, 2015). Nesse contexto, o construto dominância linguística torna-se uma variável importante para a compreensão desse fenômeno em diferentes domínios e o desenvolvimento de ferramentas eficazes para sua aferição um aspecto fundamental nesse processo (Gertken et al., 2014). Neste artigo, apresentamos os resultados da validação do Questionário de Levamento de Uso de Línguas para usuários de inglês como L2 imersos em contexto brasileiro em correlação com um marcador explícito de aprendizagem de inglês de mensuração de amplitude lexical. A concepção de dominância linguística, tal como definida por Heredia (1997), e o Princípio da Complementaridade (CP), de Grosjean (1998, 2016), foram usados para operacionalizar o construto em termos de frequência e de domínios específicos de uso, servindo de base para elaboração do questionário. A proficiência, por sua vez, foi mensurada a partir de um teste de amplitude lexical em inglês – Vocabulary Size Test (Nation; Beglar, 2007). A partir de dados de 784 participantes, foram calculadas as correlações de Pearson entre os 11 itens do questionário, referentes às práticas de uso de inglês e português, entre si, e deles com a nota obtida no Vocabulary Size Test. As análises estatísticas mostram correlação significativa entre os itens de prática de uso de língua inglesa e deles com a nota do VST, e evidenciam que os respondentes com maior envolvimento em atividades em inglês tendem a ter melhor resultado no teste de proficiência. Essas correlações são interpretadas como um aspecto chave de validade externa do construto operacionalizado pelo questionário, permitindo-nos apresentá-lo à comunidade científica como um instrumento eficaz para a detecção e aferição de alterações na dominância da L1 em favor do uso da L2 entre usuários do inglês como L2

Language and Literature, English literature
DOAJ Open Access 2023
Food insecurity among Asian Americans: A scoping review protocol

Suji Ro, Nhat-Ha Pham, Victoria N. Huynh et al.

<h4>Introduction</h4> Food insecurity is prevalent in the U.S. and is associated with deleterious health, behavioral, and social consequences. Food insecurity is currently addressed largely through public and private food assistance programs (e.g., the Supplementary Nutrition Assistance Program, and food pantries). A body of research has explored racial and ethnic disparities and differences in food insecurity and coping strategies. However, limited literature has explored these experiences among Asian Americans and Asian origin groups in the United States. <h4>Objective</h4> The aim of this review is to establish what is known about the experience of food insecurity and nutrition program participation in the Asian American population and among Asian origin groups and to suggest further research and policy action to better address food insecurity in this population. <h4>Methods</h4> Our review is guided by the methodological framework proposed by Arksey and O’Malley and refined and outlined by Levac and colleagues and the Joanna Briggs Institute. We will search key terms related to food insecurity and Asian Americans in Medline (Ovid), the Cochrane Library (Wiley), CINAHL Plus with Full Text (Ebsco), PsycINFO (Ebsco), and Scopus (Elsevier). An article will be included if it was published in the English language; is a peer reviewed research manuscript and reports primary research findings from analyses; and describes food insecurity or strategies to cope with food insecurity among individuals of Asian origins living in the U.S. An article will be excluded if it is a book, conference proceedings, or grey literature (e.g., thesis or dissertation); is a commentary, editorial, or opinion piece without primary research data; contains only research conducted outside of the U.S.; includes Asians in the sample but does not provide separate data on food insecurity or strategies to cope with food insecurity among Asians; and describes only dietary changes or patterns but not food insecurity. Two or more reviewers will participate in the study screening and selection process. We will record information from the final articles chosen to be included in the review in a data table template and will also prepare a summary narrative with key findings. <h4>Expected outputs</h4> Results will be disseminated through peer-reviewed publications and conference presentations. The findings from this review will be of interest to researchers and practitioners and inform further research and policy to better address food insecurity among this population.

Medicine, Science
DOAJ Open Access 2023
What is a ‘rare’ language in translation? The experience of distance reading

Svetlana Yu. Bochaver, Ekaterina V. Tereshko

This article examines the perception of ‘rare’ and ‘common’ languages through literary translations. The study is based on the materials from De Bezige Bij Publishing House in the Netherlands, comparing the periods of 2010—2013 and 2020—2023. A significant increase in the role of translators is reflected in the rise of translation share in the publishing house. There is an observed growth in the number of source languages for translation, with a dec­rease in the proportion of English. Translations from French, Italian, German, Scandinavian languages, Portuguese, and Japanese have emerged. A comparison with the Polyandria Rus­sian Publishing House during the period of 2020—2023 reveals common and distinct source lan­guages. Both publishers translate literature into Danish, Finnish, and French to a similar extent. The Russian publishing house represents Norwegian and Japanese to a greater extent, while the Dutch publishing house releases more translations from German, Swedish, Turkish, and Italian. The Russian publisher also includes Icelandic, Albanian, Korean, and Croatian, while the Dutch publisher includes Hebrew, Romanian, and Portuguese. Both publishers en­com­pass a total of 20 source languages, which is a small number compared to the global lin­guistic diversity. Comparing the volumes of source languages also indicates diffe­ren­ces in pre­ferences. Central European languages are chosen in the Netherlands, while Nor­wegian and Ice­landic are favored in Russia. These differences may be influenced by the cost of rights to works, editorial preferences, and translator availability. The analysis results indicate that neither typological similarity between the source language and the target language, nor association with a specific language group, influences the preference for translating books from a particular language. This highlights the importance of sociocultural factors.

Philology. Linguistics
arXiv Open Access 2022
PETCI: A Parallel English Translation Dataset of Chinese Idioms

Kenan Tang

Idioms are an important language phenomenon in Chinese, but idiom translation is notoriously hard. Current machine translation models perform poorly on idiom translation, while idioms are sparse in many translation datasets. We present PETCI, a parallel English translation dataset of Chinese idioms, aiming to improve idiom translation by both human and machine. The dataset is built by leveraging human and machine effort. Baseline generation models show unsatisfactory abilities to improve translation, but structure-aware classification models show good performance on distinguishing good translations. Furthermore, the size of PETCI can be easily increased without expertise. Overall, PETCI can be helpful to language learners and machine translation systems.

en cs.CL, cs.LG

Halaman 31 dari 477665