Joshua Chiou, Ryan J. Geusz, Mei-Lin Okino et al.
Hasil untuk "Translating and interpreting"
Menampilkan 20 dari ~137168 hasil · dari CrossRef, DOAJ, arXiv, Semantic Scholar
Maria Kunilovskaya, Christina Pollkläsener
This paper introduces an updated and combined version of the bidirectional English-German EPIC-UdS (spoken) and EuroParl-UdS (written) corpora containing original European Parliament speeches as well as their translations and interpretations. The new version corrects metadata and text errors identified through previous use, refines the content, updates linguistic annotations, and adds new layers, including word alignment and word-level surprisal indices. The combined resource is designed to support research using information-theoretic approaches to language variation, particularly studies comparing written and spoken modes, and examining disfluencies in speech, as well as traditional translationese studies, including parallel (source vs. target) and comparable (original vs. translated) analyses. The paper outlines the updates introduced in this release, summarises previous results based on the corpus, and presents a new illustrative study. The study validates the integrity of the rebuilt spoken data and evaluates probabilistic measures derived from base and fine-tuned GPT-2 and machine translation models on the task of filler particles prediction in interpreting.
Avishkar Saha, Oscar Alejandro Mendez Maldonado, Chris Russell et al.
We approach instantaneous mapping, converting images to a top-down view of the world, as a translation problem. We show how a novel form of transformer network can be used to map from images and video directly to an overhead map or bird's-eye-view (BEV) of the world, in a single end-to-end network. We assume a 1–1 correspondence between a vertical scanline in the image, and rays passing through the camera location in an overhead map. This lets us formulate map generation from an image as a set of sequence-to-sequence translations. Posing the problem as translation allows the network to use the context of the image when interpreting the role of each pixel. This constrained formulation, based upon a strong physical grounding of the problem, leads to a restricted transformer network that is convolutional in the horizontal direction only. The structure allows us to make efficient use of data when training, and obtains state-of-the-art results for instantaneous mapping of three large-scale datasets, including a 15% and 30% relative gain against existing best performing methods on the nuScenes and Argoverse datasets, respectively.
Sergi Alvarez-Vidal
El auge de Internet y la evolución de las tecnologías del lenguaje, en particular los sistemas de traducción automática y los modelos masivos de lenguaje, han transformado de manera radical las condiciones de acceso al conocimiento, la producción textual y la circulación global de información. Estas innovaciones han facilitado la comunicación multilingüe, pero también han reforzado dinámicas de concentración lingüística que pueden afectar negativamente la visibilidad, el prestigio y la funcionalidad de las lenguas minoritarias, generando lo que se ha denominado “brecha lingüística digital”. Este fenómeno pone en riesgo la sostenibilidad de dichas lenguas en el ecosistema digital, al limitar su presencia en plataformas tecnológicas y restringir su capacidad de adaptación a los nuevos entornos comunicativos. El artículo analiza críticamente el impacto de esta brecha sobre la diversidad lingüística y examina el papel del activismo lingüístico digital como forma de acción colectiva orientada a contrarrestar estas asimetrías. En este marco, se estudia el caso del catalán, a través de dos iniciativas emblemáticas: la Viquipèdia y Softcatalà, una organización sin ánimo de lucro pionera en el desarrollo de tecnologías de la lengua en catalán. Ambas propuestas ilustran cómo el compromiso sostenido de comunidades lingüísticas puede traducirse en la creación de infraestructuras digitales, recursos lingüísticos abiertos y espacios de uso significativo en línea, más allá de las políticas gubernamentales.
Haotian Tan, Hiroki Ouchi, Sakriani Sakti
How to make human-interpreter-like read/write decisions for simultaneous speech translation (SimulST) systems? Current state-of-the-art systems formulate SimulST as a multi-turn dialogue task, requiring specialized interleaved training data and relying on computationally expensive large language model (LLM) inference for decision-making. In this paper, we propose SimulSense, a novel framework for SimulST that mimics human interpreters by continuously reading input speech and triggering write decisions to produce translation when a new sense unit is perceived. Experiments against two state-of-the-art baseline systems demonstrate that our proposed method achieves a superior quality-latency tradeoff and substantially improved real-time efficiency, where its decision-making is up to 9.6x faster than the baselines.
Yunlong Liang, Fandong Meng, Jiaan Wang et al.
The challenge of slang translation lies in capturing context-dependent semantic extensions, as slang terms often convey meanings beyond their literal interpretation. While slang detection, explanation, and translation have been studied as isolated tasks in the era of large language models (LLMs), their intrinsic interdependence remains underexplored. The main reason is lacking of a benchmark where the two tasks can be a prerequisite for the third one, which can facilitate idiomatic translation. In this paper, we introduce the interpretative slang translation task (named SlangDIT) consisting of three sub-tasks: slang detection, cross-lingual slang explanation, and slang translation within the current context, aiming to generate more accurate translation with the help of slang detection and slang explanation. To this end, we construct a SlangDIT dataset, containing over 25k English-Chinese sentence pairs. Each source sentence mentions at least one slang term and is labeled with corresponding cross-lingual slang explanation. Based on the benchmark, we propose a deep thinking model, named SlangOWL. It firstly identifies whether the sentence contains a slang, and then judges whether the slang is polysemous and analyze its possible meaning. Further, the SlangOWL provides the best explanation of the slang term targeting on the current context. Finally, according to the whole thought, the SlangOWL offers a suitable translation. Our experiments on LLMs (\emph{e.g.}, Qwen2.5 and LLama-3.1), show that our deep thinking approach indeed enhances the performance of LLMs where the proposed SLangOWL significantly surpasses the vanilla models and supervised fine-tuned models without thinking.
Ling Liao, Eva Aagaard
Current research efforts largely focus on employing at most one interpretable method to elucidate machine learning (ML) model performance. However, significant barriers remain in translating these interpretability techniques into actionable insights for clinicians, notably due to complexities such as variability across clinical settings and the Rashomon effect. In this study, we developed and rigorously evaluated two ML models along with interpretation mechanisms, utilizing data from 131,051 ICU admissions across 208 hospitals in the United States, sourced from the eICU Collaborative Research Database. We examined two datasets: one with imputed missing values (130,810 patients, 5.58% ICU mortality) and another excluding patients with missing data (5,661 patients, 23.65% ICU mortality). The random forest (RF) model demonstrated an AUROC of 0.912 with the first dataset and 0.839 with the second dataset, while the XGBoost model achieved an AUROC of 0.924 with the first dataset and 0.834 with the second dataset. Consistently identified predictors of ICU mortality across datasets, cross-validation folds, models, and explanation mechanisms included lactate levels, arterial pH, body temperature, and others. By aligning with routinely collected clinical variables, this study aims to enhance ML model interpretability for clinical use, promote greater understanding and adoption among clinicians, and ultimately contribute to improved patient outcomes.
Matthias Sperber, Maureen de Seyssel, Jiajun Bao et al.
Current speech translation systems, while having achieved impressive accuracies, are rather static in their behavior and do not adapt to real-world situations in ways human interpreters do. In order to improve their practical usefulness and enable interpreting-like experiences, a precise understanding of the nature of human interpreting is crucial. To this end, we discuss human interpreting literature from the perspective of the machine translation field, while considering both operational and qualitative aspects. We identify implications for the development of speech translation systems and argue that there is great potential to adopt many human interpreting principles using recent modeling techniques. We hope that our findings provide inspiration for closing the perceived usability gap, and can motivate progress toward true machine interpreting.
Ilaria Tipà
Sofia Malamatidou
Riccardo Moratto
Sofia Malamatidou
Sofia Malamatidou
Dario Matteo Sparanero
Riccardo Moratto
Han Wang
Stefano Perrella, Lorenzo Proietti, Pere-Lluís Huguet Cabot et al.
Machine Translation (MT) evaluation metrics assess translation quality automatically. Recently, researchers have employed MT metrics for various new use cases, such as data filtering and translation re-ranking. However, most MT metrics return assessments as scalar scores that are difficult to interpret, posing a challenge to making informed design choices. Moreover, MT metrics' capabilities have historically been evaluated using correlation with human judgment, which, despite its efficacy, falls short of providing intuitive insights into metric performance, especially in terms of new metric use cases. To address these issues, we introduce an interpretable evaluation framework for MT metrics. Within this framework, we evaluate metrics in two scenarios that serve as proxies for the data filtering and translation re-ranking use cases. Furthermore, by measuring the performance of MT metrics using Precision, Recall, and F-score, we offer clearer insights into their capabilities than correlation with human judgments. Finally, we raise concerns regarding the reliability of manually curated data following the Direct Assessments+Scalar Quality Metrics (DA+SQM) guidelines, reporting a notably low agreement with Multidimensional Quality Metrics (MQM) annotations.
Kosuke Doi, Yuka Ko, Mana Makinae et al.
This paper analyzes the features of monotonic translations, which follow the word order of the source language, in simultaneous interpreting (SI). Word order differences are one of the biggest challenges in SI, especially for language pairs with significant structural differences like English and Japanese. We analyzed the characteristics of chunk-wise monotonic translation (CMT) sentences using the NAIST English-to-Japanese Chunk-wise Monotonic Translation Evaluation Dataset and identified some grammatical structures that make monotonic translation difficult in English-Japanese SI. We further investigated the features of CMT sentences by evaluating the output from the existing speech translation (ST) and simultaneous speech translation (simulST) models on the NAIST English-to-Japanese Chunk-wise Monotonic Translation Evaluation Dataset as well as on existing test sets. The results indicate the possibility that the existing SI-based test set underestimates the model performance. The results also suggest that using CMT sentences as references gives higher scores to simulST models than ST models, and that using an offline-based test set to evaluate the simulST models underestimates the model performance.
Pedro Alejandro Dal Bianco, Oscar Agustín Stanchi, Facundo Manuel Quiroga et al.
This paper presents the first comprehensive interpretability analysis of a Transformer-based Sign Language Translation (SLT) model, focusing on the translation from video-based Greek Sign Language to glosses and text. Leveraging the Greek Sign Language Dataset, we examine the attention mechanisms within the model to understand how it processes and aligns visual input with sequential glosses. Our analysis reveals that the model pays attention to clusters of frames rather than individual ones, with a diagonal alignment pattern emerging between poses and glosses, which becomes less distinct as the number of glosses increases. We also explore the relative contributions of cross-attention and self-attention at each decoding step, finding that the model initially relies on video frames but shifts its focus to previously predicted tokens as the translation progresses. This work contributes to a deeper understanding of SLT models, paving the way for the development of more transparent and reliable translation systems essential for real-world applications.
Halaman 2 dari 6859