We construct novel families of exact immersed and embedded Lagrangian translating solitons and special Lagrangian submanifolds in $\mathbb{C}^m$ that are invariant under the action of various admissible compact subgroups $G \leq \text{SU}(m-1)$ with cohomogeneity-two. These examples are obtained via an Ansatz generalising a construction of Castro-Lerma in $\mathbb{C}^2$. We give explicit examples of admissible group actions, including a full classification for $G$ simple. We also describe novel Lagrangian translators symmetric with respect to non-compact subgroups of the affine special unitary group $\text{SU}(m)\ltimes \mathbb{C}^m$, including cohomogeneity-one examples.
Modernizing legacy enterprise systems often involves translating PL/I programs into modern languages such as Java. This task becomes significantly more complex when PL/I macro procedures are involved. The PL/I macro procedures are considered string-manipulating programs that generate PL/I code, and they make automated translation more complex. Recently, large language models (LLMs) have been explored for automated code translation. However, LLM-based code translation struggles to translate the PL/I macro procedures to Java programs that reproduce the behavior of the plain PL/I code generated by the original PL/I macro procedures. This paper proposes a novel method called templatization, which uses symbolic execution to generate code templates (code with named placeholders) as an intermediate representation. In this approach, symbolic values are treated as parts of macro-generated code. By symbolically executing macro procedures and generating code templates, our approach facilitates LLMs to generate readable and maintainable Java code. Our preliminary experiment on ten PL/I macro procedures shows that the LLM-based translation through templatization successfully generates Java programs that reproduce the behavior of the macro-generated PL/I programs.
Can we improve machine translation (MT) with LLMs by rewriting their inputs automatically? Users commonly rely on the intuition that well-written text is easier to translate when using off-the-shelf MT systems. LLMs can rewrite text in many ways but in the context of MT, these capabilities have been primarily exploited to rewrite outputs via post-editing. We present an empirical study of 21 input rewriting methods with 3 open-weight LLMs for translating from English into 6 target languages. We show that text simplification is the most effective MT-agnostic rewrite strategy and that it can be improved further when using quality estimation to assess translatability. Human evaluation further confirms that simplified rewrites and their MT outputs both largely preserve the original meaning of the source and MT. These results suggest LLM-assisted input rewriting as a promising direction for improving translations.
Business Process Model and Notation (BPMN) is a widely used standard for modelling business processes. While automated planning has been proposed as a method for simulating and reasoning about BPMN workflows, most implementations remain incomplete or limited in scope. This project builds upon prior theoretical work to develop a functional pipeline that translates BPMN 2.0 diagrams into PDDL representations suitable for planning. The system supports core BPMN constructs, including tasks, events, sequence flows, and gateways, with initial support for parallel and inclusive gateway behaviour. Using a non-deterministic planner, we demonstrate how to generate and evaluate valid execution traces. Our implementation aims to bridge the gap between theory and practical tooling, providing a foundation for further exploration of translating business processes into well-defined plans.
Idioms are defined as a group of words with a figurative meaning not deducible from their individual components. Although modern machine translation systems have made remarkable progress, translating idioms remains a major challenge, especially for speech-to-text systems, where research on this topic is notably sparse. In this paper, we systematically evaluate idiom translation as compared to conventional news translation in both text-to-text machine translation (MT) and speech-to-text translation (SLT) systems across two language pairs (German to English, Russian to English). We compare state-of-the-art end-to-end SLT systems (SeamlessM4T SLT-to-text, Whisper Large v3) with MT systems (SeamlessM4T SLT-to-text, No Language Left Behind), Large Language Models (DeepSeek, LLaMA) and cascaded alternatives. Our results reveal that SLT systems experience a pronounced performance drop on idiomatic data, often reverting to literal translations even in higher layers, whereas MT systems and Large Language Models demonstrate better handling of idioms. These findings underscore the need for idiom-specific strategies and improved internal representations in SLT architectures.
Data management applications are growing and require more attention, especially in the "big data" era. Thus, supporting such applications with novel and efficient algorithms that achieve higher performance is critical. Array database management systems are one way to support these applications by dealing with data represented in n-dimensional data structures. For instance, software like SciDB and RasDaMan can be powerful tools to achieve the required performance on large-scale problems with multidimensional data. Like their relational counterparts, these management systems support specific array query languages as the user interface. As a popular programming model, MapReduce allows large-scale data analysis, facilitates query processing, and is used as a DB engine. Nevertheless, one major obstacle is the low productivity of developing MapReduce applications. Unlike high-level declarative languages such as SQL, MapReduce jobs are written in a low-level descriptive language, often requiring massive programming efforts and complicated debugging processes. This work presents a system that supports translating array queries expressed in the Array Query Language (AQL) in SciDB into MapReduce jobs. We focus on translating some unique structural aggregations, including circular, grid, hierarchical, and sliding aggregations. Unlike traditional aggregations in relational DBs, these structural aggregations are designed explicitly for array manipulation. Thus, our work can be considered an array-view counterpart of existing SQL to MapReduce translators like HiveQL and YSmart. Our translator supports structural aggregations over arrays to meet various array manipulations. The translator can also help user-defined aggregation functions with minimal user effort. We show that our translator can generate optimized MapReduce code, which performs better than the short handwritten code by up to 10.84x.
The essay examines briefly come significant works published between the 1950s and the first decade of the 21st century in order to show the endurance over time of a representation of the city of Naples inspired by a spatial typology that reduces verticality to a horizontal tangle and transforms historical conflict into the emergence of natural drives.
Geography. Anthropology. Recreation, Language. Linguistic theory. Comparative grammar
Manuel Lardelli, Giuseppe Attanasio, Anne Lauscher
The translation of gender-neutral person-referring terms (e.g., the students) is often non-trivial. Translating from English into German poses an interesting case -- in German, person-referring nouns are usually gender-specific, and if the gender of the referent(s) is unknown or diverse, the generic masculine (die Studenten (m.)) is commonly used. This solution, however, reduces the visibility of other genders, such as women and non-binary people. To counteract gender discrimination, a societal movement towards using gender-fair language exists (e.g., by adopting neosystems). However, gender-fair German is currently barely supported in machine translation (MT), requiring post-editing or manual translations. We address this research gap by studying gender-fair language in English-to-German MT. Concretely, we enrich a community-created gender-fair language dictionary and sample multi-sentence test instances from encyclopedic text and parliamentary speeches. Using these novel resources, we conduct the first benchmark study involving two commercial systems and six neural MT models for translating words in isolation and natural contexts across two domains. Our findings show that most systems produce mainly masculine forms and rarely gender-neutral variants, highlighting the need for future research. We release code and data at https://github.com/g8a9/building-bridges-gender-fair-german-mt.
The task of accurate and efficient language translation is an extremely important information processing task. Machine learning enabled and automated translation that is accurate and fast is often a large topic of interest in the machine learning and data science communities. In this study, we examine using local Generative Pretrained Transformer (GPT) models to perform automated zero shot black-box, sentence wise, multi-natural-language translation into English text. We benchmark 16 different open-source GPT models, with no custom fine-tuning, from the Huggingface LLM repository for translating 50 different non-English languages into English using translated TED Talk transcripts as the reference dataset. These GPT model inference calls are performed strictly locally, on single A100 Nvidia GPUs. Benchmark metrics that are reported are language translation accuracy, using BLEU, GLEU, METEOR, and chrF text overlap measures, and wall-clock time for each sentence translation. The best overall performing GPT model for translating into English text for the BLEU metric is ReMM-v2-L2-13B with a mean score across all tested languages of $0.152$, for the GLEU metric is ReMM-v2-L2-13B with a mean score across all tested languages of $0.256$, for the chrF metric is Llama2-chat-AYT-13B with a mean score across all tested languages of $0.448$, and for the METEOR metric is ReMM-v2-L2-13B with a mean score across all tested languages of $0.438$.
Abstract This article examines translation strategies applied in the Russian version of Mariama Bâ’s francophone novel Une si longue lettre (USLL). The novel, which is often described as feminist, was translated into Russian during the period of late socialism, which was characterized by gradual societal changes involving the liberalization of the social order. Drawing on Spivak’s theorization of the subaltern and feminist translation, this article explores how the francophone African novel was translated into Russian and how specifically Soviet feminist discourses are reflected in the translation. Ultimately, this article argues that, by employing feminist translation strategies, the subaltern women characters in USLL were represented as less dependent on patriarchal structures and ‘inserted’ into the target culture as hegemonic subjects.
The field of unsupervised machine translation has seen significant advancement from the marriage of the Transformer and the back-translation algorithm. The Transformer is a powerful generative model, and back-translation leverages Transformer's high-quality translations for iterative self-improvement. However, the Transformer is encumbered by the run-time of autoregressive inference during back-translation, and back-translation is limited by a lack of synthetic data efficiency. We propose a two-for-one improvement to Transformer back-translation: Quick Back-Translation (QBT). QBT re-purposes the encoder as a generative model, and uses encoder-generated sequences to train the decoder in conjunction with the original autoregressive back-translation step, improving data throughput and utilization. Experiments on various WMT benchmarks demonstrate that a relatively small number of refining steps of QBT improve current unsupervised machine translation models, and that QBT dramatically outperforms standard back-translation only method in terms of training efficiency for comparable translation qualities.
Shreyas Vaidya, Arvind Kumar Sharma, Prajwal Gatti
et al.
In this work, we study the task of ``visually'' translating scene text from a source language (e.g., Hindi) to a target language (e.g., English). Visual translation involves not just the recognition and translation of scene text but also the generation of the translated image that preserves visual features of the source scene text, such as font, size, and background. There are several challenges associated with this task, such as translation with limited context, deciding between translation and transliteration, accommodating varying text lengths within fixed spatial boundaries, and preserving the font and background styles of the source scene text in the target language. To address this problem, we make the following contributions: (i) We study visual translation as a standalone problem for the first time in the literature. (ii) We present a cascaded framework for visual translation that combines state-of-the-art modules for scene text recognition, machine translation, and scene text synthesis as a baseline for the task. (iii) We propose a set of task-specific design enhancements to design a variant of the baseline to obtain performance improvements. (iv) Currently, the existing related literature lacks any comprehensive performance evaluation for this novel task. To fill this gap, we introduce several automatic and user-assisted evaluation metrics designed explicitly for evaluating visual translation. Further, we evaluate presented baselines for translating scene text between Hindi and English. Our experiments demonstrate that although we can effectively perform visual translation over a large collection of scene text images, the presented baseline only partially addresses challenges posed by visual translation tasks. We firmly believe that this new task and the limitations of existing models, as reported in this paper, should encourage further research in visual translation.
FROM A JAPANESE PERSPECTIVE: THE WESTERN INFLUENCES IN THE WORKS OF ŌGAI
The article focuses on how Mori Ōgai (1862-1922), a Japanese writer, translator and literary critic, incorporated references to European literature and culture in his novellas: Maihime (The Dancing Girl, 1890), Mōsō (Delusion, 1911) and Hanako (1910). The hermeneutic approach sheds light on the Japanese author’s vivid interest in foreign languages, cultural symbols, philosophy and arts which contributes to the intricate image of foreign influences in his oeuvre and invite his readers and translators in Europe to revisit their own cultural tradition.
Recent research has shown that L2 translation —translation from the first language (L1) into the second language (L2)— is a regular practice in many countries where languages of limited diffusion are used. But is L2 translation an unusual task in major-language cultures? Is there an L2 translation market for these languages? This paper aims to report on a survey-based study that explores current L2 translation practice in Spain: a survey of over 200 L1-Spanish translators was conducted to gather information about their professional practice and their attitudes regarding directionality. Data collection method used consisted of a web-based questionnaire partially based on Roiss (2001) and administered directly on an Internet site. Multi-modal methods of survey advertising and recruitment of respondents were used. Preliminary findings show that over 75% of translators, to a greater or lesser extent, engage in L2 translation, and that almost 20% do it more than half of their time, with English being by far the most widely translated L2. Additionally, responses suggest that more than half of the translators support L2 translation education in university settings.
Despite technical and scientific texts constituting a considerable proportion of translated materials worldwide, the translation of such texts, particularly academic texts in medical science, remains understudied. Some translators may be unfamiliar with specialist medical and pharmaceutical terminology or lack knowledge about translation conventions. This article examines the bi-directional translation of pharmaceutical research articles between English and Chinese, providing insights to translators experiencing these problems. We analyse the genre’s features in both languages and locate similarities and differences via recourse to parallel texts. The findings and our approach can be generalised to the translation of texts of this kind, addressing certain translation problems in an evidence-based way.
Sebastian T. Vincent, Loïc Barrault, Carolina Scarton
Unlike English, morphologically rich languages can reveal characteristics of speakers or their conversational partners, such as gender and number, via pronouns, morphological endings of words and syntax. When translating from English to such languages, a machine translation model needs to opt for a certain interpretation of textual context, which may lead to serious translation errors if extra-textual information is unavailable. We investigate this challenge in the English-to-Polish language direction. We focus on the underresearched problem of utilising external metadata in automatic translation of TV dialogue, proposing a case study where a wide range of approaches for controlling attributes in translation is employed in a multi-attribute scenario. The best model achieves an improvement of +5.81 chrF++/+6.03 BLEU, with other models achieving competitive performance. We additionally contribute a novel attribute-annotated dataset of Polish TV dialogue and a morphological analysis script used to evaluate attribute control in models.
In this work, we consider translating tock-CSP into Timed Automata for UPPAAL to facilitate using UPPAAL in reasoning about temporal specifications of tock-CSP models. The process algebra tock-CSP provides textual notations for modelling discrete-time behaviours, with the support of tools for automatic verification. Similarly, automatic verification of Timed Automata (TA) with a graphical notation is supported by the UPPAAL real-time verification toolbox \uppaal. The two modelling approaches, TA and tock-CSP, differ in both modelling and verification approaches, temporal logic and refinement, respectively, as well as their provided facilities for automatic verification. For instance, liveness requirements are difficult to specify with the constructs of tock-CSP, but they are easy to specify and verify in UPPAAL. To take advantage of temporal logic, we translate tock-CSP into TA for \uppaal; we have developed a translation technique and its supporting tool. We provide rules for translating tock-CSP into a network of small TAs for capturing the compositional structure of tock-CSP that is not available in TA. For validation, we start with an experimental approach based on finite approximations to trace sets. Then, we explore mathematical proof to establish the correctness of the rules for covering infinite traces.