Hasil untuk "Translating and interpreting"

Menampilkan 20 dari ~137522 hasil · dari DOAJ, arXiv, Semantic Scholar, CrossRef

JSON API
S2 Open Access 2018
Long short-term memory for machine remaining life prediction

Jianjing Zhang, Peng Wang, Ruqiang Yan et al.

Abstract Reliable tracking of performance degradation in dynamical systems such as manufacturing machines or aircraft engines and consequently, prediction of the remaining useful life (RUL) are one of the major challenges in realizing smart manufacturing. Traditional machine learning algorithms are often constrained in adapting to the complex and non-linear characteristics of manufacturing systems and processes. With the rapid development of modern computational hardware, Deep Learning has emerged as a promising computational technique for dynamical system prediction due to its enhanced capability to characterize the system complexity, overcoming the shortcomings of those traditional methods. In this paper, a new approach based on the Long Short-Term Memory (LSTM) network, an architecture that is specialized in discovering the underlying patterns embedded in time series, is proposed to track the system degradation and consequently, to predict the RUL. The objectives of this paper are: 1) translating the raw sensor data to an interpretable health index with the aim of better describing the system health condition; and 2) tracking the historical system degradation for accurate prediction of its future health condition. Evaluation using NASA’s C-MAPSS dataset verifies the effectiveness of the proposed method. Compared with other machine learning techniques, LSTM turns out to be more powerful and accurate in revealing degradation patterns, enabled by its time-dependent structure in nature.

393 sitasi en Computer Science
S2 Open Access 2016
Restricting retrotransposons: a review

J. Goodier

Retrotransposons have generated about 40 % of the human genome. This review examines the strategies the cell has evolved to coexist with these genomic “parasites”, focussing on the non-long terminal repeat retrotransposons of humans and mice. Some of the restriction factors for retrotransposition, including the APOBECs, MOV10, RNASEL, SAMHD1, TREX1, and ZAP, also limit replication of retroviruses, including HIV, and are part of the intrinsic immune system of the cell. Many of these proteins act in the cytoplasm to degrade retroelement RNA or inhibit its translation. Some factors act in the nucleus and involve DNA repair enzymes or epigenetic processes of DNA methylation and histone modification. RISC and piRNA pathway proteins protect the germline. Retrotransposon control is relaxed in some cell types, such as neurons in the brain, stem cells, and in certain types of disease and cancer, with implications for human health and disease. This review also considers potential pitfalls in interpreting retrotransposon-related data, as well as issues to consider for future research.

375 sitasi en Biology, Medicine
arXiv Open Access 2026
Dynamical Stability of Translating Solitons to Mean Curvature Flow in Hyperbolic Space

Ronaldo F. de Lima, Álvaro K. Ramos

We develop the theory of translating solitons for the Mean Curvature Flow (MCF) in hyperbolic space of dimension $n+1\ge 3$. More specifically, we establish that horospheres are dynamically stable as radial graphical solutions to MCF. To that end, we construct rotationally invariant translators analogous to the winglike solitons introduced by Clutterbuck, Schnürer and Schulze, which serve as barriers in an argument based on White's avoidance principle and the strong maximum principle for parabolic PDEs.

en math.DG
arXiv Open Access 2025
Adversarial Agent Collaboration for C to Rust Translation

Tianyu Li, Ruishi Li, Bo Wang et al.

Translating C to memory-safe languages, like Rust, prevents critical memory safety vulnerabilities that are prevalent in legacy C software. Existing approaches for C to safe Rust translation, including LLM-assisted ones, do not generalize on larger (> 500 LoC) C codebases because they depend on complex program analyses that frequently break. In this work, we present ACToR (Adversarial C To Rust translator), a simple LLM agent-based approach. Inspired by GANs, ACToR pits a generator agent against a discriminator agent, which collaborate to iteratively generate a Rust translation. On each iteration, the translator agent synthesizes and refines a Rust translation to pass an existing suite of tests, and then the discriminator agent finds new failing tests. We demonstrate that ACToR translates all of the 63 real-world command-line utilities considered in our benchmarks, which have an average size of 473 lines of code, and it achieves over 90% test pass rate with zero human intervention during translation. To our knowledge, it is the first work to show evidence that an agent-centric approach can reliably and automatically convert standalone command-line C programs at this scale. Furthermore, ACToR improves translation correctness by up to 25.1% compared to baseline, non-adversarial approaches.

en cs.SE, cs.AI
arXiv Open Access 2025
Evaluating the Impact of Verbal Multiword Expressions on Machine Translation

Linfeng Liu, Saptarshi Ghosh, Tianyu Jiang

Verbal multiword expressions (VMWEs) present significant challenges for natural language processing due to their complex and often non-compositional nature. While machine translation models have seen significant improvement with the advent of language models in recent years, accurately translating these complex linguistic structures remains an open problem. In this study, we analyze the impact of three VMWE categories -- verbal idioms, verb-particle constructions, and light verb constructions -- on machine translation quality from English to multiple languages. Using both established multiword expression datasets and sentences containing these language phenomena extracted from machine translation datasets, we evaluate how state-of-the-art translation systems handle these expressions. Our experimental results consistently show that VMWEs negatively affect translation quality. We also propose an LLM-based paraphrasing approach that replaces these expressions with their literal counterparts, demonstrating significant improvement in translation quality for verbal idioms and verb-particle constructions.

en cs.CL
arXiv Open Access 2024
Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages

Midia Yousefi, Yao Qian, Junkun Chen et al.

End-to-end speech translation (ST), which translates source language speech directly into target language text, has garnered significant attention in recent years. Many ST applications require strict length control to ensure that the translation duration matches the length of the source audio, including both speech and pause segments. Previous methods often controlled the number of words or characters generated by the Machine Translation model to approximate the source sentence's length without considering the isochrony of pauses and speech segments, as duration can vary between languages. To address this, we present improvements to the duration alignment component of our sequence-to-sequence ST model. Our method controls translation length by predicting the duration of speech and pauses in conjunction with the translation process. This is achieved by providing timing information to the decoder, ensuring it tracks the remaining duration for speech and pauses while generating the translation. The evaluation on the Zh-En test set of CoVoST 2, demonstrates that the proposed Isochrony-Controlled ST achieves 0.92 speech overlap and 8.9 BLEU, which has only a 1.4 BLEU drop compared to the ST baseline.

en cs.CL, eess.AS
arXiv Open Access 2024
Seed-to-Seed: Image Translation in Diffusion Seed Space

Or Greenberg, Eran Kishon, Dani Lischinski

We introduce Seed-to-Seed Translation (StS), a novel approach for Image-to-Image Translation using diffusion models (DMs), aimed at translations that require close adherence to the structure of the source image. In contrast to existing methods that modify images during the diffusion sampling process, we leverage the semantic information encoded within the space of inverted seeds of a pretrained DM, dubbed as the seed-space. We demonstrate that inverted seeds can be used for discriminative tasks, and can also be manipulated to achieve desired transformations in an unpaired image-to-image translation setting. Our method involves training an sts-GAN, an unpaired translation model between source and target seeds, based on CycleGAN. The final translated images are obtained by initiating the DM's sampling process from the translated seeds. A ControlNet is used to ensure the structural preservation of the input image. We demonstrate the effectiveness of our approach for the task of translating automotive scenes, showcasing superior performance compared to existing GAN-based and diffusion-based methods, as well as for several other unpaired image translation tasks. Our approach offers a fresh perspective on leveraging the semantic information encoded within the seed-space of pretrained DMs for effective image editing and manipulation.

en cs.CV
arXiv Open Access 2024
Translate-Distill: Learning Cross-Language Dense Retrieval by Translation and Distillation

Eugene Yang, Dawn Lawrie, James Mayfield et al.

Prior work on English monolingual retrieval has shown that a cross-encoder trained using a large number of relevance judgments for query-document pairs can be used as a teacher to train more efficient, but similarly effective, dual-encoder student models. Applying a similar knowledge distillation approach to training an efficient dual-encoder model for Cross-Language Information Retrieval (CLIR), where queries and documents are in different languages, is challenging due to the lack of a sufficiently large training collection when the query and document languages differ. The state of the art for CLIR thus relies on translating queries, documents, or both from the large English MS MARCO training set, an approach called Translate-Train. This paper proposes an alternative, Translate-Distill, in which knowledge distillation from either a monolingual cross-encoder or a CLIR cross-encoder is used to train a dual-encoder CLIR student model. This richer design space enables the teacher model to perform inference in an optimized setting, while training the student model directly for CLIR. Trained models and artifacts are publicly available on Huggingface.

en cs.IR, cs.CL
arXiv Open Access 2024
Investigating Markers and Drivers of Gender Bias in Machine Translations

Peter J Barclay, Ashkan Sami

Implicit gender bias in Large Language Models (LLMs) is a well-documented problem, and implications of gender introduced into automatic translations can perpetuate real-world biases. However, some LLMs use heuristics or post-processing to mask such bias, making investigation difficult. Here, we examine bias in LLMss via back-translation, using the DeepL translation API to investigate the bias evinced when repeatedly translating a set of 56 Software Engineering tasks used in a previous study. Each statement starts with 'she', and is translated first into a 'genderless' intermediate language then back into English; we then examine pronoun-choice in the back-translated texts. We expand prior research in the following ways: (1) by comparing results across five intermediate languages, namely Finnish, Indonesian, Estonian, Turkish and Hungarian; (2) by proposing a novel metric for assessing the variation in gender implied in the repeated translations, avoiding the over-interpretation of individual pronouns, apparent in earlier work; (3) by investigating sentence features that drive bias; (4) and by comparing results from three time-lapsed datasets to establish the reproducibility of the approach. We found that some languages display similar patterns of pronoun use, falling into three loose groups, but that patterns vary between groups; this underlines the need to work with multiple languages. We also identify the main verb appearing in a sentence as a likely significant driver of implied gender in the translations. Moreover, we see a good level of replicability in the results, and establish that our variation metric proves robust despite an obvious change in the behaviour of the DeepL translation API during the course of the study. These results show that the back-translation method can provide further insights into bias in language models.

en cs.CL, cs.CY
arXiv Open Access 2024
Improving LLM Abilities in Idiomatic Translation

Sundesh Donthi, Maximilian Spencer, Om Patel et al.

For large language models (LLMs) like NLLB and GPT, translating idioms remains a challenge. Our goal is to enhance translation fidelity by improving LLM processing of idiomatic language while preserving the original linguistic style. This has a significant social impact, as it preserves cultural nuances and ensures translated texts retain their intent and emotional resonance, fostering better cross-cultural communication. Previous work has utilized knowledge bases like IdiomKB by providing the LLM with the meaning of an idiom to use in translation. Although this method yielded better results than a direct translation, it is still limited in its ability to preserve idiomatic writing style across languages. In this research, we expand upon the knowledge base to find corresponding idioms in the target language. Our research performs translations using two methods: The first method employs the SentenceTransformers model to semantically generate cosine similarity scores between the meanings of the original and target language idioms, selecting the best idiom (Cosine Similarity method). The second method uses an LLM to find a corresponding idiom in the target language for use in the translation (LLM-generated idiom method). As a baseline, we performed a direct translation without providing additional information. Human evaluations on the English -> Chinese, and Chinese -> English show the Cosine Similarity Lookup method out-performed others in all GPT4o translations. To further build upon IdiomKB, we developed a low-resource Urdu dataset containing Urdu idioms and their translations. Despite dataset limitations, the Cosine Similarity Lookup method shows promise, potentially overcoming language barriers and enabling the exploration of diverse literary works in Chinese and Urdu.(LoResLM @ COLING Preprint)

en cs.CL, cs.AI
arXiv Open Access 2024
CoVoSwitch: Machine Translation of Synthetic Code-Switched Text Based on Intonation Units

Yeeun Kang

Multilingual code-switching research is often hindered by the lack and linguistically biased status of available datasets. To expand language representation, we synthesize code-switching data by replacing intonation units detected through PSST, a speech segmentation model fine-tuned from OpenAI's Whisper, using a speech-to-text translation dataset, CoVoST 2. With our dataset, CoVoSwitch, spanning 13 languages, we evaluate the code-switching translation performance of two multilingual translation models, M2M-100 418M and NLLB-200 600M. We reveal that the inclusion of code-switching units results in higher translation performance than monolingual settings and that models are better at code-switching translation into English than non-English. Further, low-resource languages gain most from integration of code-switched units when translating into English but much less when translating into non-English. Translations into low-resource languages also perform worse than even raw code-switched inputs. We find that systems excel at copying English tokens but struggle with non-English tokens, that the off-target problem in monolingual settings is also relevant in code-switching settings, and that models hallucinate in code-switching translation by introducing words absent in both of the original source sentences. CoVoSwitch and code are available at https://github.com/sophiayk20/covoswitch.

en cs.CL, cs.AI
arXiv Open Access 2024
ELMI: Interactive and Intelligent Sign Language Translation of Lyrics for Song Signing

Suhyeon Yoo, Khai N. Truong, Young-Ho Kim

d/Deaf and hearing song-signers have become prevalent across video-sharing platforms, but translating songs into sign language remains cumbersome and inaccessible. Our formative study revealed the challenges song-signers face, including semantic, syntactic, expressive, and rhythmic considerations in translations. We present ELMI, an accessible song-signing tool that assists in translating lyrics into sign language. ELMI enables users to edit glosses line-by-line, with real-time synced lyric and music video snippets. Users can also chat with a large language model-driven AI to discuss meaning, glossing, emoting, and timing. Through an exploratory study with 13 song-signers, we examined how ELMI facilitates their workflows and how song-signers leverage and receive an LLM-driven chat for translation. Participants successfully adopted ELMI to song-signing, with active discussions throughout. They also reported improved confidence and independence in their translations, finding ELMI encouraging, constructive, and informative. We discuss research and design implications for accessible and culturally sensitive song-signing translation tools.

en cs.HC, cs.AI
S2 Open Access 2023
Large Language Model Displays Emergent Ability to Interpret Novel Literary Metaphors

Nicholas Ichien, Dušan Stamenković, K. Holyoak

ABSTRACT Despite the exceptional performance of large language models (LLMs) on a wide range of tasks involving natural language processing and reasoning, there has been sharp disagreement as to whether their abilities extend to more creative human abilities. A core example is the interpretation of novel metaphors. Here we assessed the ability of GPT-4, a state-of-the-art large language model, to provide natural-language interpretations of a recent AI benchmark (Fig-QA dataset), novel literary metaphors drawn from Serbian poetry and translated into English, and entire novel English poems. GPT-4 outperformed previous AI models on the Fig-QA dataset. For metaphors drawn from Serbian poetry, human judges – blind to the fact that an AI model was involved – rated metaphor interpretations generated by GPT-4 as superior to those provided by a group of college students. In interpreting reversed metaphors, GPT-4, as well as humans, exhibited signs of sensitivity to the Gricean cooperative principle. In addition, for several novel English poems GPT-4 produced interpretations that were rated as excellent or good by a human literary critic. These results indicate that LLMs such as GPT-4 have acquired an emergent ability to interpret literary metaphors, including those embedded in novel poems.

23 sitasi en Computer Science

Halaman 18 dari 6877