Evaluating Small Decoder-Only Language Models for Grammar Correction and Text Simplification
Anthony Lamelas
Large language models have become extremely popular recently due to their ability to achieve strong performance on a variety of tasks, such as text generation and rewriting, but their size and computation cost make them difficult to access, deploy, and secure in many settings. This paper investigates whether small, decoder-only language models can provide an efficient alternative for the tasks of grammar correction and text simplification. The experiments in this paper focus on testing small language models out of the box, fine-tuned, and run sequentially on the JFLEG and ASSET datasets using established metrics. The results show that while SLMs may learn certain behaviors well, their performance remains below strong baselines and current LLMs. The results also show that SLMs struggle with retaining meaning and hallucinations. These findings suggest that despite their efficiency advantages, current SLMs are not yet competitive enough with modern LLMs for rewriting, and further advances in training are required for SLMs to close the performance gap between them and today's LLMs.
Evaluation of NMT-Assisted Grammar Transfer for a Multi-Language Configurable Data-to-Text System
Andreas Madsack, Johanna Heininger, Adela Schneider
et al.
One approach for multilingual data-to-text generation is to translate grammatical configurations upfront from the source language into each target language. These configurations are then used by a surface realizer and in document planning stages to generate output. In this paper, we describe a rule-based NLG implementation of this approach where the configuration is translated by Neural Machine Translation (NMT) combined with a one-time human review, and introduce a cross-language grammar dependency model to create a multilingual NLG system that generates text from the source data, scaling the generation phase without a human in the loop. Additionally, we introduce a method for human post-editing evaluation on the automatically translated text. Our evaluation on the SportSett:Basketball dataset shows that our NLG system performs well, underlining its grammatical correctness in translation tasks.
Sexual imagery and male-ego in Lawrence's “Tortoise Shout” and Ted Hughes' “Thought Fox”
Shaymaa Alsalihi, Tavgah Ghulam Saeed
sexual imagery has been a significant phenomenon in modern literature as writers tend to enrich their writings with social taboos and erotic contents. This paper presents the poetry of D. H. Lawrence and Ted Hughes to show how those writers used sexual images to depict their male-ego, and their manhood. Concentrating on Freud`s psychosexual development, and the use of phallic symbols. Through examining Tortoise Shout by Lawrence, and Thought Fox by Hughes, the reader can notice the sexual images that resides behind their stanzas. Accordingly, this paper tries to discover the way both writers present those images.
English literature, Language. Linguistic theory. Comparative grammar
Position Paper: Generalized grammar rules and structure-based generalization beyond classical equivariance for lexical tasks and transduction
Mircea Petrache, Shubhendu Trivedi
Compositional generalization is one of the main properties which differentiates lexical learning in humans from state-of-art neural networks. We propose a general framework for building models that can generalize compositionally using the concept of Generalized Grammar Rules (GGRs), a class of symmetry-based compositional constraints for transduction tasks, which we view as a transduction analogue of equivariance constraints in physics-inspired tasks. Besides formalizing generalized notions of symmetry for language transduction, our framework is general enough to contain many existing works as special cases. We present ideas on how GGRs might be implemented, and in the process draw connections to reinforcement learning and other areas of research.
Domain-Specific Shorthand for Generation Based on Context-Free Grammar
Andriy Kanyuka, Elias Mahfoud
The generation of structured data in formats such as JSON, YAML and XML is a critical task in Generative AI (GenAI) applications. These formats, while widely used, contain many redundant constructs that lead to inflated token usage. This inefficiency is particularly evident when employing large language models (LLMs) like GPT-4, where generating extensive structured data incurs increased latency and operational costs. We introduce a domain-specific shorthand (DSS) format, underpinned by a context-free grammar (CFG), and demonstrate its usage to reduce the number of tokens required for structured data generation. The method involves creating a shorthand notation that captures essential elements of the output schema with fewer tokens, ensuring it can be unambiguously converted to and from its verbose form. It employs a CFG to facilitate efficient shorthand generation by the LLM, and to create parsers to translate the shorthand back into standard structured formats. The application of our approach to data visualization with LLMs demonstrates a significant (3x to 5x) reduction in generated tokens, leading to significantly lower latency and cost. This paper outlines the development of the DSS and the accompanying CFG, and the implications of this approach for GenAI applications, presenting a scalable solution to the token inefficiency problem in structured data generation.
Examining language attitudes and use: A survey of Indonesian university students’ loyalty to their ethnic languages
Dingding Haerudin, Ruswan Dallyono, Usep Kuswari
et al.
Currently, a large number of ethnic languages worldwide are losing their vitality and popularity due to globalization and the influence of dominant languages such as English and Indonesian. Such a linguistic decline is both unsettling and disheartening because, in reality, this loss not only means a loss of communication tools but also a loss of identities and values. Against such a backdrop, this study aims to investigate Indonesian university students’ attitudes to their ethnic languages and to explore the factors that influence students’ fluency in their ethnic languages. To conduct this study, the qualitative method was used, and the data were obtained using questionnaires distributed to 78 university students from 10 universities across Indonesia through Google Forms. These participants were purposively selected from 18 different ethnic groups, including Ambonese, Balinese, and Sundanese. The findings indicate that there are various factors affecting the participants’ fluency in their languages, namely domestic use of their ethnic languages and parental encouragement, which turned out to positively affect fluency and cross-ethnic marriages and relocation of environments, which negatively impacted the participants’ fluency. Therefore, this study recommends two strategies to preserve ethnic languages: (1) teaching programs for ethnic languages: schools should administer classes to support students from ethnic-language-deprived backgrounds, and (2) local government policies: they are expected to issue and implement policies that encourage and protect the use of ethnic languages among younger generations.
Special aspects of education, Language. Linguistic theory. Comparative grammar
Strictly Locally Testable and Resources Restricted Control Languages in Tree-Controlled Grammars
Bianca Truthe
Tree-controlled grammars are context-free grammars where the derivation process is controlled in such a way that every word on a level of the derivation tree must belong to a certain control language. We investigate the generative capacity of such tree-controlled grammars where the control languages are special regular sets, especially strictly locally testable languages or languages restricted by resources of the generation (number of non-terminal symbols or production rules) or acceptance (number of states). Furthermore, the set theoretic inclusion relations of these subregular language families themselves are studied.
Reseña: "La interpretación en el ámbito de los videojuegos. Fundamentos teóricos y prácticos"
Olaya Martínez Sánchez
N/A
Translating and interpreting
Tłumaczenie dzieci i dla dzieci w obecnej sytuacji migracyjnej w Polsce
Małgorzata Tryuk
Interpreting of Children and for Children in the Actual Migration Situation in Poland
The paper describes the actual scene of public service interpreting and translation in Poland since the beginning of the war in Ukraine and the influx of a vast number of Ukrainian refugees into Poland. In particular it is focused on the role played by volunteer interpreters and translators who offer linguistic and cultural assistance to Ukrainian children in medical and psychotherapeutic contexts. It also deals with the activities of non-governmental organizations offering linguistic and translational support for immigrants in Poland.
Translating and interpreting
LA CONTRIBUTION DU TEXTE HISTORIQUE FRANCOPHONE DANS LA FORMATION IDENTITAIRE DE L’APPRENANT ALGÉRIEN : UNE INVESTIGATION DANS LE MILIEU DE LA TROISIÈME ANNÉE SECONDAIRE
Hicham Belmokhtar
Cet article se penche sur le rôle du texte historique dans la transmission de l'identité
algérienne aux élèves de troisième année du secondaire (3ème AS) pendant l'enseignement du
français langue étrangère (FLE). La langue française occupe une place significative dans tous les
aspects du pays, y compris l'éducation, et son enseignement en 3ème AS a une influence considérable
sur le comportement et la personnalité des apprenants. Le manuel scolaire utilisé à ce niveau est un
outil essentiel dans le processus d'enseignement et d'apprentissage du FLE, fournissant une approche
didactique, pédagogique et formatrice. L'observation du programme de la 3ème AS révèle que la
construction de l'identité des citoyens est étroitement liée au contenu pédagogique du curriculum
scolaire, tel que reflété par le manuel et les différentes méthodologies d'enseignement en vigueur. Le
texte historique du premier projet de la 3ème AS joue un rôle crucial dans cette construction
identitaire en abordant des thèmes tels que l'histoire, la culture et la diversité linguistique. À travers
un questionnaire et une activité d'écriture, cet article analysera comment le texte historique contribue
à transmettre les composantes de l'identité nationale aux apprenants de la 3ème AS lors des cours de
FLE..
Philology. Linguistics, Language. Linguistic theory. Comparative grammar
Criticism of the Translation of the Mantiq Al-Tair by Badi’ Mohammad Jomeh Based on Garces Theory
Arezu Pooryazdanpanah Kermani
The Garces model is a significant theory in the field of linguistics that focuses on the qualitative examination of translated literary works to assess the quality of translations. This approach assesses translations according to two criteria: acceptability and appropriateness. It evaluates positive and negative attributes at four different levels. The initial level of this theory focuses on analyzing the components associated with vocabulary and the transmission of meaning via language. The second level involves the analysis of syntactic and morphological components, while the third and fourth levels focus on evaluating discourse and text style. Farīd ud-Dīn ʿAṭṭār of Nishapur’s Mantiq Al-Tair is highly esteemed Persian mystical poetry that has been translated into other languages. Currently, Dr. Badi’ MohammadJomeh’s translation, which he conducted at Ain Shams University, is the most thorough of these translations. Furthermore, Mohammad Jomeh’s original translation of Mantiq Al-Tair by Attar has been praised for its self-worth in conveying the logic and invaluable ideas of the author to the Arab world. The translation is particularly notable for its precision, conciseness, eloquence, and literary highlights. As a result, it has been subjected to examination and evaluation using the Garces model and its four levels. The research findings suggest that the translator has faithfully followed the source language in the mentioned translation. Furthermore, the translation demonstrates greater acceptability and adequacy in terms of lexical and syntactic-morphological meaning compared to the other two levels. The translator's proficiency is particularly evident at these two levels. He has employed a greater number of constructive methods. However, the translation being examined has prominently featured negative tactics at two levels of discourse: functional and stylistic-semantic. This might be attributed to the choice of content for translation, specifically mystical systems. Despite incorporating culturally similar parts, the elevation of Badi’ Mohammad Jomeh’s translation can be attributed to lexical expansion, compensation, and changes in syntax and structure.Keywords: Translation Criticism, Literary translation, Mantiq Al-Tair, ʿAṭṭār of Nishapur, Badi’ Mohammad Jomeh, Carmen Garces.IntroductionMantiq Al-Tair is a highly significant oriental text that has beentranslated into various languages. Dr. Badi’ Mohammad Jomeh, anesteemed professor specializing in oriental studies at Ain ShamsUniversity, has successfully rendered a full translation of this work into the Arabic language. Evaluating the translation is vital to ensuring its accuracy, utilizing various translation principles and approaches. Translation criticism serves as a crucial connection between translation theory and translation practice. Garces's theory is a significant destination-oriented theory that consists of four stages. Due to its complete nature, it can serve as an effective model for assessing the quality of translations, particularly in the context of literary translated works, in terms of their acceptability and adequacy. This essay aims to assess the Arabic translation of Mantiq Al-Tair by Badi’ Mohammad Jomeh using Garces' methodology, employing an analytical-descriptive approach.Literature ReviewSome significant research has been conducted in the field of translation criticism, specifically regarding the translation of Mantiq Al-Tair and the application of the Garces model. Notable articles include "Mantiq Al-Tair of Attar in Lebanon (criticism on the research and translation of the Arabic Mantiq Al-Tair)" (1383) by Nik Manesh, "Criticism and review of the Persian translation of the novel Qalb al-Lil with the title Del Shab based on the model of Garces" (1396) by Ali Sayadani et al., "Lexual criticism of the translation of Sheikh Abdulhaq Mohadath Dehlavi from Fatuh al-Ghayb based on the semantic level - Garces Lexicon" (1400) by Bidkhoni and AghHosseini, "Hermeneutic view of the French translation of somemystical words of Al- Mantiq Al-Tair based on the opinions of Umberto Eco" (1400) by Moghaddam and Akrami Fard company, and "Study in Translation Al-Arabiya for the Mantiq Al-Tair" (2006) by Nadi Hassoun.It is evident that the Arabic translation of Mantiq Al-Tair has not been systematically and critically evaluated using translation criticism theories. Furthermore, the studies based on Garces theory are restricted to the translation of novels and fictional works, and despite the potential of Garces theory to assess the translation of poetic texts, no research has been conducted thus far to critique and evaluate text translations using this theory. Therefore, it is imperative to elucidate the acceptability and sufficiency of the planned translation by listing the favorable and unfavorable characteristics.Research MethodologyGarces theory is a comprehensive model used to assess literary writings. It goes beyond just the translation of words and phrases and takes into account four different levels. The Garces model is widely regarded as a prominent model for evaluating the translation of literary texts. The increasing adoption of this approach by scholars in recent years demonstrates its favorable standing among translation critics. This paradigm operates on the notion of parity between the source and destination texts. According to Garces, the source text and the translation should aim for maximum equivalence on all four levels. The four layers are: semantic-lexical, syntactic-morphological, discourse-role, and stylistic-intentional.ConclusionMantiq Al-Tair is a significant spiritual poem in Persian poetry and literature that has been translated into other languages. This essay critically evaluates the Arabic translation of . Badi’ Mohammad Jomeh’s Mantiq Al-Tair, using the Garces model as a framework for analysis. Upon evaluating this translation using the four levels of the Garces model, it was concluded that the translation being examined is in the original language. Out of the four levels in the Garces model, this translation demonstrates a higher level of conformity with the semantic-lexical level compared to the other three levels. The subgroups that have emerged in this translation at this level include assimilation, lexical expansion, lexical account, general and specific, definition and explanation, cultural equivalent, and syntactic expansion. At this stage, the translator has employed constructive methods with the exception of one instance (lexical explanation). Out of the subcategories stated, lexical expansion and lexical account are more frequent in this translation.The translation examines modifications in syntax or structure, alterations in viewpoint, compensation, implications, and the elimination of subsets at the syntactic-morphological level. Among these modifications, the change of syntax or structure is particularly prominent in this translation. In addition to implication and omission, which are negative strategies at this level, the translator has employed positive techniques. All subcategories within the two levels of discourse—functional and stylistic-semantic—are seen asnegative strategies. The most common ones are the translator's mistake and changing the function of rhetorical devices.
Translating and interpreting
Making first order linear logic a generating grammar
Sergey Slavnov
It is known that different categorial grammars have surface representation in a fragment of first order multiplicative linear logic (MLL1). We show that the fragment of interest is equivalent to the recently introduced extended tensor type calculus (ETTC). ETTC is a calculus of specific typed terms, which represent tuples of strings, more precisely bipartite graphs decorated with strings. Types are derived from linear logic formulas, and rules correspond to concrete operations on these string-labeled graphs, so that they can be conveniently visualized. This provides the above mentioned fragment of MLL1 that is relevant for language modeling not only with some alternative syntax and intuitive geometric representation, but also with an intrinsic deductive system, which has been absent. In this work we consider a non-trivial notationally enriched variation of the previously introduced ETTC, which allows more concise and transparent computations. We present both a cut-free sequent calculus and a natural deduction formalism.
Book review: Empirical Studies of Translation and Interpreting: The Post-Structuralist Approach
Tianyuan Zhao, Lin Shen
Translating and interpreting, Social sciences (General)
Natural Answer Generation: From Factoid Answer to Full-length Answer using Grammar Correction
Manas Jain, Sriparna Saha, Pushpak Bhattacharyya
et al.
Question Answering systems these days typically use template-based language generation. Though adequate for a domain-specific task, these systems are too restrictive and predefined for domain-independent systems. This paper proposes a system that outputs a full-length answer given a question and the extracted factoid answer (short spans such as named entities) as the input. Our system uses constituency and dependency parse trees of questions. A transformer-based Grammar Error Correction model GECToR (2020), is used as a post-processing step for better fluency. We compare our system with (i) Modified Pointer Generator (SOTA) and (ii) Fine-tuned DialoGPT for factoid questions. We also test our approach on existential (yes-no) questions with better results. Our model generates accurate and fluent answers than the state-of-the-art (SOTA) approaches. The evaluation is done on NewsQA and SqUAD datasets with an increment of 0.4 and 0.9 percentage points in ROUGE-1 score respectively. Also the inference time is reduced by 85\% as compared to the SOTA. The improved datasets used for our evaluation will be released as part of the research contribution.
The Role of the Right Hemisphere in Processing Phonetic Variability Between Talkers
Sahil Luthra
AbstractNeurobiological models of speech perception posit that both left and right posterior temporal brain regions are involved in the early auditory analysis of speech sounds. However, frank deficits in speech perception are not readily observed in individuals with right hemisphere damage. Instead, damage to the right hemisphere is often associated with impairments in vocal identity processing. Herein lies an apparent paradox: The mapping between acoustics and speech sound categories can vary substantially across talkers, so why might right hemisphere damage selectively impair vocal identity processing without obvious effects on speech perception? In this review, I attempt to clarify the role of the right hemisphere in speech perception through a careful consideration of its role in processing vocal identity. I review evidence showing that right posterior superior temporal, right anterior superior temporal, and right inferior / middle frontal regions all play distinct roles in vocal identity processing. In considering the implications of these findings for neurobiological accounts of speech perception, I argue that the recruitment of right posterior superior temporal cortex during speech perception may specifically reflect the process of conditioning phonetic identity on talker information. I suggest that the relative lack of involvement of other right hemisphere regions in speech perception may be because speech perception does not necessarily place a high burden on talker processing systems, and I argue that the extant literature hints at potential subclinical impairments in the speech perception abilities of individuals with right hemisphere damage.
Language. Linguistic theory. Comparative grammar, Neurophysiology and neuropsychology
O malandro no cinema: atualização da figura do malandro em Madame Satã, de Karim Aïnouz
Iago Porfírio, Márcia Gomes Marques
Neste trabalho se discute a representação do malandro em diferentes movimentos do cinema nacional, a fim de indagar sobre a relação dessa representação com a marginalização e a resistência de personagens com corpos negros. Utilizamos as contribuições de Michel de Certeau (2007) e de Giorgio Agamben (1993) acerca do ser qualquer, de modo a identificar as situações que compelem às formas de vida do malandro, como sobrevivência nos espaços sociais. Com a identificação desse tipo social em filmes de diferentes períodos, e de elementos do malandro na vida cotidiana e como sintoma da comunidade que vem, analisamos Madame Satã (2002) como atualização do personagem no Brasil contemporâneo.
Considerations and challenges in longitudinal studies of lexical features in L2 writing
Minkyung Kim
Exploring the longitudinal development of second language (L2) lexical use has been one of the important topics in L2 vocabulary research. One approach to examining longitudinal changes in L2 lexical use is to capture changes in lexical features as found in learner production, such as L2 writing, over time. To further facilitate this approach, the purpose of this article is to discuss considerations and challenges for conducting longitudinal studies on lexical features in L2 writing. The article first provides a summary of relevant previous studies, followed by the promise of longitudinal studies on lexical features in L2 writing. It then presents considerations and challenges in longitudinal studies of lexical features in L2 writing in terms of the data collection and analysis and the choice of lexical measures. More research on the longitudinal changes in lexical features in L2 learner production seems warranted. Ultimately, more longitudinal research in lexical features in L2 learner production will help us have a deeper understanding of L2 lexical development and design better vocabulary intervention in L2 classrooms.
A Precedence-Driven Approach for Concurrent Model Synchronization Scenarios using Triple Graph Grammars
Lars Fritsche, Jens Kosiol, Adrian Möller
et al.
Concurrent model synchronization is the task of restoring consistency between two correlated models after they have been changed concurrently and independently. To determine whether such concurrent model changes conflict with each other and to resolve these conflicts taking domain- or user-specific preferences into account is highly challenging. In this paper, we present a framework for concurrent model synchronization algorithms based on Triple Graph Grammars (TGGs). TGGs specify the consistency of correlated models using grammar rules; these rules can be used to derive different consistency restoration operations. Using TGGs, we infer a causal dependency relation for model elements that enables us to detect conflicts non-invasively. Different kinds of conflicts are detected first and resolved by the subsequent conflict resolution process. Users configure the overall synchronization process by orchestrating the application of consistency restoration fragments according to several conflict resolution strategies to achieve individual synchronization goals. As proof of concept, we have implemented this framework in the model transformation tool eMoflon. Our initial evaluation shows that the runtime of our presented approach scales with the size of model changes and conflicts, rather than model size.
Парцеллят и сегмент: к вопросу о нетривиальных синтаксических позициях словоформ в русском языке
Fedor Pankov
Парцеллят и сегмент – нетривиальные синтаксические позиции словоформ, участвующие в выражении субъективных смыслов. Обе позиции выделяются по отношению к базовой части: парцеллят обычно следует за ней, а сегмент ей предшествует. Парцелляция и сегментированная синтаксическая конструкция (именительный темы) являются важными грамматическими механизмами, которые позволяют субъекту речи в соответствии с коммуникативной целеустановкой адекватно выразить ту или иную мысль.
Language. Linguistic theory. Comparative grammar
The use of footnotes in the Malay translation of A Thousand Splendid Suns
Haslina Haroon
Footnotes are paratextual elements which appear at the bottom of a page in a text. In translated literary texts, translators may employ footnotes to assist readers in their understanding of the translation. Despite the use of footnotes in literary translation, there are few studies which have looked into their use in translations in Malaysia. This paper thus aims to explore the use of footnotes in a literary translation from English into Malay. More specifically, the aim is to determine the types of words which are footnoted in the Malay translation of the English-language novel, A Thousand Splendid Suns, and to determine the information provided in the footnotes. The paper also aims to determine the function served by the footnotes in the translation. To carry out this study, the footnotes in the Malay translation are collected and their content examined. The analysis reveals that the footnotes are generally linked to culture-bound words which are transferred unchanged from the source text to the translation. In terms of their content, the footnotes provide mainly dictionary-like definitions of the foreign words in the Malay translation. As such, the footnotes serve a purely informative function. This paper argues that footnotes must be used judiciously in a translation. Informative footnotes can play an important role in enhancing the readers’ understanding of the text and in bringing the text closer to the readers; measures, however, must be taken to ensure the accuracy of the content of the footnotes if they are to benefit the readers.
Translating and interpreting