Abstract Small-scale sentiment classification often suffers from data scarcity, which limits the generalization ability of the models. This study evaluates and compares the effectiveness of three data augmentation strategies: Easy Data Augmentation (EDA), back-translation, and contextual token substitution (nlpaug-style), with both traditional machine learning classifiers (Logistic Regression, Random Forest) and transformer-based models (BERT). We perform a comprehensive empirical comparison with low-resource sentiment datasets by summarizing the results of recent studies and performing targeted head-to-head experiments. Our findings indicate that all augmentation methods improve performance. Contextual augmentation yields the most consistent gains for BERT models, while EDA and back-translation provide greater benefits for traditional classifiers. These insights help guide the selection of data augmentation techniques tailored to model type and dataset size, filling a critical gap in research on data augmentation for sentiment classification on small datasets.
Computational linguistics. Natural language processing, Electronic computers. Computer science
Through critical research methods using document analysis, this study investigated the current Language in Education Policy (LiEP) debates in Africa. There are many such debates going on across the continent, but the literature available is very thin. Three multilingual African countries, the Federal Republic of Nigeria, the Republic of Congo, and the Islamic Republic of Mauritania, were selected for the study. The study found that colonial languages are dominant and are the languages mainly used as media of instruction in schools and languages of assessment in the sampled countries. Although three countries were selected, the debates are not any different in the rest of the African countries. The findings of this research are generalisable to the situation across the entire continent, thus critical in influencing future LiEP on the content. It is imperative to note that the use of colonial languages in education should not be at the expense of African languages.
Language. Linguistic theory. Comparative grammar, Computational linguistics. Natural language processing
Статтю присвячено дослідженню маркування інтертекстуальних включень у художньому дискурсі з метою розкриття семантико-функційного потенціалу імпліцитних та експліцитних маркерів інтертекстуальності у творах американських та англійських письменників ХХ століття. Аналіз інтертекстуальності в художньому дискурсі допоміг виявити інтертекстуальні маркери та визначити засоби їх репрезентації, систематизувати інтертекстуальні включення та встановити їх дискурсивно-функційний потенціал. Матеріалом дослідження слугували романи "The Crying of Lot 49" Томаса Пінчона та "A History of the World in 10½ Chapters" Джуліана Барнса. Методологія дослідження передбачала використання методу уважного читання для аналізу структури романів, методу текстологічного аналізу творів дібраного матеріалу, методи лінгвостилістичного та інтертекстуального аналізу для ідентифікації й опису різних видів інтертекстуальності та міжтекстових зв’язків, методу інтермедіального аналізу для встановлення семантичних зв’язків між інтертекстуальними елементами належних до суміжних видів мистецтв, семантико-стилістичного методу для опису мовностилістичних ресурсів та їх семантики, методу деконструкції для ідентифікації непрямих інтертекстуальних зв'язків між текстами. Розроблено критерії для ідентифікації та типологізації інтертекстуальних маркерів: прямі та непрямі. Окреслено особливості семантики й функціонування інтертекстуальних маркерів у проаналізованому корпусі матеріалу. Результати дослідження полягають у розробленні авторської таксономії прямих і непрямих маркерів інтертекстуальності, що дозволить оцінити вплив рівня читацької компетенції на розуміння художніх текстів.
Discourse analysis, Computational linguistics. Natural language processing
Abstract This study explores the application of artificial intelligence (AI) in improving the trial efficiency of criminal cases. Using a dataset consisting of 500 criminal case records, including minor, ordinary, and complex cases, we applied machine learning (ML) and natural language processing (NLP) techniques to predict trial outcomes, reduce processing time, and improve judgment accuracy. The ML models, such as decision tree regression and support vector machines (SVM), were trained on historical case data to predict trial time and verdict accuracy. NLP was used to automate document generation and extract key legal information from trial records. Results showed that AI-assisted trials reduced average trial time by 40% and reduced error rates by 55% compared to traditional methods. The findings indicate that AI can significantly enhance judicial efficiency, but challenges related to AI implementation, scalability, and bias mitigation remain. Future research should focus on testing AI systems in diverse judicial contexts to address these issues.
Computational linguistics. Natural language processing, Electronic computers. Computer science
Résumé : Le changement de l’ère colonial à l’indépendance a des répercussions positives et néfastes sur les sociétés africaines. Ainsi, dans Xala Sembène Ousmane dépeint les séquelles de cette mutation dans une communauté sénégalaise dans la période postcoloniale. La problématique de cette recherche est le dilemme que traversent les autochtones après leur libération du joug colonial. Cette recherche se donne l’objectif de faire une étude psychanalytique de l’inconscient individuelle et collectif des personnages du roman pour déterminer l’impact de l’adultération de la culture et de la vie traditionnelle par la modernité. Pour ce faire nous nous servirons de la théorie psychanalytique pour analyser le comportement physique et la vie psychologique voire psychique des personnages. Les natifs conscients de la valeur de leur culture et tradition en profitent pour exploiter les vulnérables et acquérir des biens matériels et obtenir des gains financiers. Cette étude propose une analyse psychanalytique de Xala, en s’appuyant sur les concepts freudiens tels que le conflit entre le Ça, le Moi et le Surmoi, ainsi que sur les notions de désir, de castration symbolique et de culpabilité inconsciente. En examinant les comportements et les relations des personnages sous cet angle, nous mettrons en lumière les luttes psychologiques profondes qui sous-tendent les dynamiques sociales et individuelles décrites par Sembène.
Mots-clés : Psychanalytique, virilité, impuissance, tradition, modernité
Arts in general, Computational linguistics. Natural language processing
By applying the survey method, the article presents and summarizes the theoretical questions of language policy, the language consciousness of students and their attitudes towards the Ukrainian language both before and after the start of the current Russo–Ukrainian war. Students from four universities took part in the survey: Kyiv National Linguistic University, Ivan Franko National University of L'viv, V. N. Karazin Kharkiv National University, and Yuriy Fedkovych National University of Chernivtsi.
Computational linguistics. Natural language processing, Semantics
William Merrill, Yoav Goldberg, Roy Schwartz
et al.
AbstractLanguage models trained on billions of tokens have recently led to unprecedented results on many NLP tasks. This success raises the question of whether, in principle, a system can ever “understand” raw text without access to some form of grounding. We formally investigate the abilities of ungrounded systems to acquire meaning. Our analysis focuses on the role of “assertions”: textual contexts that provide indirect clues about the underlying semantics. We study whether assertions enable a system to emulate representations preserving semantic relations like equivalence. We find that assertions enable semantic emulation of languages that satisfy a strong notion of semantic transparency. However, for classes of languages where the same expression can take different values in different contexts, we show that emulation can become uncomputable. Finally, we discuss differences between our formal model and natural language, exploring how our results generalize to a modal setting and other semantic relations. Together, our results suggest that assertions in code or language do not provide sufficient signal to fully emulate semantic representations. We formalize ways in which ungrounded language models appear to be fundamentally limited in their ability to “understand”.
Computational linguistics. Natural language processing
Abstract Relation extraction between entity pairs is an increasingly critical area in natural language processing. Recently, the pre‐trained bidirectional encoder representation from transformer (BERT) performs excellently on the text classification or sequence labelling tasks. Here, the high‐level syntactic features that consider the dependency between each word and the target entities into the pre‐trained language models are incorporated. Our model also utilizes the intermediate layers of BERT to acquire different levels of semantic information and designs multi‐granularity features for final relation classification. Our model offers a momentous improvement over the published methods for the relation extraction on the widely used data sets.
Computational linguistics. Natural language processing, Computer software
German sentences with man and Italian sentences with si impersonale or si passivante are often presented as equivalent in contrastive grammars. However, this functional equation proves to be problematic when Italian students refer with man to their own role as authors, such as in: “Darauf wird man aber im folgenden Kapitel eingehen”. Evidently, man cannot refer to the speaker role, while in the same context the Italian si is well suitable. Starting from this interference error, the paper examines the possible range of reference of the two pronouns. It turns out that the most common reading of man and si in both languages is the generic one, which can be paraphrased as “everyone”. Systematic divergences, on the other hand, occur in the particular reading, i. e. when referring to single unspecified subjects. While the German man characterizes the subject as anonymous and does never include listeners or speakers (e. g: Gestern hat man bei uns eingebrochen; man ≈ ‘jemand’, ‘somebody’), the Italian si, according to the verb class (transitive, unergative, unaccusative, etc.), can or must be read as speaker-exclusive (Mi si è raccontato che ...; si ≈ ‘qualcuno’, ‘someone’) or as speaker-inclusive (Ieri si è andati al ristorante; si ≈ ‘noi’, ‘we’). The speaker-inclusive reading also occurs when si is used in academic texts as a substitute for the established form of speaker (author) reference by means of the 1st person plural (noi, ‘we’). In addition to man and si, other forms of indeterminate subjects are examined, namely the non-anaphoric uses of German “sie (pl.)” (Sie haben schon wieder die Preise erhöht.) and of the Italian 3rd person plural null subject (Ti hanno cercato.) as well as the so-called impersonal passive form in German (Es wird gemurmelt.).
Computational linguistics. Natural language processing, Language. Linguistic theory. Comparative grammar
Many studies about eTandem and language learning stem from learners in Western institutions of higher education. Unfortunately, there is a lack of research investigating the telecollaboration regarding language development between learners in the East and the West. Against this backdrop, a small-scale, six-week Chinese-English eTandem project focusing on learners’ language learning processes and experiences was undertaken between nine Chinese university students learning English in China and nine British university students learning Chinese in the UK. Multiple datasets were collected from learners’ diaries, synchronous Skype communication recordings, email exchanges, interviews and a post-project survey. This paper reports the main language error types made by Chinese L2 learners of English and error correction strategies provided by eTandem partners of competent L1 English speakers, along with how Chinese participants responded to the corrections. A thorough analysis of the research data indicated three types of linguistic errors in written tasks made by Chinese L2 learners of English: grammatical, lexical and idiomatic expressions. Another finding was that, although explicit written correction was the most commonly used strategy in email exchanges, learners preferred explanations with examples. In addition to previously established gains of eTandem learning, such as authentic communication, forging friendship and promoting intercultural awareness, positive responses to competent L1 partners’ error corrections was another major benefit indicated in our data. Our study pinpoints the importance of both pre-project training of participants on error-correction strategies with examples and how to respond to partner feedback in future eTandem projects.
Computational linguistics. Natural language processing
L’observation des échanges entre soignant et soigné constitue une voie d'accès privilégié pour l'examen de l'organisation des matériaux technolectaux mobilisés dans leurs pratiques discursives. Les analyses des données recueillies révèlent des stratégies différenciées de maniement des unités technolectales. Cette disjonction entre les façons de parler favorise des malentendus et des incompréhensions. Sous ce rapport, il devient évident que la maîtrise des technolectes est un gage assuré pour un échange fructueux entre soignant et soigné. Cet article poursuit l'objectif de montrer les traits les plus représentatifs des façons de parler de nos informateurs (les patients et les médecins). De cette représentation le lecteur déduira les obstacles inhérents au transfert d'information dans la relation médecin/patient.
Computational linguistics. Natural language processing, Language. Linguistic theory. Comparative grammar
While nation-branding campaigns have become a popular means for governments to attempt to improve their country’s standing on international indexes, such as the Anholt-Gfk Roper Nation Brand Index (NBI), the generally static ranking on such indexes suggests that national brands cannot simply be shaped by clever marketing campaigns. Instead, national brands rest on deeply rooted perceptions of a country’s character and identity, which often have much in common with popular stereotypes about the country. This article analyzes how several advertising campaigns in Germany and Denmark, sponsored by both governmental entities and private corporations, explicitly engage with and manipulate positive national stereotypes in order to shape public narratives about what their countries have to offer the world.
Computational linguistics. Natural language processing, Language. Linguistic theory. Comparative grammar
Das vorliegende Themenheft versammelt Beiträge, in denen der aktuelle Forschungsstand im Bereich der deutsch-polnischen und polnisch-deutschen Phraseologie und Parömiologie präsentiert wird. Die Idee zu diesem Themenheft ergab sich aus dem großen Interesse, dessen sich kontrastive Untersuchungen im Bereich der Phraseologie und Parömiologie erfreuen. Die Leistungen und Forschungsergebnisse der polnischen Germanistik sind daher nicht zu übersehen, zumal hier große Verdienste im Bereich der Phraseographie, Phraseodidaktik und der Untersuchung von semantischen Feldern in der Phraseologie aus kontrastiver Perspektive zu verzeichnen sind.
Computational linguistics. Natural language processing, Language. Linguistic theory. Comparative grammar