The article presents a comparative study of the sanctions policy of Japan and the Republic of Korea towards Russia after February 2022 and the assessment of the impact of their sanctions on the development of Russian-Japanese and Russian-South Korean cooperation in the Russian Far East. Based on the results of the study of the economic ties of Russia, Japan, and South Korea (in the areas of trade, investment, finance, tourism, and transport) and their interaction in the educational and cultural-humanitarian spheres under sanctions restrictions, the authors come to the conclusion that the sanctions policy of Japan and South Korea towards Russia has a common basis due to their belonging to the “collective West,” and the anti-Russian measures they take are aimed at weakening the industrial and technological potential of the Russian Federation.At the same time, like most of their Western partners, Japan and South Korea are not ready to impose such sanctions that could cause significant damage to their own economic and strategic interests. There are important differences in the sanctions approaches of Japan and South Korea – Japan pursues a much tougher policy towards Russia, not only limiting exports to Russia, but also imposing a ban on imports of a number of goods from Russia. South Korea is much more willing to maintain ties with Russia and its Far Eastern territories, despite the unfavorable political situation, which is expressed, in particular, in the ongoing oficial contacts between Primorsky Krai and Vladivostok and a number of provinces and municipalities of the Republic of Korea. The authors suggest that ties between the Russian Far East and South Korea can be quickly restored once the situation around Ukraine is resolved, while the prospects for restoring relations with Japan look much less certain.
Antoine Dussolle, Andrea Cardena D'iaz, Shota Sato
et al.
Instruction following is a core capability of modern Large language models (LLMs), making evaluating this capability essential to understanding these models. The Instruction Following Evaluation (IFEval) benchmark from the literature does this using objective criteria, offering a measure of LLM performance without subjective AI or human judgement. However, it only includes English instructions, limiting its ability to assess LLMs in other languages. We propose the Multilingual Instruction Following Evaluation (M-IFEval) benchmark, expanding the evaluation to French, Japanese, and Spanish, with both general and language-specific instructions. Applying this benchmark to 8 state-of-the-art LLMs, we find that benchmark performance across languages and instruction types can vary widely, underscoring the importance of a multilingual benchmark for evaluating LLMs in a diverse cultural context.
The article analyzes the Japanese official narrative about the Northern Territories, which is widespread in Japanese society as a key factor in the formation of the bad image of Russia in Japan. Of particular importance from the point of view of the emotional effect on public consciousness is the thesis that the Southern Kurils are the “ancestral territory of Japan,” that the USSR committed aggressive and unfair actions against Japan during World War II, and modern Russia did not correct them, and that the Japanese natives of the Southern Kurils experience enormous moral suffering, not having the opportunity to freely visit the graves of their ancestors. The article examines the organizational structure of state, public, and socio-political organizations designed to ensure public policy to popularize this narrative and shows the features of its reflection in school textbooks, museums, and memorial complexes. The author focuses on the Movement for the Return of the Northern Territories and the events held within its framework, including the annual “Northern Territories Day,” held on February 7.It is concluded that, despite all the efforts of the government, Japanese public opinion in reality turns out to be relatively poorly informed about the problem of the Northern Territories. At the same time, as generations change, the interest in this problem is gradually decreasing, especially among young people. There is a process of realizing the futility of maintaining a hard line in the government’s approach to solving it. The humanitarian aspect of the problem, related to visits to graves by former islanders and members of their families, causes the greatest public outcry, but even this aspect, as the results of public opinion polls show, has a limited effect.
Abstract Self-compassion, defined as compassion directed toward oneself in difficult situations, has been widely studied; however, the specific cognitive and behavioral patterns associated with it remain poorly understood. Participants (780 Japanese individuals; mean age = 43.0, SD = 10.6, range = 19–75) responded to 12 free-text prompts asking them to describe their typical thoughts and behaviors in three difficult situations (suffering, recognizing personal shortcomings, and experiencing failure). Employing structural topic modeling (one of the natural language processing techniques), we used participants’ scores on the Self-Compassion Scale (SCS) and the Compassionate Engagement and Action Scales (CEAS) as metadata to quantify the associations between self-compassion and each topic. The results revealed that higher self-compassion was linked to topics reflecting problem-solving orientation, balanced optimism, and flexible responses. Conversely, lower self-compassion was associated with self-criticism, upward social comparison, envy, and depressive inaction. These patterns varied by context: for example, among individuals with high self-compassion, balanced optimism predominated in contexts of suffering and failure, while flexible responses emerged when participants recognized personal shortcomings. Furthermore, the unique variance of the positive SCS items was associated with adaptive cognitive processes such as balanced optimism and flexible responses, whereas the unique variance of the CEAS was associated with problem-solving-oriented behavioral processes. This study advances the literature by offering context-sensitive, nuanced insights into self-compassion and demonstrates that a data-driven approach on large-scale free-text data can uncover nuanced processes that conventional rating scales may not capture.
In October 2025, the cabinet of Sanae Takaichi came to power in Japan. On October 4, she was elected President of the Liberal Democratic Party and became the first female prime minister in the history of Japan. On October 30, 2025, MGIMO University hosted a seminar on the political events of the past month: the internal party struggle during the election of the party president, the collapse of the ruling coalition due to the departure of the Komeito Party, as well as the formation of a new “minority coalition” with the participation of the Japan Innovation Party. The seminar participants analyzed these events in detail, considered the domestic political problems facing the Takaichi cabinet and gave their own vision of the prospects for the development of the situation, both in the socio-economic sphere and in the field of foreign policy. This publication contains a transcript of this seminar.
We argue that human language learning proceeds in a manner that is different in nature from current approaches to training LLMs, predicting a difference in learning biases. We then present evidence from German plural formation by LLMs that confirm our hypothesis that even very powerful implementations produce results that miss aspects of the logic inherent to language that humans have no problem with. We conclude that attention to the different structures of human language and artificial neural networks is likely to be an avenue to improve LLM performance.
John Pavlopoulos, Juli Bakagianni, Kanella Pouli
et al.
Natural Language Processing (NLP) for lesser-resourced languages faces persistent challenges, including limited datasets, inherited biases from high-resource languages, and the need for domain-specific solutions. This study addresses these gaps for Modern Greek through three key contributions. First, we evaluate the performance of open-source (Llama-70b) and closed-source (GPT-4o mini) large language models (LLMs) on seven core NLP tasks with dataset availability, revealing task-specific strengths, weaknesses, and parity in their performance. Second, we expand the scope of Greek NLP by reframing Authorship Attribution as a tool to assess potential data usage by LLMs in pre-training, with high 0-shot accuracy suggesting ethical implications for data provenance. Third, we showcase a legal NLP case study, where a Summarize, Translate, and Embed (STE) methodology outperforms the traditional TF-IDF approach for clustering \emph{long} legal texts. Together, these contributions provide a roadmap to advance NLP in lesser-resourced languages, bridging gaps in model evaluation, task innovation, and real-world impact.
We discuss the Japanese and universally Japanese properties for valuation rings and Prüfer domains. These properties, regarding finiteness of integral closure, have been studied extensively for Noetherian rings, but very rarely, if ever, for non-Noetherian rings. Among other results, we show that for valuation rings and Prüfer domains, the Japanese and universally Japanese properties are equivalent. This result can be seen as a counterpart to Nagata's classical result for Noetherian rings. This result also tells us many non-Noetherian rings, including all absolutely integrally closed valuation rings and Prüfer domains, are universally Japanese.
In this work, we propose JETHICS, a Japanese dataset for evaluating ethics understanding of AI models. JETHICS contains 78K examples and is built by following the construction methods of the existing English ETHICS dataset. It includes four categories based normative theories and concepts from ethics and political philosophy; and one representing commonsense morality. Our evaluation experiments on non-proprietary large language models (LLMs) and on GPT-4o reveal that even GPT-4o achieves only an average score of about 0.7, while the best-performing Japanese LLM attains around 0.5, indicating a relatively large room for improvement in current LLMs.
In this study, we evaluated the performance of the state-of-the-art sequence tagging grammar error detection and correction model (SeqTagger) using Japanese university students' writing samples. With an automatic annotation toolkit, ERRANT, we first evaluated SeqTagger's performance on error correction with human expert correction as the benchmark. Then a human-annotated approach was adopted to evaluate Seqtagger's performance in error detection using a subset of the writing dataset. Results indicated a precision of 63.66% and a recall of 20.19% for error correction in the full dataset. For the subset, after manual exclusion of irrelevant errors such as semantic and mechanical ones, the model shows an adjusted precision of 97.98% and an adjusted recall of 42.98% for error detection, indicating the model's high accuracy but also its conservativeness. Thematic analysis on errors undetected by the model revealed that determiners and articles, especially the latter, were predominant. Specifically, in terms of context-independent errors, the model occasionally overlooked basic ones and faced challenges with overly erroneous or complex structures. Meanwhile, context-dependent errors, notably those related to tense and noun number, as well as those possibly influenced by the students' first language (L1), remained particularly challenging.
Large language models have exhibited significant enhancements in performance across various tasks. However, the complexity of their evaluation increases as these models generate more fluent and coherent content. Current multilingual benchmarks often use translated English versions, which may incorporate Western cultural biases that do not accurately assess other languages and cultures. To address this research gap, we introduce KULTURE Bench, an evaluation framework specifically designed for Korean culture that features datasets of cultural news, idioms, and poetry. It is designed to assess language models' cultural comprehension and reasoning capabilities at the word, sentence, and paragraph levels. Using the KULTURE Bench, we assessed the capabilities of models trained with different language corpora and analyzed the results comprehensively. The results show that there is still significant room for improvement in the models' understanding of texts related to the deeper aspects of Korean culture.
Natural language processing (NLP) tasks in English and general domains are widely available and are often used to evaluate pre-trained language models. In contrast, fewer tasks are available for languages other than English and in the financial domain. Particularly, tasks in the Japanese and financial domains are limited. We develop two large datasets using data published by a Japanese central government agency. The datasets provide three Japanese financial NLP tasks, including 3- and 12-class classifications for categorizing sentences, along with a 5-class classification task for sentiment analysis. Our datasets are designed to be comprehensive and updated by leveraging an automatic update framework that ensures that the latest task datasets are publicly always available.
Humans are efficient language learners and inherently social creatures. Our language development is largely shaped by our social interactions, for example, the demonstration and feedback from caregivers. Contrary to human language learning, recent advancements in large language models have primarily adopted a non-interactive training paradigm, and refined pre-trained models through feedback afterward. In this work, we explore how corrective feedback from interactions influences neural language acquisition from scratch through systematically controlled experiments, assessing whether it contributes to word learning efficiency in language models. We introduce a trial-and-demonstration (TnD) learning framework that incorporates three distinct components: student trials, teacher demonstrations, and a reward conditioned on language competence at various developmental stages. Our experiments reveal that the TnD approach accelerates word acquisition for student models of equal and smaller numbers of parameters, and we highlight the significance of both trials and demonstrations. We further show that the teacher's choices of words influence students' word-specific learning efficiency, and a practice-makes-perfect effect is evident by a strong correlation between the frequency of words in trials and their respective learning curves. Our findings suggest that interactive language learning, with teacher demonstrations and active trials, can facilitate efficient word learning in language models.
The effects of language mismatch impact speech anti-spoofing systems, while investigations and quantification of these effects remain limited. Existing anti-spoofing datasets are mainly in English, and the high cost of acquiring multilingual datasets hinders training language-independent models. We initiate this work by evaluating top-performing speech anti-spoofing systems that are trained on English data but tested on other languages, observing notable performance declines. We propose an innovative approach - Accent-based data expansion via TTS (ACCENT), which introduces diverse linguistic knowledge to monolingual-trained models, improving their cross-lingual capabilities. We conduct experiments on a large-scale dataset consisting of over 3 million samples, including 1.8 million training samples and nearly 1.2 million testing samples across 12 languages. The language mismatch effects are preliminarily quantified and remarkably reduced over 15% by applying the proposed ACCENT. This easily implementable method shows promise for multilingual and low-resource language scenarios.
This study explores fine-tuning multilingual ASR (Automatic Speech Recognition) models, specifically OpenAI's Whisper-Tiny, to improve performance in Japanese. While multilingual models like Whisper offer versatility, they often lack precision in specific languages. Conversely, monolingual models like ReazonSpeech excel in language-specific tasks but are less adaptable. Using Japanese-specific datasets and Low-Rank Adaptation (LoRA) along with end-to-end (E2E) training, we fine-tuned Whisper-Tiny to bridge this gap. Our results show that fine-tuning reduced Whisper-Tiny's Character Error Rate (CER) from 32.7 to 20.8 with LoRA and to 14.7 with end-to-end fine-tuning, surpassing Whisper-Base's CER of 20.2. However, challenges with domain-specific terms remain, highlighting the need for specialized datasets. These findings demonstrate that fine-tuning multilingual models can achieve strong language-specific performance while retaining their flexibility. This approach provides a scalable solution for improving ASR in resource-constrained environments and languages with complex writing systems like Japanese.
As part of cultural documentation, literary works have the ability to record the conditions of the times and society in a nation, including gender issues as values that are built, emphasized, and disseminated in the community. As a period characterized by feudal society under the leadership of the Tokugawa clan, the Edo period (1603-1868) is known as the golden age of the development of traditional Japanese culture. Through a study of the play Sugawara Denju Tenarai Kagami (1746) which is one of the masterpieces of the Edo period, this study reveals the representation of masculinity that shows a hierarchical social construction between men and women. The method used in this study is a qualitative analysis through a gender approach from Wharton (2005) and Lindsey (2016), as well as a feminist criticism approach from Tyson (2015). The construction of masculinity in this play highlights the depiction of men as knights, which are associated with courage, loyalty, integrity, toughness and self-respect. In addition, the concept of masculinity is depicted as strongly tied to the determination of hierarchical and patriarchal social structures, as well as being a reflection of the gender ideology of the Edo period which puts the superiority of men as the central figure in socio-cultural life. The depiction of male qualities and characters that outperform female characters in this text shows the text’s strategy in strengthening the patriarchal paradigm, and clearly shows the function of this text as a locus for strengthening the implementation of patriarchy in the Edo period.
This article, based on Japanese sources, discusses the question of how the Ainu language interpreters’ guild was formed, what functions translators performed, and how their status changed in the period from the 17th to the 18th centuries. During this time, Japan pursued the policy of self-isolation, and all contacts with the outside world were closely controlled by the government. However, in the places where contact with foreign culture did occur, interpreters were needed. So, there were interpreters of Chinese, Korean, and Dutch languages. In the island of Hokkaido, where trade with the local Ainu took place, the interpreters of Ainu language were needed. In this article, the history of Ainu language interpreters and their first appearance is researched based on Japanese archive materials. The research also focuses on the functions the interpreters performed and their status in Japanese society at the time. There was a separate category of interpreters of the Ainu language in Matsumae, who were involved exclusively in important official events of the Matsumae clan. Their functions and positions in society, as well as the first mentions of Ainu language experts who succeeded in their profession, are also examined in detail. Particular attention is paid to the status and functions of the interpreters of the Ainu language in Ezo at the beginning of the 18th century, when a new basho trading system was introduced in Japan. The subject of the Ezo interpreters’ level of command of the Ainu language is also in the focus of the research. The study mentions the attempts to compile the first dictionaries of the Ainu language and the difficulties that came with it. The author concludes that the functions of interpreters of the Ainu language have undergone tremendous changes. In the 17th century, the services of interpreters were used only for the occasions of trade, as well as ceremonies of welcoming or escorting a ship. By the end of the 18th century, they stood at the forefront of the Japanese control of the Ainu. Their rights and obligations were so extensive that, in fact, they, as representatives of local authorities, completely controlled the Ainu people.
Safety lies at the core of developing and deploying large language models (LLMs). However, previous safety benchmarks only concern the safety in one language, e.g. the majority language in the pretraining data such as English. In this work, we build the first multilingual safety benchmark for LLMs, XSafety, in response to the global deployment of LLMs in practice. XSafety covers 14 kinds of commonly used safety issues across 10 languages that span several language families. We utilize XSafety to empirically study the multilingual safety for 4 widely-used LLMs, including both close-API and open-source models. Experimental results show that all LLMs produce significantly more unsafe responses for non-English queries than English ones, indicating the necessity of developing safety alignment for non-English languages. In addition, we propose several simple and effective prompting methods to improve the multilingual safety of ChatGPT by evoking safety knowledge and improving cross-lingual generalization of safety alignment. Our prompting method can significantly reduce the ratio of unsafe responses from 19.1% to 9.7% for non-English queries. We release our data at https://github.com/Jarviswang94/Multilingual_safety_benchmark.