The success of pretrained transformer language models (LMs) in natural language processing has led to a wide range of pretraining setups. In particular, these models employ a variety of subword tokenization methods, most notably byte-pair encoding (BPE) (Sennrich et al., 2016; Gage, 1994), the WordPiece method (Schuster and Nakajima, 2012), and unigram language modeling (Kudo, 2018), to segment text. However, to the best of our knowledge, the literature does not contain a direct evaluation of the impact of tokenization on language model pretraining. We analyze differences between BPE and unigram LM tokenization, finding that the latter method recovers subword units that align more closely with morphology and avoids problems stemming from BPE’s greedy construction procedure. We then compare the fine-tuned task performance of identical transformer masked language models pretrained with these tokenizations. Across downstream tasks and two languages (English and Japanese), we find that the unigram LM tokenization method matches or outperforms BPE. We hope that developers of future pretrained LMs will consider adopting the unigram LM method over the more prevalent BPE.
As Large Language Models (LLMs) are increasingly deployed in healthcare field, it becomes essential to carefully evaluate their medical safety before clinical use. However, existing safety benchmarks remain predominantly English-centric, and test with only single-turn prompts despite multi-turn clinical consultations. To address these gaps, we introduce JMedEthicBench, the first multi-turn conversational benchmark for evaluating medical safety of LLMs for Japanese healthcare. Our benchmark is based on 67 guidelines from the Japan Medical Association and contains over 50,000 adversarial conversations generated using seven automatically discovered jailbreak strategies. Using a dual-LLM scoring protocol, we evaluate 27 models and find that commercial models maintain robust safety while medical-specialized models exhibit increased vulnerability. Furthermore, safety scores decline significantly across conversation turns (median: 9.5 to 5.0, $p < 0.001$). Cross-lingual evaluation on both Japanese and English versions of our benchmark reveals that medical model vulnerabilities persist across languages, indicating inherent alignment limitations rather than language-specific factors. These findings suggest that domain-specific fine-tuning may accidentally weaken safety mechanisms and that multi-turn interactions represent a distinct threat surface requiring dedicated alignment strategies.
Yoshihiro Nakano, Georg Langeder, Daniel Taubinger
The Regionalism of Tamanoi Yoshirō: Its Timeliness and Potential for the Anthropocene
玉野井芳郎の地域主義:人新世におけるその現代性と可能性
Written by Nakano Yoshihiro
Translation by Georg Langeder and Daniel Taubinger
In 2024, a two-volume edition of N.N. Trubnikova’s translation and study of Shasekishū was republished. Shasekishū is a 13th-century Buddhist collection of setsuwa didactic tales. The compiler of the anthology, monk Mujū Ichien (1226–1312), presents diverse narratives borrowed from numerous sources, providing them with religious-philosophical commentary, vivid and often very detailed. The second volume includes the researcher’s essays on Mujū Ichien himself, the historical and cultural context of the compilation’s creation, the setsuwa genre as a whole, and various aspects that were addressed in the monk’s discourses, alongside other supplementary materials.
Andrew Gambardella, Takeshi Kojima, Yusuke Iwasawa
et al.
Typical methods for evaluating the performance of language models evaluate their ability to answer questions accurately. These evaluation metrics are acceptable for determining the extent to which language models can understand and reason about text in a general sense, but fail to capture nuanced capabilities, such as the ability of language models to recognize and obey rare grammar points, particularly in languages other than English. We measure the perplexity of language models when confronted with the "first person psych predicate restriction" grammar point in Japanese. Weblab is the only tested open source model in the 7-10B parameter range which consistently assigns higher perplexity to ungrammatical psych predicate sentences than grammatical ones. We give evidence that Weblab's uniformly bad tokenization is a possible root cause for its good performance, and show that Llama 3's perplexity on grammatical psych predicate sentences can be reduced by orders of magnitude (28x difference) by restricting test sentences to those with uniformly well-behaved tokenizations. We show in further experiments on machine translation tasks that language models will use alternative grammar patterns in order to produce grammatical sentences when tokenization issues prevent the most natural sentence from being output.
Badr AlKhamissi, Greta Tuckute, Yingtian Tang
et al.
Large language models (LLMs) exhibit remarkable similarity to neural activity in the human language network. However, the key properties of language shaping brain-like representations, and their evolution during training as a function of different tasks remain unclear. We here benchmark 34 training checkpoints spanning 300B tokens across 8 different model sizes to analyze how brain alignment relates to linguistic competence. Specifically, we find that brain alignment tracks the development of formal linguistic competence -- i.e., knowledge of linguistic rules -- more closely than functional linguistic competence. While functional competence, which involves world knowledge and reasoning, continues to develop throughout training, its relationship with brain alignment is weaker, suggesting that the human language network primarily encodes formal linguistic structure rather than broader cognitive functions. We further show that model size is not a reliable predictor of brain alignment when controlling for feature size and find that the correlation between next-word prediction, behavioral alignment and brain alignment fades once models surpass human language proficiency. Finally, using the largest set of rigorous neural language benchmarks to date, we show that language brain alignment benchmarks remain unsaturated, highlighting opportunities for improving future models. Taken together, our findings suggest that the human language network is best modeled by formal, rather than functional, aspects of language.
According to Futrell and Mahowald [arXiv:2501.17047], both infants and language models (LMs) find attested languages easier to learn than impossible languages that have unnatural structures. We review the literature and show that LMs often learn attested and many impossible languages equally well. Difficult to learn impossible languages are simply more complex (or random). LMs are missing human inductive biases that support language acquisition.
Contrastive pre-training on large-scale image-text pair datasets has driven major advances in vision-language representation learning. Recent work shows that pretraining on global data followed by language or culture specific fine-tuning is effective for improving performance in target domains. With the availability of strong open-weight multilingual models such as SigLIP2, this paradigm has become increasingly practical. However, for Japanese, the scarcity of large-scale, high-quality image-text pair datasets tailored to Japanese language and cultural content remains a key limitation. To address this gap, we introduce WAON, the largest Japanese image-text pair dataset constructed from Japanese web content in Common Crawl, containing approximately 155 million examples. Our dataset construction pipeline employs filtering and deduplication to improve dataset quality. To improve the quality and reliability of evaluation on Japanese cultural tasks, we also construct WAON-Bench, a manually curated benchmark for Japanese cultural image classification comprising 374 classes, which addresses issues in the existing benchmark such as category imbalance and label-image mismatches. Our experiments demonstrate that fine-tuning on WAON improves model performance on Japanese cultural benchmarks more efficiently than existing datasets, achieving state-of-the-art results among publicly available models of comparable architecture. We release our dataset, model, and code.
This paper presents ElliottAgents, a multi-agent system leveraging natural language processing (NLP) and large language models (LLMs) to analyze complex stock market data. The system combines AI-driven analysis with the Elliott Wave Principle to generate human-comprehensible predictions and explanations. A key feature is the natural language dialogue between agents, enabling collaborative analysis refinement. The LLM-enhanced architecture facilitates advanced language understanding, reasoning, and autonomous decision-making. Experiments demonstrate the system's effectiveness in pattern recognition and generating natural language descriptions of market trends. ElliottAgents contributes to NLP applications in specialized domains, showcasing how AI-driven dialogue systems can enhance collaborative analysis in data-intensive fields. This research bridges the gap between complex financial data and human understanding, addressing the need for interpretable and adaptive prediction systems in finance.
The archaeological direction in the Japanese studies in Russia originated at the end of the 19th century on the basis of the first trips and acquaintance with the antiquities of Japan (M.I. Venyukov, A.V. Grigoriev, I.S. Polyakov, D.M. Pozdneev), and transformed into an original direction in the Soviet period. The fruitful dialogue between Russian and Japanese archaeologists is largely due to both the territorial proximity and common roots of ancient cultures, starting from the Stone Age, as well as mutual interest in the archaeology of the Pacific basin as a whole. Since the early 1960s, one of the leading roles in this collaboration is played by the Novosibirsk Scientific Center (Institute of History, Philology and Philosophy, Siberian Branch of the Academy of Sciences of the USSR, Faculty of Humanities of NSU) and such specialists as A.P. Okladnikov, A.P. Derevyanko, R.S. Vasilevsky, and V.E. Larichev. In the first post-Soviet decade, there was a transition to new formats – long-term joint projects and archaeological expeditions, which are carried out on the basis of bilateral agreements between research organizations in Japan and Russian institutes (universities, museums) from a number of cities in Siberia and the Russian Far East. Cooperation reached its peak in 2007/8–2019, enjoying the support of Russian (RSFH, RFBR, RSF) and Japanese scientific foundations, and was implemented in a variety of formats (projects, exchanges, internships, symposiums, exhibitions, publications, etc.) and in a variety of geographical areas, both in Russia and Japan, as well as third countries – in Central Asia (Mongolia) and South America (Ecuador). One of the striking examples of such interaction is the fruitful cooperation of the Division of Foreign Archaeology (Institute of Archaeology and Ethnography SB RAS, Novosibirsk) and the Laboratory of Archaeology (Tohoku University, Sendai), resulting in a large number of publications in leading scientific journals and several dissertation studies on the Jomon and Kofun periods.
Japan and China are in the list of resource-deficient countries, and this fact largely determines the importance of pursuing a targeted policy for the development of low-carbon energy sectors. This refers, first and foremost, to renewable energy and hydrogen, but it is also worth taking into account the growing role of peaceful nuclear energy – Japan, despite the consequences of the Fukushima accident, is gradually increasing the share of nuclear power plants in its generation structure. The authors show that the tasks of achieving “net zero emissions” facing Japan and China have many similar features. The countries under consideration, undoubtedly, have different financial and economic resources, different potential and available capacity of their domestic markets for the implementation of low-carbon energy technologies and products in a broad sense. However, they build public policy in this direction based on the incentive mechanisms that create their own technological foundation, taking into account what advanced solutions are being developed and implemented by their closest competitors. Over the past two decades, Japan and China have elaborated the format of energy cooperation where the emphasis was placed on the supply of energy and transport equipment from Japan, as well as investments by Japanese companies in the construction of various energy infrastructure objects in China. However, at present, China already has sufficient technological potential not only to meet its needs in the production of equipment and components for low-carbon energy, but also to export products of this type. Accordingly, Japanese corporations that used to hold leading positions in the renewable energy segment as suppliers of necessary equipment and initiators of technology transfer to developing countries (mainly in Southeast and South Asia), nowadays face competition with Chinese producers. It is shown that, despite the existing controversy, Japan and China keep a high level of bilateral contacts through intergovernmental organizations and funds, joint research centers, various private business cooperation mechanisms and schemes, all of which foster the implementation of large projects.
Secular book printing began to spread in Japan since the beginning of the 17th century. From the middle of the 17th century, woodcut was completely dominant. The repertoire of publications was wide, including old texts written long before the Tokugawa period. Since commercial printing assumed that the book would be bought, only relevant old texts were published. The printed edition significantly expanded the circle of book readers. The Seiashō (Notes by a Frog from a Well) by Tonna (1289–1372) belongs to the karon genre (treatises on poetry) and is a guide for aspiring poets writing waka (Japanese songs). The text was published for the first time in 1648 and the first illustrated edition appeared in 1686, reprinted in 1709. The illustrator is believed to be Hishikawa Moronobu (1618–1694), although the book does not contain the artist’s name. The second illustrated edition dates back to 1752. This edition uses illustrations by Tachibana Morikuni (1679–1748). In both editions, illustrations are made on separate sheets, occupying a whole page. The illustrations are monochrome and include a drawing (a landscape illustrating the text of the poem) and an inscription of the poem at the top. An analysis and comparison of these two editions makes it possible to see some trends related to both printing itself and a number of more general cultural issues. The understanding of authorship receives a “visible” embodiment: in the first edition, neither the author of the text, nor the artist are identified, while the colophon of the second edition contains the names of both. During the time that has elapsed between the release of these two editions, the role of illustrations has grown significantly. The edition from the end of the 17th century contains 24 illustrations, and the book was made in such a way that it can exist in a version without illustrations; there, illustrations play a supporting role. The edition of the mid-18th century contains 80 illustrations, and they can be distributed in the text of the book or concentrated in one place, making this edition close to the ehon books.
This special issue examines representations and constructions of pregnancy, childbirth, and breastfeeding in contemporary Japanese fiction in a selection of literary texts from the 2010s to the 2020s. It thus joins ongoing conversations and existing studies concerned with the representation of reproduction and motherhood in modern and contemporary Japanese culture (Saito, 1994; Seaman, 2016; Castellini, 2017; Harada, 2021). However, the essays in this section focus on depictions of pregnancy, childbirth, and breastfeeding in terms of narrating bodies as a way to articulate women’s experiences of physical and psychological oppression within Japanese society and redefine new forms of mothering, fathering, and parenting. This research investigates the ambivalence and complexity around motherhood and embodiment in contemporary women’s fiction. At the same time, it explores the connections between literary studies and contemporary sociocultural dynamics of gender and family.
Document question answering is a task of question answering on given documents such as reports, slides, pamphlets, and websites, and it is a truly demanding task as paper and electronic forms of documents are so common in our society. This is known as a quite challenging task because it requires not only text understanding but also understanding of figures and tables, and hence visual question answering (VQA) methods are often examined in addition to textual approaches. We introduce Japanese Document Question Answering (JDocQA), a large-scale document-based QA dataset, essentially requiring both visual and textual information to answer questions, which comprises 5,504 documents in PDF format and annotated 11,600 question-and-answer instances in Japanese. Each QA instance includes references to the document pages and bounding boxes for the answer clues. We incorporate multiple categories of questions and unanswerable questions from the document for realistic question-answering applications. We empirically evaluate the effectiveness of our dataset with text-based large language models (LLMs) and multimodal models. Incorporating unanswerable questions in finetuning may contribute to harnessing the so-called hallucination generation.
This research explores the applicability of cross-lingual transfer learning from English to Japanese and Indonesian using the XLM-R pre-trained model. The results are compared with several previous works, either by models using a similar zero-shot approach or a fully-supervised approach, to provide an overview of the zero-shot transfer learning approach's capability using XLM-R in comparison with existing models. Our models achieve the best result in one Japanese dataset and comparable results in other datasets in Japanese and Indonesian languages without being trained using the target language. Furthermore, the results suggest that it is possible to train a multi-lingual model, instead of one model for each language, and achieve promising results.
Computational modeling plays an essential role in the study of language emergence. It aims to simulate the conditions and learning processes that could trigger the emergence of a structured language within a simulated controlled environment. Several methods have been used to investigate the origin of our language, including agent-based systems, Bayesian agents, genetic algorithms, and rule-based systems. This chapter explores another class of computational models that have recently revolutionized the field of machine learning: deep learning models. The chapter introduces the basic concepts of deep and reinforcement learning methods and summarizes their helpfulness for simulating language emergence. It also discusses the key findings, limitations, and recent attempts to build realistic simulations. This chapter targets linguists and cognitive scientists seeking an introduction to deep learning as a tool to investigate language evolution.
Hansi Hettiarachchi, Tharindu Ranasinghe, Paul Rayson
et al.
The first Workshop on Language Models for Low-Resource Languages (LoResLM 2025) was held in conjunction with the 31st International Conference on Computational Linguistics (COLING 2025) in Abu Dhabi, United Arab Emirates. This workshop mainly aimed to provide a forum for researchers to share and discuss their ongoing work on language models (LMs) focusing on low-resource languages, following the recent advancements in neural language models and their linguistic biases towards high-resource languages. LoResLM 2025 attracted notable interest from the natural language processing (NLP) community, resulting in 35 accepted papers from 52 submissions. These contributions cover a broad range of low-resource languages from eight language families and 13 diverse research areas, paving the way for future possibilities and promoting linguistic inclusivity in NLP.
There is a rich body of literature that details the effects of automated writing evaluation (AWE) on second language (L2) students. However, these studies mostly focus on the impact that automated feedback has on writing performance, i.e.that is, there is a dearth of research on its influence on affective factors. Hence, this study was conducted to fill this gap in the literature. The study explored the impact of Grammarly, a popular AWE tool, on English as a foreign language (EFL) students’ foreign language anxiety (FLA) and learner autonomy (LA). EFL students in four separate academic writing courses (N = 58) taught by one of the researchers at a public Japanese university participated in the study. The students received training on Grammarly at the start of the Fall 2022 semester and were required to use the tool while editing their English writing during the 16-week course. Pre- and post-surveys were administered to measure the effects that Grammarly had on FLA and LA. Qualitative data in the form of written reflective reports was also collected from the participants to gain deeper insight into their perceptions of Grammarly to improve their writing. Results from the analyses indicated that Grammarly had a significant positive effect on both FLA and LA. The students also had largely positive perceptions toward Grammarly as an English writing tool. These findings have important implications for the L2 writing classroom and demonstrate that AWE can be used to reduce anxiety and promote autonomy among language learners.
The burden of respiratory syncytial virus (RSV), which causes acute respiratory illness, is well recognized among the pediatric population but also imposes a significant risk to the elderly (age ≥ 60) and those with underlying comorbidities. The study aimed to review the most recent data on epidemiology and burden (clinical and economic) of RSV in the elderly/high-risk populations in China, Japan, South Korea, Taiwan, and Australia. A targeted review was conducted of English, Japanese, Korean, and Chinese language articles published from 1 January 2010 to 7 October 2020 relevant for the purpose. A total of 881 studies were identified, and 41 were included. The median proportion of elderly patients with RSV in all adult patients with acute respiratory infection (ARI) or community acquired pneumonia was 79.78% (71.43–88.12%) in Japan, 48.00% (3.64–80.00%) in China, 41.67% (33.33–50.00%) in Taiwan, 38.61% in Australia, and 28.57% (22.76–33.33%) in South Korea. RSV was associated with a high clinical burden on those patients with comorbidities such as asthma and chronic obstructive pulmonary disease. In China, inpatients with ARI showed a significantly higher rate of RSV-related hospitalization than outpatients (13.22% versus 4.08%, p < 0.01). The median length of hospital stay among elderly patients with RSV was longest in Japan (30 days) and shortest in China (7 days). Mortality data varied by region with some studies reporting rates as high as 12.00% (9/75) in hospitalized elderly patients. Finally, data on the economic burden was only available for South Korea, with the median cost of a medical admission for an elderly patient with RSV being US dollar (USD) 2933. RSV infection is a major source of disease burden among elderly patients, especially in regions with aging populations. It also complicates the management of those with underlying diseases. Appropriate prevention strategies are required to reduce the burden among the adult, especially the elderly, population. Data gaps regarding economic burden of RSV infection in the Asia Pacific region indicates the need for further research to increase our understanding on the burden of this disease in this region.