The article analyses the role of author’s commentaries, realized in the form of page footnotes, within the framework of a work of fiction. Shirō Masamune’s manga “Ghost in the Shell” is chosen for analysis because here the author forms a special textual situation when commentaries are not the dominant (as it happens, for example, in V. Nabokov’s “Pale Fire”) or equal (as it happens in D. F. Wallace’s “Infinite Jest”) aspect of the text, but an aspect which, realizing its marginal status, strives to perform several functions in the text. Firstly, the author’s commentaries, as a specific para-text, emphasize the status of manga as a book and give the reader an opportunity to choose one of the possible reading options, thus including the reader in the unfolding of the text. Secondly, the space of commentaries becomes a space of manifestation of the author’s presence, who, through the footnotes, creates essays on science and religion, on the interaction between man and technology, and thus seeks to better articulate the main philosophical theses realized through drawing and narrative in “Ghost in the Shell.” Third, the commentaries allow the author to shift the focus from the narrative to the a-narrative aspects of the text, the very space and world in which the action unfolds. Fourthly, the commentaries emphasize the changing role of the author, who turns from an omnipotent demiurge and creator into a chronicler, a reporter, whose task is to record that what is happening, which unfolds according to the laws formulated by the interaction between characters and spaces, spaces and spaces, spaces and things, things and characters. Thus, through the commentaries, a situation is formed in which the diegetic world seems to acquire a conditional autonomy, the ability to exist outside the narrative, which tells about the adventures of Major Kusanagi Motoko and her subordinates. The author’s commentaries thus, through the clarification of illustrations, characters’ actions, and attention to detail, strengthen the base that facilitates the transference of the plot into other media.
The claim that the legendary thief Ishikawa Goemon attempted to assassinate the warlord Oda Nobunaga by dripping poison down a thread into the latter’s mouth is a staple of English-language histories of the so-called ‘ninja.’ Despite its widespread circulation in popular histories of Japan, there is good reason to believe that this famous assassination attempt never actually happened. In this article, I trace the Ishikawa Goemon legend through a range of Japanese-language documentary and literary sources, attempting to find a source for the poison-thread tale. I conclude that the story is not only fiction but modern fiction, resulting from a misunderstanding of the climactic scene of a 1962 ninja movie, Shinobi no mono, as depicting an historical event. The poison-thread technique, I also suggest, is not an authentic historical technique at all but a borrowing from a 1925 novel by the mystery writer Edogawa Ranpo. The article concludes by exploring how the poison-thread story managed to circulate unchallenged for more than fifty years, and by offering some observations on the serious methodological flaws of English-language ‘ninja’ histories to date.
Language and Literature, Japanese language and literature
Fernando Rodríguez-Izquierdo y Gavala (Seville, 1937 – Seville, January 8, 2025), who earned a degree in Japanese Language and Culture from Sophia University in Tokyo, held a bachelor’s degree in philosophy and Letters, as well as in Hispanic Philology and Classical Philology, and a PhD from the University of Seville. He was one of the most important pioneers of Japanese studies in Spain. He lived in Japan for three and a half years and, after returning to Spain, worked as a Senior Lecturer in Hispanic Philology at the University of Seville (1975–2006). His doctoral thesis, The Japanese Haiku, marked the beginning of an extensive body of research on this subject, about which he published numerous works and gave countless lectures, presentations, and papers. He was the foremost haiku specialist in the Spanish-speaking world and its greatest promoter. In addition, he gifted us with outstanding translations, not only of haikus but also of numerous significant works by exceptional authors of both classical and contemporary Japanese literature.
Social sciences and state - Asia (Asian studies only), Social sciences (General)
Jacqueline Rowe, Mateusz Klimaszewski, Liane Guillou
et al.
Large language models increasingly support multiple languages, yet most benchmarks for gender bias remain English-centric. We introduce EuroGEST, a dataset designed to measure gender-stereotypical reasoning in LLMs across English and 29 European languages. EuroGEST builds on an existing expert-informed benchmark covering 16 gender stereotypes, expanded in this work using translation tools, quality estimation metrics, and morphological heuristics. Human evaluations confirm that our data generation method results in high accuracy of both translations and gender labels across languages. We use EuroGEST to evaluate 24 multilingual language models from six model families, demonstrating that the strongest stereotypes in all models across all languages are that women are 'beautiful', 'empathetic' and 'neat' and men are 'leaders', 'strong, tough' and 'professional'. We also show that larger models encode gendered stereotypes more strongly and that instruction finetuning does not consistently reduce gendered stereotypes. Our work highlights the need for more multilingual studies of fairness in LLMs and offers scalable methods and resources to audit gender bias across languages.
Recent advances in large language models (LLMs) have demonstrated notable performance in medical licensing exams. However, comprehensive evaluation of LLMs across various healthcare roles, particularly in high-stakes clinical scenarios, remains a challenge. Existing benchmarks are typically text-based, English-centric, and focus primarily on medicines, which limits their ability to assess broader healthcare knowledge and multimodal reasoning. To address these gaps, we introduce KokushiMD-10, the first multimodal benchmark constructed from ten Japanese national healthcare licensing exams. This benchmark spans multiple fields, including Medicine, Dentistry, Nursing, Pharmacy, and allied health professions. It contains over 11588 real exam questions, incorporating clinical images and expert-annotated rationales to evaluate both textual and visual reasoning. We benchmark over 30 state-of-the-art LLMs, including GPT-4o, Claude 3.5, and Gemini, across both text and image-based settings. Despite promising results, no model consistently meets passing thresholds across domains, highlighting the ongoing challenges in medical AI. KokushiMD-10 provides a comprehensive and linguistically grounded resource for evaluating and advancing reasoning-centric medical AI across multilingual and multimodal clinical tasks.
Similar to LLMs, the development of vision language models is mainly driven by English datasets and models trained in English and Chinese language, whereas support for other languages, even those considered high-resource languages such as German, remains significantly weaker. In this work we present an analysis of open-weight VLMs on factual knowledge in the German and English language. We disentangle the image-related aspects from the textual ones by analyzing accu-racy with jury-as-a-judge in both prompt languages and images from German and international contexts. We found that for celebrities and sights, VLMs struggle because they are lacking visual cognition of German image contents. For animals and plants, the tested models can often correctly identify the image contents ac-cording to the scientific name or English common name but fail in German lan-guage. Cars and supermarket products were identified equally well in English and German images across both prompt languages.
Rapid advancements of large language model (LLM) technologies led to the introduction of powerful open-source instruction-tuned LLMs that have the same text generation quality as the state-of-the-art counterparts such as GPT-4. While the emergence of such models accelerates the adoption of LLM technologies in sensitive-information environments the authors of such models don not disclose the training data necessary for replication of the results thus making the achievements model-exclusive. Since those open-source models are also multilingual this in turn reduces the benefits of training a language specific LLMs as improved inference computation efficiency becomes the only guaranteed advantage of such costly procedure. More cost-efficient options such as vocabulary extension and subsequent continued pre-training are also inhibited by the lack of access to high-quality instruction-tuning data since it is the major factor behind the resulting LLM task-solving capabilities. To address the limitations and cut the costs of the language adaptation pipeline we propose Learned Embedding Propagation (LEP). Unlike existing approaches our method has lower training data size requirements due to minimal impact on existing LLM knowledge which we reinforce using novel ad-hoc embedding propagation procedure that allows to skip the instruction-tuning step and instead implant the new language knowledge directly into any existing instruct-tuned variant. We evaluated four Russian vocabulary adaptations for LLaMa-3-8B and Mistral-7B, showing that LEP is competitive with traditional instruction-tuning methods, achieving performance comparable to OpenChat 3.5 and LLaMa-3-8B-Instruct, with further improvements via self-calibration and continued tuning enhancing task-solving capabilities.
Numerous benchmarks aim to evaluate the capabilities of Large Language Models (LLMs) for causal inference and reasoning. However, many of them can likely be solved through the retrieval of domain knowledge, questioning whether they achieve their purpose. In this review, we present a comprehensive overview of LLM benchmarks for causality. We highlight how recent benchmarks move towards a more thorough definition of causal reasoning by incorporating interventional or counterfactual reasoning. We derive a set of criteria that a useful benchmark or set of benchmarks should aim to satisfy. We hope this work will pave the way towards a general framework for the assessment of causal understanding in LLMs and the design of novel benchmarks.
Japan is famous for its politeness in language. Politeness in Japanese is inseparable from culture. Anime is a form of global culture. Anime is also well-known as one of the reasons Japanese language learners are interested in Japanese. It is this background that makes the writer research "Strategy of Politeness through Japanese Respect in the Anime My Next Life as a Villainess: All Routes Lead to Doom Season 1". This data source is used because there are many uses of Japanese language honorifics in the conversations of the nobility. Honorifics found in data are analyzed using Yule's speech act theory (1996), Brown and Levinson's politeness strategies (1987), and Kabaya's Japanese language honorifics theory (2008). Based on the analysis results, many on-record strategies with positive politeness were found due to the status of the speech participants from among the nobility, who are used to various forms of respect. The don't do the FTA strategy was not found because the participants said their intentions clearly. The most widely used form of respect is sonkeigo because it respects the speech partner, and there is a difference in status. Factors in the use of honorifics that influence the use of sonkeigo and kenjougo are human relations, feelings and forms of delivery.
In this technological era, social media has an important role in all fields. Various types of social media are used by the public, one of which is TikTok. TikTok's diverse video content provides benefits for its users. One type of video content that can be found on TikTok is learning video content that can be used as an alternative learning resource or learning media. One of the many learning video content found on TikTok is language learning video content. For example, Japanese language learning video content. The learning video content needs to be sorted first so that its use can be maximized. For this reason, this study aims to determine the type of learning and the level of Japanese language profiency contained in the Japanese language learning video content on TikTok. The method used in this study is a qualitative descriptive method which is providing detailed and factual data interpretation. Based on the research, there are 7 types of Japanese language learning (Aisatsu, Bunka, Bunpou, Goi, Kaiwa, Linguistics, and Moji) that can be found on TikTok video content. Meanwhile, the level of Japanese language proficiency that be found is level A1 to level B2 and level N5 to level N1.
Elmurod Kuriyozov, Ulugbek Salaev, Sanatbek Matlatipov
et al.
Text classification is an important task in Natural Language Processing (NLP), where the goal is to categorize text data into predefined classes. In this study, we analyse the dataset creation steps and evaluation techniques of multi-label news categorisation task as part of text classification. We first present a newly obtained dataset for Uzbek text classification, which was collected from 10 different news and press websites and covers 15 categories of news, press and law texts. We also present a comprehensive evaluation of different models, ranging from traditional bag-of-words models to deep learning architectures, on this newly created dataset. Our experiments show that the Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN) based models outperform the rule-based models. The best performance is achieved by the BERTbek model, which is a transformer-based BERT model trained on the Uzbek corpus. Our findings provide a good baseline for further research in Uzbek text classification.
Jungo Kasai, Yuhei Kasai, Keisuke Sakaguchi
et al.
As large language models (LLMs) gain popularity among speakers of diverse languages, we believe that it is crucial to benchmark them to better understand model behaviors, failures, and limitations in languages beyond English. In this work, we evaluate LLM APIs (ChatGPT, GPT-3, and GPT-4) on the Japanese national medical licensing examinations from the past five years, including the current year. Our team comprises native Japanese-speaking NLP researchers and a practicing cardiologist based in Japan. Our experiments show that GPT-4 outperforms ChatGPT and GPT-3 and passes all six years of the exams, highlighting LLMs' potential in a language that is typologically distant from English. However, our evaluation also exposes critical limitations of the current LLM APIs. First, LLMs sometimes select prohibited choices that should be strictly avoided in medical practice in Japan, such as suggesting euthanasia. Further, our analysis shows that the API costs are generally higher and the maximum context size is smaller for Japanese because of the way non-Latin scripts are currently tokenized in the pipeline. We release our benchmark as Igaku QA as well as all model outputs and exam metadata. We hope that our results and benchmark will spur progress on more diverse applications of LLMs. Our benchmark is available at https://github.com/jungokasai/IgakuQA.
Kunal Handa, Margaret Clapper, Jessica Boyle
et al.
Teachers' growth mindset supportive language (GMSL)--rhetoric emphasizing that one's skills can be improved over time--has been shown to significantly reduce disparities in academic achievement and enhance students' learning outcomes. Although teachers espouse growth mindset principles, most find it difficult to adopt GMSL in their practice due the lack of effective coaching in this area. We explore whether large language models (LLMs) can provide automated, personalized coaching to support teachers' use of GMSL. We establish an effective coaching tool to reframe unsupportive utterances to GMSL by developing (i) a parallel dataset containing GMSL-trained teacher reframings of unsupportive statements with an accompanying annotation guide, (ii) a GMSL prompt framework to revise teachers' unsupportive language, and (iii) an evaluation framework grounded in psychological theory for evaluating GMSL with the help of students and teachers. We conduct a large-scale evaluation involving 174 teachers and 1,006 students, finding that both teachers and students perceive GMSL-trained teacher and model reframings as more effective in fostering a growth mindset and promoting challenge-seeking behavior, among other benefits. We also find that model-generated reframings outperform those from the GMSL-trained teachers. These results show promise for harnessing LLMs to provide automated GMSL feedback for teachers and, more broadly, LLMs' potentiality for supporting students' learning in the classroom. Our findings also demonstrate the benefit of large-scale human evaluations when applying LLMs in educational domains.
Javier de la Rosa, Álvaro Pérez Pozo, Salvador Ros
et al.
The computational analysis of poetry is limited by the scarcity of tools to automatically analyze and scan poems. In a multilingual settings, the problem is exacerbated as scansion and rhyme systems only exist for individual languages, making comparative studies very challenging and time consuming. In this work, we present \textsc{Alberti}, the first multilingual pre-trained large language model for poetry. Through domain-specific pre-training (DSP), we further trained multilingual BERT on a corpus of over 12 million verses from 12 languages. We evaluated its performance on two structural poetry tasks: Spanish stanza type classification, and metrical pattern prediction for Spanish, English and German. In both cases, \textsc{Alberti} outperforms multilingual BERT and other transformers-based models of similar sizes, and even achieves state-of-the-art results for German when compared to rule-based systems, demonstrating the feasibility and effectiveness of DSP in the poetry domain.
Natsume Fusanosuke is one of the founding critics of manga that pioneered a style of formal analysis of manga in the 1990s. Natsume’s first important foray into his “theory of expression” (hyōgenron) was seen in the collaborate work, Manga no yomikata (How to read manga) in 1995. He later streamlined those ideas in a twelve-episode series Manga wa naze omoshiroi no ka: sono bunpō to hyōgen (Why is manga so interesting?: Its grammar and expression) for NHK television in 1996. The accompanying expanded book (1997) consists of well-ordered, individual essays on elements of manga such as line, character creation, and panels. In the present translated essay, the eighth chapter of part one, Natsume explores how hand-drawn onomatopoeia—or comic-book interjections—are quite nuanced, conveying additional information about time and space as a part of the larger narrative flow, which Natsume asserts is uniquely characteristic of Japanese comic books. This early essay, representing the beginning of Natsume’s scholarly arc, is important for its examination of how hand-drawn onomatopoeia are vital tools for the manga storyteller. Natsume argues that these graphic giongo, gitaigo, and other mimetic expressions also reveal how Japanese audiences are predisposed to reading and processing verbal information in both as words and as pictures. The translation and introduction make available in English for the first time a part of a key text in the history of manga studies in Japan.
This study aims to identify and describe the translation of Indonesian cultural terms in the film Battle of Surabaya. It did not intend to pinpoint the exact translation technique used. The animated history of Indonesia and the variety of the language used in the movie, such as Dutch, English, Japanese, Indonesian, and local, motivated the researcher to conduct the research. The research method used mixed-method and translation theory as a ground theory for analyzing the data. The analysis found that 24 vocabularies contain Indonesian culture-each cultural term spread in every aspect. Socio-culture found 17 data or 71% of the total existing data; Material culture only found 1 data, Ecology 3 data, Organization 2 data, Gesture or habits also 1 data. The difference in translating cultural terms was only to equate equivalence in meaning between ST and TT. Bringing cultural aspects in various ways is the same as appreciating a culture, but the way to show it and transfer it becomes a challenge for the translator. The true meaning of a cultural term may not be fully conveyed due to a lack of data or only communicated to each other in a cultural group. It can be solved if the cultural term has a glossary and the translator uses it. This study also found that foreign cultures that have been in the local culture for too long can grow and become part of the local culture itself.
Background. Studies have shown that the collaborative processing of feedback on a jointly produced text facilitates language learning in a traditional classroom. However, it is still unknown whether there are similar learning benefits when the feedback is provided through an online modality from an expert peer during an international virtual exchange (IVE). Purpose. The present study fills this gap in the literature by investigating Japanese learners engaged in processing written corrective feedback from expert language users in the United States. Methods. Qualitative data concerning students’ perceptions of learning outcomes were collected via retrospective interviews and narrative frames, then triangulated with their first and final drafts of written texts and analyzed using activity theory (AT). Results. Findings indicate that learning benefits accrued in areas of language skills such as vocabulary, spelling, and grammar, as well as deepening learners’ reflexive awareness of themselves as language users. Conclusion. A discussion of these findings, informed by sociocultural theory and shaped by the categories of AT, brings to light some of the interactional dynamics that contributed to the creation of these outcomes. These interactional dynamics show that the learning benefits of the activity primarily resided in the peer-to-peer interactions rather than interactions with the expert-peer.
This study aims to explain the forms of morphological and syntactic interference of the Indonesian language in Japanese language learners' production of Japanese passive sentences. The data were then analyzed as errors because of the two interference. Research data was obtained through an open questionnaire distributed to Japanese Literature Study Program students during the 4th semester at Brawijaya University. Based on the research, the total forms of morphological interference found were 47. Morphological interference in the transfer of morphemes was the most common, namely 42 or 89%, followed by the removal of grammatical categories, as much as 4 or 9% and the most diminutive replicas of grammatical functions, as much as 1 or 2%.Meanwhile, syntactic interference is 33. The number of syntactic interference in phrase patterns is 22 or 69%, and syntactic interference in sentence patterns is 11 or 31%. Based on the data on the findings of morphological and syntactic interference in the Indonesian language, the form of errors following Parera's (1997) theory is 55. The number of 52 or 94% production errors is the highest, followed by a decrease of 2 or 4%; and the least is overproduction of 1 or 2%.
First-time conversations play an important role as a starting point in building relationships. However, the lack of information about interlocutors makes it difficult to decide what topic to take up. In this study, we targeted 10 pairs of native Japanese speakers and Sundanese native speakers, clarified what topics would be selected in the first-time conversation by university students, and examined similarities and differences between both native speakers. As a result, in the conversation data of the Japanese pairs, 20 topics out of 83 topics were recognized as “topic items”, and depending on the relationship between the topic items, they were then can be classified into eight categories, namely “affiliation”, “origin”, “university life”, “hobbies/enjoyment”, “living”, “commonalities”, “specialty”, and “society”. On the other hand, in the conversation data of the Sundanese pairs, 16 topics out of 95 topics were recognized as “topic items”, and they can be classified into seven categories, namely “university life”, “affiliation”, “residential”, “origin”, “commonalities”, “specialty”, and “society”. The overall picture of the classified categories and the topic items corresponding to the subclasses is the “topic selection list”, and the set of culturally shared knowledge about this “list” is the “first-time conversation topic selection schema”. The results of this study can be applied as a reference for topic selection in the first conversations with Japanese native speakers or Sundanese native speakers, especially between university students. The results also can be used as a repository of scientific knowledge in related fields such as sociolinguistic studies on conversational analysis, and as a reference on Japanese language education studies in general.