Hasil "Japanese language and literature"

CrossRef Open Access 2025

Japanese Language Learners Benefit from Information on Verb Structures

J.-R. Hayashishita, Daiki Tanaka, Yuko Miyoshi et al.

The writing errors made by second language learners of Japanese often reflect insufficient knowledge of verbs, specifically in choosing appropriate verbs and using them correctly in given contexts. To support these learners, we have been developing a Japanese verb database, Don 動詞どん (www.dondoushidon.org), which explicitly provides information about verb structures, including types of entities expressed by co-occurring elements, their semantic roles, and accompanying particles. To investigate whether having access to such information improves learners’ accuracy in sentence production, we conducted a study with forty-three learners of Japanese using two tools: Don 動詞どん and, for comparison, Jisho (jisho.org). While the former highlights verb structures explicitly, the latter does not. The results showed that participants performed significantly better when using Don 動詞どん. Responses to a post-experiment questionnaire further revealed that learners of Japanese believe that information about verb structures would be helpful for their learning.

en

Detail DOI Sumber

DOAJ Open Access 2025

Book review: “Worlds of Japanese Culture” by Elena L. Skvortsova and Alexander L. Lutsky

S. A. Polkhov

The article examines the contents of the book “Worlds of Japanese Culture” by E.L. Skvortsova and A.L. Lutsky (Moscow, St. Petersburg: Center for Humanitarian Initiatives, 2025. 582 p. Series “Tree of Meanings.” ISBN 978-5-98712-499-4). The review outlines the main topics addressed by the authors of the book: the development of Japanese philosophy and sociology in the modern period, the problems faced by Japanese society in the 21st century, the essence and characteristic features of Japanese civilization. According to the author of the review, the book is a valuable contribution to the study of modern philosophy and culture of Japan as a whole.

Japanese language and literature

Detail DOI Sumber

DOAJ Open Access 2025

Compass Japanese 1 Interactive Workbook Compass Japanese 1, Supplemental Resource: Japanese Writing Practice Book for Novice Learners

Noriko Sugimori

Compass Japanese 1 Interactive Workbook and its accompanying Japanese Writing Practice Book for Novice Learners mark an important contribution to Japanese language instruction. Drawing on the Global Competence framework, the workbook integrates reflection, collaboration, and authentic communication into a learner-centered design. With its diverse representation, inclusive visuals, and wide range of interactive tasks, the series encourages students not only to acquire Japanese but also to explore cultural and social themes. While instructors may need to guide learners through some vocabulary and pitch accent nuances, the workbook’s emphasis on inclusivity, creativity, and intercultural awareness makes it a valuable and inspiring resource for novice-level classrooms.

Language and Literature, Japanese language and literature

Detail DOI Sumber

arXiv Open Access 2025

Exploring the Role of Knowledge Graph-Based RAG in Japanese Medical Question Answering with Small-Scale LLMs

Yingjian Chen, Feiyang Li, Xingyu Song et al.

Large language models (LLMs) perform well in medical QA, but their effectiveness in Japanese contexts is limited due to privacy constraints that prevent the use of commercial models like GPT-4 in clinical settings. As a result, recent efforts focus on instruction-tuning open-source LLMs, though the potential of combining them with retrieval-augmented generation (RAG) remains underexplored. To bridge this gap, we are the first to explore a knowledge graph-based (KG) RAG framework for Japanese medical QA small-scale open-source LLMs. Experimental results show that KG-based RAG has only a limited impact on Japanese medical QA using small-scale open-source LLMs. Further case studies reveal that the effectiveness of the RAG is sensitive to the quality and relevance of the external retrieved content. These findings offer valuable insights into the challenges and potential of applying RAG in Japanese medical QA, while also serving as a reference for other low-resource languages.

en cs.CL, cs.AI

Detail Sumber

DOAJ Open Access 2024

Erratum

Hiroshi Nara

Language and Literature, Japanese language and literature

Detail DOI Sumber

DOAJ Open Access 2024

Intercultural approach to studying English for Japanese studies students: a textbook review

Tetiana Druzhchenko, Nataliya Semyan

This publication is a review of the textbook by Nataliia Semian and Tetiana Druzhchenko, "Exploring Cultural Contrasts: Japan and Ukraine in English Classes" (2024). This textbook was developed for teaching English at the university level using comparative strategies as part of the foreign language curriculum for second- and third-year undergraduate students majoring in Japanese studies. Grounded in an intercultural approach, interdisciplinary connections, and a cognitive-communicative concept of foreign language acquisition, the authors encourage students to explore and analyze cultural contrasts and similarities between Japan and Ukraine. The textbook is designed to provide students with specialized knowledge for practical use in professional English for students of Eastern studies, satisfy their intellectual and cultural needs, and promote the development of their professional competencies. The textbook employs contrast and comparison strategies aimed at developing monologic and dialogic English skills, which can be used in discussions about similar and different cultural objects, daily life, literature, art, and the histories of Ukraine and Japan. By exploring linguistic features and cultural realities, students can enhance their language skills and gain a deeper understanding of the cultural context of communication. Thus, the intercultural approach is not only a tool for scholarly research but also a comprehensive and systematic method of teaching English. It helps students appreciate cultural diversity and ensures their success in their professional and academic growth.

Education, Language and Literature

Detail DOI Sumber

arXiv Open Access 2024

A Comprehensive Evaluation of Semantic Relation Knowledge of Pretrained Language Models and Humans

Zhihan Cao, Hiroaki Yamada, Simone Teufel et al.

Recently, much work has concerned itself with the enigma of what exactly pretrained language models~(PLMs) learn about different aspects of language, and how they learn it. One stream of this type of research investigates the knowledge that PLMs have about semantic relations. However, many aspects of semantic relations were left unexplored. Generally, only one relation has been considered, namely hypernymy. Furthermore, previous work did not measure humans' performance on the same task as that performed by the PLMs. This means that at this point in time, there is only an incomplete view of the extent of these models' semantic relation knowledge. To address this gap, we introduce a comprehensive evaluation framework covering five relations beyond hypernymy, namely hyponymy, holonymy, meronymy, antonymy, and synonymy. We use five metrics (two newly introduced here) for recently untreated aspects of semantic relation knowledge, namely soundness, completeness, symmetry, prototypicality, and distinguishability. Using these, we can fairly compare humans and models on the same task. Our extensive experiments involve six PLMs, four masked and two causal language models. The results reveal a significant knowledge gap between humans and models for all semantic relations. In general, causal language models, despite their wide use, do not always perform significantly better than masked language models. Antonymy is the outlier relation where all models perform reasonably well. The evaluation materials can be found at https://github.com/hancules/ProbeResponses.

en cs.CL

Detail DOI Sumber

arXiv Open Access 2024

Grounding Toxicity in Real-World Events across Languages

Wondimagegnhue Tsegaye Tufa, Ilia Markov, Piek Vossen

Social media conversations frequently suffer from toxicity, creating significant issues for users, moderators, and entire communities. Events in the real world, like elections or conflicts, can initiate and escalate toxic behavior online. Our study investigates how real-world events influence the origin and spread of toxicity in online discussions across various languages and regions. We gathered Reddit data comprising 4.5 million comments from 31 thousand posts in six different languages (Dutch, English, German, Arabic, Turkish and Spanish). We target fifteen major social and political world events that occurred between 2020 and 2023. We observe significant variations in toxicity, negative sentiment, and emotion expressions across different events and language communities, showing that toxicity is a complex phenomenon in which many different factors interact and still need to be investigated. We will release the data for further research along with our code.

en cs.CL

Detail Sumber

arXiv Open Access 2024

L3Cube-IndicNews: News-based Short Text and Long Document Classification Datasets in Indic Languages

Aishwarya Mirashi, Srushti Sonavane, Purva Lingayat et al.

In this work, we introduce L3Cube-IndicNews, a multilingual text classification corpus aimed at curating a high-quality dataset for Indian regional languages, with a specific focus on news headlines and articles. We have centered our work on 10 prominent Indic languages, including Hindi, Bengali, Marathi, Telugu, Tamil, Gujarati, Kannada, Odia, Malayalam, and Punjabi. Each of these news datasets comprises 10 or more classes of news articles. L3Cube-IndicNews offers 3 distinct datasets tailored to handle different document lengths that are classified as: Short Headlines Classification (SHC) dataset containing the news headline and news category, Long Document Classification (LDC) dataset containing the whole news article and the news category, and Long Paragraph Classification (LPC) containing sub-articles of the news and the news category. We maintain consistent labeling across all 3 datasets for in-depth length-based analysis. We evaluate each of these Indic language datasets using 4 different models including monolingual BERT, multilingual Indic Sentence BERT (IndicSBERT), and IndicBERT. This research contributes significantly to expanding the pool of available text classification datasets and also makes it possible to develop topic classification models for Indian regional languages. This also serves as an excellent resource for cross-lingual analysis owing to the high overlap of labels among languages. The datasets and models are shared publicly at https://github.com/l3cube-pune/indic-nlp

en cs.CL, cs.LG

Detail Sumber

arXiv Open Access 2024

Fotheidil: an Automatic Transcription System for the Irish Language

Liam Lonergan, Ibon Saratxaga, John Sloan et al.

This paper sets out the first web-based transcription system for the Irish language - Fotheidil, a system that utilises speech-related AI technologies as part of the ABAIR initiative. The system includes both off-the-shelf pre-trained voice activity detection and speaker diarisation models and models trained specifically for Irish automatic speech recognition and capitalisation and punctuation restoration. Semi-supervised learning is explored to improve the acoustic model of a modular TDNN-HMM ASR system, yielding substantial improvements for out-of-domain test sets and dialects that are underrepresented in the supervised training set. A novel approach to capitalisation and punctuation restoration involving sequence-to-sequence models is compared with the conventional approach using a classification model. Experimental results show here also substantial improvements in performance. The system will be made freely available for public use, and represents an important resource to researchers and others who transcribe Irish language materials. Human-corrected transcriptions will be collected and included in the training dataset as the system is used, which should lead to incremental improvements to the ASR model in a cyclical, community-driven fashion.

en cs.CL, cs.SD

Detail Sumber

arXiv Open Access 2024

Beyond Data Quantity: Key Factors Driving Performance in Multilingual Language Models

Sina Bagheri Nezhad, Ameeta Agrawal, Rhitabrat Pokharel

Multilingual language models (MLLMs) are crucial for handling text across various languages, yet they often show performance disparities due to differences in resource availability and linguistic characteristics. While the impact of pre-train data percentage and model size on performance is well-known, our study reveals additional critical factors that significantly influence MLLM effectiveness. Analyzing a wide range of features, including geographical, linguistic, and resource-related aspects, we focus on the SIB-200 dataset for classification and the Flores-200 dataset for machine translation, using regression models and SHAP values across 204 languages. Our findings identify token similarity and country similarity as pivotal factors, alongside pre-train data and model size, in enhancing model performance. Token similarity facilitates cross-lingual transfer, while country similarity highlights the importance of shared cultural and linguistic contexts. These insights offer valuable guidance for developing more equitable and effective multilingual language models, particularly for underrepresented languages.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2024

Soft Language Prompts for Language Transfer

Ivan Vykopal, Simon Ostermann, Marián Šimko

Cross-lingual knowledge transfer, especially between high- and low-resource languages, remains challenging in natural language processing (NLP). This study offers insights for improving cross-lingual NLP applications through the combination of parameter-efficient fine-tuning methods. We systematically explore strategies for enhancing cross-lingual transfer through the incorporation of language-specific and task-specific adapters and soft prompts. We present a detailed investigation of various combinations of these methods, exploring their efficiency across 16 languages, focusing on 10 mid- and low-resource languages. We further present to our knowledge the first use of soft prompts for language transfer, a technique we call soft language prompts. Our findings demonstrate that in contrast to claims of previous work, a combination of language and task adapters does not always work best; instead, combining a soft language prompt with a task adapter outperforms most configurations in many cases.

en cs.CL

Detail DOI Sumber

S2 Open Access 2023

(M)Other Tongue; or, Exophony

Keith L. Johnson

KEITH LESLIE JOHNSON is director of film and media studies at William and Mary, where he is senior lecturer of English and affiliated faculty of Japanese studies. He is the author of Jan Švankmajer: Animist Cinema (U of Illinois P, 2017); essays on Aldous Huxley, Franz Kafka, and other modernist figures; and translations of Haruki Murakami and Akira Yoshimura. I want to make a modest case for exophony as a term deserving of wider application and scrutiny. The phenomenon of exophony is familiar enough, even if the term itself is not. Put simply, it refers to composition in a nonnative language—which, at first blush, might seem a rather exotic state of literary matter. However, since appearing in Susan Arndt, Dirk Naguschewski, and Robert Stockhammer’s 2007 edited collection, Exophonie: Anderssprachigkeit (in) der Literatur (Exophony: Otherlanguaged-ness in/of Literature), exophony has become an increasingly widespread and galvanizing concept in literary studies, of obvious interest to those working on translation but also more generally to those working on migrant or exile literatures, postcolonial literatures, and transnational literatures. Beyond these direct applications of the term, however, I argue that exophony represents not just an exception or special case of translation but the paradigm of literary production as such. To flesh out that thesis, I want to briefly address (and push back on) three related assumptions one often sees at the scene of translation: the idea that translations are secondary or subordinate to the composed literary object (i.e., the original), the idea that exophonic writers represent a vanishingly small minority, and the idea that self-translation is a special case, even among exophonic writers. Assumptions regarding translation as such, then exophony, and finally self-translation: obviously, these are not the only ones we might think about—and I do not treat them in any systematic, sequential way in what follows—but they nonetheless help us begin zeroing in on why translation matters, integrally, for literary studies as a whole. As a preemptive exercise, maybe we can think about how many exophonic writers we can name off the top of our heads. Here goes: Jhumpa Lahiri; Gary Shteyngart and Kazuo Ishiguro, both of whom moved to English-speaking countries as children (though Ishiguro claims to remember little to no Japanese); Aleksandar

5 sitasi en

Detail DOI Sumber

S2 Open Access 2023

‘Casual Friday’

A. Ohta

There is increasing research literature on instructional pragmatics, including work on Japanese, but little research on naturally occurring classroom innovations. This article presents a study of an instructional innovation called Casual Friday, where the professor of a university multi-section advanced-beginning (2nd year) Japanese language course designated certain lessons as spaces for graduate student teaching assistants (TAs) to involve students in using Japanese casual register. Analysis of interviews with instructional staff, student survey results, and classroom and meeting observations, shows how Casual Friday, an organizational transformation of the course, transformed activity systems (Engeström, 1987, 1999, 2003). Transformed TA roles created a pedagogical safe house (Canagarajah, 2004; Pomerantz and Bell, 2011; Pratt, 1991) on Casual Fridays by providing TAs instructional autonomy, stronger horizontal connections with students, and temporary freedom from the restraints of the course-as-usual. The re-organization thus promoted TA innovation, as they creatively used language, designed materials, taught dialect, introduced Japanese youth culture, etc. Triangulation with student surveys confirms findings of the interviews and observations, while also showing that students reported languaculture learning. Results suggest the benefits of carving out spaces within normally textbook-and-grammar-focused courses for TAs to have free rein in presenting and involving students with languaculture.

2 sitasi en

Detail DOI Sumber

CrossRef Open Access 2023

Errata

Hiroshi Nara

-

en

Detail DOI Sumber

DOAJ Open Access 2023

Sociological Aspects of the Tokyo Olympics

A. V. Belov

The Olympic and Paralympic Games in Tokyo in July–September 2021 took place in a challenging social environment that seriously affected the public perception of the events. When preparing for the Olympics from 2013–2019, the Japanese people actively supported the Games, which was confirmed by the results of numerous sociological studies. In March 2020, the COVID-19 pandemic began, followed by several waves of infection spread. The competition was postponed for a year. Vaccination in Japan was delayed compared with most of the G7 countries. Against this background, in the summer of 2021, the most dangerous Delta strain of coronavirus began to spread in the country, bringing a rise in mortality rates and overcrowding in hospitals in large cities. In this difficult epidemiological and social situation, surveys recorded a negative attitude towards the Olympics.However, during the competition, the majority opinion once again turned positive, mainly due to the athletic successes of the Japanese team and effective anti-virus control measures. The absence of spectators in the venues, most probably, did not affect the sporting achievements significantly. At least, the Japanese Olympic team won a record number of medals. Infection prevention measures proved effective in limiting the transmission of the virus among the athletes and the Japanese service personnel. The economic and symbolic achievements of the Games did not meet expectations, as, during the Olympics, it was not possible to properly address its significance as the end point of the low-growth “lost decades”, evidence of economic recovery after the triple disaster of 2011, and as a tool to increase Japan’s tourist attractiveness.Therefore, during the pandemic, major sports events should be held primarily to train top-class athletes and to increase populace satisfaction with the success of the national team rather than to obtain direct economic benefits or improve the host country’s image.

Japanese language and literature

Detail DOI Sumber

DOAJ Open Access 2023

The Complete Works of World Literature (Sekai Bungaku Zenshū, 1927–1932) by Shinchōsha publishing house in the context of the history of Japanese book

M. V. Toropygina

The last years of the 1920s and the beginning of the 1930s are known as the “era of one-yen books” in the history of Japanese book printing. One-yen books were serial subscription publications, with the price of one yen per volume. The first such publication was the Complete Works of Contemporary Japanese Literature (Kindai Nihon Bungaku Zenshū), launched by Kaizōsha publishing house in 1926. The series was very successful with at least 250,000 subscribers. The “one-yen editions race” was initiated: many publishing houses began releasing their own one-yen series as early as the following year. The most commercially successful among the one-yen books (at least 400,000 copies) was the Complete Works of World Literature (Sekai bungaku zenshū) published by the Shinchōsha publishing house in 1927–1932. The Complete Works of World Literature consists of 57 volumes in two parts (38 volumes of the first part were published in 1927–1930, and then more 19 volumes were added, composing the second part of the publication). The books of the series had a hard cover and a thought-out design and were supposed to serve not only for reading, but also for the decoration of the house and the demonstration of the owner’s status. The series represents one of the possible canons of world literature. The time frame of the presented works is from the 14th century (the first volume is Dante’s Divine comedy) to the present (the last volume contains six works, five of them written in the 1920s, while the volume was released in 1929). The series includes prose, drama, and, to a lesser extent, poetry. The volumes of the series have a fairly extensive apparatus (prefaces, comments in some volumes, portraits of authors, monthly attachments tucked into the pages of the volumes). World literature is presented as Western literature. Translations of the works of Western literature played an important role in the formation of national Japanese literature. The success of this series also demonstrated the readers’ great interest in literary translations, especially in the translations of modern literature.

Japanese language and literature

Detail DOI Sumber

arXiv Open Access 2023

An Evaluation on Large Language Model Outputs: Discourse and Memorization

Adrian de Wynter, Xun Wang, Alex Sokolov et al.

We present an empirical evaluation of various outputs generated by nine of the most widely-available large language models (LLMs). Our analysis is done with off-the-shelf, readily-available tools. We find a correlation between percentage of memorized text, percentage of unique text, and overall output quality, when measured with respect to output pathologies such as counterfactual and logically-flawed statements, and general failures like not staying on topic. Overall, 80.0% of the outputs evaluated contained memorized data, but outputs containing the most memorized content were also more likely to be considered of high quality. We discuss and evaluate mitigation strategies, showing that, in the models evaluated, the rate of memorized text being output is reduced. We conclude with a discussion on potential implications around what it means to learn, to memorize, and to evaluate quality text.

en cs.CL, cs.AI

Detail DOI Sumber

arXiv Open Access 2023

Lenient Evaluation of Japanese Speech Recognition: Modeling Naturally Occurring Spelling Inconsistency

Shigeki Karita, Richard Sproat, Haruko Ishikawa

Word error rate (WER) and character error rate (CER) are standard metrics in Speech Recognition (ASR), but one problem has always been alternative spellings: If one's system transcribes adviser whereas the ground truth has advisor, this will count as an error even though the two spellings really represent the same word. Japanese is notorious for ``lacking orthography'': most words can be spelled in multiple ways, presenting a problem for accurate ASR evaluation. In this paper we propose a new lenient evaluation metric as a more defensible CER measure for Japanese ASR. We create a lattice of plausible respellings of the reference transcription, using a combination of lexical resources, a Japanese text-processing system, and a neural machine translation model for reconstructing kanji from hiragana or katakana. In a manual evaluation, raters rated 95.4% of the proposed spelling variants as plausible. ASR results show that our method, which does not penalize the system for choosing a valid alternate spelling of a word, affords a 2.4%-3.1% absolute reduction in CER depending on the task.

en cs.CL

Detail Sumber

arXiv Open Access 2023

covLLM: Large Language Models for COVID-19 Biomedical Literature

Yousuf A. Khan, Clarisse Hokia, Jennifer Xu et al.

The COVID-19 pandemic led to 1.1 million deaths in the United States, despite the explosion of coronavirus research. These new findings are slow to translate to clinical interventions, leading to poorer patient outcomes and unnecessary deaths. One reason is that clinicians, overwhelmed by patients, struggle to keep pace with the rate of new coronavirus literature. A potential solution is developing a tool for evaluating coronavirus literature using large language models (LLMs) -- neural networks that are deployed for natural language processing. LLMs can be used to summarize and extract user-specified information. The greater availability and advancement of LLMs and pre-processed coronavirus literature databases provide the opportunity to assist clinicians in evaluating coronavirus literature through a coronavirus literature specific LLM (covLLM), a tool that directly takes an inputted research article and a user query to return an answer. Using the COVID-19 Open Research Dataset (CORD-19), we produced two datasets: (1) synCovid, which uses a combination of handwritten prompts and synthetic prompts generated using OpenAI, and (2) real abstracts, which contains abstract and title pairs. covLLM was trained with LLaMA 7B as a baseline model to produce three models trained on (1) the Alpaca and synCovid datasets, (2) the synCovid dataset, and (3) the synCovid and real abstract datasets. These models were evaluated by two human evaluators and ChatGPT. Results demonstrate that training covLLM on the synCovid and abstract pairs datasets performs competitively with ChatGPT and outperforms covLLM trained primarily using the Alpaca dataset.

en cs.CL, cs.AI

Detail Sumber

Hasil untuk "Japanese language and literature"