Hasil "Japanese language and literature"

arXiv Open Access 2025

Adding Alignment Control to Language Models

Wenhong Zhu, Weinan Zhang, Rui Wang

Post-training alignment has increasingly become a crucial factor in enhancing the usability of language models (LMs). However, the strength of alignment varies depending on individual preferences. This paper proposes a method to incorporate alignment control into a single model, referred to as CLM. This approach adds one identity layer preceding the initial layers and performs preference learning only on this layer to map unaligned input token embeddings into the aligned space. Experimental results demonstrate that this efficient fine-tuning method performs comparable to full fine-tuning. During inference, the input embeddings are processed through the aligned and unaligned layers, which are then merged through the interpolation coefficient. By controlling this parameter, the alignment exhibits a clear interpolation and extrapolation phenomenon.

en cs.CL

Detail Sumber

arXiv Open Access 2025

Do Large Language Models Grasp The Grammar? Evidence from Grammar-Book-Guided Probing in Luxembourgish

Lujun Li, Yewei Song, Lama Sleem et al.

Grammar refers to the system of rules that governs the structural organization and the semantic relations among linguistic units such as sentences, phrases, and words within a given language. In natural language processing, there remains a notable scarcity of grammar focused evaluation protocols, a gap that is even more pronounced for low-resource languages. Moreover, the extent to which large language models genuinely comprehend grammatical structure, especially the mapping between syntactic structures and meanings, remains under debate. To investigate this issue, we propose a Grammar Book Guided evaluation pipeline intended to provide a systematic and generalizable framework for grammar evaluation consisting of four key stages, and in this work we take Luxembourgish as a case study. The results show a weak positive correlation between translation performance and grammatical understanding, indicating that strong translations do not necessarily imply deep grammatical competence. Larger models perform well overall due to their semantic strength but remain weak in morphology and syntax, struggling particularly with Minimal Pair tasks, while strong reasoning ability offers a promising way to enhance their grammatical understanding.

en cs.CL

Detail Sumber

DOAJ Open Access 2024

“Searching for order at all levels”. Antonio Lima-de-Faria (July 4, 1921 – December 27, 2023)

Stefano Serafini, Tatyana S. Turova

Professor Antonio Lima-de-Faria was our friend and, in a sense, a teacher. Despite our different fields of study, this master of scientific thought has deeply influenced both of us. Dr. Stefano Serafini came to know the work of Antonio Lima-de-Faria when he was just a teenager thanks to a disseminative article by the late Italian geneticist, Giuseppe Sermonti. Lima-de-Faria’s elegant vision of a universal order at all levels of nature opened his eyes to the consistency of patterns, forms, and function throughout the mineral, vegetable, and animal realms – a concept that has influenced his work in urban studies. Prof. Tatyana Turova met Antonio Lima-de-Faria on a museum tour of the Royal Physiographic Society (Lund). He was 95. When Antonio came to know that she is a mathematician working in probability, the discussion went straight to a critical analysis of the concept of randomness. That conversation kept going over the years. Professor Emeritus of Molecular Cytogenetics at Lund University (Sweden), Antonio Lima-de-Faria was a scientist of rare character. He had the innate gift of courage and the ability to tackle big problems despite dominant opinions. He was rigorous and tenacious in his method, and he had an immense knowledge and a sharp rationality. Antonio Lima-de-Faria defined himself as “a surviving dinosaur” to both of us. He was a magnificent old man – but that “dinosaur” had been ahead of his time since the beginning of his career. This was a constant. In the early 1960s, a multinational company discreetly requested him to develop a futuristic agrifood bioengineering program. This is the current reality of the genetically modified organism. Known to the scientific world as a pioneer and one of the most relevant exponents of molecular cytogenetics (his 1969 Handbook of Molecular Cytology is a classic) – not to mention author of over 200 research articles and influencing monographs – Lima-de-Faria became a member of some of the world’s top scientific societies. He also taught in some of the most prestigious universities. He received awards and recognition for his extraordinary activity. These included the appointment as Knight of the Order of the North Star by the Swedish King and as Great Official of the Order of Santiago by the President of Portugal. He held scientific consultancy positions for governments and institutions, including the European Space Agency, the United Nations Educational, Scientific and Cultural Organization, and the World Bank Group. He never stopped working and studying. In fact, he focused on the molecular organization of the chromosome until the end of his long life. Despite all of this, his endeavor was not always understood. His famous book, Evolution without Selection: Form and Function by Autoevolution (Elsevier, 1988, translated into Russian, Japanese, and Italian) is not only fundamental and revolutionary but also a case of sociology of science. This book, which advanced the current trend in molecular biology, even branded him as anti-evolutionist. Such a tag limited the essence of his work to a mere attack against natural selection – “a parlor game to explain life,” as Giuseppe Sermonti would say. Rather, this treatise, based on his vast physical, chemical, crystallographic, botanical, and zoological expertise, proposed to overcome the concept of natural selection. It downsized the role of genes and chromosomes in the architecture of living things through a plethora of biological forms that came directly from physical constraints. His self-evolutionism united the biological and inorganic worlds. This echoed Aristotelian and Goethean intuitions of morphofunctional homologies, that is, a sort of “non-genic kinship” between the spin of the ultramicroscopic electron, the shell of a Limnaea, and the spirals of immense galaxies. Indeed, selectionism (identifying natural selection not as a contributing cause but as the main engine of biological development) is the major methodological obstacle to the recognition and explanation of Lima-de-Faria’s morphofunctional homology. This is the true protagonist of his book. An order crosses and defines the subatomic, chemical, and physical worlds on all of their scales through progressive and deterministic channels. The form of Chitoniscus feedjeanus, traditionally explained as a classic example of the mimetic imitation of leaves, has a precedent in the arrangement of the crystals of pure bismuth. The same structure appears in the patterns of chlorite crystals, several vegetal hooks, the shells of ancient ammonites, or goat horns. The bird’s-eye-view of an estuary, the branches of a tree, and the vascularization of a mammal follow a single dendritic development pattern – so much so that their images, once reduced to the same size, are difficult to distinguish. Constant chemical commonalities actually underlie these and countless, more apparent natural oddities. Now, selection is not only powerless to account for them but also logically incompatible with any attempt to explain them. Like all strong theoretical systems faced with a fact that is refractory to integration, selectionism ignores homology. And when it cannot help but deal with it, it defines it as mere analogy. This then relegates it to that metaphor of annihilation, which is accidentality. Therefore, demolishing selectionism in biology was the necessary premise for developing a theory of self-evolution, towards which Lima-de-Faria has led us with a firm, methodical hand. Indeed, he deploys a set of images and observations that are rarely rivalled in modern scientific literature. Beyond classic studies on the subject, from D’Arcy Thompson (On Growth and Form, 1917) onwards, there is no doubt that recent molecular biology has continued to confirm with ever greater evidence the importance of elements that are complementary to classical theoretical genetics in the formation of living organisms. Lima-de-Faria had already begun to indicate and systematize these elements 40 years ago in Molecular Evolution and Organization of the Chromosome (1983). In fact, as the author himself recalled, Evolution without Selection is the consequence of those premises once applied to evolutionism. The last writing of Antonio Lima-de-Faria, printed in this very issue of Caryologia, develops and complements his marvelous treatise Praise of Chromosome “Folly”: Confessions of an Untamed Molecular Structure (2008). This masterpiece continues the great tradition of scientific giants such as Schrödinger and Feynman (authors that Antonio Lima-de-Faria highly regarded) talking to the public about the most advanced theories in a clear way. It is written with such wit and humor and such an elegant reference to art that any reader with a natural sciences or mathematics background, having read the first sentence, will not stop until the last. The book summarizes results on chromosome research and offers directions and ideas for further studies. It clearly confirms that understanding evolution requires a deep knowledge in not only chemistry and physics, but also mathematics – especially when it comes to the atomic level. Long discussions with Antonio Lima-de-Faria of one the authors began soon after Molecular Origins of Brain and Body Geometry: Plato’s Concept of Reality is Reversed (2014) was published. In an intriguing manner, this work unveils and explains the emergence of body patterns in animals by tracing them to the origin of the brain. For Antonio Lima-de-Faria, “geometry” manifests an “utter simplicity coupled to rigorous order that underlines the phenomenon.” He does not use the language of mathematics, as he was not trained in it. However – even if this may sound paradoxical for a non-mathematician – his search for order, for “a common denominator”, for a unifying theory, make them akin to fundamental mathematics. Remarkably, already in his early nineties, Antonio Lima-de-Faria completed an extensive analysis of the structures and functions of living organisms on a molecular level. He then created a new book, Periodic Tables Unifying Living Organisms at the Molecular Level: The Predictive Power of the Law of Periodicity (2017). This truly fascinating work provides a new perspective on the relations between matter and energy. Its logical systematic approach links different levels, from atoms to macromolecules to organisms. As Lima-de-Faria stated, his books do not give ultimate answers and immediate solutions to the posed questions. On the other hand, readers are invited to use the tools, methods, and ideas that he generously expressed in his late works. “Order allows variation but imposes in the same time a canalization that is patent in what we call evolution, being that of galaxies or of living organisms.” Antonio Lima-de-Faria was almost 100 years old when he released his last book, Science and Art are Based on the Same Principles and Values (2020) – something he had thought about “for 30 years.” It was his scientific testament, encompassing his life-long love for art, beauty, and truth. There, as a “lonely wolf howling in the immensity of the night,” he launched a straightforward warning: “At present a wave of obscurantism is spreading over Western countries affecting both science and art in a deadly way. (…) Modern technology has been most successful in transforming our daily lives and in allowing us to conquer outer space. These impressive achievements have, to a large extent, made us dumb, making it difficult to perceive the danger that lies ahead. Hence, there is a pressing need to bring forward the original sources in which, leading scientists and renowned artists, explained the principles that they followed in their discovery of novel phenomena and in the creation of unique works of art. It turns out that both types of minds speak the same language. There is a basic denominator that unites the human endeavor.” Lima-de-Faria’s works are jewels for scientific and aesthetic minds. The beauty of Nature absorbed him completely, and he devoted himself passionately to it. He was an admirer and a true connoisseur of the arts, music, and ballet. He was a passionate gardener and loved roses and the fragrance of flowers. Antonio Lima-de-Faria was a man of enlightenment, dedication, will, and truth. With his gentle and generous attitude towards anyone around him, Antonio Lima-de-Faria radiated love. He knew what happiness is (“What is Happiness?”, Journal of Biourbanism, IX, 2021). Antonio Lima-de-Faria is an endless source of inspiration and admiration for us.

Biology (General), Cytology

Detail DOI Sumber

DOAJ Open Access 2024

Complicated history with huge potential: Israeli-Japanese relations’ development

D. А. Maryasis, E. A. Iakimova

From an analysis of Russian, Israeli, Japanese, and various English-language sources, the authors conclude that this paper is one of the few studies on the topic of Japanese-Israeli relations in the field of Oriental studies. In particular, in Russian Oriental studies, this is the first article focusing on Japanese-Israeli relations. The authors aim not only to trace the dynamics of bilateral relations in socio-political and economic aspects but also to identify the reasons for their complex history. The study also explores potential future developments. This goal has shaped the article’s structure. It includes a literature review, an examination of cooperation prerequisites, an analysis of political and economic components, and a conclusion. The study’s analysis of historical factors, primarily the Japanese Empire’s attitude towards Jewry, reveals that no significant obstacles existed before diplomatic relations were established to hinder current bilateral and multilateral interaction. Moreover, Japan’s Prime Minister Abe Shinzō relied on positive past experiences to foster increased contacts between Tokyo and Jerusalem. Economic considerations and dependence on external energy resources have played a crucial role in the political and diplomatic sphere for both states. This has led Japanese authorities to maintain a balanced course in the Middle East. Over time, the significance of this task has grown as Tokyo seeks to enhance its international status, particularly within the UN and its specialized agencies. In the realm of economic cooperation, internal crises and differing economic models have hindered progress for a considerable period. However, the distinctive national economies and entrepreneurial cultures of Japan and Israel now present significant opportunities for collaboration.

Japanese language and literature

Detail DOI Sumber

DOAJ Open Access 2024

The perception of “insular” England in “insular” Japan

A. N. Meshcheryakov

The insular position has serious influence on history and mentality. However, this provision “works” only in conjunction with other factors. Japan and England are island nations, but the history of England is characterized by the maximum number of foreign contacts, while that of Japan, until the middle of the 19th century, by the minimum one. The passive approach to space in Tokugawa period is explained by the following factors: high productivity of rice cultivation, lack of livestock farming, the conviction that Japan has the best climate, and the “closed country” policy. During the Meiji period, under the influence of the West (primarily Great Britain), the attitude towards space changed radically. The sea was conceptualized as a “conducting,” rather than “isolating” environment. The choice of Great Britain as a role model was determined, first of all, by its experience in the conquest of maritime space and the creation of a powerful colonial empire. The transition to a new model of “expanding space” was also justified by references to ancient times, when the Japanese had an “active” character, but the “closed country” policy “spoiled” the Japanese. As a result of military victories over China (1894–1895) and Russia (1904–1905), Japan began to be called “England of the East.” Great Britain ceased to be a role model after its withdrawal from the Japanese-British Alliance Treaty in 1922, and public discourse was directed towards justifying the uniqueness of the Japanese. The established characterization of the Japanese as adherents of tradition is dubious. The appeal to antiquity was indeed of great importance for the Japanese and, in this sense, they can be considered “traditionalists.” But, after the Meiji Revolution, they demonstrated amazing ability to embrace the “new” and destroy the “old,” but often boasted that they were “merely” recollecting their past. The concept of “traditionality” is too broad. Upon closer examination it does not provide much for understanding historical and cultural processes which require careful division into components that have an exact chronological and situational reference.

Japanese language and literature

Detail DOI

arXiv Open Access 2024

Efficacy of Large Language Models in Systematic Reviews

Aaditya Shah, Shridhar Mehendale, Siddha Kanthi

This study investigates the effectiveness of Large Language Models (LLMs) in interpreting existing literature through a systematic review of the relationship between Environmental, Social, and Governance (ESG) factors and financial performance. The primary objective is to assess how LLMs can replicate a systematic review on a corpus of ESG-focused papers. We compiled and hand-coded a database of 88 relevant papers published from March 2020 to May 2024. Additionally, we used a set of 238 papers from a previous systematic review of ESG literature from January 2015 to February 2020. We evaluated two current state-of-the-art LLMs, Meta AI's Llama 3 8B and OpenAI's GPT-4o, on the accuracy of their interpretations relative to human-made classifications on both sets of papers. We then compared these results to a "Custom GPT" and a fine-tuned GPT-4o Mini model using the corpus of 238 papers as training data. The fine-tuned GPT-4o Mini model outperformed the base LLMs by 28.3% on average in overall accuracy on prompt 1. At the same time, the "Custom GPT" showed a 3.0% and 15.7% improvement on average in overall accuracy on prompts 2 and 3, respectively. Our findings reveal promising results for investors and agencies to leverage LLMs to summarize complex evidence related to ESG investing, thereby enabling quicker decision-making and a more efficient market.

en cs.CL, cs.LG

Detail Sumber

arXiv Open Access 2024

Massively Multilingual Text Translation For Low-Resource Languages

Zhong Zhou

Translation into severely low-resource languages has both the cultural goal of saving and reviving those languages and the humanitarian goal of assisting the everyday needs of local communities that are accelerated by the recent COVID-19 pandemic. In many humanitarian efforts, translation into severely low-resource languages often does not require a universal translation engine, but a dedicated text-specific translation engine. For example, healthcare records, hygienic procedures, government communication, emergency procedures and religious texts are all limited texts. While generic translation engines for all languages do not exist, translation of multilingually known limited texts into new, low-resource languages may be possible and reduce human translation effort. We attempt to leverage translation resources from rich-resource languages to efficiently produce best possible translation quality for well known texts, which are available in multiple languages, in a new, low-resource language. To reach this goal, we argue that in translating a closed text into low-resource languages, generalization to out-of-domain texts is not necessary, but generalization to new languages is. Performance gain comes from massive source parallelism by careful choice of close-by language families, style-consistent corpus-level paraphrases within the same language and strategic adaptation of existing large pretrained multilingual models to the domain first and then to the language. Such performance gain makes it possible for machine translation systems to collaborate with human translators to expedite the translation process into new, low-resource languages.

en cs.CL

Detail Sumber

arXiv Open Access 2024

Attacks on Third-Party APIs of Large Language Models

Wanru Zhao, Vidit Khazanchi, Haodi Xing et al.

Large language model (LLM) services have recently begun offering a plugin ecosystem to interact with third-party API services. This innovation enhances the capabilities of LLMs, but it also introduces risks, as these plugins developed by various third parties cannot be easily trusted. This paper proposes a new attacking framework to examine security and safety vulnerabilities within LLM platforms that incorporate third-party services. Applying our framework specifically to widely used LLMs, we identify real-world malicious attacks across various domains on third-party APIs that can imperceptibly modify LLM outputs. The paper discusses the unique challenges posed by third-party API integration and offers strategic possibilities to improve the security and safety of LLM ecosystems moving forward. Our code is released at https://github.com/vk0812/Third-Party-Attacks-on-LLMs.

en cs.CR, cs.AI

Detail Sumber

arXiv Open Access 2024

Automated Collection of Evaluation Dataset for Semantic Search in Low-Resource Domain Language

Anastasia Zhukova, Christian E. Matt, Bela Gipp

Domain-specific languages that use a lot of specific terminology often fall into the category of low-resource languages. Collecting test datasets in a narrow domain is time-consuming and requires skilled human resources with domain knowledge and training for the annotation task. This study addresses the challenge of automated collecting test datasets to evaluate semantic search in low-resource domain-specific German language of the process industry. Our approach proposes an end-to-end annotation pipeline for automated query generation to the score reassessment of query-document pairs. To overcome the lack of text encoders trained in the German chemistry domain, we explore a principle of an ensemble of "weak" text encoders trained on common knowledge datasets. We combine individual relevance scores from diverse models to retrieve document candidates and relevance scores generated by an LLM, aiming to achieve consensus on query-document alignment. Evaluation results demonstrate that the ensemble method significantly improves alignment with human-assigned relevance scores, outperforming individual models in both inter-coder agreement and accuracy metrics. These findings suggest that ensemble learning can effectively adapt semantic search systems for specialized, low-resource languages, offering a practical solution to resource limitations in domain-specific contexts.

en cs.CL

Detail Sumber

DOAJ Open Access 2023

Makna Gramatikal Punning Phrase Dajare dalam Kanal Youtube Shakishakinashi-Zu

Desy Irmayanti, Vira Yuniar Anggraini

Dajare is a kind of Japanese wordplay in which a bad or unfunny joke is created by using the same or similar-sounding phrases or words. This uniqueness makes dajare interesting to analyze further in terms of its grammatical meaning of punning phrase dajare. The source of the data used in this study is a YouTube channel about Japanese prefectures. Dajare about prefectures would be very helpful in memorizing the names of Japanese prefectures. Based on that, this study aims to describe the meaning of punning phrase dajare on the Shakishakinashi-zu YouTube channel. Researchers use the theory of grammatical meaning by Sutedi. As a guideline, a qualitative descriptive method was used to analyze the data in this study. In this study, 42 of the 47 total data had grammatical meaning. The punny phrase dajare about Japanese prefectures was found to use the most grammatical meanings of the types of formation of language units found in 19 data, followed by 16 affixation data, 4 compounding data, and 2 reduplication data.

Japanese language and literature

Detail DOI Sumber

arXiv Open Access 2023

LAraBench: Benchmarking Arabic AI with Large Language Models

Ahmed Abdelali, Hamdy Mubarak, Shammur Absar Chowdhury et al.

Recent advancements in Large Language Models (LLMs) have significantly influenced the landscape of language and speech research. Despite this progress, these models lack specific benchmarking against state-of-the-art (SOTA) models tailored to particular languages and tasks. LAraBench addresses this gap for Arabic Natural Language Processing (NLP) and Speech Processing tasks, including sequence tagging and content classification across different domains. We utilized models such as GPT-3.5-turbo, GPT-4, BLOOMZ, Jais-13b-chat, Whisper, and USM, employing zero and few-shot learning techniques to tackle 33 distinct tasks across 61 publicly available datasets. This involved 98 experimental setups, encompassing ~296K data points, ~46 hours of speech, and 30 sentences for Text-to-Speech (TTS). This effort resulted in 330+ sets of experiments. Our analysis focused on measuring the performance gap between SOTA models and LLMs. The overarching trend observed was that SOTA models generally outperformed LLMs in zero-shot learning, with a few exceptions. Notably, larger computational models with few-shot learning techniques managed to reduce these performance gaps. Our findings provide valuable insights into the applicability of LLMs for Arabic NLP and speech processing tasks.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2023

Japanese Lexical Complexity for Non-Native Readers: A New Dataset

Yusuke Ide, Masato Mita, Adam Nohejl et al.

Lexical complexity prediction (LCP) is the task of predicting the complexity of words in a text on a continuous scale. It plays a vital role in simplifying or annotating complex words to assist readers. To study lexical complexity in Japanese, we construct the first Japanese LCP dataset. Our dataset provides separate complexity scores for Chinese/Korean annotators and others to address the readers' L1-specific needs. In the baseline experiment, we demonstrate the effectiveness of a BERT-based system for Japanese LCP.

en cs.CL

Detail Sumber

arXiv Open Access 2023

Using Large Language Models to Provide Explanatory Feedback to Human Tutors

Jionghao Lin, Danielle R. Thomas, Feifei Han et al.

Research demonstrates learners engaging in the process of producing explanations to support their reasoning, can have a positive impact on learning. However, providing learners real-time explanatory feedback often presents challenges related to classification accuracy, particularly in domain-specific environments, containing situationally complex and nuanced responses. We present two approaches for supplying tutors real-time feedback within an online lesson on how to give students effective praise. This work-in-progress demonstrates considerable accuracy in binary classification for corrective feedback of effective, or effort-based (F1 score = 0.811), and ineffective, or outcome-based (F1 score = 0.350), praise responses. More notably, we introduce progress towards an enhanced approach of providing explanatory feedback using large language model-facilitated named entity recognition, which can provide tutors feedback, not only while engaging in lessons, but can potentially suggest real-time tutor moves. Future work involves leveraging large language models for data augmentation to improve accuracy, while also developing an explanatory feedback interface.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2022

Kencorpus: A Kenyan Language Corpus of Swahili, Dholuo and Luhya for Natural Language Processing Tasks

Barack Wanjawa, Lilian Wanzare, Florence Indede et al.

Indigenous African languages are categorized as under-served in Natural Language Processing. They therefore experience poor digital inclusivity and information access. The processing challenge with such languages has been how to use machine learning and deep learning models without the requisite data. The Kencorpus project intends to bridge this gap by collecting and storing text and speech data that is good enough for data-driven solutions in applications such as machine translation, question answering and transcription in multilingual communities. The Kencorpus dataset is a text and speech corpus for three languages predominantly spoken in Kenya: Swahili, Dholuo and Luhya. Data collection was done by researchers from communities, schools, media, and publishers. The Kencorpus' dataset has a collection of 5,594 items - 4,442 texts (5.6M words) and 1,152 speech files (177hrs). Based on this data, Part of Speech tagging sets for Dholuo and Luhya (50,000 and 93,000 words respectively) were developed. We developed 7,537 Question-Answer pairs for Swahili and created a text translation set of 13,400 sentences from Dholuo and Luhya into Swahili. The datasets are useful for downstream machine learning tasks such as model training and translation. We also developed two proof of concept systems: for Kiswahili speech-to-text and machine learning system for Question Answering task, with results of 18.87% word error rate and 80% Exact Match (EM) respectively. These initial results give great promise to the usability of Kencorpus to the machine learning community. Kencorpus is one of few public domain corpora for these three low resource languages and forms a basis of learning and sharing experiences for similar works especially for low resource languages.

en cs.CL

Detail DOI Sumber

arXiv Open Access 2022

Learnings from Technological Interventions in a Low Resource Language: Enhancing Information Access in Gondi

Devansh Mehta, Harshita Diddee, Ananya Saxena et al.

The primary obstacle to developing technologies for low-resource languages is the lack of representative, usable data. In this paper, we report the deployment of technology-driven data collection methods for creating a corpus of more than 60,000 translations from Hindi to Gondi, a low-resource vulnerable language spoken by around 2.3 million tribal people in south and central India. During this process, we help expand information access in Gondi across 2 different dimensions (a) The creation of linguistic resources that can be used by the community, such as a dictionary, children's stories, Gondi translations from multiple sources and an Interactive Voice Response (IVR) based mass awareness platform; (b) Enabling its use in the digital domain by developing a Hindi-Gondi machine translation model, which is compressed by nearly 4 times to enable it's edge deployment on low-resource edge devices and in areas of little to no internet connectivity. We also present preliminary evaluations of utilizing the developed machine translation model to provide assistance to volunteers who are involved in collecting more data for the target language. Through these interventions, we not only created a refined and evaluated corpus of 26,240 Hindi-Gondi translations that was used for building the translation model but also engaged nearly 850 community members who can help take Gondi onto the internet.

en cs.CL, cs.CY

Detail Sumber

arXiv Open Access 2022

Gender Biases and Where to Find Them: Exploring Gender Bias in Pre-Trained Transformer-based Language Models Using Movement Pruning

Przemyslaw Joniak, Akiko Aizawa

Language model debiasing has emerged as an important field of study in the NLP community. Numerous debiasing techniques were proposed, but bias ablation remains an unaddressed issue. We demonstrate a novel framework for inspecting bias in pre-trained transformer-based language models via movement pruning. Given a model and a debiasing objective, our framework finds a subset of the model containing less bias than the original model. We implement our framework by pruning the model while fine-tuning it on the debiasing objective. Optimized are only the pruning scores - parameters coupled with the model's weights that act as gates. We experiment with pruning attention heads, an important building block of transformers: we prune square blocks, as well as establish a new way of pruning the entire heads. Lastly, we demonstrate the usage of our framework using gender bias, and based on our findings, we propose an improvement to an existing debiasing method. Additionally, we re-discover a bias-performance trade-off: the better the model performs, the more bias it contains.

en cs.CL

Detail Sumber

DOAJ Open Access 2021

Disadvantage or Blessing in Disguise? Field Research in Japan during COVID-19

Swati Arora

Area studies is an interpretive research field, and fieldwork is a key enabler for area studies research projects. However, field research also results in some fundamental challenges, which are described in varied literature available for scholars of anthropology, geography, social sciences and various other fields. Within area studies literature, there is little which deals with how to manage fieldwork without being present on the field. This paper reflects upon my experience of being on fieldwork in Japan during the global COVID-19 pandemic. It shares my experiences during 2020 and early 2021 and discusses how COVID-19 affected various aspects of fieldwork in Japan, including unexpected challenges, new opportunities, institutional support and accessing academic texts. The paper aims to give a concrete picture of fieldwork in Japan for other scholars who are yet to conduct research in the COVID-19 context. The paper maps out how the pandemic has affected the field, why it is so, and future implications while also decoding field research challenges and offering achievable solutions.

Japanese language and literature

Detail DOI Sumber

arXiv Open Access 2021

Case Studies on using Natural Language Processing Techniques in Customer Relationship Management Software

Şükrü Ozan

How can a text corpus stored in a customer relationship management (CRM) database be used for data mining and segmentation? In order to answer this question we inherited the state of the art methods commonly used in natural language processing (NLP) literature, such as word embeddings, and deep learning literature, such as recurrent neural networks (RNN). We used the text notes from a CRM system which are taken by customer representatives of an internet ads consultancy agency between years 2009 and 2020. We trained word embeddings by using the corresponding text corpus and showed that these word embeddings can not only be used directly for data mining but also be used in RNN architectures, which are deep learning frameworks built with long short term memory (LSTM) units, for more comprehensive segmentation objectives. The results prove that structured text data in a CRM can be used to mine out very valuable information and any CRM can be equipped with useful NLP features once the problem definitions are properly built and the solution methods are conveniently implemented.

en cs.CL

Detail DOI Sumber

DOAJ Open Access 2020

Exploring the Iconicity of Godzilla in Popular Culture. A Comparative Intercultural Perspective: Japan-America

Crînguța Irina Pelea

The present study aims to compare the representation of Godzilla or Gojira, considered one of the most representative cultural icons of Japanese cinematography within the intertwinement of the fluid, versatile and dynamic context of contemporary Japanese and North American film industry. The undying popularity of Godzilla is puzzling, and one can ask himself where the appeal of this irradiated dinosaur-like fictional monster lies in. The author adopts a comparative intercultural perspective, one that integrates research into a much broader sociohistorical context, with particular attention to how the culturally enhanced linguistic component influences the symbolism incorporated by Godzilla in Japan and how it is reimagined in its Hollywood counterpart.Hence, the theoretical section brings into discussion relevant and previously unpublished Japanese-language literature on Godzilla, thus trying to balance both Western and Japanese perspectives academically. The present research applies the methodology of narrative analysis to investigate from a comparative perspective significant differences existing in the narrative development and portrayal of the iconic monster in “Shin Godzilla” (Japan, 2016) versus “Godzilla: King of the Monsters” (the USA, 2019). One of the most relevant findings refers to the impossibility of ultimately transferring or translating the cultural specificity of the iconic beast within the North American media context, despite recycling almost the same film narrative: therefore, Gojira is inherently Japanese.

Social history and conditions. Social problems. Social reform

Detail DOI Sumber

DOAJ Open Access 2020

EMBBED DAJARE WORD PLAY PROCESS IN “SHIROKUMA CAFÉ”

Talin Salisah, Nani Sunarni

Pun, in English, is a word play that is very suitable for Japanese, which has many homonyms. The name of word play in Japanese is called dajare. Many experts mention the types of dajare, and one of the researchers mentioned that there are three types of dajare. These three of types are homophonic dajare, near-homophonic dajare, and embbed dajare. One of the three types of dajare is the main focus of research. This research explained the analysis of embbed dajare in an animated show titled ““Shirokuma Café””, the show that has a lot of dajare in conversations between Shirokuma and his friends which is seen from the vocabularies are used. The purpose of this research was to study how the embbed dajare word play process is used in the conversation between Shirokuma and the person he is talking to. The research method used is a descriptive qualitative method. The results of this research showed that the same or similarity of sound and form of dajare does not guarantee the similarity of meaning between the referent word and the target word.

Japanese language and literature

Detail DOI Sumber

Hasil untuk "Japanese language and literature"