These two volumes contain 16 chapters together with an editorial introduction. One of the papers in Volume II is also by Slobin and he is coauthor of the chapter on Turkish in Volume I. Volume I is oriented towards the acquisition of specific languages (namely, English, German, Hebrew, Japanese, Kaluli, Polish, Romance [with special reference to French], Samoan, Turkish, American Sign language), whereas the second focuses on theoretical issues. MacWhinney's paper in the second volume adds Hungarian to the list of languages from which the data are drawn. It should be obvious that this is an important collection since nothing of this scope and type exists. It is the culmination of some 15 years of research on language acquisition motivated largely by Slobin's notion of the operating principles which guide language acquisition. The contributors address themselves to a common set of issues and sum up the research that has been done on particular languages. At the same time, however, these volumes draw our attention to how few languages we have adequate acquisitional data for. One useful function of this collection is thus to identify lacunae in the literature and to isolate particular problems that require elucidation from a particular type of data. There are still many aspects of acquisition for which we have no data, even in relatively well-investigated languages like English and French. At the moment, cross-linguistic comparison can be realistically carried out for only a handful of constructions and/or categories, for example, passives, locative prepositions, relative clauses. And even for features that have been extensively studied, there is often little agreement across different studies. The subsystems chosen for cross-linguistic comparison are generally biased towards the typological distinctions made within Indo-European languages. Thus, not surprisingly, there have been no comparative studies of children's acquisition of switch reference systems, as far as I know. The evidence presented from languages like Japanese, Samoan, and Kaluli indicate too how biased our notions are of what children's language is like. Linguistics generally refer to the early stages of child language as telegraphic because much that would have to be present in the presumed adult equivalent has not been expressed. However, in other languages ellipsis is the norm. As Ochs (808) points out, in Samoan the relative nonexpression of major constituents is a sign of competence, since the presence of a subject or object would represent a marked strategy of expression. In English, however, telegraphic speech is indicative of incompetence. In still other cases, like Turkish, child speech is not telegraphic because children acquire most of the inflectional system by 2 years of age or earlier. Although utterances are short
Maritime trade is a key component of the economies of China and Japan, the two leading powers in East Asia. In the context of globalization and increasing international trade flows, the ports of these countries play a central role in ensuring efficient logistics and transportation of goods and act as a barometer of foreign trade. The importance of maritime trade for both countries is due to their geographical location, the development of port infrastructure, and the growing needs for international supplies. Sino-Japanese trade relations and interaction in the port and related areas are a complex and multifaceted process, and are also an important aspect of economic and geopolitical interaction in the Asia-Pacific region. On the one hand, the countries seek joint projects and technological innovations to improve the efficiency of ports and logistics. On the other hand, geopolitical factors and competition for influence create tensions in their relations. Despite the historical and political differences between Japan and China, complicated by the tense U.S.-China confrontation, the potential for improving trade ties, joint work in the field of maritime infrastructure development, logistics and port economy has not yet been lost or exhausted.
Based on the historical context, the article analyzes the current state and potential directions for the development of bilateral cooperation between China and Japan in the areas of mutual trade and modernization of seaport infrastructure as a tool for this cooperation. The study includes periodization of the relations of both countries in the field of maritime transport and foreign trade. Particular attention is paid to analyzing the maritime profiles of China and Japan, identifying key factors of economic and geopolitical competition between the two countries while taking into account the peculiarities of the economy, geography, and strategies for the development of port facilities of the two countries. The current indicators of bilateral trade are assessed, regional and global economic and strategic aspects of competition between the countries in the field of port infrastructure are studied, and the current political risks and opportunities caused by the complex geopolitical situation in the Asia-Pacific region are assessed.
The relevance of this research is underpinned by the importance of supporting and promoting the Russian language as an instrument of the Russian Federation’s humanitarian policy abroad, coupled with the strategic signi cance of the comprehensive partnership of our country with the People’s Republic of China (PRC). Against the backdrop of rapidly intensifying Russian-Chinese relations, the training of highly quali ed Chinese specialists pro cient in Russian has emerged as a critical factor for e ective cooperation in economic, scienti c, technological and cultural spheres. In this context, the purpose of this article is to evaluate the current state of Russian language education in China. In this research the authors employed a combination of general scienti c and historical methods. The foundation was formed by the historical and problem-chronological methods and approaches. The history of Russian language learning in China spans over three centuries, commencing with the establishment of the Russian Language Institution in Beijing in 1708. A peak in popularity of Russian language study occurred in the 1950s, fueled by the Soviet-Chinese alliance relations, followed by a decline during the subsequent deterioration of bilateral ties. A renewed phase of growth began after China introduced its reform and opening-up policy in 1978, gaining further momentum with the development of the strategic partnership between Russia and China at the beginning of the 21st century. At present, the Russian language maintains a stable position as one of the most sought-after foreign languages in China, trailing only English and Japanese in terms of prevalence. The Chinese Association of Teachers of the Russian Language and Literature plays a leading role in consolidating the community of Russists and providing methodological support. The authors highlight several key challenges, including the persistence of outdated teaching methodologies prioritizing grammatical accuracy over communicative competence; a de cit of contemporary teaching materials; the geographical concentration of Russian language centers in Northeast China and the limited career prospects for Russian studies graduates lacking supplementary specializations. As a viable solution, the study proposes a wider adoption of the “Language + Specialty” model. This approach cultivates specialists who possess not only Russian language pro ciency but also competencies in speci c professional elds (e.g., economics, law, technical sciences), thereby better aligning with the demands of Russian-Chinese practical cooperation.
International relations, Political science (General)
Language Models (LMs) have revolutionized natural language processing, enabling high-quality text generation through prompting and in-context learning. However, models often struggle with long-context summarization due to positional biases, leading to suboptimal extraction of critical information. There are techniques to improve this with fine-tuning, pipelining, or using complex techniques, which have their own challenges. To solve these challenges, we propose QA-prompting - a simple prompting method for summarization that utilizes question-answering as an intermediate step prior to summary generation. Our method extracts key information and enriches the context of text to mitigate positional biases and improve summarization in a single LM call per task without requiring fine-tuning or pipelining. Experiments on multiple datasets belonging to different domains using ten state-of-the-art pre-trained models demonstrate that QA-prompting outperforms baseline and other state-of-the-art methods, achieving up to 29% improvement in ROUGE scores. This provides an effective and scalable solution for summarization and highlights the importance of domain-specific question selection for optimal performance.
In recent years, training methods centered on Reinforcement Learning (RL) have markedly enhanced the reasoning and alignment performance of Large Language Models (LLMs), particularly in understanding human intents, following user instructions, and bolstering inferential strength. Although existing surveys offer overviews of RL augmented LLMs, their scope is often limited, failing to provide a comprehensive summary of how RL operates across the full lifecycle of LLMs. We systematically review the theoretical and practical advancements whereby RL empowers LLMs, especially Reinforcement Learning with Verifiable Rewards (RLVR). First, we briefly introduce the basic theory of RL. Second, we thoroughly detail application strategies for RL across various phases of the LLM lifecycle, including pre-training, alignment fine-tuning, and reinforced reasoning. In particular, we emphasize that RL methods in the reinforced reasoning phase serve as a pivotal driving force for advancing model reasoning to its limits. Next, we collate existing datasets and evaluation benchmarks currently used for RL fine-tuning, spanning human-annotated datasets, AI-assisted preference data, and program-verification-style corpora. Subsequently, we review the mainstream open-source tools and training frameworks available, providing clear practical references for subsequent research. Finally, we analyse the future challenges and trends in the field of RL-enhanced LLMs. This survey aims to present researchers and practitioners with the latest developments and frontier trends at the intersection of RL and LLMs, with the goal of fostering the evolution of LLMs that are more intelligent, generalizable, and secure.
We present CrossTL, a universal programming language translator enabling bidirectional translation between multiple languages through a unified intermediate representation called CrossGL. Traditional approaches require separate translators for each language pair, leading to exponential complexity growth. CrossTL uses a single universal IR to facilitate translations between CUDA, HIP, Metal, DirectX HLSL, OpenGL GLSL, Vulkan SPIR-V, Rust, and Mojo, with Slang support in development. Our system consists of: language-specific lexers/parsers converting source code to ASTs, bidirectional CrossGL translation modules implementing ToCrossGLConverter classes for importing code and CodeGen classes for target generation, and comprehensive backend implementations handling full translation pipelines. We demonstrate effectiveness through comprehensive evaluation across programming domains, achieving successful compilation and execution across all supported backends. The universal IR design enables adding new languages with minimal effort, requiring only language-specific frontend/backend components. Our contributions include: (1) a unified IR capturing semantics of multiple programming paradigms, (2) a modular architecture enabling extensibility, (3) a comprehensive framework supporting GPU compute, graphics programming, and systems languages, and (4) empirical validation demonstrating practical viability of universal code translation. CrossTL represents a significant step toward language-agnostic programming, enabling write-once, deploy-everywhere development.
This report offers a brief reflection on a graduate student forum held under the theme “The Institutions and Ethics of Care and Self-Care.” The presentations explored how contemporary Japanese literature engages with caregiving, emotion, and relationality through various narrative voices and critical perspectives.
Over the years there has been ongoing interest in detecting authorship of a text based on statistical properties of the text, such as by using occurrence rates of noncontextual words. In previous work, these techniques have been used, for example, to determine authorship of all of \emph{The Federalist Papers}. Such methods may be useful in more modern times to detect fake or AI authorship. Progress in statistical natural language parsers introduces the possibility of using grammatical structure to detect authorship. In this paper we explore a new possibility for detecting authorship using grammatical structural information extracted using a statistical natural language parser. This paper provides a proof of concept, testing author classification based on grammatical structure on a set of "proof texts," The Federalist Papers and Sanditon which have been as test cases in previous authorship detection studies. Several features extracted from the statistical natural language parser were explored: all subtrees of some depth from any level; rooted subtrees of some depth, part of speech, and part of speech by level in the parse tree. It was found to be helpful to project the features into a lower dimensional space. Statistical experiments on these documents demonstrate that information from a statistical parser can, in fact, assist in distinguishing authors.
We present a novel extension to Retrieval Augmented Generation with the goal of mitigating factual inaccuracies in the output of large language models. Specifically, our method draws on the cognitive linguistic theory of frame semantics for the indexing and retrieval of factual information relevant to helping large language models answer queries. We conduct experiments to demonstrate the effectiveness of this method both in terms of retrieval effectiveness and in terms of the relevance of the frames and frame relations automatically generated. Our results show that this novel mechanism of Frame Semantic-based retrieval, designed to improve Retrieval Augmented Generation (FS-RAG), is effective and offers potential for providing data-driven insights into frame semantics theory. We provide open access to our program code and prompts.
Neris Özen, Wenjuan Mu, Esther D. van Asselt
et al.
The number of scientific articles published in the domain of food safety has consistently been increasing over the last few decades. It has therefore become unfeasible for food safety experts to read all relevant literature related to food safety and the occurrence of hazards in the food chain. However, it is important that food safety experts are aware of the newest findings and can access this information in an easy and concise way. In this study, an approach is presented to automate the extraction of chemical hazards from the scientific literature through large language models. The large language model was used out-of-the-box and applied on scientific abstracts; no extra training of the models or a large computing cluster was required. Three different styles of prompting the model were tested to assess which was the most optimal for the task at hand. The prompts were optimized with two validation foods (leafy greens and shellfish) and the final performance of the best prompt was evaluated using three test foods (dairy, maize and salmon). The specific wording of the prompt was found to have a considerable effect on the results. A prompt breaking the task down into smaller steps performed best overall. This prompt reached an average accuracy of 93% and contained many chemical contaminants already included in food monitoring programs, validating the successful retrieval of relevant hazards for the food safety domain. The results showcase how valuable large language models can be for the task of automatic information extraction from the scientific literature.
Sequential recommendation models user interests based on historical behaviors to provide personalized recommendation. Previous sequential recommendation algorithms primarily employ neural networks to extract features of user interests, achieving good performance. However, due to the recommendation system datasets sparsity, these algorithms often employ small-scale network frameworks, resulting in weaker generalization capability. Recently, a series of sequential recommendation algorithms based on large pre-trained language models have been proposed. Nonetheless, given the real-time demands of recommendation systems, the challenge remains in applying pre-trained language models for rapid recommendations in real scenarios. To address this, we propose a sequential recommendation algorithm based on a pre-trained language model and knowledge distillation. The key of proposed algorithm is to transfer pre-trained knowledge across domains and achieve lightweight inference by knowledge distillation. The algorithm operates in two stages: in the first stage, we fine-tune the pre-trained language model on the recommendation dataset to transfer the pre-trained knowledge to the recommendation task; in the second stage, we distill the trained language model to transfer the learned knowledge to a lightweight model. Extensive experiments on multiple public recommendation datasets show that the proposed algorithm enhances recommendation accuracy and provide timely recommendation services.
Fernando Gabriela Garcia, Spencer Burns, Harrison Fuller
In this paper, we introduce ChatCite, a novel method leveraging large language models (LLMs) for generating comparative literature summaries. The ability to summarize research papers with a focus on key comparisons between studies is an essential task in academic research. Existing summarization models, while effective at generating concise summaries, fail to provide deep comparative insights. ChatCite addresses this limitation by incorporating a multi-step reasoning mechanism that extracts critical elements from papers, incrementally builds a comparative summary, and refines the output through a reflective memory process. We evaluate ChatCite on a custom dataset, CompLit-LongContext, consisting of 1000 research papers with annotated comparative summaries. Experimental results show that ChatCite outperforms several baseline methods, including GPT-4, BART, T5, and CoT, across various automatic evaluation metrics such as ROUGE and the newly proposed G-Score. Human evaluation further confirms that ChatCite generates more coherent, insightful, and fluent summaries compared to these baseline models. Our method provides a significant advancement in automatic literature review generation, offering researchers a powerful tool for efficiently comparing and synthesizing scientific research.
An author's creative process cannot be separated from the storehouse of knowledge obtained from different results of reading, hearing, or observing of the events around him. This store of knowledge, when juxtaposed with the concept introduced by Wolfgang Iser in his book The Act of Reading: A Theory of Aesthetic Response (1987), can be called a Repertoire. Shortly, repertoire can be understood as the basis for creating a work, as the background to make the foreground the author aims at through his work. This process also applies to Akutagawa Ryuunosuke's short story entitled Rashomon as the foreground of Konjakumonogatari, the 29th volume of the 18th story. This research aims to describe how the social, historical, and cultural writings by Akutagawa Ryunosuke in the Rashomon and compared with Konjakumonogatari, using the Aesthetic Repertoire theory proposed by Wolfgang Iser. The process through which, among others, grouping the data to be analyzed is related to social, historical, and cultural norms of Japanese society. Next, compare the data to see the relationship between Rashomon and Konjakumonogatari. The results showed (1) There are similarities between social, historical, and cultural similarities between literature and reality, (2) Social norms indicate the life of the Japanese lower class in the Heian period called Genin, (3) Historical norms show the dark conditions that Japanese people went through in the Heian era, because of the many problems that occurred at that time, and (4) Cultural Norms show the efforts made by Japanese people in the Heian period to survive despite hurting others.
Large Language Models (LLMs) represent a revolution in AI. However, they also pose many significant risks, such as the presence of biased, private, copyrighted or harmful text. For this reason we need open, transparent and safe solutions. We introduce a complete open-source ecosystem for developing and testing LLMs. The goal of this project is to boost open alternatives to closed-source approaches. We release h2oGPT, a family of fine-tuned LLMs of diverse sizes. We also introduce H2O LLM Studio, a framework and no-code GUI designed for efficient fine-tuning, evaluation, and deployment of LLMs using the most recent state-of-the-art techniques. Our code and models are fully open-source. We believe this work helps to boost AI development and make it more accessible, efficient and trustworthy. The demo is available at: https://gpt.h2o.ai/
The Olympic and Paralympic Games in Tokyo in July–September 2021 took place in a challenging social environment that seriously affected the public perception of events. When preparing for the Olympics in 2013–2019, the Japanese people actively supported the Games, which was confirmed by the results of numerous sociological studies. In March 2020, the COVID-19 pandemic began, followed by several waves of infection. The competition was postponed for a year. Vaccination in Japan was delayed compared to most G7 countries. Against this background, in the summer of 2021, the most dangerous Delta strain of coronavirus began to spread in the country, bringing the rise in mortality rates, and the overflowing of hospitals in large cities. In such a difficult epidemiological and social situation, surveys recorded a negative attitude towards the Olympics. However, during the competition, the majority opinion once again turned positive, mainly due to the athletic successes of the Japanese team and effective anti-virus control measures. The absence of spectators in the venues, most probably, did not affect the sporting achievements significantly. At least, Japanese Olympic team won a record number of medals. Infection prevention measures proved effective in limiting the transmission of the virus among the athletes and the Japanese service personnel. The economic and symbolic achievements of the Games did not meet expectations, as, during the Olympics, it was not possible to properly address its significance as the end point of the low-growth “lost decades”, evidence of economic recovery after the triple disaster of 2011, and as a tool to increase Japan’s tourist attractiveness. Therefore, during a pandemic, major sports events should be held primarily to train top-class athletes and to increase populace satisfaction with the success of the national team rather than to obtain direct economic benefits or improve the host country’s image.
The article addresses the Russian vector of Japan’s Arctic policy. The main areas of Japan’s interest in cooperation with Russia in the Arctic region are energy, transport, and security. The article focuses on the developments that took place in these areas in 2019-2020, which have not yet received proper coverage in Russian historiography. Pursuing the policy of diversification of energy supply sources, Japan turns its attention to the Russian Arctic as one of the promising areas of cooperation in the gas sector. In 2019, Japanese companies signed a contract for the purchase of a 10-percent stake in the Arctic LNG-2 project, which provides for Japanese investment worth almost $3 billion. As one of the primary areas of cooperation with Russia, Japan also considers participation in the transport and logistics development of the Northern Sea Route, which is indispensable for the implementation of gas production projects on the Yamal Peninsula. In addition, Japan is interested in establishing clear and stable “game rules” in the Arctic, and, in this sense, the security sphere in the Arctic region is becoming one of the most important areas of cooperation with Russia. The Russian vector of Japan’s Arctic policy received an additional impetus in connection with the policy of rapprochement with Moscow conducted by the Abe cabinets in 2012-2020. The Arctic projects have become an integral part of the Eight-Point Plan, contributing to Japan’s energy and economic security. Cooperation in the Arctic is directly linked not only to the projects of the development of the Northern Sea Route and Arctic projects for the extraction and liquefaction of natural gas, but also to bilateral projects in the fields of “green energy”, development of port infrastructure, urban construction, fish processing, ecology, improving people’s living conditions, medicine, tourism, etc.
Yanjun Gao, Dmitriy Dligach, Timothy Miller
et al.
Applying methods in natural language processing on electronic health records (EHR) data is a growing field. Existing corpus and annotation focus on modeling textual features and relation prediction. However, there is a paucity of annotated corpus built to model clinical diagnostic thinking, a process involving text understanding, domain knowledge abstraction and reasoning. This work introduces a hierarchical annotation schema with three stages to address clinical text understanding, clinical reasoning, and summarization. We created an annotated corpus based on an extensive collection of publicly available daily progress notes, a type of EHR documentation that is collected in time series in a problem-oriented format. The conventional format for a progress note follows a Subjective, Objective, Assessment and Plan heading (SOAP). We also define a new suite of tasks, Progress Note Understanding, with three tasks utilizing the three annotation stages. The novel suite of tasks was designed to train and evaluate future NLP models for clinical text understanding, clinical knowledge representation, inference, and summarization.
Abstract Kawahara, Noto, and Kumagai (2018b) found that within the corpus of existing Pokémon names, the number of voiced obstruents in the characters’ names correlates positively with their weight, height, evolution levels and attack values. While later experimental studies to some extent confirmed the productivity of these sound symbolic relationships (e.g. Kawahara and Kumagai 2019a), they are limited, due to the fact that the visual images presented to the participants primarily differed with regard to evolution levels. The current experiments thus for the first time directly explored how each of these semantic dimensions—weight, height, evolution levels, and attack values—correlates with the number of voiced obstruents in nonce names. The results of two judgment experiments show that all of these parameters indeed correlate positively with the number of voiced obstruents in the names. Overall, the results show that a particular class of sounds—in our case, a set of voiced obstruents—can signal different semantic meanings within a single language, supporting the pluripotentiality of sound symbolism (Winter, Pérez-Sobrino, and Brown 2019). We also address another general issue that has been under-explored in the literature on sound symbolism; namely, its cumulative nature. In both of the experiments, we observe that two voiced obstruents evoke stronger images than one voiced obstruent, instantiating what is known as the counting cumulativity effect (Jäger and Rosenbach 2006).
Technological development has to be used as an opportunity to introduce the Indonesian language as the identity of the Indonesian nation abroad. YouTube is one of the heterogeneous virtual platforms that involves people all over the country. This phenomenon encourages Indonesian to continue to compete with foreign languages so that efforts to introduce and maintain Indonesian should be intensified. This qualitative descriptive study aims to qualitatively describe the phenomenon of the Indonesian language pride of Jerome Polin Sijabat on the YouTube channel of Nihongo Mantappu. The data were the utterances of Jerome Polin Sijabat and other supporting YouTubers who use Indonesian and written comments from netizens obtained from the Nihongo Mantappu YouTube channel. The data sources of this research were the YouTube channel of Nihongo Mantappu. Data were collected by downloading, screen capturing, notetaking techniques, and viewing techniques. The results of this study indicated that Jerome Polin Sijabat has a proud attitude of speaking Indonesian. It is indicated when Jerome Polin Sijabat makes Indonesian as his identity. Jerome Polin Sijabat uses Indonesian when asking, inviting, informing, promoting to his friends who are originally from Japan on the YouTube channel of Nihongo Mantappu.