The article studies how China is using and developing artificial intelligence (AI) for its army. This is a big part of China’s plan to modernize and make its military smarter. The research looks at the main ideas behind China’s AI strategy, how it affects the world, and the moral problems that come with China’s goal to become the top country in military AI. It shows that China uses a centralized way to develop AI, with a lot of help from the government and easy access to data. The article also explains how China uses AI to run worldwide campaigns to influence what people think. China has some problems, like depending on foreign technology for making chips, and issues with bureaucracy and corruption. But it also has strong points, such as a huge number of workers and the ability to quickly use resources. The article gives a clear picture of how AI is growing in China and what it means for global security. It shows that using AI in information and mind-influencing wars is a serious threat. China tries to change what people think and what enemies decide by affecting their minds directly. The war in Ukraine proved that such AI-based tactics are dangerous but cannot win wars alone. Real fighting on the battlefield is still the key. Because of this, other countries – especially the United States – need to improve their own AI systems and create strong plans that combine physical, cyber, and mind-based operations to deal with China effectively. It is concluded that China’s use of artificial intelligence is one of the key factors strengthening its military power and serves as a catalyst for reshaping the modern geopolitical map of the world.
Szigethy Balázs recenziója A család szerepe a távol-keleti és belső-ázsiai régióban. Tanulmányok a család mint közösség társadalmi, vallási és rituális aspektusairól című kötetről (szerk.: Birtalan Ágnes – Teleki Krisztina).
This systematic review examined research on the self-efficacy of teachers of Chinese as a foreign language (TCFL) from 2004 to 2024. Guided by social cognitive theory and Bandura’s concept of self-efficacy, 15 empirical studies were synthesized following PRISMA guidelines, employing both qualitative and quantitative analyses. The review identified key factors influencing TCFL teacher self-efficacy, including personal, student, and environmental factors. It further showed that teacher self-efficacy predicts important outcomes such as technology use and integration, career development and retention, and emotional and psychological resources. The findings underscore the need for targeted professional development, supportive institutional policies, and cross-cultural adaptation resources, and they point to future research directions on emerging technologies and diverse teaching contexts.
Traditional Chinese Medicine (TCM) has seen increasing adoption in healthcare, with specialized Large Language Models (LLMs) emerging to support clinical applications. A fundamental requirement for these models is accurate identification of TCM drug ingredients. In this paper, we evaluate how general and TCM-specialized LLMs perform when identifying ingredients of Chinese drugs. Our systematic analysis reveals consistent failure patterns: models often interpret drug names literally, overuse common herbs regardless of relevance, and exhibit erratic behaviors when faced with unfamiliar formulations. LLMs also fail to understand the verification task. These findings demonstrate that current LLMs rely primarily on drug names rather than possessing systematic pharmacological knowledge. To address these limitations, we propose a Retrieval Augmented Generation (RAG) approach focused on ingredient names. Experiments across 220 TCM formulations show our method significantly improves accuracy from approximately 50% to 82% in ingredient verification tasks. Our work highlights critical weaknesses in current TCM-specific LLMs and offers a practical solution for enhancing their clinical reliability.
Disinformation spreads rapidly across linguistic boundaries, yet most AI models are still benchmarked only on English. We address this gap with a systematic comparison of five multilingual transformer models: mBERT, XLM, XLM-RoBERTa, RemBERT, and mT5 on a common fake-vs-true machine learning classification task. While transformer-based language models have demonstrated notable success in detecting disinformation in English, their effectiveness in multilingual contexts still remains up for debate. To facilitate evaluation, we introduce PolyTruth Disinfo Corpus, a novel corpus of 60,486 statement pairs (false claim vs. factual correction) spanning over twenty five languages that collectively cover five language families and a broad topical range from politics, health, climate, finance, and conspiracy, half of which are fact-checked disinformation claims verified by an augmented MindBugs Discovery dataset. Our experiments revealed performance variations. Models such as RemBERT achieved better overall accuracy, particularly excelling in low-resource languages, whereas models like mBERT and XLM exhibit considerable limitations when training data is scarce. We provide a discussion of these performance patterns and implications for real-world deployment. The dataset is publicly available on our GitHub repository to encourage further experimentation and advancement. Our findings illuminate both the potential and the current limitations of AI systems for multilingual disinformation detection.
Traditional Chinese Medicine (TCM), with a history spanning over two millennia, plays a role in global healthcare. However, applying large language models (LLMs) to TCM remains challenging due to its reliance on holistic reasoning, implicit logic, and multimodal diagnostic cues. Existing TCM-domain LLMs have made progress in text-based understanding but lack multimodal integration, interpretability, and clinical applicability. To address these limitations, we developed BenCao, a ChatGPT-based multimodal assistant for TCM, integrating structured knowledge bases, diagnostic data, and expert feedback refinement. BenCao was trained through natural language instruction tuning rather than parameter retraining, aligning with expert-level reasoning and ethical norms specific to TCM. The system incorporates a comprehensive knowledge base of over 1,000 classical and modern texts, a scenario-based instruction framework for diverse interactions, a chain-of-thought simulation mechanism for interpretable reasoning, and a feedback refinement process involving licensed TCM practitioners. BenCao connects to external APIs for tongue-image classification and multimodal database retrieval, enabling dynamic access to diagnostic resources. In evaluations across single-choice question benchmarks and multimodal classification tasks, BenCao achieved superior accuracy to general-domain and TCM-domain models, particularly in diagnostics, herb recognition, and constitution classification. The model was deployed as an interactive application on the OpenAI GPTs Store, accessed by nearly 1,000 users globally as of October 2025. This study demonstrates the feasibility of developing a TCM-domain LLM through natural language-based instruction tuning and multimodal integration, offering a practical framework for aligning generative AI with traditional medical reasoning and a scalable pathway for real-world deployment.
Idioms serve as crucial components that illustrate the expressive capabilities of a society’s language and enhance stylistic richness. Their presence significantly enriches the narrative quality of literary works. Throughout Chinese history, literary creations have emerged in various periods, with distinct genres gaining prominence. During the Tang Dynasty (618-907), poetry notably ascended as a leading literary form. This study investigates the idioms found in selected poems by Li Bai, an immortal poet emblematic of Tang poetry. Li Bai’s body of work includes over a thousand poems. The analysis focuses on his poems featured in Three Hundred Tang Poems (唐诗三百首), compiled by Gu Qing in 2009 and published by Zhonghua Shuju (中华书局), which is a vital resource for scholars of Tang poetry and is written in Chinese. Employing a qualitative research design, the study utilizes literature screening for data collection, followed by content analysis. The volume encompasses twenty-six poems attributed to Li Bai, eight of which contain a total of fifteen distinct idioms. The research includes translations of these poems into Turkish, provides a thematic overview, analyzes and categorizes the idioms by meaning, and offers suggestions for closely related Turkish idioms.
Patronizing and Condescending Language (PCL) is a form of discriminatory toxic speech targeting vulnerable groups, threatening both online and offline safety. While toxic speech research has mainly focused on overt toxicity, such as hate speech, microaggressions in the form of PCL remain underexplored. Additionally, dominant groups' discriminatory facial expressions and attitudes toward vulnerable communities can be more impactful than verbal cues, yet these frame features are often overlooked. In this paper, we introduce the PCLMM dataset, the first Chinese multimodal dataset for PCL, consisting of 715 annotated videos from Bilibili, with high-quality PCL facial frame spans. We also propose the MultiPCL detector, featuring a facial expression detection module for PCL recognition, demonstrating the effectiveness of modality complementarity in this challenging task. Our work makes an important contribution to advancing microaggression detection within the domain of toxic speech.
In recent years, the breakthrough of Large Language Models (LLMs) offers new ideas for achieving universal methods on graph data. The common practice of converting graphs into natural language for LLMs, which refers to graph flattening, exhibits good generalizability and interpretability. However, the poor organization of the textual format results in poor performance in long-distance scenario understanding. Inspired by human cognitive reasoning habits, we propose a novel method for graph flattening to fit LLMs, termed as End-to-End DAG-Path prompting (EEDP). Experiments on real-world datasets show that EEDP enhances the reasoning performance of LLMs in long-distance scenarios while maintaining excellent performance in short-distance scenarios, demonstrating good robustness in the face of distance variations.
The surge of large language models (LLMs) has driven significant progress in medical applications, including traditional Chinese medicine (TCM). However, current medical LLMs struggle with TCM diagnosis and syndrome differentiation due to substantial differences between TCM and modern medical theory, and the scarcity of specialized, high-quality corpora. To this end, in this paper we propose BianCang, a TCM-specific LLM, using a two-stage training process that first injects domain-specific knowledge and then aligns it through targeted stimulation to enhance diagnostic and differentiation capabilities. Specifically, we constructed pre-training corpora, instruction-aligned datasets based on real hospital records, and the ChP-TCM dataset derived from the Pharmacopoeia of the People's Republic of China. We compiled extensive TCM and medical corpora for continual pre-training and supervised fine-tuning, building a comprehensive dataset to refine the model's understanding of TCM. Evaluations across 11 test sets involving 31 models and 4 tasks demonstrate the effectiveness of BianCang, offering valuable insights for future research. Code, datasets, and models are available on https://github.com/QLU-NLP/BianCang.
As an indispensable ingredient of intelligence, commonsense reasoning is crucial for large language models (LLMs) in real-world scenarios. In this paper, we propose CORECODE, a dataset that contains abundant commonsense knowledge manually annotated on dyadic dialogues, to evaluate the commonsense reasoning and commonsense conflict detection capabilities of Chinese LLMs. We categorize commonsense knowledge in everyday conversations into three dimensions: entity, event, and social interaction. For easy and consistent annotation, we standardize the form of commonsense knowledge annotation in open-domain dialogues as "domain: slot = value". A total of 9 domains and 37 slots are defined to capture diverse commonsense knowledge. With these pre-defined domains and slots, we collect 76,787 commonsense knowledge annotations from 19,700 dialogues through crowdsourcing. To evaluate and enhance the commonsense reasoning capability for LLMs on the curated dataset, we establish a series of dialogue-level reasoning and detection tasks, including commonsense knowledge filling, commonsense knowledge generation, commonsense conflict phrase detection, domain identification, slot identification, and event causal inference. A wide variety of existing open-source Chinese LLMs are evaluated with these tasks on our dataset. Experimental results demonstrate that these models are not competent to predict CORECODE's plentiful reasoning content, and even ChatGPT could only achieve 0.275 and 0.084 accuracy on the domain identification and slot identification tasks under the zero-shot setting. We release the data and codes of CORECODE at https://github.com/danshi777/CORECODE to promote commonsense reasoning evaluation and study of LLMs in the context of daily conversations.
Zheng-Xin Yong, Cristina Menghini, Stephen H. Bach
AI safety training and red-teaming of large language models (LLMs) are measures to mitigate the generation of unsafe content. Our work exposes the inherent cross-lingual vulnerability of these safety mechanisms, resulting from the linguistic inequality of safety training data, by successfully circumventing GPT-4's safeguard through translating unsafe English inputs into low-resource languages. On the AdvBenchmark, GPT-4 engages with the unsafe translated inputs and provides actionable items that can get the users towards their harmful goals 79% of the time, which is on par with or even surpassing state-of-the-art jailbreaking attacks. Other high-/mid-resource languages have significantly lower attack success rate, which suggests that the cross-lingual vulnerability mainly applies to low-resource languages. Previously, limited training on low-resource languages primarily affects speakers of those languages, causing technological disparities. However, our work highlights a crucial shift: this deficiency now poses a risk to all LLMs users. Publicly available translation APIs enable anyone to exploit LLMs' safety vulnerabilities. Therefore, our work calls for a more holistic red-teaming efforts to develop robust multilingual safeguards with wide language coverage.
Bhavyajeet Singh, Pavan Kandru, Anubhav Sharma
et al.
Massive knowledge graphs like Wikidata attempt to capture world knowledge about multiple entities. Recent approaches concentrate on automatically enriching these KGs from text. However a lot of information present in the form of natural text in low resource languages is often missed out. Cross Lingual Information Extraction aims at extracting factual information in the form of English triples from low resource Indian Language text. Despite its massive potential, progress made on this task is lagging when compared to Monolingual Information Extraction. In this paper, we propose the task of Cross Lingual Fact Extraction(CLFE) from text and devise an end-to-end generative approach for the same which achieves an overall F1 score of 77.46.
Bram M. A. van Dijk, Tom Kouwenhoven, Marco R. Spruit
et al.
Current Large Language Models (LLMs) are unparalleled in their ability to generate grammatically correct, fluent text. LLMs are appearing rapidly, and debates on LLM capacities have taken off, but reflection is lagging behind. Thus, in this position paper, we first zoom in on the debate and critically assess three points recurring in critiques of LLM capacities: i) that LLMs only parrot statistical patterns in the training data; ii) that LLMs master formal but not functional language competence; and iii) that language learning in LLMs cannot inform human language learning. Drawing on empirical and theoretical arguments, we show that these points need more nuance. Second, we outline a pragmatic perspective on the issue of `real' understanding and intentionality in LLMs. Understanding and intentionality pertain to unobservable mental states we attribute to other humans because they have pragmatic value: they allow us to abstract away from complex underlying mechanics and predict behaviour effectively. We reflect on the circumstances under which it would make sense for humans to similarly attribute mental states to LLMs, thereby outlining a pragmatic philosophical context for LLMs as an increasingly prominent technology in society.
Shaohua Lyu, Shaohua Lyu, Claire Shuiqing Zhang
et al.
BackgroundMigraine is a prevalent headache disorder with significant impacts on patients' quality of life and economic burden. Chinese herbal medicine (CHM) is commonly prescribed for migraine in China. This review aimed to provide a rigorous evaluation of evidence on the efficacy of oral CHM for migraine and explore the correlation between its effect size and treatment duration.MethodsWe searched nine digital databases (PubMed, EMBASE, CINAHL, Cochrane Central Register of Controlled Trials, AMED, BioMedical Literature, CNKI, CQVIP, and Wanfang Data) from their inceptions to May 2021, with the language being restricted to Chinese and English. Randomized, placebo-controlled trials using oral CHM to treat adult migraine were included. Data screening and extraction were conducted by two independent reviewers. The methodological quality of randomized controlled trials (RCTs) was assessed using the Cochrane Risk of Bias tool. Meta-analyses were conducted to estimate the effect size using a random effect model, and a robust variance estimation (RVE) model was constructed to explore the correlation between treatment effects and treatment duration. The certainty of the evidence was assessed with the Grading of Recommendations Assessment, Development, and Evaluation. Publication bias was tested using a funnel plot and Egger's test.ResultsA total of 18 RCTs involving 3,015 participants were included. Results of the meta-analyses showed that, at the end of the treatment phase, CHM was more efficacious than placebo in reducing migraine frequency, migraine days, and pain severity, and increasing response rate. Additionally, CHM showed superior effects to placebo in lowering migraine frequency and pain severity at the end of the 4-week follow-up. The RVE model suggested that the benefits of CHM for migraine frequency and pain intensity increased as treatment duration extended. The number of adverse events reported by the CHM and placebo groups was comparable. The certainty of the evidence was graded as “moderate.” No publication bias was detected.ConclusionOral CHM appeared to be more efficacious than placebo for reducing migraine frequency and pain severity. Greater treatment effects were associated with longer treatment duration. The oral CHM was well tolerated.Systematic Review Registrationhttps://www.crd.york.ac.uk/prospero/#recordDetails, identifier: CRD42021270719.
Training generalist agents is difficult across several axes, requiring us to deal with high-dimensional inputs (space), long horizons (time), and generalization to novel tasks. Recent advances with architectures have allowed for improved scaling along one or two of these axes, but are still computationally prohibitive to use. In this paper, we propose to address all three axes by leveraging \textbf{L}anguage to \textbf{C}ontrol \textbf{D}iffusion models as a hierarchical planner conditioned on language (LCD). We effectively and efficiently scale diffusion models for planning in extended temporal, state, and task dimensions to tackle long horizon control problems conditioned on natural language instructions, as a step towards generalist agents. Comparing LCD with other state-of-the-art models on the CALVIN language robotics benchmark finds that LCD outperforms other SOTA methods in multi-task success rates, whilst improving inference speed over other comparable diffusion models by 3.3x~15x. We show that LCD can successfully leverage the unique strength of diffusion models to produce coherent long range plans while addressing their weakness in generating low-level details and control.
This paper describes the University of Maryland's submission to the Special Task on Formality Control for Spoken Language Translation at \iwslt, which evaluates translation from English into 6 languages with diverse grammatical formality markers. We investigate to what extent this problem can be addressed with a \textit{single multilingual model}, simultaneously controlling its output for target language and formality. Results show that this strategy can approach the translation quality and formality control achieved by dedicated translation models. However, the nature of the underlying pre-trained language model and of the finetuning samples greatly impact results.
We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license. It is, to the best of our knowledge, the largest dense autoregressive model that has publicly available weights at the time of submission. In this work, we describe \model{}'s architecture and training and evaluate its performance on a range of language-understanding, mathematics, and knowledge-based tasks. We find that GPT-NeoX-20B is a particularly powerful few-shot reasoner and gains far more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models. We open-source the training and evaluation code, as well as the model weights, at https://github.com/EleutherAI/gpt-neox.