Hasil "Chinese language and literature"

DOAJ Open Access 2026

Quality appraisal of clinical guidelines for newborns in China: a systematic review protocol

Li Jing, Zengzhen Liao, Luqin Liao et al.

Introduction Neonatal guidelines are critical for enhancing healthcare quality and reducing neonatal mortality in China. However, the methodological quality of existing guidelines varies considerably, and the credibility and implementability (CI) of their recommendations has not been systematically evaluated. Consequently, key areas for improvement remain unclear. This protocol aims to appraise the methodological quality of Chinese neonatal guidelines and assess the CI of their recommendations using two evaluation frameworks.Methods and analysis This study will adhere to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses for Protocols statement. We will search multiple electronic databases, including PubMed, Embase, Web of Science, China National Knowledge Infrastructure, Wanfang Database, the Chinese Science and Technology Periodical Database (VIP) and the Chinese Biomedical Literature Database, from inception to 31 December 2025. The search will use keywords such as ‘neonatal’, ‘clinical practice guidelines’ and ‘China’. The language of publication will be limited to Chinese or English. Two researchers will independently screen title, abstract and full text. Chinese-language clinical practice guidelines focused on neonatal disease management (rather than assessment or diagnosis) will be included. Relevant information (including guideline name, publication year, disease type, leading institution and guideline development group composition) will be extracted using a pre-designed form. Cross-verification will be performed, and discrepancies will be resolved by a third reviewer. Final data will be presented in tables. We will use a narrative synthesis to address two objectives: (1) to evaluate the methodological quality of Chinese neonatal guidelines (especially those published in the past 5 years) using AGREE II (6 domains, 23 items, scored 1–7) and identify critical deficiencies in the guideline development process; and (2) appraise the CI of recommendations for guidelines scoring ≥60% in the domain 3 (‘Rigour of Development’) of AGREE II using the AGREE-REX (Appraisal of Guidelines for Research and Evaluation – Recommendation EXcellence) tool, deriving context-specific, evidence-based optimisation strategies. This review will explore the strengths and limitations of Chinese neonatal clinical practice guidelines. The findings will provide evidence-based direction for future guideline development and recommendation formulation.Ethics and dissemination No ethical approval is required. Results will be disseminated through open-access, peer-reviewed journal publications and conference presentations. The results will inform efforts to improve the methodological quality of future neonatal guidelines in China and the CI of their recommendations.PROSPERO registration number This systematic review protocol was registered on PROSPERO (registration number CRD420251106060).

Medicine

Detail DOI Sumber

DOAJ Open Access 2025

毛泽东《抗日游击战争的战略问题》在印尼大将军《游击战争基本原理》记载的互文性浅析

Woro Januarti

https://docs.google.com/document/d/1xwu7D52-z5dxlPnQ-4HE_EqqFGOoqAd0/edit?usp=share_link&ouid=106136773958812733977&rtpof=true&sd=true

Chinese language and literature

Detail DOI Sumber

DOAJ Open Access 2025

Chinese EFL students' perceptions about the role of artificial intelligence (AI) technologies in their second language (L2) self-concept

Shiyuan Huang

The applications of Artificial Intelligence (AI) technologies to second or foreign language (L2) education have recently been the focus of several studies in the literature. However, the impact of AI tools on students' psychological-affective states has remained under-explored in many contexts. To address this gap, the present qualitative study intended to examine the role of AI technologies in Chinese English as a foreign language (EFL) students' self-concept. To this end, 62 students were interviewed online. The results of thematic analysis showed four major areas, where AI tools could contribute to EFL students' self-concept through ‘removing negative academic emotions’, ‘developing students’ skills, competency, and knowledge’, increasing learning efficacy’, and ‘fostering autonomous and self-optimized learning’. The findings are separately discussed and practical implications are provided for EFL students and practitioners considering the role of AI technologies in L2 learning and learners' psycho-emotional states.

Psychology

Detail DOI Sumber

CrossRef Open Access 2025

Study on the usage of Chinese place names and Pinyin spelling

Zexia Zheng

Place names is one of the significant windows to understanding and exploring different language, different culture and different social development of a re-gion. In recent years, the naming, renaming, using and cultural protection of place names have increasingly attracted public attention, with relevant national and provincial laws being enacted one after another. This study centers on op-timizing the application of Pinyin in the standardization practices, with par-tic-ular emphasis on three critical dimensions. Additionally, in special cities mat-ters concerning the addition and modification of existing place names also de-serve our attention. Finally, the study proposes three suggestions: encourag-ing citizens to participate in policy-making, improving legal regulations for place names through moderate pilot programs, as well as strengthening the protec-tion of cultural heritage.

en

Detail DOI Sumber

CrossRef Open Access 2025

Analysis of Julie's Feminist Thoughts in the Chinese Translation of Flipped

Sijia Qiu

Flipped is a full-length novel by the American writer Wendelin Van Draanen, which was later remade into a famous film of the same name. It mainly tells the funny story of a boy and a girl’s first love growing up in adolescence. Although the plot of the novel is simple, it has affluent connotation. It is based on the sto-ry of Julie and Bryce’s first love, and involves a series of practical problems such as family affection and ethics, tolerance and prejudice, dreams and reality which cause the reader’s resonance and cogitation. Feminism is a kind of social and political movement based on women’s so-cial and life experiences and related theoretical research. From the early period when the main purpose was political appeal, it developed into the theoretical research and social appeal permeating every aspect of women’s life. The purpose of this thesis is to make a thorough study of Julie’s feminist thoughts in Flipped by combining the connotation, significance and historical develop-ment of feminism in British and American literary works, and to sum-marize the characteristics of these female images to further analyze the influ-ence of femi-nism on novel creation.

en

Detail DOI Sumber

CrossRef Open Access 2025

Analisis Kesalahan Dalam Penggunaan “Danshi” “Keshi” “Buguo” oleh Pembelajar Bahasa Mandarin di Indonesia Serta Strategi Pengajarannya

Eugenes Yenadiputri, Steffi Thanissa Halim

Bahasa Mandarin sering kali bergantung pada kata hubung untuk membentuk kata, frasa, atau kalimat. Namun, para siswa Indonesia sering merasa kesulitan dalam menggunakan konjungsi, terutama konjungsi pertentangan. Oleh karena itu, penelitian ini menganalisis kesalahan yang dilakukan oleh mahasiswa Program Studi Bahasa Mandarin Universitas Kristen Petra dalam menggunakan konjungsi pertentangan “但是dànshì, 可是kěshì, dan 不过búguò melalui penyebaran kuesioner. Melalui analisis data kuesioner, ditemukan bahwa para mahasiswa sering melakukan kesalahan dalam hal kolokasi kata, serta saat menggunakan “但是dànshì” untuk menghubungkan dua unsur. Dalam bagian “可是kěshì”, mahasiswa kadang-kadang melakukan kesalahan saat menggunakannya secara tunggal, dalam kolokasi, serta ketika “可是kěshì” digunakan untuk menyatakan penyesalan. Sementara itu, dalam penggunaan “不过búguò”, mahasiswa juga menunjukkan kesalahan dalam kolokasi dan saat menggunakannya untuk menambahkan informasi tambahan. Penyebab dari kesalahan-kesalahan ini meliputi: pengaruh negatif dari bahasa ibu, kurangnya kesadaran terhadap kolokasi, serta faktor internal dari pembelajar itu sendiri. Selain itu, penelitian ini juga memberikan saran yang relevan bagi pengajar dan pembelajar. Diharapkan saran-saran ini dapat menjadi referensi yang bermanfaat bagi penyusunan materi ajar, perbaikan strategi pembelajaran, dan peningkatan efektivitas pengajaran.

en

Detail DOI Sumber

CrossRef Open Access 2025

The Transcendence of Traditional Concepts in Modern Chinese and Western Painting

Xinlu Yu

This study investigates the transcendence of traditional aesthetic concepts in modern Chinese and Western painting practices. It addresses a gap in compara-tive art studies by focusing specifically on how artists in both traditions move beyond established norms. The research employs a comparative analysis of se-lected artworks, examining the reinterpretation of traditional techniques, the influence of cross-cultural exchange, and the embrace of abstraction and con-ceptual art. It argues that modern Chinese painting transcends tradition through the integration of Western methods while retaining Chinese philosophical ele-ments. Simultaneously, modern Western painting achieves transcendence by adopting abstraction and challenging representational conventions. Despite differing cultural origins, artists in both spheres share a commitment to ex-panding artistic boundaries and questioning established norms. The study finds that this transcendence is not a complete abandonment of the past, but rather a reimagining of traditional components to reflect contemporary realities. The research contributes a nuanced understanding of the evolving definitions of beauty and artistic value in modern art, shaped by social, political, and techno-logical shifts. It offers insights into the shared and distinct approaches of Chi-nese and Western artists in their pursuit of artistic innovation.

en

Detail DOI Sumber

arXiv Open Access 2025

Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning

Haiyang Yu, Yuchuan Wu, Fan Shi et al.

Chinese ancient documents, invaluable carriers of millennia of Chinese history and culture, hold rich knowledge across diverse fields but face challenges in digitization and understanding, i.e., traditional methods only scan images, while current Vision-Language Models (VLMs) struggle with their visual and linguistic complexity. Existing document benchmarks focus on English printed texts or simplified Chinese, leaving a gap for evaluating VLMs on ancient Chinese documents. To address this, we present AncientDoc, the first benchmark for Chinese ancient documents, designed to assess VLMs from OCR to knowledge reasoning. AncientDoc includes five tasks (page-level OCR, vernacular translation, reasoning-based QA, knowledge-based QA, linguistic variant QA) and covers 14 document types, over 100 books, and about 3,000 pages. Based on AncientDoc, we evaluate mainstream VLMs using multiple metrics, supplemented by a human-aligned large language model for scoring.

en cs.CL

Detail Sumber

arXiv Open Access 2025

Chinese Toxic Language Mitigation via Sentiment Polarity Consistent Rewrites

Xintong Wang, Yixiao Liu, Jingheng Pan et al.

Detoxifying offensive language while preserving the speaker's original intent is a challenging yet critical goal for improving the quality of online interactions. Although large language models (LLMs) show promise in rewriting toxic content, they often default to overly polite rewrites, distorting the emotional tone and communicative intent. This problem is especially acute in Chinese, where toxicity often arises implicitly through emojis, homophones, or discourse context. We present ToxiRewriteCN, the first Chinese detoxification dataset explicitly designed to preserve sentiment polarity. The dataset comprises 1,556 carefully annotated triplets, each containing a toxic sentence, a sentiment-aligned non-toxic rewrite, and labeled toxic spans. It covers five real-world scenarios: standard expressions, emoji-induced and homophonic toxicity, as well as single-turn and multi-turn dialogues. We evaluate 17 LLMs, including commercial and open-source models with variant architectures, across four dimensions: detoxification accuracy, fluency, content preservation, and sentiment polarity. Results show that while commercial and MoE models perform best overall, all models struggle to balance safety with emotional fidelity in more subtle or context-heavy settings such as emoji, homophone, and dialogue-based inputs. We release ToxiRewriteCN to support future research on controllable, sentiment-aware detoxification for Chinese.

en cs.CL

Detail Sumber

arXiv Open Access 2025

Language Models at the Syntax-Semantics Interface: A Case Study of the Long-Distance Binding of Chinese Reflexive ziji

Xiulin Yang

This paper explores whether language models can effectively resolve the complex binding patterns of the Mandarin Chinese reflexive ziji, which are constrained by both syntactic and semantic factors. We construct a dataset of 240 synthetic sentences using templates and examples from syntactic literature, along with 320 natural sentences from the BCC corpus. Evaluating 21 language models against this dataset and comparing their performance to judgments from native Mandarin speakers, we find that none of the models consistently replicates human-like judgments. The results indicate that existing language models tend to rely heavily on sequential cues, though not always favoring the closest strings, and often overlooking subtle semantic and syntactic constraints. They tend to be more sensitive to noun-related than verb-related semantics.

en cs.CL

Detail Sumber

arXiv Open Access 2025

ViBidirectionMT-Eval: Machine Translation for Vietnamese-Chinese and Vietnamese-Lao language pair

Hong-Viet Tran, Minh-Quy Nguyen, Van-Vinh Nguyen

This paper presents an results of the VLSP 2022-2023 Machine Translation Shared Tasks, focusing on Vietnamese-Chinese and Vietnamese-Lao machine translation. The tasks were organized as part of the 9th, 10th annual workshop on Vietnamese Language and Speech Processing (VLSP 2022, VLSP 2023). The objective of the shared task was to build machine translation systems, specifically targeting Vietnamese-Chinese and Vietnamese-Lao translation (corresponding to 4 translation directions). The submission were evaluated on 1,000 pairs for testing (news and general domains) using established metrics like BLEU [11] and SacreBLEU [12]. Additionally, system outputs also were evaluated with human judgment provided by experts in Chinese and Lao languages. These human assessments played a crucial role in ranking the performance of the machine translation models, ensuring a more comprehensive evaluation.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2025

ElliottAgents: A Natural Language-Driven Multi-Agent System for Stock Market Analysis and Prediction

Jarosław A. Chudziak, Michał Wawer

This paper presents ElliottAgents, a multi-agent system leveraging natural language processing (NLP) and large language models (LLMs) to analyze complex stock market data. The system combines AI-driven analysis with the Elliott Wave Principle to generate human-comprehensible predictions and explanations. A key feature is the natural language dialogue between agents, enabling collaborative analysis refinement. The LLM-enhanced architecture facilitates advanced language understanding, reasoning, and autonomous decision-making. Experiments demonstrate the system's effectiveness in pattern recognition and generating natural language descriptions of market trends. ElliottAgents contributes to NLP applications in specialized domains, showcasing how AI-driven dialogue systems can enhance collaborative analysis in data-intensive fields. This research bridges the gap between complex financial data and human understanding, addressing the need for interpretable and adaptive prediction systems in finance.

en cs.CE

Detail Sumber

arXiv Open Access 2025

Safer in Translation? Presupposition Robustness in Indic Languages

Aadi Palnitkar, Arjun Suresh, Rishi Rajesh et al.

Increasingly, more and more people are turning to large language models (LLMs) for healthcare advice and consultation, making it important to gauge the efficacy and accuracy of the responses of LLMs to such queries. While there are pre-existing medical benchmarks literature which seeks to accomplish this very task, these benchmarks are almost universally in English, which has led to a notable gap in existing literature pertaining to multilingual LLM evaluation. Within this work, we seek to aid in addressing this gap with Cancer-Myth-Indic, an Indic language benchmark built by translating a 500-item subset of Cancer-Myth, sampled evenly across its original categories, into five under-served but widely used languages from the subcontinent (500 per language; 2,500 translated items total). Native-speaker translators followed a style guide for preserving implicit presuppositions in translation; items feature false presuppositions relating to cancer. We evaluate several popular LLMs under this presupposition stress.

en cs.CL

Detail Sumber

S2 Open Access 2022

The Role of Technology-Based Education and Teacher Professional Development in English as a Foreign Language Classes

Weihong Zhang

The swift development of technology has had a considerable effect on teaching, especially in foreign language classes, and the rising procedure of using creative technology to help teachers’ instruction and learning indicates the growing domination of technology in academic environments. In addition, teacher professional development significantly affects enhancing the teaching quality, especially the quality of educational activities within the class. Nevertheless, the shortage of workshops on professional development education made educators reliant on informal education where they worked and learned collectively with classmates in mini-groups to enhance their technology usage. The functions of technology-based instruction in the process of learning have not been taken into account in the professional development programs in the Chinese context so far, and consequently, this review takes a look at this issue. In a nutshell, this review of literature has suggestions for academics, theoreticians, and experts in search of inspecting the roles of technology in teacher professional development programs.

91 sitasi en Medicine

Detail DOI Sumber

DOAJ Open Access 2024

Fusing Algorithms for Intersection of Computer Science and Art: Innovations in Generative Art and Interactive Digital Installations

Jingpeng Xie, Miaomiao Yu, Guangliang Liu

This article investigates the integration of Variational Autoencoders (VAEs) and Particle Swarm Optimization (PSO) in the realm of generative art and interactive digital installations. The study focuses on how these advanced algorithms can enhance artistic expression and interactivity, providing novel approaches for generating and optimizing art. Key innovations include the application of VAEs to create diverse and complex art forms, coupled with PSO to fine-tune these generative processes. The research demonstrates that VAEs significantly improve the aesthetic quality and variety of generated artworks, achieving an average aesthetic score of 8.3 out of 10. Integrating PSO further optimizes these results, enhancing the quality of outputs with a final score of 9.0. The study also reveals that this combination improves user engagement and satisfaction, with interactive installations utilizing VAE + PSO achieving a satisfaction score of 9.0, compared to 7.0 for traditional methods. The findings highlight the transformative impact of these algorithms on art generation, showing that while computational resources and time are higher, the artistic and interactive benefits are substantial. This research underscores the potential of combining deep learning and optimization techniques to push the boundaries of digital creativity and offers new perspectives for artists and designers. The article concludes that the synergy of VAEs and PSO represents a significant advancement in generative art and interactive installations, opening new avenues for future exploration and development in the field.

Electrical engineering. Electronics. Nuclear engineering

Detail DOI Sumber

DOAJ Open Access 2024

The application of Juliane House’s translation quality assessment model in the English translation of Yu Hua’s ‘Yanre de xiatian’

Huan He, Mansour Amini, Malini Ganapathy et al.

The quality of some Chinese literary translations has raised concerns among some translation scholars. This qualitative study presents an examination of Juliane House’s latest translation quality assessment model in evaluating the quality of ‘Sweltering Summer’, the English translation of ‘Yanre de xiatian’, a short Chinese story. House’s model reveals that the translation quality is dependent on the matching degree of the textual profile and function between the original and translation. The article concludes that despite minor discrepancies along the dimensions of field, tenor, and mode leading to minor disagreements of ideational and interpersonal functional components between the source text and the target text, the original and its overt translation are largely equivalent at the level of language/text, register, as well as genre, and the overall quality of the translation remains adequate by providing mismatch and match examples and interpretations in support of the conclusion. This study confirms the feasibility and reliability of House’s model, offering insights for translation scholars interested in assessing translation quality in the realm of short literary works and providing constructive suggestions for translators of Chinese literature. Future research could explore sufficient syntactic and textual examples using quantitative data and the mismatches between the original and the translated texts.

Fine Arts, Arts in general

Detail DOI Sumber

DOAJ Open Access 2024

Efficacy and safety of Yishen Huashi granules combined with conventional therapy in the treatment of diabetic kidney disease: A systematic review and meta-analysis

Bo Dai, Yanxu Chen, Chaoqun Song et al.

Ethnopharmacological relevance: Diabetic kidney disease(DKD) is a complication of diabetes. If not treated in time, it will lead to severe glomerular damage, causing irreversible damage, and ultimately may lead to uremia and even death. Yishen Huashi Granule (YSHS) is a Chinese patent medicine for treating DKD by invigorating the spleen and removing dampness, which has shown a good curative effect. Aim of the study: This study systematically evaluated the clinical efficacy and blood biochemical improvement of YSHS combined with conventional therapy (CT) in treating DKD. Materials and methods: By August 2024, four English databases (PubMed, Web of Science, the Cochrane Library, and Embase) and four Chinese databases (China National Knowledge Infrastructure (CNKI), Wanfang Database (WF), China Biological Medicine Database (CBM), and China Science and Technology Journal Database (VIP)) were searched to screen literature, extract information, and evaluate quality according to inclusion and exclusion criteria. The language of these literature is limited to Chinese and English, but not limited to published sources. Meta-analysis and bias analysis were performed using Steata 16 software and Review Manager 5.3 software. The bias risk tool in the Cochrane Handbook was used to assess the quality of the literature. At 95 % confidence interval (CI), relative risk (RR) and Cohen's d were used for the categorical and continuous variables, respectively. To evaluate heterogeneity, the Q test and I2 statistics were employed within a random-effects model framework. Results: A total of 28 randomized controlled trials (RCTs) comprising 2416 patients were included in this study. There were 1200 patients in the control group and 1216 in the treatment group. Compared with CT, combined YSHS therapy is more effective at improving clinical efficiency rate [RR(95 % confidence interval (CI)) = 1.26(1.21, 1.32), I2 = 15.83 %], renal function (urinary albumin excretion rate [SMD(95%CI) = -1.72(-2.27, −1.17), I2 = 95.5 %], 24-h quantitative urine protein level [SMD(95%CI) = -1.50(-2.51, −0.50), I2 = 95.52 %], blood urea nitrogen [SMD(95%CI) = -1.13(-1.46, −0.81), I2 = 89.81 %], SCr [SMD(95%CI) = -2.34(-3.36, −1.32), I2 = 97.90 %], eGFR [SMD(95%CI) = 0.50(0.18,0.82), I2 = 51.51 %]), glucose metabolism levels (FBG [SMD(95%CI) = -0.50(-0.87, −0.14), I2 = 89.67 %], 2hPG [SMD(95%CI) = -0.83(-1.39, −0.26), I2 = 87.11 %], HbA1c [SMD(95%CI) = -0.79(-1.74, −0.15), I2 = 94.6 %]), lipid metabolism levels (TC [SMD(95%CI) = -1.53(-2.32, −0.75), I2 = 95.49 %], TG [SMD(95%CI) = -0.96(-1.08, −0.77), I2 = 86.05 %], LDL-C [SMD(95%CI) = -1.25(-1.81, −0.69), I2 = 90.56 %], HDL-C [SMD(95%CI) = 0.71(0.43, 0.98), I2 = 55.92 %]), oxidative stress indicators (SOD [SMD(95%CI) = 6.00(2.77, 9.24, I2 = 99.01 %], MDA [SMD(95%CI) = -2.81(-3.58, −2.05), I2 = 89.64 %]), ALB [SMD(95%CI) = 1.01(0.63, −1.39), I2 = 68.89 %], vWF [SMD(95%CI) = -0.84(-1.08, −0.60), I2 = 0 %], ET-1 [SMD(95%CI) = -0.89(-1.13, −0.65), I2 = 0 %], and MAP [SMD(95%CI) = −1.76(-3.26, −0.25), I2 = 94.05 %]. The incidence of adverse reactions in YSHS combination therapy was not high [SMD(95%CI) = 0.99(0.97, 1.02), I2 = 0.03 %]. Conclusion: The meta-analysis revealed that YSHS combined with CT therapy is superior to CT in improving clinical outcomes, renal function, glucose metabolism, lipid metabolism, and oxidative stress. Yishen Huashi granules is more effective and safer for Diabetic kidney disease treatment.

Science (General), Social sciences (General)

Detail DOI Sumber

arXiv Open Access 2024

Overview of the First Workshop on Language Models for Low-Resource Languages (LoResLM 2025)

Hansi Hettiarachchi, Tharindu Ranasinghe, Paul Rayson et al.

The first Workshop on Language Models for Low-Resource Languages (LoResLM 2025) was held in conjunction with the 31st International Conference on Computational Linguistics (COLING 2025) in Abu Dhabi, United Arab Emirates. This workshop mainly aimed to provide a forum for researchers to share and discuss their ongoing work on language models (LMs) focusing on low-resource languages, following the recent advancements in neural language models and their linguistic biases towards high-resource languages. LoResLM 2025 attracted notable interest from the natural language processing (NLP) community, resulting in 35 accepted papers from 52 submissions. These contributions cover a broad range of low-resource languages from eight language families and 13 diverse research areas, paving the way for future possibilities and promoting linguistic inclusivity in NLP.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2024

LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models

Haitao Li, You Chen, Qingyao Ai et al.

Large language models (LLMs) have made significant progress in natural language processing tasks and demonstrate considerable potential in the legal domain. However, legal applications demand high standards of accuracy, reliability, and fairness. Applying existing LLMs to legal systems without careful evaluation of their potential and limitations could pose significant risks in legal practice. To this end, we introduce a standardized comprehensive Chinese legal benchmark LexEval. This benchmark is notable in the following three aspects: (1) Ability Modeling: We propose a new taxonomy of legal cognitive abilities to organize different tasks. (2) Scale: To our knowledge, LexEval is currently the largest Chinese legal evaluation dataset, comprising 23 tasks and 14,150 questions. (3) Data: we utilize formatted existing datasets, exam datasets and newly annotated datasets by legal experts to comprehensively evaluate the various capabilities of LLMs. LexEval not only focuses on the ability of LLMs to apply fundamental legal knowledge but also dedicates efforts to examining the ethical issues involved in their application. We evaluated 38 open-source and commercial LLMs and obtained some interesting findings. The experiments and findings offer valuable insights into the challenges and potential solutions for developing Chinese legal systems and LLM evaluation pipelines. The LexEval dataset and leaderboard are publicly available at \url{https://github.com/CSHaitao/LexEval} and will be continuously updated.

en cs.CL

Detail Sumber

arXiv Open Access 2024

FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models

Wei Li, Ren Ma, Jiang Wu et al.

In the burgeoning field of large language models (LLMs), the assessment of fundamental knowledge remains a critical challenge, particularly for models tailored to Chinese language and culture. This paper introduces FoundaBench, a pioneering benchmark designed to rigorously evaluate the fundamental knowledge capabilities of Chinese LLMs. FoundaBench encompasses a diverse array of 3354 multiple-choice questions across common sense and K-12 educational subjects, meticulously curated to reflect the breadth and depth of everyday and academic knowledge. We present an extensive evaluation of 12 state-of-the-art LLMs using FoundaBench, employing both traditional assessment methods and our CircularEval protocol to mitigate potential biases in model responses. Our results highlight the superior performance of models pre-trained on Chinese corpora, and reveal a significant disparity between models' reasoning and memory recall capabilities. The insights gleaned from FoundaBench evaluations set a new standard for understanding the fundamental knowledge of LLMs, providing a robust framework for future advancements in the field.

en cs.CL, cs.AI

Detail Sumber

Hasil untuk "Chinese language and literature"