Hasil "Chinese language and literature"

S2 Open Access 2025

Retrieval augmented generation for large language models in healthcare: A systematic review

L. M. Amugongo, Pietro Mascheroni, Steve Brooks et al.

Large Language Models (LLMs) have demonstrated promising capabilities to solve complex tasks in critical sectors such as healthcare. However, LLMs are limited by their training data which is often outdated, the tendency to generate inaccurate (“hallucinated”) content and a lack of transparency in the content they generate. To address these limitations, retrieval augmented generation (RAG) grounds the responses of LLMs by exposing them to external knowledge sources. However, in the healthcare domain there is currently a lack of systematic understanding of which datasets, RAG methodologies and evaluation frameworks are available. This review aims to bridge this gap by assessing RAG-based approaches employed by LLMs in healthcare, focusing on the different steps of retrieval, augmentation and generation. Additionally, we identify the limitations, strengths and gaps in the existing literature. Our synthesis shows that 78.9% of studies used English datasets and 21.1% of the datasets are in Chinese. We find that a range of techniques are employed RAG-based LLMs in healthcare, including Naive RAG, Advanced RAG, and Modular RAG. Surprisingly, proprietary models such as GPT-3.5/4 are the most used for RAG applications in healthcare. We find that there is a lack of standardised evaluation frameworks for RAG-based applications. In addition, the majority of the studies do not assess or address ethical considerations related to RAG in healthcare. It is important to account for ethical challenges that are inherent when AI systems are implemented in the clinical setting. Lastly, we highlight the need for further research and development to ensure responsible and effective adoption of RAG in the medical domain.

106 sitasi en Medicine

Detail DOI Sumber

CrossRef Open Access 2025

Analisis Kesalahan Penggunaan Kata “bù”, “méi” dan “méiyou” serta Saran Pengajaran bagi Mahasiswa Indonesia

Ivana Permatasari

Fǒudìng fùcí atau kata keterangan negatif adalah kata yang sering digunakan dalam bahasa Mandarin. Tetapi pelajar Indonesia sering mengalami kesalahan dalam mempelajari fǒudìng fùcí. Oleh karena itu, penelitian ini menggunakan korpus global antarbahasa Tionghoa untuk menganalisis kesalahan pelajar Indonesia dalam penggunaan fǒudìng fùcí “bù”, “méi” dan “méiyǒu”. Melalui analisis data, ditemukan bahwa kesalahan penggunaan kata “bù”, “méi” dan “méiyǒu” dapat dibagi menjadi penggunaan kata yang terbalik, kesalahan penambahan, dan penggunaan kata yang tidak lengkap, namun frekuensi penggunaan kata yang terbalik adalah yang tertinggi. Oleh karena itu, penelitian ini menganalisis kesalahan penggunaan kata yang terbalik dari penggunaan fǒudìng fùcí “bù”, “méi” dan “méiyǒu”. Alasan kesalahan pelajar Indonesia dibagi menjadi gangguan atau/ transfer negatif dari bahasa ibu, kurangnya pemahaman tata bahasa terhadap bahasa kedua, kurang efektifnya metode pengajaran, dan situasi pelajar sendiri. Selain itu, penelitian ini juga membahas saran pengajaran fǒudìng fùcí kepada pelajar Indonesia, termasuk saran mengenai isi pengajaran, metode pengajaran, dan saran pembelajaran untuk pelajar. Saran-saran pengajaran ini diharapkan dapat membantu mengembangkan bahan ajar, memperbaiki strategi pengajaran, dan meningkatan efisiensi pengajaran.

en

Detail DOI Sumber

DOAJ Open Access 2025

The Evaluation of Undergraduate English Language Major Curriculum Interdisciplinarity in China

Yanping Sun, Yang Liu, Jie Hu

This study investigates the current state of interdisciplinarity within undergraduate English language major curricula in Chinese public universities, 7 years after the implementation of the National Quality Criteria for Undergraduate Teaching (Foreign Language Majors) ( NQC ). Grounded in integrationist theories of interdisciplinarity (Repko, Newell), this research conceptualizes interdisciplinarity as the structured integration of five academic disciplines: English language linguistics, English language literature, translation, comparative literature and transcultural studies, and international and regional studies. Drawing on responses from 1,210 students across eight universities, this study develops five empirically grounded and psychometrically validated competence scales, which are then employed in a series of multiple regression analyses to assess interrelationships among disciplinary domains. The findings indicate a generally high level of student-reported academic competence and significant positive associations among most disciplines—particularly in literature studies—suggesting substantive interdisciplinary integration. Moreover, notable disjunctions remain, especially within the subdimensions of translation practice, phonetics, and historical cognition. These patterns underscore both the achievements and the ongoing challenges of recent curricular reforms, and point to specific areas where deeper interdisciplinary alignment may be pursued. By introducing a set of robust disciplinary competence measures and demonstrating their utility in mapping curricular integration, this study offers a methodological framework that may inform future research on interdisciplinarity in diverse educational settings.

History of scholarship and learning. The humanities, Social Sciences

Detail DOI Sumber

DOAJ Open Access 2025

共同希望语言学院 2019 届学生对汉语网络语言汉字类与符号类的理解分析

Chelsi Fidelis, Sabinus Iden

The way a community uses the internet influences language, causing changes in how words are used online. Although online expressions may originate from Chinese, the usage of Chinese internet language differs greatly from that of everyday Chinese. Therefore, learners of Chinese must understand internet language in order to communicate effectively. This study aims to understand the level of comprehension of Chinese internet language among the 2019 cohort of students at the Joint Hope Language Institute. After summarizing the categories of Chinese internet language, data were collected through distributed questionnaires and test items. Based on the survey conducted with the 2019 cohort of students at the Joint Hope Language Institute, the results indicate that most students still have a relatively low level of understanding of the logographic and symbolic categories of Chinese internet language, particularly in the categories of old words with new meanings and homophones. In addition, students in the institute receive limited instruction on Chinese internet language, focusing primarily on traditional Chinese and linguistics.

Chinese language and literature

Detail DOI Sumber

DOAJ Open Access 2025

Effects of acupuncture on pregnancy outcomes in infertile women with polycystic ovarian syndrome: a protocol for systematic review and meta-analysis

Xiaoyan Wang, Huanfang Xu, Yigong Fang et al.

Introduction Polycystic ovarian syndrome (PCOS) is a common endocrine disorder that affects reproductive-age women, impairing their ability to conceive and sustain fertility. The efficacy of conventional therapies varies among individuals and is often accompanied by multiple side effects. Acupuncture has shown potential in fertility management for PCOS patients. However, the current evidence on its impact on pregnancy outcomes remains inconclusive. This study aims to synthesise the latest evidence regarding the efficacy and safety of acupuncture in infertile women with PCOS.Methods and analysis A comprehensive literature review will be conducted by searching for randomised controlled trials of acupuncture for infertile women with PCOS in English and Chinese language databases, including the Cochrane Database of Clinical Trials, PubMed, EMBASE, CNKI, Wanfang, VIP and SinoMed. The primary outcome will be live birth rat (LBR) and clinical pregnancy rate (CRP). The secondary outcomes will include multiple pregnancy rate (MPR), ongoing pregnancy rate (OPR), miscarriage rate (MR), ovulation rate (OR), hyperstimulation ovarian syndrome (OHSS) and adverse events. The risk of bias will be assessed using the Cochrane Risk of Bias 2.0 tool. Meta-analysis will be conducted to synthesise the evidence for each outcome, if possible. The heterogeneity will be statistically assessed using a χ2 test and I2 statistic. Subgroup analyses, sensitivity analyses and publication bias will be performed if the available data are sufficient. Evidence strength will be graded using the Grading of Recommendations Assessment, Development and Evaluation system. This protocol is developed following the guidelines of PRISMA-P 2015.Ethics and dissemination Ethics approval is not required for this review. Our findings will be published in a peer-reviewed journal.Prospero registration number CRD42024601226.

Medicine

Detail DOI Sumber

arXiv Open Access 2025

Small Language Models Reshape Higher Education: Courses, Textbooks, and Teaching

Jian Zhang, Jia Shao

While large language models (LLMs) have introduced novel paradigms in science and education, their adoption in higher education is constrained by inherent limitations. These include a tendency to produce inaccuracies and high computational requirements, which compromise the strict demands for accurate and reliable knowledge essential in higher education. Small language models (MiniLMs), by contrast, offer distinct advantages in professional education due to their lightweight nature and precise retrieval capabilities. This research takes "Atmospheric Physics" as an example. We established a specialized corpus and image repository by gathering over 550,000 full-text PDFs from over 130 international well-respected journals in Earth and environmental science. From this collection, we extracted over 100 million high-quality sentence-level corpus and more than 3 million high-resolution academic images. Using MiniLMs, these resources were organized into a high-dimensional vector library for precise retrieval and efficient utilization of extensive educational content. Consequently, we systematically redesigned the courses, textbooks, and teaching strategies for "Atmospheric Physics" based on MiniLMs. The course is designed as a "interdisciplinary-frontier" system, breaking down traditional boundaries between atmospheric science, space science, hydrology, and remote sensing. Teaching materials are transformed from static, lagging text formats into a dynamic digital resource library powered by MiniLM. For teaching methods, we have designed a question-based learning pathway. This paradigm promotes a shift from passive knowledge transfer to active cognitive development. Consequently, this MiniLM-driven "Atmospheric Physics" course demonstrates a specific avenue for "AI for education".

en physics.ed-ph, cs.CL

Detail Sumber

arXiv Open Access 2025

Logits-Constrained Framework with RoBERTa for Ancient Chinese NER

Wenjie Hua, Shenghan Xu

This paper presents a Logits-Constrained (LC) framework for Ancient Chinese Named Entity Recognition (NER), evaluated on the EvaHan 2025 benchmark. Our two-stage model integrates GujiRoBERTa for contextual encoding and a differentiable decoding mechanism to enforce valid BMES label transitions. Experiments demonstrate that LC improves performance over traditional CRF and BiLSTM-based approaches, especially in high-label or large-data settings. We also propose a model selection criterion balancing label complexity and dataset size, providing practical guidance for real-world Ancient Chinese NLP tasks.

en cs.CL

Detail Sumber

arXiv Open Access 2025

Redefining technology for indigenous languages

Silvia Fernandez-Sabido, Laura Peniche-Sabido

In this paper, we offer an overview of indigenous languages, identifying the causes of their devaluation and the need for legislation on language rights. We review the technologies used to revitalize these languages, finding that when they come from outside, they often have the opposite effect to what they seek; however, when developed from within communities, they become powerful instruments of expression. We propose that the inclusion of Indigenous knowledge in large language models (LLMs) will enrich the technological landscape, but must be done in a participatory environment that encourages the exchange of knowledge.

en cs.CY, cs.AI

Detail Sumber

S2 Open Access 2024

Research on flipped classrooms in foreign language teaching in Chinese higher education

Wen Kong, Di Li, Quanjiang Guo

This review examines 233 articles published in Chinese academic journals between 2011 and 2021, documenting the state of research concerning flipped classrooms (FCs) in foreign language teaching within the context of higher education in China. Employing the methodological approach of a scoping review, the investigation is underpinned by the five-stage framework articulated by Arksey and O’Malley. The results reveal a notable surge in FC-related studies between 2013 and 2017, with a subsequent decline in scholarly attention. The majority of the reviewed studies on FCs focused on English instruction at the college level, with a conspicuous dearth of inquiry into the application of FCs in the teaching of other foreign languages. All studies were categorized as either empirical or non-empirical, and the most frequently used instruments for data collection were surveys and interviews; case studies were underrepresented in the literature. Early studies focused on the introduction of the new model, while more recent investigations focused on the impact of its implementation. The findings of the in-depth content analysis unearthed a prevailing trend of high learner satisfaction with the FC model, along with favorable direct and indirect educational outcomes. Noteworthy factors influencing the efficacy of FCs included learners’ foreign language proficiency and their self-regulation or self-discipline abilities. The paper concludes with a discussion of the challenges in FC implementation and a call for future research on this promising pedagogy.

10 sitasi en

Detail DOI Sumber

S2 Open Access 2024

Instruct Large Language Models to Generate Scientific Literature Survey Step by Step

Yuxuan Lai, Yupeng Wu, Yidan Wang et al.

Abstract. Automatically generating scientific literature surveys is a valuable task that can significantly enhance research efficiency. However, the diverse and complex nature of information within a literature survey poses substantial challenges for generative models. In this paper, we design a series of prompts to systematically leverage large language models (LLMs), enabling the creation of comprehensive literature surveys through a step-by-step approach. Specifically, we design prompts to guide LLMs to sequentially generate the title, abstract, hierarchical headings, and the main content of the literature survey. We argue that this design enables the generation of the headings from a high-level perspective. During the content generation process, this design effectively harnesses relevant information while minimizing costs by restricting the length of both input and output content in LLM queries. Our implementation with Qwen-long achieved third place in the NLPCC 2024 Scientific Literature Survey Generation evaluation task, with an overall score only 0.03% lower than the second-place team. Additionally, our soft heading recall is 95.84%, the second best among the submissions. Thanks to the efficient prompt design and the low cost of the Qwen-long API, our method reduces the expense for generating each literature survey to 0.1 RMB, enhancing the practical value of our method.

9 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2023

Eye movements of second language learners when reading spaced and unspaced Chinese texts

D. Shen, S. Liversedge, Jin Tian et al.

Unlike English, Chinese does not have interword spacing in written texts, which poses difficulties for Chinese-as-a-second-language (CSL) learners’ identification of word boundaries and affects their reading comprehension and vocabulary acquisition. The eye-movement literature has suggested that interword spacing is important in alphabetic languages; examining languages that lack interword spaces such as Chinese, thus, may help to inform theoretical accounts of eye-movement control and word identification during reading. Research investigating the interword spacing effect in reading Chinese showed that adding spacing facilitated CSL learners’ reading comprehension and speed as well as vocabulary learning. However, the bulk of this research mainly looked at the learning outcomes (off-line measures), with few studies focusing on L2 learners’ reading processes. Building on this background, this study seeks to provide a descriptive perspective of the eye movements of CSL learners. In this study, 24 CSL learners with intermediate Chinese proficiency were recruited as the experimental group, and 20 Chinese native speakers were recruited as the control group. The EyeLink 1,000 eye tracker was used to record their reading of four segmentation conditions of Chinese texts, namely, no space condition, word-spaced condition, non-word-spaced condition, and pinyin-spaced condition. Results show that: (1) CSL learners with intermediate Chinese proficiency generally spent less time reading Chinese texts with spaces between words, and they showed more gazes and regressions when reading texts without spaces; (2) Non-word-spaced texts and Pinyin-spaced texts interfere with CSL learners’ reading process; and (3) Intermediate CSL learners show consistent eye movement patterns in the normal no-space condition and word-spaced condition. I conclude that word boundary information can effectively guide CSL learners’ eye movement behaviors and eye saccade planning, thus improving reading efficiency.

40 sitasi en Medicine

Detail DOI Sumber

S2 Open Access 2024

Friend or Foe? A Mixed-Methods Study on the Impact of Digital Device Use on Chinese–Canadian Children’s Heritage Language Learning

Guofang Li, Ziwen Mei, Fubiao Zhen

Abstract Digital devices have been increasingly integrated into language learning environments, particularly since the COVID-19 pandemic. Existing literature, focusing predominantly on dominant languages like English, presents mixed findings on the effectiveness of digital resources for language learning. Few studies address heritage languages, which often have limited resources beyond the home and may depend more on digital tools for support. This longitudinal, mixed-methods study investigated the impact of digital device use on heritage language learning among Chinese–Canadian families. We examined the relationship between digital device use and Chinese receptive vocabulary among 128 first graders, 137 second graders, and 66 third graders over three years. Additionally, we conducted parental interviews with 42 focal families for three years to explore the evolving patterns of digital resource use at home. Our findings revealed a statistically significant positive impact of digital device use on Chinese receptive vocabulary development among first and second graders, while no significant effects were observed in third graders. The analyses of parental interviews uncovered increased digital use, diversity of resources, positive parental attitudes, and digital literacy among families from grades 1 to 2 but decreased digital use and parental enthusiasm in the third grade due to health and addiction concerns, reinforcing the quantitative results. Conducted during the COVID-19 pandemic, this study offers a unique perspective on how families’ digital device use for heritage languages changed before, during, and after the pandemic. The findings offer valuable insights for families and educators to better support heritage language learners with digital resources.

4 sitasi en

Detail DOI Sumber

DOAJ Open Access 2024

A Systematic Review of Big Data Driven Education Evaluation

Lin Lin, Danhua Zhou, Jingying Wang et al.

The rapid development of artificial intelligence has driven the transformation of educational evaluation into big data-driven. This study used a systematic literature review method to analyzed 44 empirical research articles on the evaluation of big data education. Firstly, it has shown an increasing trend year by year, and is mainly published in thematic journals such as educational technology, science education, and language teaching. Chinese and American researchers have made the greatest contributions in this field. Secondly, the algorithmic models for big data education evaluation research are diverse, the text modality is the most popular, the evaluation subjects are mainly college students, with fewer primary and secondary school students, and science is the discipline that most commonly applies big data education evaluation. The evaluation objectives of big data education evaluation mainly focus on five aspects: high-order thinking analysis, learning performance prediction, learning emotion recognition, teaching management decision-making, and evaluation mode optimization, and the text modality is widely used for data collection in high-order thinking analysis; regardless of the evaluation objectives, higher education students are the most widely evaluated objects; the science discipline is the main field of using big data technology to empower teaching evaluation. Thirdly, the current research topics of big data education evaluation mainly focus on online learning behavior and environmental participation evaluation, process assessment of learning motivation and emotional analysis, development and optimization of subject domain big data models, cognitive diagnosis and high-order thinking skills evaluation, and design of learning analysis frameworks based on data mining.

History of scholarship and learning. The humanities, Social Sciences

Detail DOI Sumber

DOAJ Open Access 2024

Historical earthquake records in the Weihe Basin, central China and new insights for geothermal genesis

Bing Zhou, Yancheng Zhang, Jian Kuang et al.

The Weihe Basin, located in central China, stands out for its significant earthquake activity while concurrently harboring promising geothermal reservoirs. The potential association between these two geological occurrences and the underlying mechanisms remain enigmatic. Here, we compile a catalog of historic earthquakes, total strain data, data related to crustal mantle structure, surface heat flow data, and heat production data of the rocks in the Weihe Basin. Our aim is to unveil the intricate interplay among the occurrence of earthquakes, tectonic activity, and the genesis of geothermal resources. Our findings reveal that earthquake activity in the Weihe Basin is regulated by the responses of faults or fractures intricately influenced by regional tectonics. These tectonic processes are responsible for the formation of favorable geothermal resources beneath the basin. We propose there is a weak zone beneath the basin, which is controlled by a combination of tectonic processes and the flow of the asthenosphere. We finally establish a comprehensive model to visualize the genesis of the occurrence of earthquakes and the formation of geothermal resources. These results have important guiding significance for future research endeavors in the realms of both geothermal exploration and earthquake investigations within the Weihe Basin.

Science

Detail DOI Sumber

DOAJ Open Access 2024

Cultural friction during intercultural service encounters with Chinese tourists: perspectives from hotel employees in Australia

Wen Hao Liang

This study aims to investigate the experiences and outcomes of intercultural service encounters between hotel employees and customers, including the underlying factors attributed to these outcomes. A qualitative research approach using the critical incident technique was adopted by conducting 20 semi-structured interviews with hotel employees who frequently engaged in intercultural service encounters with Chinese tourists. The findings revealed that critical incidents were mainly attributed to cultural differences in language, customs, and preferences. These cultural differences with Chinese guests can lead to outcomes such as service failures, which can be a stressor for hotel employees in Australia and trigger emotions such as frustration and intimidation. The study found that non-cultural factors such as the characteristics of customers and service employees can impact the outcome of an intercultural service encounter. This study contributes to intercultural service encounters literature by offering a new perspective from service providers’ viewpoint on their intercultural interactions. It is imperative for hotels to comprehend and work through these cultural differences to succeed in the global hospitality market. Moreover, this study offers important practical implications for hotels regarding how to best facilitate intercultural service encounters to ensure positive outcomes for both customers and service employees.

Social Sciences

Detail DOI Sumber

DOAJ Open Access 2024

The role of the target language culture on Arabic learners' fondness for Arabic poetry

Li Gao, Kai Wang, Qian Yang et al.

As an important carrier of culture, poetry plays a significant role in deepening language learners' understanding of the target language culture as well as enhancing their language skills; however, the effect of the target language culture on language learners' enjoyment of poetry remains unclear. The study served as an attempt to shed light on the point of whether the target language culture has different effects on high- and low-level Chinese Arabic learners' fondness for Arabic poetry with the use of pictures related to Arabic culture and those not related to Arabic culture. In the current study, 40 Arabic learners (20 high-level and 20 low-level) scored the Arabic poem line based on their fondness for it after viewing two kinds of picture with electroencephalogram (EEG) recording. Frontal alpha asymmetry index as a correlate of approach and avoidance related motivation measured by EEG power in the alpha band (8-13 Hz) was calculated for examining whether the behavioral results of Arabic learners' fondness for poetry are in line with the results of changes in the related EEG components. Behavioral results illustrated that low-level subjects showed significantly less liking for Arabic poetry after viewing pictures related to Arabic culture compared to those not related to Arabic culture. The high-level subjects did not show a significant difference in the level of liking for Arabic poetry between the two cases. FAA results demonstrated that low-level subjects presented a significant avoidance-related responses to Arabic poetry after viewing pictures related to Arabic culture in comparison to viewing pictures not related to Arabic culture; while the FAA values did not differ significantly between the two cases in high-level subjects, which is in line with behavioral results. The findings of this research can benefit teachers in motivating students to learn poetry in foreign language curriculum and also contribute to the literature on the effect of target language culture on language learners' enjoyment of poetry.

Psychology

Detail DOI Sumber

arXiv Open Access 2024

Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral

Yiming Cui, Xin Yao

Mixtral, a representative sparse mixture of experts (SMoE) language model, has received significant attention due to its unique model design and superior performance. Based on Mixtral-8x7B-v0.1, in this paper, we propose Chinese-Mixtral and Chinese-Mixtral-Instruct with improved Chinese language abilities by adopting further pre-training and instruction fine-tuning. Experimental results show that our Chinese-Mixtral and Chinese-Mixtral-Instruct successfully improve Chinese understanding and generation performance while retaining the original English abilities. Then, we discuss several key questions when performing language adaptation on large language models, including the necessity of extending the language-specific vocabulary and the choice of the initialization model (foundation model v.s. instruction model), by providing empirical results and analysis. We also present the visualizations of each expert to examine their importance on downstream tasks. Our resources are publicly available through \url{https://github.com/ymcui/Chinese-Mixtral}.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2024

Machine Translation Evaluation Benchmark for Wu Chinese: Workflow and Analysis

Hongjian Yu, Yiming Shi, Zherui Zhou et al.

We introduce a FLORES+ dataset as an evaluation benchmark for modern Wu Chinese machine translation models and showcase its compatibility with existing Wu data. Wu Chinese is mutually unintelligible with other Sinitic languages such as Mandarin and Yue (Cantonese), but uses a set of Hanzi (Chinese characters) that profoundly overlaps with others. The population of Wu speakers is the second largest among languages in China, but the language has been suffering from significant drop in usage especially among the younger generations. We identify Wu Chinese as a textually low-resource language and address challenges for its machine translation models. Our contributions include: (1) an open-source, manually translated dataset, (2) full documentations on the process of dataset creation and validation experiments, (3) preliminary tools for Wu Chinese normalization and segmentation, and (4) benefits and limitations of our dataset, as well as implications to other low-resource languages.

en cs.CL

Detail Sumber

arXiv Open Access 2024

Chinese Offensive Language Detection:Current Status and Future Directions

Yunze Xiao, Houda Bouamor, Wajdi Zaghouani

Despite the considerable efforts being made to monitor and regulate user-generated content on social media platforms, the pervasiveness of offensive language, such as hate speech or cyberbullying, in the digital space remains a significant challenge. Given the importance of maintaining a civilized and respectful online environment, there is an urgent and growing need for automatic systems capable of detecting offensive speech in real time. However, developing effective systems for processing languages such as Chinese presents a significant challenge, owing to the language's complex and nuanced nature, which makes it difficult to process automatically. This paper provides a comprehensive overview of offensive language detection in Chinese, examining current benchmarks and approaches and highlighting specific models and tools for addressing the unique challenges of detecting offensive language in this complex language. The primary objective of this survey is to explore the existing techniques and identify potential avenues for further research that can address the cultural and linguistic complexities of Chinese.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2024

CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation

Yuxuan Wang, Yijun Liu, Fei Yu et al.

Despite the rapid development of Chinese vision-language models (VLMs), most existing Chinese vision-language (VL) datasets are constructed on Western-centric images from existing English VL datasets. The cultural bias in the images makes these datasets unsuitable for evaluating VLMs in Chinese culture. To remedy this issue, we present a new Chinese Vision- Language Understanding Evaluation (CVLUE) benchmark dataset, where the selection of object categories and images is entirely driven by Chinese native speakers, ensuring that the source images are representative of Chinese culture. The benchmark contains four distinct VL tasks ranging from image-text retrieval to visual question answering, visual grounding and visual dialogue. We present a detailed statistical analysis of CVLUE and provide a baseline performance analysis with several open-source multilingual VLMs on CVLUE and its English counterparts to reveal their performance gap between English and Chinese. Our in-depth category-level analysis reveals a lack of Chinese cultural knowledge in existing VLMs. We also find that fine-tuning on Chinese culture-related VL datasets effectively enhances VLMs' understanding of Chinese culture.

en cs.CV, cs.CL

Detail Sumber

Hasil untuk "Chinese language and literature"