Hasil untuk "Chinese language and literature"

Menampilkan 20 dari ~3664650 hasil · dari DOAJ, Semantic Scholar, CrossRef, arXiv

JSON API
S2 Open Access 2023
Modeling the Effect of Chinese EFL Teachers’ Self-efficacy and Resilience on Their Work Engagement: A Structural Equation Modeling Analysis

Yongliang Wang, Ziwen Pan

Teaching is deemed a taxing enterprise, especially as far as higher education is concerned. As a result, much research attention has been showered upon exploring the correlates of engagement for instructors. To contribute to this research direction, the present researchers carried out this survey to test a model of work engagement in a Chinese higher education context. Teacher self-efficacy and resilience were added as predictors of the hypothesized model. A number of 372 Chinese English as a foreign language (EFL) instructors in higher education contexts were selected through convenience sampling. The measurement models of the latent variables (i.e., self-efficacy, resilience, and work engagement) were confirmed via running Confirmatory Factor Analysis (CFA). Afterward, Structural Equation Modeling (SEM) was used to test the structural model. The results obtained from the analyses showed that both teacher self-efficacy and resilience could significantly predict EFL teachers’ work engagement, with self-efficacy serving as a stronger predictor than resilience. The outcomes suggest some implications for EFL stakeholders such as teachers, teacher trainers, policy makers, and officials. Plain Language Summary This new article bridges the identified gaps in the literature, and the current research intends to examine the predictability of EFL teachers work engagement level in terms of their self-efficacy beliefs and resilience tendencies. Based on our observation, there is a paucity in the literature examining the causal relationship between teacher self-efficacy and teacher work engagement in HE, and to our knowledge, this is the first study to consider teacher self-efficacy as a potential predictive variable of work engagement of EFL teachers in HE.

103 sitasi en
arXiv Open Access 2026
NeuCLIRTech: Chinese Monolingual and Cross-Language Information Retrieval Evaluation in a Challenging Domain

Dawn Lawrie, James Mayfield, Eugene Yang et al.

Measuring advances in retrieval requires test collections with relevance judgments that can faithfully distinguish systems. This paper presents NeuCLIRTech, an evaluation collection for cross-language retrieval over technical information. The collection consists of technical documents written natively in Chinese and those same documents machine translated into English. It includes 110 queries with relevance judgments. The collection supports two retrieval scenarios: monolingual retrieval in Chinese, and cross-language retrieval with English as the query language. NeuCLIRTech combines the TREC NeuCLIR track topics of 2023 and 2024. The 110 queries with 35,962 document judgments provide strong statistical discriminatory power when trying to distinguish retrieval approaches. A fusion baseline of strong neural retrieval systems is included so that developers of reranking algorithms are not reliant on BM25 as their first stage retriever. The dataset and artifacts are released on Huggingface Datasets

en cs.IR
S2 Open Access 2024
Application of Large Language Models in Cybersecurity: A Systematic Literature Review

I. Hasanov, Seppo Virtanen, Antti Hakkala et al.

The emergence of Large Language Models (LLMs) is currently creating a major paradigm shift in societies and businesses in the way digital technologies are used. While the disruptive effect is especially observable in the information and communication technology field, there is a clear lack of systematic studies focusing on the application and impact of LLMs in cybersecurity holistically. This article presents an exhaustive systematic literature review of 177 articles published in 2018-2024 on the application of LLMs and the use of Artificial Intelligence (AI) as a defensive measure in cybersecurity. This article contributes an analytical compendium of the recent research on the application of LLMs in offensive and defensive cybersecurity as well as in research on cyberethics, current legal frameworks, and research regarding the use of LLMs for cybersecurity governance. It also contributes a statistical summary of global research trends in the field. Of the reviewed literature, 68% was published in 2023. Nearly 30% of the articles originate from the USA and 11% from China, with other countries currently having significantly lower contributions to recent research. Most attention in recent research has been given to AI as a defensive measure, accounting for 27% of the reviewed literature. It was observed that LLMs have proven highly effective in phishing attack simulations and in managing cybersecurity administrative aspects, including defending against advanced exploits. Furthermore, LLMs show significant potential in the development of security software, further cementing their role as a powerful tool in cybersecurity innovation.

50 sitasi en Computer Science
S2 Open Access 2025
A Systematic Literature Review of ESL/EFL Learning Strategies and Learner Motivation

Marzia Shurovi, Mohamad Fadhili Yahaya, H. Hajimia et al.

While language learning strategies of English as a second language or English as a foreign language learners were reportedly linked to learners’ motivational beliefs by many theorists and researchers, systematic reviews of how language learning strategies were studied in association with learner motivation in the previous decades were scarce. Therefore, this review paper analysed research trends in language learning strategies in relation to learner motivation from 1960 to 2023. Employing a Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) Protocol, this paper selected empirical research papers studying both language learning strategies and motivation from the high-ranking journals from Web of Science and Scopus databases. This review paper employed descriptive, frequency and thematic analyses to trace the diversity of participants, and practical teaching-learning factors that were addressed in 36 empirical studies. In addition, this review summarized the research methods, highlighting key themes that emerged in the research papers. From 18 countries, most of the studies were done in the context of Iran and China. Apart from the majority of tertiary learners as participants, the quantitative method was predominant in the research. In addition to highlighting a few innovative studies that contributed to the literature and practical EFL/ESL teaching and learning, this review paper advocated more research on culturally diverse populations, technology-integrated ESL/EFL teaching and learning to enhance learner motivation and learning strategy use, and experimenting with effective intervention techniques to contribute to ESL/EFL teaching and learning in the future.

DOAJ Open Access 2025
Cross-language dissemination of Chinese classical literature using multimodal deep learning and artificial intelligence

Yulan Bai, Songhua Lei

Abstract Against the backdrop of rapid advancements in artificial intelligence (AI), multimodal deep learning (DL) technologies offer new possibilities for cross-language translation. This work proposes a multimodal DL-based translation model, the Transformer-Multimodal Neural Machine Translation (TMNMT), to promote the cross-language dissemination and comprehension of Chinese classical literature. The proposed model innovatively integrates visual features generated by conditional diffusion models and leverages knowledge distillation techniques to achieve efficient transfer learning, fully exploiting the latent information in multilingual corpora. The work designs a gated neural unit-based multimodal feature fusion mechanism and a decoder-based visual feature attention module to enhance translation performance, thus dynamically combining textual and visual information. Experimental results demonstrate that TMNMT significantly outperforms baseline models in multimodal and text-only translation tasks. It achieves a BLEU score of 39.2 on the Chinese literature dataset, a minimum improvement of 1.55% over other models, and a METEOR score of 64.8, with a minimum improvement of 8.14%. Moreover, incorporating the decoder’s visual module notably boosts performance, with BLEU and METEOR scores on the En-Ge Test2017 task improving by 2.55% and 2.33%, respectively. This work provides technical support for the multilingual dissemination of Chinese classical literature and broadens the application prospects of AI in cultural domains.

Medicine, Science
DOAJ Open Access 2025
Challenges for EFL Teachers in Designing Communication Activities: A Chinese Perspective

Min Gu

This study examined the challenges faced by 19 Chinese senior high school English as a Foreign Language (EFL) teachers in designing communication activities. By closely examining the design features of the teaching activities employed in their classrooms, it was revealed that the majority of the observed activities lacked authenticity, a crucial aspect emphasized in the extant literature on communication practices. This significant finding underscores the substantial hurdle faced by the participating teachers in creating authentic communication activities. The analysis of the semi-structured interviews suggests that the participants’ comprehension of the concept of “meaning” and its significance in language education might be contributory to the absence of authenticity in their communication activities. Furthermore, while the activities designed for moral education appeared genuinely authentic, they fell short in fostering collaboration among students, potentially leading to less negotiation of meaning. The analysis presented profound insights that can greatly enhance training in communicative language teaching.

History of scholarship and learning. The humanities, Social Sciences
DOAJ Open Access 2025
Construct comparability and the limits of post hoc modeling: insights from International Baccalaureate multi-language assessments

Louise Badham, Louise Badham, Michelle Meadows et al.

Construct comparability was investigated across different subjects in the International Baccalaureate (IB) Diploma Programme (DP). A Rasch Partial Credit Model (PCM) was applied to historical assessment data to generate statistical measures of the relative “difficulty” of IB subjects and languages. Specifically, analysis centered on different language versions of literature assessments, where exams differ in content, but are designed to assess the same target constructs. Rasch analyses were conducted sequentially in three subsets of data. Three different conceptualizations of the linking construct were compared, with the aim of narrowing the definition to increase the validity of the comparisons. These ranged from different DP subjects being linked by “general academic ability,” to linking English, Spanish and Chinese language versions of literature with the more relevant construct of “literary analysis.” Ultimately, the Rasch analyses produced three different rank orders of “difficulty” for the assessments, illustrating the limitations of post hoc construct comparability investigations. Whilst literary analysis is the most theoretically defensible linking construct in this context, the approach relies on bilingual students taking different language versions of the assessments and therefore has limited operational applicability. There are also conceptual limitations, as bilingual examinees are not representative of all students in DP cohorts. Further research is recommended into how cohort characteristics can impact performance, as well as how constructs are defined for use across linguistic and cultural subgroups. Such investigations are crucial to avoid construct bias being introduced in the earliest stages of assessment design.

Education (General)
CrossRef Open Access 2025
Memory Reconstruction and Cultural Identity: A Study on the Formation Mechanism of Cultural Nostalgia among Descendants of Chinese Immigrants in the Philippines

Chunyang Lin

As a special group in a cross-cultural context, the cultural nostalgia of descend-ants of Chinese immigrants in the Philippines is not a simple replication of their ancestral culture, but a dynamic cultural construction achieved through memory reconstruction in a discrete context. Based on theories of ethnic iden-tity, cultural adaptation, and collective memory, this study uses in-depth inter-views and case studies to conduct an empirical study of 60 descendants of Chi-nese immigrants in Manila and Cebu, Philippines, revealing the formation mechanism of their cultural nostalgia. The study finds that: the intergeneration-al transmission of discrete memories constitutes the historical genes of nostal-gia; the ritualistic reconstruction of cultural practices builds the expressive car-riers of nostalgia; identity games in cross-cultural interactions generate the emotional dynamics of nostalgia; and the memory activation of digital media expands the presentation dimensions of nostalgia. At the core of this mecha-nism, descendants of immigrants construct a cultural identity within the cultur-al field of their adopted country that combines the characteristics of their an-cestral home with local adaptation through memory selection, reorganization, and meaning-assignment, providing a new perspective for understanding the cultural continuity of overseas Chinese communities.

arXiv Open Access 2025
Parsing Through Boundaries in Chinese Word Segmentation

Yige Chen, Zelong Li, Cindy Zhang et al.

Chinese word segmentation is a foundational task in natural language processing (NLP), with far-reaching effects on syntactic analysis. Unlike alphabetic languages like English, Chinese lacks explicit word boundaries, making segmentation both necessary and inherently ambiguous. This study highlights the intricate relationship between word segmentation and syntactic parsing, providing a clearer understanding of how different segmentation strategies shape dependency structures in Chinese. Focusing on the Chinese GSD treebank, we analyze multiple word boundary schemes, each reflecting distinct linguistic and computational assumptions, and examine how they influence the resulting syntactic structures. To support detailed comparison, we introduce an interactive web-based visualization tool that displays parsing outcomes across segmentation methods.

en cs.CL
arXiv Open Access 2025
CPCLDETECTOR: Knowledge Enhancement and Alignment Selection for Chinese Patronizing and Condescending Language Detection

Jiaxun Yang, Yifei Han, Long Zhang et al.

Chinese Patronizing and Condescending Language (CPCL) is an implicitly discriminatory toxic speech targeting vulnerable groups on Chinese video platforms. The existing dataset lacks user comments, which are a direct reflection of video content. This undermines the model's understanding of video content and results in the failure to detect some CPLC videos. To make up for this loss, this research reconstructs a new dataset PCLMMPLUS that includes 103k comment entries and expands the dataset size. We also propose the CPCLDetector model with alignment selection and knowledge-enhanced comment content modules. Extensive experiments show the proposed CPCLDetector outperforms the SOTA on PCLMM and achieves higher performance on PCLMMPLUS . CPLC videos are detected more accurately, supporting content governance and protecting vulnerable groups. Code and dataset are available at https://github.com/jiaxunyang256/PCLD.

en cs.MM, cs.AI
arXiv Open Access 2025
Steering Language Models in Multi-Token Generation: A Case Study on Tense and Aspect

Alina Klerings, Jannik Brinkmann, Daniel Ruffinelli et al.

Large language models (LLMs) are able to generate grammatically well-formed text, but how do they encode their syntactic knowledge internally? While prior work has focused largely on binary grammatical contrasts, in this work, we study the representation and control of two multidimensional hierarchical grammar phenomena - verb tense and aspect - and for each, identify distinct, orthogonal directions in residual space using linear discriminant analysis. Next, we demonstrate causal control over both grammatical features through concept steering across three generation tasks. Then, we use these identified features in a case study to investigate factors influencing effective steering in multi-token generation. We find that steering strength, location, and duration are crucial parameters for reducing undesirable side effects such as topic shift and degeneration. Our findings suggest that models encode tense and aspect in structurally organized, human-like ways, but effective control of such features during generation is sensitive to multiple factors and requires manual tuning or automated optimization.

en cs.CL
arXiv Open Access 2025
CEC-Zero: Chinese Error Correction Solution Based on LLM

Sophie Zhang, Zhiming Lin

Recent advancements in large language models (LLMs) demonstrate exceptional Chinese text processing capabilities, particularly in Chinese Spelling Correction (CSC). While LLMs outperform traditional BERT-based models in accuracy and robustness, challenges persist in reliability and generalization. This paper proposes CEC-Zero, a novel reinforcement learning (RL) framework enabling LLMs to self-correct through autonomous error strategy learning without external supervision. By integrating RL with LLMs' generative power, the method eliminates dependency on annotated data or auxiliary models. Experiments reveal RL-enhanced LLMs achieve industry-viable accuracy and superior cross-domain generalization, offering a scalable solution for reliability optimization in Chinese NLP applications. This breakthrough facilitates LLM deployment in practical Chinese text correction scenarios while establishing a new paradigm for self-improving language models.

en cs.CL, cs.AI
S2 Open Access 2021
In the digital age: a systematic literature review of the e-health literacy and influencing factors among Chinese older adults

Yuxin Shi, Denghui Ma, Jun Zhang et al.

Aim This study aimed to explore the current status of e-health literacy among Chinese older adults, and to summarize and analyze the related influencing factors. Subject and methods Following the PRISMA Checklist, we searched MEDLINE, CINAHL Complete (EBSCO), PubMed, Embase, Cochrane Library, China National Knowledge Infrastructure, WanFang Data, and China Science and Technology Journal Database to identify the relevant literature published between January 2000 and December 2020. The Mixed Methods Assessment Tool (MMAT) was used to appraise the quality of the studies. Results Five articles were included for the systematic review. The results showed that the e-health literacy of Chinese older adults was low. Based on the social-ecological model, the influencing factors at the individual level included age, gender, educational attainment, socioeconomic status, physical and psychological conditions, frequency of internet use, and credibility perception of online health resources; at the interpersonal level, the influencing factors included marital status, being the family carer and being taught how to use internet to find health resources; at the social/community level, influencing factors included language barriers and cultural barriers. Conclusion Current e-health literacy among Chinese older adults is low, which is affected by a number of factors. Medical staff should provide detailed health information with guaranteed accuracy and reliability for elderly people. It is necessary to develop intervention programs tailored to varied educational needs of the elderly with different backgrounds (i.e., age, gender, educational attainment, and socioeconomic status) need to be developed in the near future. Family members are encouraged to teach older adults how to use e-health resource in appropriate ways.

122 sitasi en Psychology, Medicine
DOAJ Open Access 2024
Development and application of Chinese medical ontology for diabetes mellitus

Jie Hu, Zixian Huang, Xuewen Ge et al.

Abstract Objective To develop a Chinese Diabetes Mellitus Ontology (CDMO) and explore methods for constructing high-quality Chinese biomedical ontologies. Materials and methods We used various data sources, including Chinese clinical practice guidelines, expert consensus, literature, and hospital information system database schema, to build the CDMO. We combined top-down and bottom-up strategies and integrated text mining and cross-lingual ontology mapping. The ontology was validated by clinical experts and ontology development tools, and its application was validated through clinical decision support and Chinese natural language medical question answering. Results The current CDMO consists of 3,752 classes, 182 fine-grained object properties with hierarchical relationships, 108 annotation properties, and over 12,000 mappings to other well-known medical ontologies in English. Based on the CDMO and clinical practice guidelines, we developed 200 rules for diabetes diagnosis, treatment, diet, and medication recommendations using the Semantic Web Rule Language. By injecting ontology knowledge, CDMO enhances the performance of the T5 model on a real-world Chinese medical question answering dataset related to diabetes. Conclusion CDMO has fine-grained semantic relationships and extensive annotation information, providing a foundation for medical artificial intelligence applications in Chinese contexts, including the construction of medical knowledge graphs, clinical decision support systems, and automated medical question answering. Furthermore, the development process incorporated natural language processing and cross-lingual ontology mapping to improve the quality of the ontology and improved development efficiency. This workflow offers a methodological reference for the efficient development of other high-quality Chinese as well as non-English medical ontologies.

Computer applications to medicine. Medical informatics
S2 Open Access 2022
Growth of non‐English‐language literature on biodiversity conservation

S. Chowdhury, K. Gonzalez, M. Aytekin et al.

English is widely recognized as the language of science, and English‐language publications (ELPs) are rapidly increasing. It is often assumed that the number of non‐ELPs is decreasing. This assumption contributes to the underuse of non‐ELPs in conservation science, practice, and policy, especially at the international level. However, the number of conservation articles published in different languages is poorly documented. Using local and international search systems, we searched for scientific articles on biodiversity conservation published from 1980 to 2018 in English and 15 non‐English languages. We compared the growth rate in publications across languages. In 12 of the 15 non‐English languages, published conservation articles significantly increased every year over the past 39 years, at a rate similar to English‐language articles. The other three languages showed contrasting results, depending on the search system. Since the 1990s, conservation science articles in most languages increased exponentially. The variation in the number of non‐English‐language articles identified among the search systems differed markedly (e.g., for simplified Chinese, 11,148 articles returned with local search system and 803 with Scopus). Google Scholar and local literature search systems returned the most articles for 11 and 4 non‐English languages, respectively. However, the proportion of peer‐reviewed conservation articles published in non‐English languages was highest in Scopus, followed by Web of Science and local search systems, and lowest in Google Scholar. About 20% of the sampled non‐English‐language articles provided no title or abstract in English; thus, in theory, they were undiscoverable with English keywords. Possible reasons for this include language barriers and the need to disseminate research in countries where English is not widely spoken. Given the known biases in statistical methods and study characteristics between English‐ and non‐English‐language studies, non‐English‐language articles will continue to play an important role in improving the understanding of biodiversity and its conservation.

52 sitasi en Medicine
S2 Open Access 2023
Bibliometric analysis of Asian ‘language and linguistics’ research: A case of 13 countries

Danielle H. Lee

The foci of voluminous bibliometric studies on ‘language and linguistics’ research are limited to specific sub-topics with little regional context. Given the paucity of relevant literature, we are relatively uninformed about the regional trends of ‘language and linguistics’ research. This paper aims to analyze research developments in the field of ‘language and linguistics’ in 13 Asian countries: China, Hong Kong, India, Indonesia, Iran, Israel, Japan, Malaysia, Saudi Arabia, Singapore, South Korea, Taiwan, and Turkey. This study probed 30,515 articles published between 2000 and 2021, assessing each within four major bibliometric perspectives: (1) productivity, (2) authorship and collaborations, (3) top keywords, and (4) research impact. The results show that, in Asian ‘language and linguistics’ research, the relative contributions made by the 13 countries comprised 85% of the total number of articles produced in Asia. The other 28 Asian countries’ output, for the past two decades, never surpassed that of the individual 13 countries. Among the 13 countries, the most prolific were China, Japan, Hong Kong, and Taiwan; they especially published most articles in international core journals. In contrast, Indonesia, Iran, and Malaysia published more in regional journals. Traditionally, research on each country’s national language(s) and dialects were chiefly conducted throughout a period of 22 years. In addition, coping with internationalization worldwide, from 2010 onward, topics related to ‘English’ were of burgeoning interest among Asian researchers. Asian countries often collaborated with each other, and they also exerted a high degree of research influence on each other. The present study was designed to contribute to the literature on the comprehensive bibliometric analyses of Asian ‘language and linguistics’ research.

14 sitasi en
arXiv Open Access 2023
Effective Proxy for Human Labeling: Ensemble Disagreement Scores in Large Language Models for Industrial NLP

Wei Du, Laksh Advani, Yashmeet Gambhir et al.

Large language models (LLMs) have demonstrated significant capability to generalize across a large number of NLP tasks. For industry applications, it is imperative to assess the performance of the LLM on unlabeled production data from time to time to validate for a real-world setting. Human labeling to assess model error requires considerable expense and time delay. Here we demonstrate that ensemble disagreement scores work well as a proxy for human labeling for language models in zero-shot, few-shot, and fine-tuned settings, per our evaluation on keyphrase extraction (KPE) task. We measure fidelity of the results by comparing to true error measured from human labeled ground truth. We contrast with the alternative of using another LLM as a source of machine labels, or silver labels. Results across various languages and domains show disagreement scores provide a better estimation of model performance with mean average error (MAE) as low as 0.4% and on average 13.8% better than using silver labels.

en cs.CL
arXiv Open Access 2023
The Uncertainty-based Retrieval Framework for Ancient Chinese CWS and POS

Pengyu Wang, Zhichen Ren

Automatic analysis for modern Chinese has greatly improved the accuracy of text mining in related fields, but the study of ancient Chinese is still relatively rare. Ancient text division and lexical annotation are important parts of classical literature comprehension, and previous studies have tried to construct auxiliary dictionary and other fused knowledge to improve the performance. In this paper, we propose a framework for ancient Chinese Word Segmentation and Part-of-Speech Tagging that makes a twofold effort: on the one hand, we try to capture the wordhood semantics; on the other hand, we re-predict the uncertain samples of baseline model by introducing external knowledge. The performance of our architecture outperforms pre-trained BERT with CRF and existing tools such as Jiayan.

en cs.CL
arXiv Open Access 2023
A New Dataset and Empirical Study for Sentence Simplification in Chinese

Shiping Yang, Renliang Sun, Xiaojun Wan

Sentence Simplification is a valuable technique that can benefit language learners and children a lot. However, current research focuses more on English sentence simplification. The development of Chinese sentence simplification is relatively slow due to the lack of data. To alleviate this limitation, this paper introduces CSS, a new dataset for assessing sentence simplification in Chinese. We collect manual simplifications from human annotators and perform data analysis to show the difference between English and Chinese sentence simplifications. Furthermore, we test several unsupervised and zero/few-shot learning methods on CSS and analyze the automatic evaluation and human evaluation results. In the end, we explore whether Large Language Models can serve as high-quality Chinese sentence simplification systems by evaluating them on CSS.

en cs.CL
arXiv Open Access 2023
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences

Yuanhe Tian, Ruyi Gan, Yan Song et al.

Recently, the increasing demand for superior medical services has highlighted the discrepancies in the medical infrastructure. With big data, especially texts, forming the foundation of medical services, there is an exigent need for effective natural language processing (NLP) solutions tailored to the healthcare domain. Conventional approaches leveraging pre-trained models present promising results in this domain and current large language models (LLMs) offer advanced foundation for medical text processing. However, most medical LLMs are trained only with supervised fine-tuning (SFT), even though it efficiently empowers LLMs to understand and respond to medical instructions but is ineffective in learning domain knowledge and aligning with human preference. In this work, we propose ChiMed-GPT, a new benchmark LLM designed explicitly for Chinese medical domain, and undergoes a comprehensive training regime with pre-training, SFT, and RLHF. Evaluations on tasks including information extraction, question answering, and dialogue generation demonstrate ChiMed-GPT's superior performance over general domain LLMs. Furthermore, we analyze possible biases through prompting ChiMed-GPT to perform attitude scales regarding discrimination of patients, so as to contribute to further responsible development of LLMs in the medical domain. The code and model are released at https://github.com/synlp/ChiMed-GPT.

en cs.CL

Halaman 37 dari 183233