Hasil untuk "Chinese language and literature"

Menampilkan 20 dari ~3659826 hasil · dari CrossRef, DOAJ, arXiv, Semantic Scholar

JSON API
CrossRef Open Access 2025
Cross-Language Translation Evaluation: A Comparative Analysis of the Quality of English-Chinese Interpretation by AI Models and Strategies for Improvement

Tong Su

Machine translation, as one of the important functions of artificial intelligence, plays an increasingly important role in cross-linguistic communication. The paper is meant to evaluate the performance of domestic up-and-coming AI mod-els in English-Chinese mutual translation tasks in order to analyze their func-tions and effects comparatively and explore strategies to improve the quality of AI translation in the process of both 2 languages. Based on the literature review in the first section, we put forward the hypothesis that there are certain defi-ciencies in AI translation tasks, and that different types of errors will be pro-duced depending on the target language. We will use both quantitative and qual-itative analysis methods, combined with example analysis and the application of the BLEU model, to evaluate the AI translation output in terms of several di-mensions. We also point out the limitations of the study and discuss the main problems of AI translation and propose targeted enhancement strategies in the conclusion part.

DOAJ Open Access 2025
Illness Originates from "Emotions": Attribution and Emotional Management among Young Women Diagnosed with Nodules

Su MA, Tiantian WEI, Xinmiao CHEN

The increasing detection of nodules in young women has sparked widespread concern, with emotional factors often discussed as potential contributors. Young women have traced the etiology from both internal and external perspectives, employing a governance chain of "rebuilding 'knowledge'-reshaping 'self '-reconsidering 'gender'" to "contain" and "manage" their emotions. These nodule-related emotional management practices are a form of governance of the body and the social relationships it connects. This research also demonstrates women's agency in emotional management and suggests that the loosening of gender structures has weakened women's constraints. Ultimately, this study calls for closer public attention to youth health issues.

Medical philosophy. Medical ethics
DOAJ Open Access 2025
INTERACTIVE LEARNING STRATEGIES FOR CHINESE LANGUAGE AT SMP KRISTEN SHINING STAR SRAGEN

Rudiansyah Rudiansyah, Ulfah Yanuar Lianisyah, Tati Sugiarti et al.

This research reviews 'Interactive Learning Strategies of Chinese language at SMP Kristen Shining Star Sragen. The aim is to identify and analyze the Chinese interactive learning strategies implemented by the teachers in this school. Chinese language learning at every level of education is becoming increasingly important, considering the development and needs of globalization. An interactive approach to Chinese language learning is expected to improve students' motivation, comprehension and language skills. This research uses descriptive qualitative method, with data utilization through observation, interview, documentation, and Focus Group Discussion (FGD), as well as using books, journals, and relevant literature sources. The results show that the interactive learning strategies applied include using drills and role-playing techniques. This strategy is believed to create a fun learning atmosphere while activating students' participation. In addition, the factors that influence the effectiveness of this strategy include teacher readiness, supporting facilities and infrastructure, and student involvement in the teaching-learning process.

Education (General)
DOAJ Open Access 2025
Effects of aromatherapy on sleep quality in patients: Protocol for an umbrella review.

Hongrui Shi, Mengqi Liu, Xinxin Fan et al.

<h4>Introduction</h4>Poor sleep quality affects disease recovery and overall quality of life. Aromatherapy has been shown to improve sleep quality by alleviating anxiety, promoting relaxation, and enhancing blood circulation. Several meta-analyses have reported the effects of aromatherapy on sleep quality, however, the efficacy varies widely, with some studies showing that the effects are negligible or inconsistent. A comprehensive review synthesizing the available evidence on the impact of aromatherapy on sleep quality is still needed. Therefore, this study aims to provide an umbrella review summarizing all available systematic reviews and meta-analyses investigating the effects of aromatherapy on sleep quality.<h4>Methods and analysis</h4>A thorough literature search will be conducted across six English-language databases and four Chinese-language databases, following the Joanna Briggs Institute (JBI) methodology for umbrella reviews. Two reviewers will independently screen all articles to determine their eligibility based on pre-established inclusion criteria. Additionally, a manual search will be performed by reviewing the reference lists of the included studies. Data extraction will be conducted from all eligible studies, and the quality will be assessed using the Assessment of Multiple Systematic Reviews-2 (AMSTAR 2) tool. The study selection process will be presented using a PRISMA flow diagram. Statistical analyses will be performed using appropriate statistical methods to summarize and describe the findings.<h4>Results</h4>This study aims to provide comprehensive evidence by systematically evaluating the effects of aromatherapy on sleep quality in patients. It seeks to uncover the impact of aromatherapy on sleep quality across different patient populations and explore variations in its effectiveness based on disease type and the form of aromatherapy used. We will conduct a systematic evaluation of all eligible systematic reviews and meta-analyses. Subgroup analysis will also be performed, if applicable, to divide patients into groups according to how aromatherapy attributes (e.g., population type, dosage, route of administration) and disease types affect the effect size.<h4>Prospero registration number</h4>CRD42024580250.

Medicine, Science
arXiv Open Access 2025
PL-Guard: Benchmarking Language Model Safety for Polish

Aleksandra Krasnodębska, Karolina Seweryn, Szymon Łukasik et al.

Despite increasing efforts to ensure the safety of large language models (LLMs), most existing safety assessments and moderation tools remain heavily biased toward English and other high-resource languages, leaving majority of global languages underexamined. To address this gap, we introduce a manually annotated benchmark dataset for language model safety classification in Polish. We also create adversarially perturbed variants of these samples designed to challenge model robustness. We conduct a series of experiments to evaluate LLM-based and classifier-based models of varying sizes and architectures. Specifically, we fine-tune three models: Llama-Guard-3-8B, a HerBERT-based classifier (a Polish BERT derivative), and PLLuM, a Polish-adapted Llama-8B model. We train these models using different combinations of annotated data and evaluate their performance, comparing it against publicly available guard models. Results demonstrate that the HerBERT-based classifier achieves the highest overall performance, particularly under adversarial conditions.

en cs.CL
arXiv Open Access 2025
An Efficient Approach for Machine Translation on Low-resource Languages: A Case Study in Vietnamese-Chinese

Tran Ngoc Son, Nguyen Anh Tu, Nguyen Minh Tri

Despite the rise of recent neural networks in machine translation, those networks do not work well if the training data is insufficient. In this paper, we proposed an approach for machine translation in low-resource languages such as Vietnamese-Chinese. Our proposed method leveraged the power of the multilingual pre-trained language model (mBART) and both Vietnamese and Chinese monolingual corpus. Firstly, we built an early bird machine translation model using the bilingual training dataset. Secondly, we used TF-IDF technique to select sentences from the monolingual corpus which are the most related to domains of the parallel dataset. Finally, the first model was used to synthesize the augmented training data from the selected monolingual corpus for the translation model. Our proposed scheme showed that it outperformed 8% compared to the transformer model. The augmented dataset also pushed the model performance.

en cs.CL
arXiv Open Access 2024
TEXT2AFFORD: Probing Object Affordance Prediction abilities of Language Models solely from Text

Sayantan Adak, Daivik Agrawal, Animesh Mukherjee et al.

We investigate the knowledge of object affordances in pre-trained language models (LMs) and pre-trained Vision-Language models (VLMs). A growing body of literature shows that PTLMs fail inconsistently and non-intuitively, demonstrating a lack of reasoning and grounding. To take a first step toward quantifying the effect of grounding (or lack thereof), we curate a novel and comprehensive dataset of object affordances -- Text2Afford, characterized by 15 affordance classes. Unlike affordance datasets collected in vision and language domains, we annotate in-the-wild sentences with objects and affordances. Experimental results reveal that PTLMs exhibit limited reasoning abilities when it comes to uncommon object affordances. We also observe that pre-trained VLMs do not necessarily capture object affordances effectively. Through few-shot fine-tuning, we demonstrate improvement in affordance knowledge in PTLMs and VLMs. Our research contributes a novel dataset for language grounding tasks, and presents insights into LM capabilities, advancing the understanding of object affordances. Codes and data are available at https://github.com/sayantan11995/Text2Afford

arXiv Open Access 2024
An Empirical Study on Large Language Models in Accuracy and Robustness under Chinese Industrial Scenarios

Zongjie Li, Wenying Qiu, Pingchuan Ma et al.

Recent years have witnessed the rapid development of large language models (LLMs) in various domains. To better serve the large number of Chinese users, many commercial vendors in China have adopted localization strategies, training and providing local LLMs specifically customized for Chinese users. Furthermore, looking ahead, one of the key future applications of LLMs will be practical deployment in industrial production by enterprises and users in those sectors. However, the accuracy and robustness of LLMs in industrial scenarios have not been well studied. In this paper, we present a comprehensive empirical study on the accuracy and robustness of LLMs in the context of the Chinese industrial production area. We manually collected 1,200 domain-specific problems from 8 different industrial sectors to evaluate LLM accuracy. Furthermore, we designed a metamorphic testing framework containing four industrial-specific stability categories with eight abilities, totaling 13,631 questions with variants to evaluate LLM robustness. In total, we evaluated 9 different LLMs developed by Chinese vendors, as well as four different LLMs developed by global vendors. Our major findings include: (1) Current LLMs exhibit low accuracy in Chinese industrial contexts, with all LLMs scoring less than 0.6. (2) The robustness scores vary across industrial sectors, and local LLMs overall perform worse than global ones. (3) LLM robustness differs significantly across abilities. Global LLMs are more robust under logical-related variants, while advanced local LLMs perform better on problems related to understanding Chinese industrial terminology. Our study results provide valuable guidance for understanding and promoting the industrial domain capabilities of LLMs from both development and industrial enterprise perspectives. The results further motivate possible research directions and tooling support.

en cs.CL, cs.AI
arXiv Open Access 2024
ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction

Victor Junqiu Wei, Weicheng Wang, Di Jiang et al.

Automatic speech Recognition (ASR) is a fundamental and important task in the field of speech and natural language processing. It is an inherent building block in many applications such as voice assistant, speech translation, etc. Despite the advancement of ASR technologies in recent years, it is still inevitable for modern ASR systems to have a substantial number of erroneous recognition due to environmental noise, ambiguity, etc. Therefore, the error correction in ASR is crucial. Motivated by this, this paper studies ASR error correction in the Chinese language, which is one of the most popular languages and enjoys a large number of users in the world. We first create a benchmark dataset named \emph{ASR-EC} that contains a wide spectrum of ASR errors generated by industry-grade ASR systems. To the best of our knowledge, it is the first Chinese ASR error correction benchmark. Then, inspired by the recent advances in \emph{large language models (LLMs)}, we investigate how to harness the power of LLMs to correct ASR errors. We apply LLMs to ASR error correction in three paradigms. The first paradigm is prompting, which is further categorized as zero-shot, few-shot, and multi-step. The second paradigm is finetuning, which finetunes LLMs with ASR error correction data. The third paradigm is multi-modal augmentation, which collectively utilizes the audio and ASR transcripts for error correction. Extensive experiments reveal that prompting is not effective for ASR error correction. Finetuning is effective only for a portion of LLMs. Multi-modal augmentation is the most effective method for error correction and achieves state-of-the-art performance.

en cs.CL, cs.SD
arXiv Open Access 2024
Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking

Ming Dong, Yujing Chen, Miao Zhang et al.

Chinese Spell Checking (CSC) is a widely used technology, which plays a vital role in speech to text (STT) and optical character recognition (OCR). Most of the existing CSC approaches relying on BERT architecture achieve excellent performance. However, limited by the scale of the foundation model, BERT-based method does not work well in few-shot scenarios, showing certain limitations in practical applications. In this paper, we explore using an in-context learning method named RS-LLM (Rich Semantic based LLMs) to introduce large language models (LLMs) as the foundation model. Besides, we study the impact of introducing various Chinese rich semantic information in our framework. We found that by introducing a small number of specific Chinese rich semantic structures, LLMs achieve better performance than the BERT-based model on few-shot CSC task. Furthermore, we conduct experiments on multiple datasets, and the experimental results verified the superiority of our proposed framework.

en cs.CL
arXiv Open Access 2024
InternLM-Law: An Open Source Chinese Legal Large Language Model

Zhiwei Fei, Songyang Zhang, Xiaoyu Shen et al.

While large language models (LLMs) have showcased impressive capabilities, they struggle with addressing legal queries due to the intricate complexities and specialized expertise required in the legal field. In this paper, we introduce InternLM-Law, a specialized LLM tailored for addressing diverse legal queries related to Chinese laws, spanning from responding to standard legal questions (e.g., legal exercises in textbooks) to analyzing complex real-world legal situations. We meticulously construct a dataset in the Chinese legal domain, encompassing over 1 million queries, and implement a data filtering and processing pipeline to ensure its diversity and quality. Our training approach involves a novel two-stage process: initially fine-tuning LLMs on both legal-specific and general-purpose content to equip the models with broad knowledge, followed by exclusive fine-tuning on high-quality legal data to enhance structured output generation. InternLM-Law achieves the highest average performance on LawBench, outperforming state-of-the-art models, including GPT-4, on 13 out of 20 subtasks. We make InternLM-Law and our dataset publicly available to facilitate future research in applying LLMs within the legal domain.

en cs.CL
DOAJ Open Access 2023
Editorial

Shin Yi Chew

In this first issue of Volume 33, we delve into the rich tapestry of linguistic diversity and academic discourse, showcasing a range of studies that shed light on various facets of language use, maintenance, and communication. The collection of articles in this issue offers valuable insights into the ever-evolving world of language. The first article in our lineup, Language Shift and Maintenance: A Case Study of the Telugu Community in Bagan Datoh, Perak (Malaysia), takes us to Bagan Datoh, Perak, where the Telugu language, despite being a minority language in Malaysia, continues to thrive in specific domains. This case study illuminates the dynamics of language choice among different generations and provides hope for the revitalization of Telugu among the younger generation. The second article, Metadiscourse Markers in Abstracts of Linguistics and Literature Research Articles from Scopus-Indexed Journals, shifts our focus to the world of academic writing, specifically the use of metadiscourse markers in abstracts. It highlights the crucial role these markers play in structuring and presenting research arguments. The comparative analysis between linguistics and literature abstracts provides valuable insights into disciplinary differences in the use of these markers. Our third article, An Exploratory Analysis of Linking Adverbials Used by Filipino, Pakistani, and Thai Writers of English, undertakes a contrastive interlanguage analysis, shedding light on how students from the Philippines, Pakistan, and Thailand use linking adverbials in their English academic writing. The importance of understanding the distinct production tendencies of various English varieties is emphasised in this article. Turning to a sensitive topic in the Malaysian context, the fourth article, Female Circumcision in Malaysia: Challenges and Lessons Learned in Using Focus Groups through an NGO-Academia Collaboration, explores female circumcision and the challenges faced in conducting research on this subject. It highlights the collaborative efforts between academia and a local NGO, offering valuable insights into data collection via focus group discussions. The fifth article, Prosodic Marking of New and Given Information in English and Mandarin by Chinese Speakers, ventures into the realm of prosody and its impact on language comprehension. Focusing on Chinese English as a Foreign Language learners, it investigates how Mandarin influences the prosodic marking of new and given information in English, shedding light on potential areas of misunderstanding. Our final article, Privacy Policy Pop-up: A Genre Analysis of Journal Websites’ HTTP Cookies, takes a dive into the world of online privacy and transparency. It analyses the communication of transparency through HTTP cookies on academic journal websites, uncovering the rhetorical strategies employed to inform users about data privacy. In this diverse collection of articles, we invite readers to explore the multifaceted world of language and academic discourse. Each study offers unique insights into the complexities of communication and the richness of linguistic diversity. We hope this issue serves as a valuable resource for scholars, researchers, and language enthusiasts alike, encouraging further exploration and understanding of these vital aspects of our academic and cultural landscape. Last but not least, I would like to express my heartfelt thanks to all the contributors, reviewers and readers of this Journal. My special thanks also go to all the  members of the Editorial Board and Advisory Board for their significant contributions. Editorial Board Prof. Dr. Stefanie Pillai  A.P. Dr. Paolo Coluzzi A.P.  Kim Keum Hyun Dr. Azlin Zaiti Zainal Dr. Charity Lee Chin Ai Dr. Ng Lee Luan Dr. Noor Aqsa Nabila binti Mat Isa Dr. Soh Siak Bie (journal manager) Dr. Thanalachime Perumal   International Advisory Board  Prof. Dr. Richard Fitzgerald (University of Macau, China) Prof. Dr. Stephen Hall (Sunway University, Malaysia) Prof. Dr. Jan Hardman (University of York, United Kingdom) Prof. Dr. Jason Miin-Hwa Lim (Universiti Malaysia Sabah, Malaysia) Prof. Dr. Dennis Tay (The Hong Kong Polytechnic University) Assoc. Prof. Dr. Shirley Dita (De La Salle University, Philippines) Assoc. Prof. Dr. Michelle M. Lazar (National University of Singapore, Singapore) Assoc. Prof. Dr. Jonathan Newton (Victoria University of Wellington, New Zealand) Dr. Mário Pinharanda-Nunes (University of Macau, China) The future issues of this Journal will be in the capable hands of Prof. Dr. Stefanie Pillai, whom I believe will be able to lead the Journal to greater heights. P.S.: An Appendix with a compilation of highlighted articles from the past five years is also included here for your reference.  

Philology. Linguistics
DOAJ Open Access 2023
Establishment of basic principles and methods of acupuncture standardization in traditional Chinese medicine

Guo Yi, L.I. Zhenji, Liu Baoyan et al.

Standardization is the universal language of the world, and standardization of traditional Chinese medicine (TCM) is essential for its communication in China and globally. However, the principles and methods of TCM acupuncture standardization have been unclear and inadequate in the early stages. Based on an investigative approach to understanding the current status, identifying problems, and finding solutions, our team has established basic principles of TCM acupuncture that embody Chinese wisdom, evaluated the international strategic environment systematically, proposed the principle of “importance of harmony and exercise of impartiality”, and established basic working principles. A series of methods for TCM acupuncture standard development and evaluation have been constructed, including general standards for the revision of TCM acupuncture standards, the first TCM acupuncture clinical research management specification, a shared full chain technology platform, a data center, and an evaluation research base for TCM acupuncture clinical research. Evaluation criteria for ancient literature and expert experience, a recommendation method for the “three main and three auxiliaries” TCM guideline for prevention were established, and quantifiable assessment methods of TCM standard applicability were proposed. These findings provide methodological guidance for TCM acupuncture standardization.

Medicine, Other systems of medicine
arXiv Open Access 2023
A two-way translation system of Chinese sign language based on computer vision

Shengzhuo Wei, Yan Lan

As the main means of communication for deaf people, sign language has a special grammatical order, so it is meaningful and valuable to develop a real-time translation system for sign language. In the research process, we added a TSM module to the lightweight neural network model for the large Chinese continuous sign language dataset . It effectively improves the network performance with high accuracy and fast recognition speed. At the same time, we improve the Bert-Base-Chinese model to divide Chinese sentences into words and mapping the natural word order to the statute sign language order, and finally use the corresponding word videos in the isolated sign language dataset to generate the sentence video, so as to achieve the function of text-to-sign language translation. In the last of our research we built a system with sign language recognition and translation functions, and conducted performance tests on the complete dataset. The sign language video recognition accuracy reached about 99.3% with a time of about 0.05 seconds, and the sign language generation video time was about 1.3 seconds. The sign language system has good performance performance and is feasible.

en cs.CV
arXiv Open Access 2023
Chinese Fine-Grained Financial Sentiment Analysis with Large Language Models

Yinyu Lan, Yanru Wu, Wang Xu et al.

Entity-level fine-grained sentiment analysis in the financial domain is a crucial subtask of sentiment analysis and currently faces numerous challenges. The primary challenge stems from the lack of high-quality and large-scale annotated corpora specifically designed for financial text sentiment analysis, which in turn limits the availability of data necessary for developing effective text processing techniques. Recent advancements in large language models (LLMs) have yielded remarkable performance in natural language processing tasks, primarily centered around language pattern matching. In this paper, we propose a novel and extensive Chinese fine-grained financial sentiment analysis dataset, FinChina SA, for enterprise early warning. We thoroughly evaluate and experiment with well-known existing open-source LLMs using our dataset. We firmly believe that our dataset will serve as a valuable resource to advance the exploration of real-world financial sentiment analysis tasks, which should be the focus of future research. The FinChina SA dataset is publicly available at https://github.com/YerayL/FinChina-SA

en cs.CL, cs.AI
CrossRef Open Access 2021
Portraying the ‘Chinese international students’: a review of English-language and Chinese-language literature on Chinese international students (2015–2020)

Cora Lingling Xu

AbstractThe Chinese international students are often portrayed in a monolithic manner in popular discourse. To offer a more comprehensive and critical representation of the Chinese international students, this paper conducts a thematic narrative review of 128 English-language and 74 Chinese-language peer-reviewed articles published between 2015 and 2020. Drawing on post-colonial theories, this review identifies four subject positions portrayed of the Chinese international students: the (1) neoliberal, (2) political, (3) pedagogic and (4) racialised subjects. This paper celebrates heartening developments in the literature which affirms Chinese international students’ epistemic contributions, legitimate pedagogic needs, notable heterogeneity and wide-ranging political, cultural and pedagogic agencies. It also highlights how aspects of these subject positions have exercised epistemic injustice on the Chinese international students. Meanwhile, it pinpoints the Chinese international students’ acquiescence in exacerbating global education inequalities. Among the first to bring the dominant English-language and ‘local’ perspectives of Chinese-language literature in dialogue, this article notes divergent focuses and indicates unique contributions to historicising research on Chinese international students made by the latter. This article challenges popular perceptions of Chinese international students, questions production of knowledge, and pinpoints future research directions.

45 sitasi en
arXiv Open Access 2022
Prompting Is Programming: A Query Language for Large Language Models

Luca Beurer-Kellner, Marc Fischer, Martin Vechev

Large language models have demonstrated outstanding performance on a wide range of tasks such as question answering and code generation. On a high level, given an input, a language model can be used to automatically complete the sequence in a statistically-likely way. Based on this, users prompt these models with language instructions or examples, to implement a variety of downstream tasks. Advanced prompting methods can even imply interaction between the language model, a user, and external tools such as calculators. However, to obtain state-of-the-art performance or adapt language models for specific tasks, complex task- and model-specific programs have to be implemented, which may still require ad-hoc interaction. Based on this, we present the novel idea of Language Model Programming (LMP). LMP generalizes language model prompting from pure text prompts to an intuitive combination of text prompting and scripting. Additionally, LMP allows constraints to be specified over the language model output. This enables easy adaption to many tasks while abstracting language model internals and providing high-level semantics. To enable LMP, we implement LMQL(short for Language Model Query Language), which leverages the constraints and control flow from an LMP prompt to generate an efficient inference procedure that minimizes the number of expensive calls to the underlying language model. We show that LMQL can capture a wide range of state-of-the-art prompting methods in an intuitive way, especially facilitating interactive flows that are challenging to implement with existing high-level APIs. Our evaluation shows that we retain or increase the accuracy on several downstream tasks, while also significantly reducing the required amount of computation or cost in the case of pay-to-use APIs (26-85% cost savings).

en cs.CL, cs.AI

Halaman 16 dari 182992