Voices from the Margins: A Critical Analysis of Peranakan Chinese Literary Contributions in Doenia Baroe (1930)
Risa Junita Sari
This study critically examines the literary contributions of Peranakan Chinese communities in colonial Sumatra through the lens of Doenia Baroe, a pioneering Malay-language magazine published in Padang from January to November 1930. Despite the significant role of Peranakan Chinese in Indonesia's literary and journalistic development, their contributions have remained marginalized in mainstream literary historiography. Utilizing a historical research methodology that encompasses source heuristics, source criticism, interpretation, and historiography, this article analyzes how Doenia Baroe functioned as a cultural and intellectual platform for Peranakan Chinese during the Dutch colonial period. The magazine's liberal and cosmopolitan orientation, its strategic positioning within the colonial media landscape, and its role in fostering a new literary genre that blended Chinese, Malay, and Western influences are highlighted. Findings reveal that Doenia Baroe successfully navigated the restrictive colonial press regulations while providing a space for Peranakan Chinese to articulate their identity, critique social realities, and contribute to the broader literary and intellectual discourse of the era. This study fills a significant gap in the historiography of Indonesian literature by demonstrating how Peranakan Chinese literary production was not merely a marginal phenomenon but an integral part of the colonial intellectual landscape, challenging the narrow canon that has historically excluded their contributions from mainstream literary narratives.
Philology. Linguistics, History (General) and history of Europe
Algorithmic Drive, Aesthetic Transformation and the Production Logic of Chinese Online Literature within the context of the Platform Economy: The Case of Qidian
Lu Wang, Xuan Wang
The rapid expansion of the internet has made online literature a key component of China’s cultural sector. With the increasing influence of platform algorithms, literary production has evolved beyond a purely individual creative endeavour, becoming increasingly shaped by reader preferences and commercial imperatives. While much existing scholarship has focused on the exploitation of writers within the platform economy, this paper shifts attention to how algorithms contribute to the emergence of new aesthetic values and forms of creativity. The study investigates the dynamic relationship between platform economies, algorithmic writing, and aesthetic transformation in Chinese online literature, with a specific focus on Qidian, one of the leading platforms in this domain. Employing a mixed-methods approach, this research combines a literature review, case studies, surveys, and interviews with both writers and readers to provide a comprehensive analysis of the evolving literary landscape. The paper critically evaluates how the production logic of online literature platforms encourages new creative and aesthetic practices, providing insights into the future development of online literature.
Philology. Linguistics, Chinese language and literature
AutoSign: Direct Pose-to-Text Translation for Continuous Sign Language Recognition
Samuel Ebimobowei Johnny, Blessed Guda, Andrew Blayama Stephen
et al.
Continuously recognizing sign gestures and converting them to glosses plays a key role in bridging the gap between the hearing and hearing-impaired communities. This involves recognizing and interpreting the hands, face, and body gestures of the signer, which pose a challenge as it involves a combination of all these features. Continuous Sign Language Recognition (CSLR) methods rely on multi-stage pipelines that first extract visual features, then align variable-length sequences with target glosses using CTC or HMM-based approaches. However, these alignment-based methods suffer from error propagation across stages, overfitting, and struggle with vocabulary scalability due to the intermediate gloss representation bottleneck. To address these limitations, we propose AutoSign, an autoregressive decoder-only transformer that directly translates pose sequences to natural language text, bypassing traditional alignment mechanisms entirely. The use of this decoder-only approach allows the model to directly map between the features and the glosses without the need for CTC loss while also directly learning the textual dependencies in the glosses. Our approach incorporates a temporal compression module using 1D CNNs to efficiently process pose sequences, followed by AraGPT2, a pre-trained Arabic decoder, to generate text (glosses). Through comprehensive ablation studies, we demonstrate that hand and body gestures provide the most discriminative features for signer-independent CSLR. By eliminating the multi-stage pipeline, AutoSign achieves substantial improvements on the Isharah-1000 dataset, achieving an improvement of up to 6.1\% in WER score compared to the best existing method.
Adding Alignment Control to Language Models
Wenhong Zhu, Weinan Zhang, Rui Wang
Post-training alignment has increasingly become a crucial factor in enhancing the usability of language models (LMs). However, the strength of alignment varies depending on individual preferences. This paper proposes a method to incorporate alignment control into a single model, referred to as CLM. This approach adds one identity layer preceding the initial layers and performs preference learning only on this layer to map unaligned input token embeddings into the aligned space. Experimental results demonstrate that this efficient fine-tuning method performs comparable to full fine-tuning. During inference, the input embeddings are processed through the aligned and unaligned layers, which are then merged through the interpolation coefficient. By controlling this parameter, the alignment exhibits a clear interpolation and extrapolation phenomenon.
Do Large Language Models Grasp The Grammar? Evidence from Grammar-Book-Guided Probing in Luxembourgish
Lujun Li, Yewei Song, Lama Sleem
et al.
Grammar refers to the system of rules that governs the structural organization and the semantic relations among linguistic units such as sentences, phrases, and words within a given language. In natural language processing, there remains a notable scarcity of grammar focused evaluation protocols, a gap that is even more pronounced for low-resource languages. Moreover, the extent to which large language models genuinely comprehend grammatical structure, especially the mapping between syntactic structures and meanings, remains under debate. To investigate this issue, we propose a Grammar Book Guided evaluation pipeline intended to provide a systematic and generalizable framework for grammar evaluation consisting of four key stages, and in this work we take Luxembourgish as a case study. The results show a weak positive correlation between translation performance and grammatical understanding, indicating that strong translations do not necessarily imply deep grammatical competence. Larger models perform well overall due to their semantic strength but remain weak in morphology and syntax, struggling particularly with Minimal Pair tasks, while strong reasoning ability offers a promising way to enhance their grammatical understanding.
Effects of Teacher Engagement on Students’ Achievement in an Online English as a Foreign Language Classroom: The Mediating Role of Autonomous Motivation and Positive Emotions
Jianhua Wang, Xi Zhang, L. Zhang
As an important factor promoting students’ learning behavior and achievement, teacher engagement has been largely neglected in the research literature on English as a foreign language (EFL) and applied linguistics. Moreover, the few studies have focused more on conventional classrooms rather than online learning contexts and failed to reveal how teacher engagement in the online foreign language classroom affected students’ achievement. The present study assessed 546 university students in China using self-report questionnaires to examine the relationship between teacher engagement and students’ achievement in an online EFL course over an 18-week semester, taking into account the possible mediating effects of autonomous motivation and positive academic emotions. The results showed that teacher engagement exerted a direct and positive impact on students’ English achievement. Students’ autonomous motivation and enjoyment mediated the association between teacher engagement and English achievement, but the mediating effects of relief were not significant. Additionally, teacher engagement affected students’ English achievement through the chain mediation of autonomous motivation and positive academic emotions (enjoyment and relief). Relief displayed a smaller effect on students’ English achievement than enjoyment did. These findings elucidate the impact of teacher engagement on students’ English achievement in the online environment and support the utility of self-determination theory and control-value theory in explaining foreign language learning. Directions for future research and implications for education are also presented.
Mapping the literature on the application of artificial intelligence in libraries (AAIL): a scientometric analysis
Dhrubajyoti Borgohain, R. Bhardwaj, M. Verma
PurposeArtificial Intelligence (AI) is an emerging technology and turned into a field of knowledge that has been consistently displacing technologies for a change in human life. It is applied in all spheres of life as reflected in the review of the literature section here. As applicable in the field of libraries too, this study scientifically mapped the papers on AAIL and analyze its growth, collaboration network, trending topics, or research hot spots to highlight the challenges and opportunities in adopting AI-based advancements in library systems and processes.Design/methodology/approachThe study was developed with a bibliometric approach, considering a decade, 2012 to 2021 for data extraction from a premier database, Scopus. The steps followed are (1) identification, selection of keywords, and forming the search strategy with the approval of a panel of computer scientists and librarians and (2) design and development of a perfect algorithm to verify these selected keywords in title-abstract-keywords of Scopus (3) Performing data processing in some state-of-the-art bibliometric visualization tools, Biblioshiny R and VOSviewer (4) discussing the findings for practical implications of the study and limitations.FindingsAs evident from several papers, not much research has been conducted on AI applications in libraries in comparison to topics like AI applications in cancer, health, medicine, education, and agriculture. As per the Price law, the growth pattern is exponential. The total number of papers relevant to the subject is 1462 (single and multi-authored) contributed by 5400 authors with 0.271 documents per author and around 4 authors per document. Papers occurred mostly in open-access journals. The productive journal is the Journal of Chemical Information and Modelling (NP = 63) while the highly consistent and impactful is the Journal of Machine Learning Research (z-index=63.58 and CPP = 56.17). In the case of authors, J Chen (z-index=28.86 and CPP = 43.75) is the most consistent and impactful author. At the country level, the USA has recorded the highest number of papers positioned at the center of the co-authorship network but at the institutional level, China takes the 1st position. The trending topics of research are machine learning, large dataset, deep learning, high-level languages, etc. The present information system has a high potential to improve if integrated with AI technologies.Practical implicationsThe number of scientific papers has increased over time. The evolution of themes like machine learning implicates AI as a broad field of knowledge that converges with other disciplines. The themes like large datasets imply that AI may be applied to analyze and interpret these data and support decision-making in public sector enterprises. Theme named high-level language emerged as a research hotspot which indicated that extensive research has been going on in this area to improve computer systems for facilitating the processing of data with high momentum. These implications are of high strategic worth for policymakers, library stakeholders, researchers and the government as a whole for decision-making.Originality/valueThe analysis of collaboration, prolific authors/journals using consistency factor and CPP, testing the relationship between consistency (z-index) and impact (h-index), using state-of-the-art network visualization and cluster analysis techniques make this study novel and differentiates it from the traditional bibliometric analysis. To the best of the author's knowledge, this work is the first attempt to comprehend the research streams and provide a holistic view of research on the application of AI in libraries. The insights obtained from this analysis are instrumental for both academics and practitioners.
70 sitasi
en
Computer Science
Research on the Generation and Automatic Detection of Chinese Academic Writing
Shushan Zhu, Limin MA, Xingyuan Chen
With the advancement of text-generation technology, misuse has increasingly challenged academic research sustainability. The Chinese academic community, vast and active with millions of researchers and extensive literature, faces an inevitable generator misuse. However, research on the automatic detection of Chinese academic texts remains scarce, necessitating the thorough exploration of detection methods to support ongoing academic development. This study explored automatic detection technology for Chinese academic texts, focusing on TK2A dataset construction, generation models, and detection methods to assess their practical impact. TK2A covers papers across disciplines, such as computer science, engineering, and medicine, ensuring broad applicability and forming a solid foundation for model training and evaluation. Using advanced natural language processing, models trained on TK2A showed strong performance across disciplines. Rigorous manual evaluation verified their reliability in terms of grammar, semantics, and logic. The study employed the widely adopted BERT model for detection, achieving high accuracy in distinguishing human-written content from AI-generated content on TK2A. This research underscores TK2A’s practical value by offering crucial support to journals with an accuracy exceeding 84%, institutions, and education in swiftly detecting AI-generated content, preventing misconduct, and enhancing academic publication quality.
Electrical engineering. Electronics. Nuclear engineering
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature
David Wadden, Kejian Shi, Jacob Morrison
et al.
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following instances for training and evaluation, covering 54 tasks. These tasks span five core scientific literature understanding capabilities: information extraction, summarization, question answering, claim verification, and classification. SciRIFF is unique in being entirely expert-written, high-quality instruction-following dataset for extracting and synthesizing information from research literature across diverse scientific fields. It features complex instructions with long input contexts, detailed task descriptions, and structured outputs. To demonstrate its utility, we finetune a series of large language models (LLMs) using a mix of general-domain and SciRIFF instructions. On nine out-of-distribution held-out tasks (referred to as SciRIFF-Eval), LLMs finetuned on SciRIFF achieve 70.6% average improvement over baselines trained only on general-domain instructions. SciRIFF facilitates the development and evaluation of LLMs to help researchers navigate the rapidly growing body of scientific literature.
CollectiveSFT: Scaling Large Language Models for Chinese Medical Benchmark with Collective Instructions in Healthcare
Jingwei Zhu, Minghuan Tan, Min Yang
et al.
The rapid progress in Large Language Models (LLMs) has prompted the creation of numerous benchmarks to evaluate their capabilities.This study focuses on the Comprehensive Medical Benchmark in Chinese (CMB), showcasing how dataset diversity and distribution in supervised fine-tuning (SFT) may enhance LLM performance.Remarkably, We successfully trained a smaller base model to achieve scores comparable to larger models, indicating that a diverse and well-distributed dataset can optimize performance regardless of model size.This study suggests that even smaller models may reach high performance levels with carefully curated and varied datasets. By integrating a wide range of instructional content, our approach addresses potential issues such as data quality inconsistencies. Our results imply that a broader spectrum of training data may enhance a model's ability to generalize and perform effectively across different medical scenarios, highlighting the importance of dataset quality and diversity in fine-tuning processes. We open-source the model for future research at https://github.com/CAS-SIAT-XinHai/CollectiveSFT
DrBenchmark: A Large Language Understanding Evaluation Benchmark for French Biomedical Domain
Yanis Labrak, Adrien Bazoge, Oumaima El Khettari
et al.
The biomedical domain has sparked a significant interest in the field of Natural Language Processing (NLP), which has seen substantial advancements with pre-trained language models (PLMs). However, comparing these models has proven challenging due to variations in evaluation protocols across different models. A fair solution is to aggregate diverse downstream tasks into a benchmark, allowing for the assessment of intrinsic PLMs qualities from various perspectives. Although still limited to few languages, this initiative has been undertaken in the biomedical field, notably English and Chinese. This limitation hampers the evaluation of the latest French biomedical models, as they are either assessed on a minimal number of tasks with non-standardized protocols or evaluated using general downstream tasks. To bridge this research gap and account for the unique sensitivities of French, we present the first-ever publicly available French biomedical language understanding benchmark called DrBenchmark. It encompasses 20 diversified tasks, including named-entity recognition, part-of-speech tagging, question-answering, semantic textual similarity, and classification. We evaluate 8 state-of-the-art pre-trained masked language models (MLMs) on general and biomedical-specific data, as well as English specific MLMs to assess their cross-lingual capabilities. Our experiments reveal that no single model excels across all tasks, while generalist models are sometimes still competitive.
Computational Modelling of Plurality and Definiteness in Chinese Noun Phrases
Yuqi Liu, Guanyi Chen, Kees van Deemter
Theoretical linguists have suggested that some languages (e.g., Chinese and Japanese) are "cooler" than other languages based on the observation that the intended meaning of phrases in these languages depends more on their contexts. As a result, many expressions in these languages are shortened, and their meaning is inferred from the context. In this paper, we focus on the omission of the plurality and definiteness markers in Chinese noun phrases (NPs) to investigate the predictability of their intended meaning given the contexts. To this end, we built a corpus of Chinese NPs, each of which is accompanied by its corresponding context, and by labels indicating its singularity/plurality and definiteness/indefiniteness. We carried out corpus assessments and analyses. The results suggest that Chinese speakers indeed drop plurality and definiteness markers very frequently. Building on the corpus, we train a bank of computational models using both classic machine learning models and state-of-the-art pre-trained language models to predict the plurality and definiteness of each NP. We report on the performance of these models and analyse their behaviours.
Efficacy of Large Language Models in Systematic Reviews
Aaditya Shah, Shridhar Mehendale, Siddha Kanthi
This study investigates the effectiveness of Large Language Models (LLMs) in interpreting existing literature through a systematic review of the relationship between Environmental, Social, and Governance (ESG) factors and financial performance. The primary objective is to assess how LLMs can replicate a systematic review on a corpus of ESG-focused papers. We compiled and hand-coded a database of 88 relevant papers published from March 2020 to May 2024. Additionally, we used a set of 238 papers from a previous systematic review of ESG literature from January 2015 to February 2020. We evaluated two current state-of-the-art LLMs, Meta AI's Llama 3 8B and OpenAI's GPT-4o, on the accuracy of their interpretations relative to human-made classifications on both sets of papers. We then compared these results to a "Custom GPT" and a fine-tuned GPT-4o Mini model using the corpus of 238 papers as training data. The fine-tuned GPT-4o Mini model outperformed the base LLMs by 28.3% on average in overall accuracy on prompt 1. At the same time, the "Custom GPT" showed a 3.0% and 15.7% improvement on average in overall accuracy on prompts 2 and 3, respectively. Our findings reveal promising results for investors and agencies to leverage LLMs to summarize complex evidence related to ESG investing, thereby enabling quicker decision-making and a more efficient market.
Massively Multilingual Text Translation For Low-Resource Languages
Zhong Zhou
Translation into severely low-resource languages has both the cultural goal of saving and reviving those languages and the humanitarian goal of assisting the everyday needs of local communities that are accelerated by the recent COVID-19 pandemic. In many humanitarian efforts, translation into severely low-resource languages often does not require a universal translation engine, but a dedicated text-specific translation engine. For example, healthcare records, hygienic procedures, government communication, emergency procedures and religious texts are all limited texts. While generic translation engines for all languages do not exist, translation of multilingually known limited texts into new, low-resource languages may be possible and reduce human translation effort. We attempt to leverage translation resources from rich-resource languages to efficiently produce best possible translation quality for well known texts, which are available in multiple languages, in a new, low-resource language. To reach this goal, we argue that in translating a closed text into low-resource languages, generalization to out-of-domain texts is not necessary, but generalization to new languages is. Performance gain comes from massive source parallelism by careful choice of close-by language families, style-consistent corpus-level paraphrases within the same language and strategic adaptation of existing large pretrained multilingual models to the domain first and then to the language. Such performance gain makes it possible for machine translation systems to collaborate with human translators to expedite the translation process into new, low-resource languages.
Information extraction and knowledge graph construction from geoscience literature
Chengbin Wang, Xiaogang Ma, Jianguo Chen
et al.
194 sitasi
en
Computer Science
Ethnic Diversity, Trust and Corporate Social Responsibility: The Moderating Effects of Marketization and Language
Gaowen Kong, T. Kong, N. Qin
et al.
Assessing Suicide Risk and Emotional Distress in Chinese Social Media: A Text Mining and Machine Learning Study
Q. Cheng, Tim M. H. Li, Chi-Leung Kwok
et al.
Background Early identification and intervention are imperative for suicide prevention. However, at-risk people often neither seek help nor take professional assessment. A tool to automatically assess their risk levels in natural settings can increase the opportunity for early intervention. Objective The aim of this study was to explore whether computerized language analysis methods can be utilized to assess one’s suicide risk and emotional distress in Chinese social media. Methods A Web-based survey of Chinese social media (ie, Weibo) users was conducted to measure their suicide risk factors including suicide probability, Weibo suicide communication (WSC), depression, anxiety, and stress levels. Participants’ Weibo posts published in the public domain were also downloaded with their consent. The Weibo posts were parsed and fitted into Simplified Chinese-Linguistic Inquiry and Word Count (SC-LIWC) categories. The associations between SC-LIWC features and the 5 suicide risk factors were examined by logistic regression. Furthermore, the support vector machine (SVM) model was applied based on the language features to automatically classify whether a Weibo user exhibited any of the 5 risk factors. Results A total of 974 Weibo users participated in the survey. Those with high suicide probability were marked by a higher usage of pronoun (odds ratio, OR=1.18, P=.001), prepend words (OR=1.49, P=.02), multifunction words (OR=1.12, P=.04), a lower usage of verb (OR=0.78, P<.001), and a greater total word count (OR=1.007, P=.008). Second-person plural was positively associated with severe depression (OR=8.36, P=.01) and stress (OR=11, P=.005), whereas work-related words were negatively associated with WSC (OR=0.71, P=.008), severe depression (OR=0.56, P=.005), and anxiety (OR=0.77, P=.02). Inconsistently, third-person plural was found to be negatively associated with WSC (OR=0.02, P=.047) but positively with severe stress (OR=41.3, P=.04). Achievement-related words were positively associated with depression (OR=1.68, P=.003), whereas health- (OR=2.36, P=.004) and death-related (OR=2.60, P=.01) words positively associated with stress. The machine classifiers did not achieve satisfying performance in the full sample set but could classify high suicide probability (area under the curve, AUC=0.61, P=.04) and severe anxiety (AUC=0.75, P<.001) among those who have exhibited WSC. Conclusions SC-LIWC is useful to examine language markers of suicide risk and emotional distress in Chinese social media and can identify characteristics different from previous findings in the English literature. Some findings are leading to new hypotheses for future verification. Machine classifiers based on SC-LIWC features are promising but still require further optimization for application in real life.
203 sitasi
en
Medicine, Psychology
Towards a Theory of Morphology as Syntax
Collins Chris, Kayne Richard S.
Phenomena traditionally thought of as morphological can be accounted for in terms of syntactic operations and principles, hence bringing forth questions that traditional morphology fails to ask (for instance, concerning the licensing of empty morphemes). The language faculty contains no specific morphological component, nor any post-syntactic morphological operations.
Chinese language and literature
Job burnout among teachers handling English as a foreign language in China: review and prospects
Qiangfu Yu, Xiaofeng Yu
In recent years, job burnout of English as a foreign language (EFL) teachers in China has become prominent in the field of education and psychology, with the related research articles generally on the rise. Using the database of Web of Science (WOS) and the sub-database of Chinese Social Sciences Citation Index (CSSCI) in China National Knowledge Infrastructure (CNKI) database, this paper comprehensively reviews the current situation of research on job burnout of EFL teachers in China between 2020 and 2023, from the aspects of research methods, research focuses and research findings. The literature research results show that on the whole, the research on job burnout of EFL teachers in China is still in its infancy, and that the research level is still relatively low. Based on the systematic reviews of the collected studies, we can conclude that although there is no unanimous conclusion between demographic variables and job burnout severity of EFL teachers in China, we can intervene at both the teacher and school levels to alleviate job burnout of EFL teachers. This review paper analyzes some main problems existing in the current research, for example, lack of theoretical construction and guidance, too much concentration on some research topics, lack of diversified and interdisciplinary research methods, lack of longitudinal research, and potential directions for future research are also discussed in the paper.
A Systemic Crisis: Political Consequences of the 1989 Tiananmen Square Events
I. Yu. Zuenko
The article attempts to revise the understanding of the nature of China’s 1989 Tiananmen Square crisis and its consequences in the context of comparison with Soviet/Russian history of 1980-1990s. Several dozens of studies and op-ed pieces focused on the problem earlier (especially in Russian-language literature) but these papers lacked Chinese sources, intended to liken the Tiananmen Square crisis to the Russian agenda, and simplified complicated cause-effects relations of the Chinese historical process. As a result, the conclusions are limited to the question of the necessity of a decisive crackdown on any turmoil or political manifestation as a sole tool to protect one nation’s stability and prosperity. China’s stability and prosperity after the Tiananmen Square Crisis should not be seen as a result of crackdowns. Paradoxically, despite the conservatives’ victory in 1989, China got back to the path of reforms several years after the Tiananmen Square events. It can be attributed to the complex set of internal and external factors, and the freezing of China’s political reforms was just one of them. Moreover, the process was facilitated not only by overcoming its political crisis but also by observing the revolutions of 1989 in Eastern Europe. Faced with the threat of being ousted from power, China’s elites consolidated – a factor that became crucial for the implementation of systemic market reforms in the 1990s. The author concludes that a similar situation was observed in post-Soviet Russia in 1998-2000: the Russian financial crisis and several events including Second Chechnya Campaign became the ‘backbone crisis’ of Vladimir Putin’s presidency.
Political science (General)