Hasil untuk "Language and Literature"

Menampilkan 20 dari ~3362561 hasil · dari DOAJ, Semantic Scholar, arXiv, CrossRef

JSON API
S2 Open Access 2023
20 years of the default mode network: A review and synthesis.

Vinod Menon

The discovery of the default mode network (DMN) has revolutionized our understanding of the workings of the human brain. Here, I review developments that led to the discovery of the DMN, offer a personal reflection, and consider how our ideas of DMN function have evolved over the past two decades. I summarize literature examining the role of the DMN in self-reference, social cognition, episodic and autobiographical memory, language and semantic memory, and mind wandering. I identify unifying themes and propose new perspectives on the DMN's role in human cognition. I argue that the DMN integrates and broadcasts memory, language, and semantic representations to create a coherent "internal narrative" reflecting our individual experiences. This narrative is central to the construction of a sense of self, shapes how we perceive ourselves and interact with others, may have ontogenetic origins in self-directed speech during childhood, and forms a vital component of human consciousness.

585 sitasi en Medicine
DOAJ Open Access 2025
مظاهر السبك النحوي في خطاب الامام الحسين (عليه السلام) الى معاوية

جاسم محمد العمري

يهدف هذا البحث إلى الاستفادة من معطيات الدرس اللساني النصي لتوضيح الدور الذي يلعبه السبك في بناء النص وتنظيم المعلومات داخله. كما يسعى للكشف عن الوسائل التي يتبناها الباحث لتحقيق هذه الغاية، وكيف تُعتبر وسائل السبك النحوي في مقدمتها. حيث يركز البحث على تحليل الخطاب الحسيني الموجه إلى معاوية، مقدمًا دراسة نصية عميقة تليق بمكانة صاحب النص (الإمام الحسين عليه السلام). ويبرز البحث أثرًا جماليًا ملحوظًا في تماسك النص من حيث الشكل والمضمون، ويشحنه بمفاهيم المواجهة وعوامل التصدي التي تحفز المجتمع الإسلامي على مقاومة جميع أشكال التضليل والزيف وتشويه المبادئ التي مارسها الحزب الأموي. لقد شكلت آليات السبك النحوي أساسًا لرؤية النص الحسيني الشريف، بحثًا عن درر الجمال وعناصر الترابط الإبداعي. إذ ساهمت المظاهر النصية في استمرارية تدفق الخطاب، مما أدى إلى تحقيق تماسك نصي ودلالي واضح. وقد اعتمد البحث على معيار السبك كأبرز المعايير النصية - التي حددها علماء النص - لإظهار التماسك النصي. في هذا السياق، تمّ تناول عناصر الإحالة كفاعل رئيسي من عناصر السبك النحوي، القادر على تجسير الفجوات بين أجزاء النص، مما يخلق علاقة اندماج بين الوحدات النحوية ويعزز التعبير المتماسك الذي يتيح للمتلقي التنقل بين أجزاء النص وفهم رموزه. كما تمَّ استعراض عنصر الاستبدال كوسيلة فعالة داخل النص، تمنح القارئ تركيزًا ذهنيًا على التغيرات التركيبية في بناء النص. بعد ذلك، انتقل البحث إلى عنصر الحذف بوصفه أسلوبًا لغويًا يستدعي تنشيط ذهنية المتلقي للبحث عن العناصر الخفية وإعادة تصورها في هذا النص. وكان تتمة هذه العناصر الربط بعده وسيلة فاعلة في تماسك المتتالية النصية لتنوع ادواته وكثرة ورودها في النص، مساهمتها في حض المتلقي على متابعة الخطاب وفهمه. والبحث يمثل دراسة تحليلية عميقة للخطاب الحسيني، مركّزًا على دور السبك النحوي في بناء النص وتنظيم المعلومات. يتناول النص كيفية استخدام الإمام الحسين (عليه السلام) لوسائل السبك النحوي لخلق تماسك نصي ودلالي، مما يمكّن المتلقي من فهم الرسالة بشكل أعمق. تظهر الدراسة بشكل جلي كيف أن هذه العناصر تساهم في خلق نص مُحكم ومؤثر يناقش قضايا نبيلة ويحفز المجتمع الإسلامي على المقاومة ضد الظلم. يبرز الخطاب الحسيني كقيمة فكرية عالية تُظهر التحدي الأخلاقي والإيماني ضد الفساد والتضليل. بهذا الشكل، يسعى النص لإبراز جماليات اللغة وتفاعلاتها باعتبارها أداة فعالة لنشر الوعي وإحداث التغيير، وهو ما يتناسب مع روح الرسالة الحسينية التي تسعى إلى تحقيق العدالة ورفع الظلم.

Language and Literature, Social Sciences
arXiv Open Access 2025
Do Large Language Models Grasp The Grammar? Evidence from Grammar-Book-Guided Probing in Luxembourgish

Lujun Li, Yewei Song, Lama Sleem et al.

Grammar refers to the system of rules that governs the structural organization and the semantic relations among linguistic units such as sentences, phrases, and words within a given language. In natural language processing, there remains a notable scarcity of grammar focused evaluation protocols, a gap that is even more pronounced for low-resource languages. Moreover, the extent to which large language models genuinely comprehend grammatical structure, especially the mapping between syntactic structures and meanings, remains under debate. To investigate this issue, we propose a Grammar Book Guided evaluation pipeline intended to provide a systematic and generalizable framework for grammar evaluation consisting of four key stages, and in this work we take Luxembourgish as a case study. The results show a weak positive correlation between translation performance and grammatical understanding, indicating that strong translations do not necessarily imply deep grammatical competence. Larger models perform well overall due to their semantic strength but remain weak in morphology and syntax, struggling particularly with Minimal Pair tasks, while strong reasoning ability offers a promising way to enhance their grammatical understanding.

en cs.CL
arXiv Open Access 2025
ARS: Adaptive Reasoning Suppression for Efficient Large Reasoning Language Models

Dongqi Zheng

Large Reasoning Language Models (LRLMs or LRMs) demonstrate remarkable capabilities in complex reasoning tasks, but suffer from significant computational inefficiencies due to overthinking phenomena. Existing efficient reasoning methods face the challenge of balancing reasoning quality with inference cost reduction. We propose \textbf{Adaptive Reasoning Suppression (ARS)}, a novel training-free approach that dynamically suppresses redundant reasoning steps while preserving accuracy through adaptive certainty monitoring. ARS introduces a multi-checkpoint certainty estimation mechanism with progressive suppression thresholds, achieving superior efficiency compared to static suppression methods. Our extensive evaluation across mathematical reasoning benchmarks using multiple model architectures demonstrates that ARS achieves up to 53%, 46.1%, and 57.9% in token, latency and energy reduction, while maintaining or improving accuracy.

en cs.AI, cs.CL
arXiv Open Access 2025
GDLLM: A Global Distance-aware Modeling Approach Based on Large Language Models for Event Temporal Relation Extraction

Jie Zhao, Wanting Ning, Yuxiao Fei et al.

In Natural Language Processing(NLP), Event Temporal Relation Extraction (ETRE) is to recognize the temporal relations of two events. Prior studies have noted the importance of language models for ETRE. However, the restricted pre-trained knowledge of Small Language Models(SLMs) limits their capability to handle minority class relations in imbalanced classification datasets. For Large Language Models(LLMs), researchers adopt manually designed prompts or instructions, which may introduce extra noise, leading to interference with the model's judgment of the long-distance dependencies between events. To address these issues, we propose GDLLM, a Global Distance-aware modeling approach based on LLMs. We first present a distance-aware graph structure utilizing Graph Attention Network(GAT) to assist the LLMs in capturing long-distance dependency features. Additionally, we design a temporal feature learning paradigm based on soft inference to augment the identification of relations with a short-distance proximity band, which supplements the probabilistic information generated by LLMs into the multi-head attention mechanism. Since the global feature can be captured effectively, our framework substantially enhances the performance of minority relation classes and improves the overall learning ability. Experiments on two publicly available datasets, TB-Dense and MATRES, demonstrate that our approach achieves state-of-the-art (SOTA) performance.

en cs.CL, cs.IR
CrossRef Open Access 2025
A review of the Total Physical Response—a Foreign Language Teaching Methodology

Jie Zeng

The Total Physical Response (TPR) method, pioneered by psychologist James Asher in the 1960s, has revolutionized the way we approach language learning, especially for young learners. This innovative method focuses on integrating physical actions with linguistic input, fostering a natural and intuitive approach to language acquisition. In this thesis, the theoretical foundations and practical implications of TPR have been delved into, and some insights have been drawn from various scholars who have contributed significantly to its development and application.

arXiv Open Access 2024
Automated Collection of Evaluation Dataset for Semantic Search in Low-Resource Domain Language

Anastasia Zhukova, Christian E. Matt, Bela Gipp

Domain-specific languages that use a lot of specific terminology often fall into the category of low-resource languages. Collecting test datasets in a narrow domain is time-consuming and requires skilled human resources with domain knowledge and training for the annotation task. This study addresses the challenge of automated collecting test datasets to evaluate semantic search in low-resource domain-specific German language of the process industry. Our approach proposes an end-to-end annotation pipeline for automated query generation to the score reassessment of query-document pairs. To overcome the lack of text encoders trained in the German chemistry domain, we explore a principle of an ensemble of "weak" text encoders trained on common knowledge datasets. We combine individual relevance scores from diverse models to retrieve document candidates and relevance scores generated by an LLM, aiming to achieve consensus on query-document alignment. Evaluation results demonstrate that the ensemble method significantly improves alignment with human-assigned relevance scores, outperforming individual models in both inter-coder agreement and accuracy metrics. These findings suggest that ensemble learning can effectively adapt semantic search systems for specialized, low-resource languages, offering a practical solution to resource limitations in domain-specific contexts.

en cs.CL
arXiv Open Access 2024
ArxivDIGESTables: Synthesizing Scientific Literature into Tables using Language Models

Benjamin Newman, Yoonjoo Lee, Aakanksha Naik et al.

When conducting literature reviews, scientists often create literature review tables - tables whose rows are publications and whose columns constitute a schema, a set of aspects used to compare and contrast the papers. Can we automatically generate these tables using language models (LMs)? In this work, we introduce a framework that leverages LMs to perform this task by decomposing it into separate schema and value generation steps. To enable experimentation, we address two main challenges: First, we overcome a lack of high-quality datasets to benchmark table generation by curating and releasing arxivDIGESTables, a new dataset of 2,228 literature review tables extracted from ArXiv papers that synthesize a total of 7,542 research papers. Second, to support scalable evaluation of model generations against human-authored reference tables, we develop DecontextEval, an automatic evaluation method that aligns elements of tables with the same underlying aspects despite differing surface forms. Given these tools, we evaluate LMs' abilities to reconstruct reference tables, finding this task benefits from additional context to ground the generation (e.g. table captions, in-text references). Finally, through a human evaluation study we find that even when LMs fail to fully reconstruct a reference table, their generated novel aspects can still be useful.

en cs.CL
DOAJ Open Access 2023
D’où viennent les relatives ?

Pierre Le Goffic

Relative clauses (of the type … le livre qui est sur la table…, la lettre que tu as écrite…) appeared in Latin by a shift from the correlative structure quas litteras scripsisti, eae… ‘quelle lettre tu as écrite, elle…’ to litterae quas scripsisti… ‘la lettre que tu as écrite…’: the qu- word quas, whose meaning was to express a variable (quas litteras = a letter x), and which was initially a determiner of N, became pronomina²lized and anaphoric of the N that became its antecedent. This structure underwent a great development in Latin and passed onto French, in spite of the morphosyntactic transformations which occurred in late Latin: the relative pronouns (having an antecedent N) separated from the other qu- words to constitute a heterogeneous paradigm, borrowing from several logical principles: casual opposition qui subject / que direct regime, reinforcement by the adverbs où ‘where’ and dont ‘from where’, Roman invention of lequel, late (17th c.) and partial return of the +/-H distinction after preposition. This heterogeneity does not prevent the extensive use of relative clauses in contemporary French.

Philology. Linguistics
DOAJ Open Access 2023
Namen im südlichen Ermland. Beobachtungen zur Tätigkeit von Komisja Ustalania Nazw Miejscowości am Beispiel von Toponymen der Gemeinden Gietrzwałd und Stawiguda

Magdalena Lidia Lobert

In the article, selected names of places in southern Warmia will be discussed and subjected to linguistic analysis. On this basis, the division of these toponyms will be made in terms of the linguistic affiliation of their morphemes, which have their source in Polish, German and Prussian. The history of the activity of the Commission for the Determination of Place Names will also be presented, which after World War II, immediately after the incorporation of Warmia into the Polish state, began intensive work on giving German names their Polish equivalents. The collected material will make it possible to formulate the methods most likely used by the Commission during its work, as well as to show conclusions regarding the impact of national identity and state policy on changes in the naming of places in Warmia.

Language. Linguistic theory. Comparative grammar
arXiv Open Access 2023
LAraBench: Benchmarking Arabic AI with Large Language Models

Ahmed Abdelali, Hamdy Mubarak, Shammur Absar Chowdhury et al.

Recent advancements in Large Language Models (LLMs) have significantly influenced the landscape of language and speech research. Despite this progress, these models lack specific benchmarking against state-of-the-art (SOTA) models tailored to particular languages and tasks. LAraBench addresses this gap for Arabic Natural Language Processing (NLP) and Speech Processing tasks, including sequence tagging and content classification across different domains. We utilized models such as GPT-3.5-turbo, GPT-4, BLOOMZ, Jais-13b-chat, Whisper, and USM, employing zero and few-shot learning techniques to tackle 33 distinct tasks across 61 publicly available datasets. This involved 98 experimental setups, encompassing ~296K data points, ~46 hours of speech, and 30 sentences for Text-to-Speech (TTS). This effort resulted in 330+ sets of experiments. Our analysis focused on measuring the performance gap between SOTA models and LLMs. The overarching trend observed was that SOTA models generally outperformed LLMs in zero-shot learning, with a few exceptions. Notably, larger computational models with few-shot learning techniques managed to reduce these performance gaps. Our findings provide valuable insights into the applicability of LLMs for Arabic NLP and speech processing tasks.

en cs.CL, cs.AI
arXiv Open Access 2023
Using Large Language Models to Provide Explanatory Feedback to Human Tutors

Jionghao Lin, Danielle R. Thomas, Feifei Han et al.

Research demonstrates learners engaging in the process of producing explanations to support their reasoning, can have a positive impact on learning. However, providing learners real-time explanatory feedback often presents challenges related to classification accuracy, particularly in domain-specific environments, containing situationally complex and nuanced responses. We present two approaches for supplying tutors real-time feedback within an online lesson on how to give students effective praise. This work-in-progress demonstrates considerable accuracy in binary classification for corrective feedback of effective, or effort-based (F1 score = 0.811), and ineffective, or outcome-based (F1 score = 0.350), praise responses. More notably, we introduce progress towards an enhanced approach of providing explanatory feedback using large language model-facilitated named entity recognition, which can provide tutors feedback, not only while engaging in lessons, but can potentially suggest real-time tutor moves. Future work involves leveraging large language models for data augmentation to improve accuracy, while also developing an explanatory feedback interface.

en cs.CL, cs.AI
arXiv Open Access 2023
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models

Zhen Yang, Yingxue Zhang, Fandong Meng et al.

Despite Multi-modal Large Language Models (MM-LLMs) have made exciting strides recently, they are still struggling to efficiently model the interactions among multi-modal inputs and the generation in non-textual modalities. In this work, we propose TEAL (Tokenize and Embed ALl)}, an approach to treat the input from any modality as a token sequence and learn a joint embedding space for all modalities. Specifically, for the input from any modality, TEAL first discretizes it into a token sequence with the off-the-shelf tokenizer and embeds the token sequence into a joint embedding space with a learnable embedding matrix. MM-LLMs just need to predict the multi-modal tokens autoregressively as the textual LLMs do. Finally, the corresponding de-tokenizer is applied to generate the output in each modality based on the predicted token sequence. With the joint embedding space, TEAL enables the frozen LLMs to perform both understanding and generation tasks involving non-textual modalities, such as image and audio. Thus, the textual LLM can just work as an interface and maintain its high performance in textual understanding and generation. Experiments show that TEAL achieves substantial improvements in multi-modal understanding, and implements a simple scheme for multi-modal generations.

en cs.CL, cs.AI
DOAJ Open Access 2022
The effectiveness of the structure of the Quranic text and its semantic layers in translation; a case study of Surah Fatir [In Arabic]

Siamak Asgharpour, Atefeh Asghari

The audience's perceptions of the text's meaning and author are influenced by its structure. The impact of a work is multiplied if it has a straightforward, logical structure combined with linguistic and moral aesthetics (eloquence and eloquence). Every text has a unique structure that gives it individuality and influences how literary the text is. Religious writings can be categorized in a number of structural directions. This essay examines the grammatical and rhetorical structures of Surah Fatir and how they manifest in its Persian translation using the descriptive-analytical technique. One of the causes of parallelism between source and destination texts is paying attention to grammatical and rhetorical tendencies. The translator has attempted to translate the verbal and conceptual complexities in accordance with the structure of the source text into the structure of the target text using the simplest expressions in an artistic manner and with respect for veracity. This has involved interacting with the grammatical and rhetorical structures used in the Surah. Furthermore, the desired performance of the translator is more closely linked to the transfer of grammatical than rhetorical components.

Language and Literature
arXiv Open Access 2022
Kencorpus: A Kenyan Language Corpus of Swahili, Dholuo and Luhya for Natural Language Processing Tasks

Barack Wanjawa, Lilian Wanzare, Florence Indede et al.

Indigenous African languages are categorized as under-served in Natural Language Processing. They therefore experience poor digital inclusivity and information access. The processing challenge with such languages has been how to use machine learning and deep learning models without the requisite data. The Kencorpus project intends to bridge this gap by collecting and storing text and speech data that is good enough for data-driven solutions in applications such as machine translation, question answering and transcription in multilingual communities. The Kencorpus dataset is a text and speech corpus for three languages predominantly spoken in Kenya: Swahili, Dholuo and Luhya. Data collection was done by researchers from communities, schools, media, and publishers. The Kencorpus' dataset has a collection of 5,594 items - 4,442 texts (5.6M words) and 1,152 speech files (177hrs). Based on this data, Part of Speech tagging sets for Dholuo and Luhya (50,000 and 93,000 words respectively) were developed. We developed 7,537 Question-Answer pairs for Swahili and created a text translation set of 13,400 sentences from Dholuo and Luhya into Swahili. The datasets are useful for downstream machine learning tasks such as model training and translation. We also developed two proof of concept systems: for Kiswahili speech-to-text and machine learning system for Question Answering task, with results of 18.87% word error rate and 80% Exact Match (EM) respectively. These initial results give great promise to the usability of Kencorpus to the machine learning community. Kencorpus is one of few public domain corpora for these three low resource languages and forms a basis of learning and sharing experiences for similar works especially for low resource languages.

arXiv Open Access 2022
Learnings from Technological Interventions in a Low Resource Language: Enhancing Information Access in Gondi

Devansh Mehta, Harshita Diddee, Ananya Saxena et al.

The primary obstacle to developing technologies for low-resource languages is the lack of representative, usable data. In this paper, we report the deployment of technology-driven data collection methods for creating a corpus of more than 60,000 translations from Hindi to Gondi, a low-resource vulnerable language spoken by around 2.3 million tribal people in south and central India. During this process, we help expand information access in Gondi across 2 different dimensions (a) The creation of linguistic resources that can be used by the community, such as a dictionary, children's stories, Gondi translations from multiple sources and an Interactive Voice Response (IVR) based mass awareness platform; (b) Enabling its use in the digital domain by developing a Hindi-Gondi machine translation model, which is compressed by nearly 4 times to enable it's edge deployment on low-resource edge devices and in areas of little to no internet connectivity. We also present preliminary evaluations of utilizing the developed machine translation model to provide assistance to volunteers who are involved in collecting more data for the target language. Through these interventions, we not only created a refined and evaluated corpus of 26,240 Hindi-Gondi translations that was used for building the translation model but also engaged nearly 850 community members who can help take Gondi onto the internet.

en cs.CL, cs.CY

Halaman 21 dari 168129