Large language models and linguistic intentionality
Jumbly Grindrod
Do large language models like Chat-GPT or LLaMa meaningfully use the words they produce? Or are they merely clever prediction machines, simulating language use by producing statistically plausible text? There have already been some initial attempts to answer this question by showing that these models meet the criteria for entering meaningful states according to metasemantic theories of mental content. In this paper, I will argue for a different approach - that we should instead consider whether language models meet the criteria given by our best metasemantic theories of linguistic content. In that vein, I will illustrate how this can be done by applying two such theories to the case of language models: Gareth Evans' (1982) account of naming practices and Ruth Millikan's (1984, 2004, 2005) teleosemantics. In doing so, I will argue that it is a mistake to think that the failure of LLMs to meet plausible conditions for mental intentionality thereby renders their outputs meaningless, and that a distinguishing feature of linguistic intentionality - dependency on a pre-existing linguistic system - allows for the plausible result LLM outputs are meaningful.
A Multidimensional Framework for Evaluating Lexical Semantic Change with Social Science Applications
Naomi Baes, Nick Haslam, Ekaterina Vylomova
Historical linguists have identified multiple forms of lexical semantic change. We present a three-dimensional framework for integrating these forms and a unified computational methodology for evaluating them concurrently. The dimensions represent increases or decreases in semantic 1) sentiment, 2) breadth, and 3) intensity. These dimensions can be complemented by the evaluation of shifts in the frequency of the target words and the thematic content of its collocates. This framework enables lexical semantic change to be mapped economically and systematically and has applications in computational social science. We present an illustrative analysis of semantic shifts in mental health and mental illness in two corpora, demonstrating patterns of semantic change that illuminate contemporary concerns about pathologization, stigma, and concept creep.
Hire a Linguist!: Learning Endangered Languages with In-Context Linguistic Descriptions
Kexun Zhang, Yee Man Choi, Zhenqiao Song
et al.
How can large language models (LLMs) process and translate endangered languages? Many languages lack a large corpus to train a decent LLM; therefore existing LLMs rarely perform well in unseen, endangered languages. On the contrary, we observe that 2000 endangered languages, though without a large corpus, have a grammar book or a dictionary. We propose LINGOLLM, a training-free approach to enable an LLM to process unseen languages that hardly occur in its pre-training. Our key insight is to demonstrate linguistic knowledge of an unseen language in an LLM's prompt, including a dictionary, a grammar book, and morphologically analyzed input text. We implement LINGOLLM on top of two models, GPT-4 and Mixtral, and evaluate their performance on 5 tasks across 8 endangered or low-resource languages. Our results show that LINGOLLM elevates translation capability from GPT-4's 0 to 10.5 BLEU for 10 language directions. Our findings demonstrate the tremendous value of linguistic knowledge in the age of LLMs for endangered languages. Our data, code, and model generations can be found at https://github.com/LLiLab/llm4endangeredlang.
Society and State in the Balé Lowlands: Interplay of Divergent Interests in Centre-Periphery Interrelations in South-eastern Ethiopia, 1891–1991
Kefyalew Tessema Semu
Dissertation abstract.
Ethnology. Social and cultural anthropology, Philology. Linguistics
Speculative Black Feminist Epistemologies of Worldbuilding for XR
Clareese Hill
Speculative Black Feminist Epistemologies of Worldbuilding for XR is a methodology attempting to address space, the production of space, permission of space, the economy of space, and evading the confines of space by activating possible imaginaries in the development of XR (Extended Reality) environments. Through a praxis straddling academic and artist writing, the argument explores an experimental approach to Worldbuilding for XR by upending the role of Cartesian coordinates as the default measurement of 3D space. The possibilities afforded XR technologies allow for experimenting with unrestricted navigation of Black women’s cartographic movements, which is impossible in real-world geography. The core proposition of the Black Feminist Episteme of Worldbuilding is the praxis of de-mapping. This praxis foregrounds fugitive movements and spaces by utilizing XR Worldbuilding affordances as a speculative container for reimagining navigation for identities under conditions of subjugation. Researching and speculating about the affordances of XR is a critical intervention attempting to counter mainstream development and deployment of immersive media technology dedicated to the pedagogical tasks of gaming, militarization, and other real-world training applications. The first move toward the praxis of de-mapping, an arrival, is acknowledging the material composition and operation of XR technology. The second move and the first departure explore intentional disorientation. The third move interrupts the linearity of departures and arrivals to establish mobility as a counter-cartographic methodology by referencing the female protagonists in the study by Octavia E. Butler.
Communication. Mass media
DER EINFLUSS DES RUSSISCHEN AUF DIE VERBREKTION IM KARELISCHEN
Jaan Õispuu
COMPARATIVE TYPOLOGY OF THE ENGLISH AND UZBEK LANGUAGES
Khusenova Mekhriniso Uktamovna
Comparative linguistics, or comparative-historical linguistics (formerly comparative philology ) is a branch of historical linguistics that is concerned with comparing languages to establish their historical relatedness. This article focuses on the comparative typology of English, Uzbek and discusses the formation of comparative typology as a science, its methods of analysis, and the relations it with other linguistic subjects. Key words-comparative typology, confrontative linguistics, contrastive linguistics, linguistic characterology, comparativists, notions of a type of a language and a type in a language, linguistic universals, recessives and uncials
The Notion of “Fascination” and “Fascinativity” in Linguistic Poetics Discourse
Олена Скоробогатова, Антоніна Золотько
The study of phenomena that have the potential for verbal persuasive influence on the recipient as well as the identification and development of ways to counteract this influence are the challenges of modern linguistics. Poetry is considered to be one of the most ancient forms of verbal communication. The analysis of the nature of poetic text units provides the key not only to a deeper interpretation of the idea of a particular work but also to the explanation of general linguistic phenomena. The poetic discourse as a linguistic and creative environment regularly activates expressive and figurative potential of language units at different levels. Describing the processes of the grammatical level of poetic language, the terms “fascination” and “fascinativity” are often used. In this paper, the authors refer to the theoretical contributions of representatives of the Kharkiv School of Philology and scholars of other linguistic branches whose works are related to the issues of perception and apperception, and fascinativity of texts. Fascination is considered as an aspect of social communication complementary to information, and fascinativity as an ontological feature of a poetic text that is closely connected with the realisation of the author's supertask. Fascinativity is a basic notion, considered as a specific quality of the text – the ability to “enchant” the reader. Fascinativity is created by the author with the help of linguistic and discursive (in this case, poetic) means and affects the perception of poetry by the recipient. The research findings emphasise that the degree of fascinativity is strongly affected by all the immanent elements of a particular poetic text, the existing discursive practice, and tradition. The study of fascinativity features in a poetic text and fascination as an important aspect of social communication becomes convincing when scholars draw on recent research findings of modern linguists in the field of linguistic poetics. Combination of the communicative and linguistic poetics approaches allows to explain how the mechanism of fascination works in terms of communication and how the fascinativity of a poetic text and discourse is formed.
METAPHOR AND MENTAL IMAGE. WHY WE DO UNDERSTAND METAPHORS
В.А. Борисова, Евгения Анатольевна Пигаркина
В статье рассматривается эволюция точек зрения на метафору как феномен человеческого мышления и, как следствие, феномен языка и речи. В настоящее время спектр наук, изучающих метафору, расширяется, но все они интересуются общей проблемой: что такое метафора, как происходит понимание метафоры, каковы законы её интерпретации и в чем причина множественности смыслов, которые метафора порождает. The article discusses the evolution of opinions on metaphor as a phenomenon of human thinking that results in a phenomenon of language and speech. Presently, the range of sciences studying metaphor is expanding: one can trace the stages starting from philology and linguistics and following to neuropsychology. However, all approaches have one common basic point: they deal with the issue of understanding and interpreting the metaphor as well as the variety of meanings it generates.
A part outside the whole? (To Anton Zimmerling's article “Really: syntactics without semiotics?”)
Sergey Viktorovich Chebanov
Before delving into the connections between linguistics and semiotics, it is essential to establish a clear demarcation between these fields, which necessitates a precise definition of each subject. However, the approach taken by Anton Zimmerling in this regard is subject to debate. In the discussion of semiotics, the focus tends to lean towards interpretations that recognize the dual understanding of signs, while unilateral conceptions of signs are often overlooked. Linguistics is typically confined to the study of language itself, and the treatment of linguistics concerning speech (text) is often seen as a concealed branch of philology. Moreover, it remains unclear whether the distinction between language and speech pertains to linguistics or philology. This ambiguity extends to the status of linguistic pragmatics. To address this issue constructively, it is useful to differentiate between five concepts encompassing language and speech: hermeneutics, philology, linguistics, semiotics, and pragmalinguistics. Each of these concepts delineates a specific ontology and corresponding methodological approach. By considering them as orthogonal axes within a fan matrix, one can identify 25 possible approaches for studying speech, including those that are currently employed and potential ones. Within this framework, philological linguistics, as discussed by Zimmerling, finds its place, and the transitions of scholars like Witzany from biohermeneutics to biopragmalinguistics and Ongstad's shift from philology become more comprehensible.
INTERLINGUAL AND INTERCULTURAL COMMUNICATION IN THE EDUCATION OF MODERN GERMANISTS
I. Khurtak, M. Tayyem
The article is devoted to interlingual and intercultural communication in the training of specialists in Germanic philology. The relevance of the research is conditioned by the search for effective ways of training modern professionals in the field of Germanic Studies. The aim of the article is to substantiate the necessity of revising the content of foreign language education with a focus on linguistics and intercultural communication. Intercultural communication is interpreted as a complete mutual understanding between two participants in a communication act who belong to different national cultures. The authors point out that in intercultural communication, it is not the culture that comes into contact, but the person who is the representative and mediator of intercultural interaction, i.e. intercultural communication is the relationship that develops between people from different cultures in situations of language interaction. Cultural competence is considered to be the most important element of communicative competence necessary for cultural dialogue, which means tolerance in all situations of communication, the ability to choose the right tone in communicating with representatives of other cultures, appropriate communication strategies and forms of self-presentation. The authors considered the condition of effective intercultural communication from linguistic and extralinguistic points of view. The article concludes that it is necessary to jointly study language as a system of symbols and a tool for communication and culture as a system of values, worldview and behavior of native speakers. The authors of the article suggest introducing such disciplines as «Linguocultural studies» and «Ethnolinguistics» into the curriculum of Germanist training. It is proposed to use macrosociological, microsociological, anthropological, tourism, and educational approaches in the teaching of these disciplines.
Scientific Traditions and Current Trends of Linguistic
Research at the Department of English Phonetics and Lexicology Named After Vladimir
D. Arakin (To the 75th Anniversary of the Institute of Foreign Languages of Moscow Pedagogical
State University)
E. Nikulina, E. Freydina
The paper presents the analysis of the topics of dissertations carried out at the department of English phonetics and lexis named after V.D. Arakin over the last decade. Special focus is given to the development of the scientific schools’ traditions with regard to the current priorities in linguistics and pedagogical education. The authors give an overview of the fundamental principles and approaches which form the basis of linguistic research conducted at the department, analyse the peculiarities of forming scientific paradigms within one department. Special attention is paid to the continuity of the researches of the English Phonetics and Lexicology Department. On the basis of the study the authors outline relevant research areas within the framework of the scientific specialty 5.9.6. Languages of foreign countries (Germanic languages) (philology).
Lexical aspect in language and culture communication
Indira Baissydyk, Kuralay Kuderinova, R. Shakhanova
et al.
This study is devoted to the study of the vocabulary of the English language of business communication. The role of vocabulary in natural semasiological systems has been repeatedly emphasized by leading scientists of domestic and foreign linguistics. To date, linguistics and, more broadly, philology has accumulated a wealth of experience in describing and systematizing the vocabulary of natural human languages. The most significant directions in the study of vocabulary include the following: separateness and integrity of the word, meaning and usage, philological foundations of lexical semantics and dynamics of relationships between different types of lexical meanings of the word, dialectics of lexicology and lexicography, synchrony and diachrony, scientific development of comparative semasiology and etymology, consistent differentiation of mono and polylex units, the doctrine of the phrase and justification of various types of idiomatic phraseology, identification of subsystems in the lexical domain of the language (homonymy, synonymy, antonymy, paronymy). This article examines authentic examples of modern English-language business discourse and describes some of the processes occurring in the vocabulary of business English: the emergence of new polylexemic business terms created by analogy with terminological units that have become widespread; the emergence of consubstantial terms as an indicator of the interaction of different lexical strata in the vocabulary of the English language of business communication, the development of phrasal verbs with terminating values.
THE IMPACT OF ANCIENT QURANIC MANUSCRIPTS ON CONTEMPORARY LINGUISTIC AND EXEGETICAL STUDIES: AN INTERDISCIPLINARY ANALYSIS
A. Ardiansyah, Marhamah Annazah Tambunan
Abstract This study investigates the influence of ancient Quranic manuscripts on linguistic studies and the understanding of linguistic nuances in the Quranic text. By employing a comprehensive methodological framework that integrates philology, paleography, historical linguistics, hermeneutics, and comparative studies, this research aims to uncover the linguistic evolution and contextual adaptations of the Quran over time. The study analyzes various ancient manuscripts from the Meccan and Medinan periods, comparing them with modern Quranic texts to identify significant linguistic variations. Advanced digital tools and statistical methods are utilized to ensure precise analysis and validation of findings. Key findings include the identification of vocabulary changes, syntactic developments, and orthographic variations that highlight the dynamic nature of the Arabic language. The incorporation of insights from these ancient manuscripts into contemporary Quranic studies provides a richer, more nuanced understanding of the Quran’s message and historical context. This interdisciplinary approach not only bridges traditional and modern interpretations but also enhances the depth and relevance of Quranic exegesis in contemporary scholarship.
Symmetrical Linguistic Feature Distillation with CLIP for Scene Text Recognition
Zixiao Wang, Hongtao Xie, Yuxin Wang
et al.
In this paper, we explore the potential of the Contrastive Language-Image Pretraining (CLIP) model in scene text recognition (STR), and establish a novel Symmetrical Linguistic Feature Distillation framework (named CLIP-OCR) to leverage both visual and linguistic knowledge in CLIP. Different from previous CLIP-based methods mainly considering feature generalization on visual encoding, we propose a symmetrical distillation strategy (SDS) that further captures the linguistic knowledge in the CLIP text encoder. By cascading the CLIP image encoder with the reversed CLIP text encoder, a symmetrical structure is built with an image-to-text feature flow that covers not only visual but also linguistic information for distillation.Benefiting from the natural alignment in CLIP, such guidance flow provides a progressive optimization objective from vision to language, which can supervise the STR feature forwarding process layer-by-layer.Besides, a new Linguistic Consistency Loss (LCL) is proposed to enhance the linguistic capability by considering second-order statistics during the optimization. Overall, CLIP-OCR is the first to design a smooth transition between image and text for the STR task.Extensive experiments demonstrate the effectiveness of CLIP-OCR with 93.8% average accuracy on six popular STR benchmarks.Code will be available at https://github.com/wzx99/CLIPOCR.
Unveiling A Core Linguistic Region in Large Language Models
Jun Zhao, Zhihao Zhang, Yide Ma
et al.
Brain localization, which describes the association between specific regions of the brain and their corresponding functions, is widely accepted in the field of cognitive science as an objective fact. Today's large language models (LLMs) possess human-level linguistic competence and can execute complex tasks requiring abstract knowledge and reasoning. To deeply understand the inherent mechanisms of intelligence emergence in LLMs, this paper conducts an analogical research using brain localization as a prototype. We have discovered a core region in LLMs that corresponds to linguistic competence, accounting for approximately 1% of the total model parameters. This core region exhibits significant dimension dependency, and perturbations to even a single parameter on specific dimensions can lead to a loss of linguistic competence. Furthermore, we observe that an improvement in linguistic competence does not necessarily accompany an elevation in the model's knowledge level, which might imply the existence of regions of domain knowledge that are dissociated from the linguistic region. Overall, exploring the LLMs' functional regions provides insights into the foundation of their intelligence. In the future, we will continue to investigate knowledge regions within LLMs and the interactions between them.
Generative linguistic representation for spoken language identification
Peng Shen, Xuguang Lu, Hisashi Kawai
Effective extraction and application of linguistic features are central to the enhancement of spoken Language IDentification (LID) performance. With the success of recent large models, such as GPT and Whisper, the potential to leverage such pre-trained models for extracting linguistic features for LID tasks has become a promising area of research. In this paper, we explore the utilization of the decoder-based network from the Whisper model to extract linguistic features through its generative mechanism for improving the classification accuracy in LID tasks. We devised two strategies - one based on the language embedding method and the other focusing on direct optimization of LID outputs while simultaneously enhancing the speech recognition tasks. We conducted experiments on the large-scale multilingual datasets MLS, VoxLingua107, and CommonVoice to test our approach. The experimental results demonstrated the effectiveness of the proposed method on both in-domain and out-of-domain datasets for LID tasks.
ChiSCor: A Corpus of Freely Told Fantasy Stories by Dutch Children for Computational Linguistics and Cognitive Science
Bram M. A. van Dijk, Max J. van Duijn, Suzan Verberne
et al.
In this resource paper we release ChiSCor, a new corpus containing 619 fantasy stories, told freely by 442 Dutch children aged 4-12. ChiSCor was compiled for studying how children render character perspectives, and unravelling language and cognition in development, with computational tools. Unlike existing resources, ChiSCor's stories were produced in natural contexts, in line with recent calls for more ecologically valid datasets. ChiSCor hosts text, audio, and annotations for character complexity and linguistic complexity. Additional metadata (e.g. education of caregivers) is available for one third of the Dutch children. ChiSCor also includes a small set of 62 English stories. This paper details how ChiSCor was compiled and shows its potential for future work with three brief case studies: i) we show that the syntactic complexity of stories is strikingly stable across children's ages; ii) we extend work on Zipfian distributions in free speech and show that ChiSCor obeys Zipf's law closely, reflecting its social context; iii) we show that even though ChiSCor is relatively small, the corpus is rich enough to train informative lemma vectors that allow us to analyse children's language use. We end with a reflection on the value of narrative datasets in computational linguistics.
METAPHOR AS A FACTOR OF DISCURSIVE CREATION (USING THE EXAMPLE OF THE LEXEME ВКУСНЫЙ – TASTY)
Olena V. Kardashova, Tetiana F. Filchuk
This article attempts a comprehensive discourse analysis of the metaphor using the lexeme “вкусный”
(tasty) as an example.
Considering that metaphor can be viewed from the perspective of its ability to create social reality,
and that reality images are discursively conditioned, the authors explore metaphor as a discursive agent
that implies information about the basic parameters of discursive instances: metasubject, metaobject, and
meta-addressee. The tasks of discourse analysis of metaphor in the article include: 1) consecutive explication of metaphorically conditioned components of meaning; 2) reconstruction of the constitutive parameters of discursive instances of subject, object, and addressee, based on these components; 3) reconstruction of the worldview and articulatory possibilities of the speaker who occupies the position of discursive
subject. The study was conducted on the material of the National Corpus of the Russian Language using descriptive, contextual, interpretative methods, and the method of component analysis.
Discourse analysis of linguistic material allows for the reconstruction of two possible types of discursive subject. The first one (nominally designated as DS1) represents the subject as a prepared listener, viewer, appreciator, or expert. Its discursive orientation realizes such intentions as: recognition of the
complexity and intrinsic value of objects in the surrounding world; readiness to expend one`s own resources to interact with them; existential needs to act as a subject of love, care, and knowledge; and transfer of
the value center from one`s own “self” to the surrounding world. With such a focus, the source of positive
emotions (“satisfaction”) becomes the discursive subject itself, which is characterized by the ability to valorize objects and endow them with meaning.
These constitutive parameters of discursive subject DS1 are implicated in such metaphorical constructions as “tasty music”, “tasty picture”, “tasty space”, “tasty design solution”, “tasty movie”, “tasty
goal”, “tasty opponent”, and so on.
The second type of discursive subject (DS2) can be reconstructed based on metaphors like “tasty assets”, “tasty prices”, “tasty discounts”, “tasty offer”, “tasty text”, “tasty position”, “tasty option”, “tasty
life”. Unlike DS1, its attitude towards things, phenomena, and events in the surrounding world is determined by the ratio of “resources spent – satisfaction received”, which characterizes the subject of this type
as a consumer. The main intentional characteristics of DS2 are: a primary desire for satisfaction of their
own needs and desires (receiving positive emotions, material benefits, achieving an attractive social status); the devaluation of the sovereign value of objects and the unwillingness to make an effort to interact
with them; the devaluation of all qualities and properties inherent in an object, except for the consumer
ones (capable of bringing satisfaction to the speaker); a fundamental unwillingness to expend their own
resources, avoidance of novelty, and a desire to maintain the stability of their own internal and external
space.
These two configurations of discursive subject allow the speaker to articulate almost diametrically
opposed attitudes towards the surrounding world. In general terms, they correspond to two worldviews:
modernist and postmodernist.
What do we mean by "data"? A proposed classification of data types in the arts and humanities
B. Gualandi, L. Pareschi, S. Peroni
PurposeThis article describes the interviews the authors conducted in late 2021 with 19 researchers at the Department of Classical Philology and Italian Studies at the University of Bologna. The main purpose was to shed light on the definition of the word “data” in the humanities domain, as far as FAIR data management practices are concerned, and on what researchers think of the term.Design/methodology/approachThe authors invited one researcher for each of the official disciplinary areas represented within the department and all 19 accepted to participate in the study. Participants were then divided into five main research areas: philology and literary criticism, language and linguistics, history of art, computer science and archival studies. The interviews were transcribed and analysed using a grounded theory approach.FindingsA list of 13 research data types has been compiled thanks to the information collected from participants. The term “data” does not emerge as especially problematic, although a good deal of confusion remains. Looking at current research management practices, methodologies and teamwork appear more central than previously reported.Originality/valueOur findings confirm that “data” within the FAIR framework should include all types of inputs and outputs humanities research work with, including publications. Also, the participants of this study appear ready for a discussion around making their research data FAIR: they do not find the terminology particularly problematic, while they rely on precise and recognised methodologies, as well as on sharing and collaboration with colleagues.
12 sitasi
en
Computer Science