Hasil untuk "Ural-Altaic languages"

Menampilkan 20 dari ~220431 hasil · dari DOAJ, arXiv, CrossRef, Semantic Scholar

JSON API
arXiv Open Access 2026
Some Remarks on Marginal Code Languages

Stavros Konstantinidis

A prefix code L satisfies the condition that no word of L is a proper prefix of another word of L. Recently, Ko, Han and Salomaa relaxed this condition by allowing a word of L to be a proper prefix of at most k words of L, for some `margin' k, introducing thus the class of k-prefix-free languages, as well as the similar classes of k-suffix-free and k-infix-free languages. Here we unify the definitions of these three classes of languages into one uniform definition in two ways: via the method of partial orders and via the method of transducers. Thus, for any known class of code-related languages definable via the transducer method, one gets a marginal version of that class. Building on the techniques of Ko, Han and Salomaa, we discuss the \emph{uniform} satisfaction and maximality problems for marginal classes of languages.

en cs.FL, cs.CC
S2 Open Access 2025
An Efficient Gloss-Free Sign Language Translation Using Spatial Configurations and Motion Dynamics with LLMs

Eui Jun Hwang, Sukmin Cho, Junmyeong Lee et al.

Gloss-free Sign Language Translation (SLT) converts sign videos into spoken language sentences without relying on glosses, which are the written representations of signs. Recently, Large Language Models (LLMs) have shown remarkable translation performance in gloss-free methods by harnessing their powerful nat-ural language generation capabilities. However, these methods often rely on domain-specific fine-tuning of visual encoders to achieve optimal results. By contrast, we emphasize the importance of capturing the spatial configurations and motion dynamics in sign language. With this in mind, we introduce Spa tial and Mo tion-based Sign Language Translation ( SpaMo ), a novel LLM-based SLT framework. The core idea of SpaMo is simple yet effective: instead of domain-specific tuning, we use off-the-shelf visual encoders to extract spatial and motion features, which are then input into an LLM along with a language prompt. Additionally, we employ a visual-text alignment process as a lightweight warm-up step before applying SLT supervision. Our experiments demonstrate that SpaMo achieves state-of-the-art performance on three popular datasets— PHOENIX14T, CSL-Daily, and How2Sign— without visual fine-tuning 1 .

11 sitasi en Computer Science
S2 Open Access 2025
ITALIC: An Italian Culture-Aware Natural Language Benchmark

Andrea Seveso, Daniele Potertì, Edoardo Federici et al.

We present ITALIC 1 , a large-scale benchmark dataset of 10,000 multiple-choice questions designed to evaluate the natural language understanding of the Italian language and culture. ITALIC spans 12 domains, exploiting public tests to score domain experts in real-world scenarios. We detail our data collection process, stratification techniques, and selection strategies. ITALIC provides a comprehensive assessment suite that captures commonsense reasoning and linguistic proficiency in a morphologically rich language. We establish baseline performances using 17 state-of-the-art LLMs, revealing current limitations in Italian language understanding and highlighting significant linguistic complexity and cultural specificity challenges. ITALIC serves as a benchmark for evaluating existing models and as a roadmap for future research, encouraging the development of more sophisticated and culturally aware nat-ural language systems.

8 sitasi en Computer Science
S2 Open Access 2025
From Complexity to Clarity: AI/NLP's Role in Regulatory Compliance

J. Jain, Nivedhitha Dhanasekaran, Mona T. Diab

Regulatory data compliance is a cornerstone of trust and accountability in critical sectors like finance, healthcare, and technology, yet its complexity poses significant challenges for organizations worldwide. Recent advances in nat-ural language processing, particularly large language models, have demonstrated remarkable capabilities in text analysis and reasoning, offering promising solutions for automating compliance processes. This survey examines the current state of automated data compliance, analyzing key challenges and approaches across problem areas. We identify critical limitations in current datasets and techniques, including issues of adaptability, completeness, and trust. Looking ahead, we propose research directions to address these challenges, emphasizing standardized evaluation frameworks and balanced human-AI collaboration.

4 sitasi en Computer Science
DOAJ Open Access 2025
Bulgaristan Türklerinin Dilsel Kimliği: Tarihsel Politikalar ve Güncel Zorluklar

Harun Bekir

Bu makale, Bulgaristan’daki Türk azınlığın dilsel kimliğini koruma mücadelesini tarihsel ve güncel gelişmeler çerçevesinde ele almaktadır. Bulgaristan, etnik ve kültürel çeşitliliğe sahip bir ülke olup, Türkler en büyük azınlık grubunu oluşturmaktadır. Türkçe, Bulgarcadan sonra en çok konuşulan ikinci dil olmasına rağmen, eğitim ve kamusal alanda yeterince desteklenmemektedir. Sosyalist dönemde uygulanan asimilasyon politikaları, Türkçenin kullanımını ciddi şekilde kısıtlarken, 1989 sonrası demokratikleşme süreci azınlık haklarında bazı iyileştirmeler sağlamıştır. Ancak, dilsel kimliğin korunmasına yönelik sorunlar devam etmektedir. Bulgaristan Türkleri genellikle iki dillidir ve günlük yaşamda Türkçe ile Bulgarca arasında bir denge kurmaya çalışmaktadır. Ancak genç nesillerin Bulgarcaya yönelmesi, Türkçenin kullanım alanını daraltmaktadır. Türkçenin korunması için eğitimde daha fazla yer alması, kültürel etkinliklerin artırılması ve medyanın etkin kullanımı gerekmektedir. Resmi kurumlarda Türkçenin sınırlı kullanımı, kimliğin sürdürülebilirliğini zorlaştırmaktadır. Makale, Bulgaristan’daki dil politikalarının tarihsel arka planını, iki dilliliğin dinamiklerini ve Türk kimliği üzerindeki etkilerini değerlendirerek çözüm önerileri sunmaktadır. Dilin sadece bir iletişim aracı değil aynı zamanda kimlik ve kültürel devamlılık unsuru olduğu vurgulanmaktadır.

Ural-Altaic languages
DOAJ Open Access 2025
Nimisõnafraasi ja hulgafraasi piirimail: "osa", "enamik" ja "enamus" hulgasõnadena

Maarja-Liisa Pilvik, Liina Lindström, Helen Plado et al.

Artiklis vaatleme hulgasõnast osa, enamik ja enamus ning mitmuslikust komplektisõnast moodustatud fraaside struktuurilist ja semantilist varieerumist tänapäeva eesti keeles. Lähemalt keskendume hulgasõna arvu (nt osale inimestele, osadele inimestele), komplektisõna käände (enamikku inimesi, enamikku inimestest) ning verbi vormi valikut (enamus inimesi läheb ~ lähevad) mõjutavate tegurite analüüsile. Nii hulgasõna valikut kui ka kõiki kolme varieerumise aspekti on varasemalt seotud käimasoleva keelemuutusega, milles hulgasõnast põhi ja komplektisõnast laiendiga kvantorifraas on asendumas hulgasõnast laiendi ja komplektisõnast põhjaga nimisõnafraasiga. Keelekasutajaid on võinud sealjuures motiveerida vajadus eristada hulgasõna kvantifitseerivat ja määratlevat funktsiooni. Uurimuse tulemused kinnitavad paljusid varasemaid, väiksema materjali põhjal tehtud või intuitsioonil põhinevaid tähelepanekuid, ent toovad lisaks esile fraaside süntaktilise rolli olulisuse: fraasisisese arvuühildumise võimalikuks lähtekohaks on tõenäoliselt adverbiaalid, samas kui hulgasõna määratlev funktsioon on kinnistumas pigem subjektina toimivates fraasides. Muu hulgas näeme keelemuutuse levikul ka selget žanri ja tekstiloome spontaansuse mõju. *** "On the border of noun phrases and quantifier phrases: osa ‘some’, enamik ‘most’, and enamus ‘most; majority’ as quantifiers" *** In this article, we examine the structural and semantic variation of phrases which are formed with the quantifiers osa ‘some’, enamik ‘most’, or enamus ‘most; majority’, and plural set nouns in contemporary Estonian. We focus specifically on the analysis of factors influencing the number agreement of the quantifier (e.g., enamikule inimestele, enamikele inimestele ‘to some people’), the case form of the set noun (e.g., enamikku inimesi ‘most people,’ enamikku inimestest ‘most (of the) people’), and the verbal number agreement (e.g., enamus inimesi läheb ~ lähevad ‘most people go’). Both the choice of quantifier and all three aspects of variation have previously been linked to an ongoing linguistic change, in which the quantifier as the head and set noun as the modifier in quantifier phrases are being replaced by noun phrases with quantifier as the modifier and the set noun as the head. The motivation behind this change might stem from the need to distinguish the quantifying and specifying functions of the quantifier. This study’s results confirm many previous observations, made either on a smaller data set or based on intuition, but additionally highlight the importance of the syntactic role of phrases: the potential source of number agreement within phrases is likely adverbials, while the specifying function of the quantifier is increasingly solidifying in phrases where it serves as the subject. Among other things, we also see a clear effect of genre and the degree of text editing in the expansion of this linguistic change.

Philology. Linguistics, Finnic. Baltic-Finnic
DOAJ Open Access 2025
Vana kirjakeele feminiinsest ik-sufiksist murdesõnastike peeglis

Loviisa Mänd, Szilárd Tibor Tóth

The old written Estonian feminine suffix -ik and its reflection in dialect dictionaries This article examines the feminine function of the polysemous Estonian suffix –ik. The feminine function of this suffix is documented in numerous sources of old written Estonian and, due to its Proto-Finnic origins (diminutive suffix *-(i)kkoi̯), in studies of other Finnic languages. As a marker of feminine gender, the suffix was primarily used in ethnonyms (e.g., saksik ‘German woman’), but also appeared in broader contexts, such as hõimik ‘female relative’. These derivatives often carried a pejorative connotation, as seen in the following example from Hornung’s grammar: Saksik ein teutsch Weib / Rootsik ein Schwedisch Weib per contemptum dicuntur ‘Saksik a German woman / Rootsik a Swedish woman, said with contempt’. Although the suffix no longer functions as a feminine marker in contemporary Estonian, traces of its earlier usage persist in the word noorik (‘newly married woman’ < noor ‘young’) and in cow names (Mustik < must ‘black’). In Estonian linguistics, the feminine suffix –ik has primarily been regarded as a distinctive feature of South Estonian. However, an analysis of 12 examples from the dialect dictionaries of the Institute of the Estonian Language reveals that this suffix was used to mark feminine gender throughout the entire Estonian language area. According to our data, two of the examined words appear exclusively in non-southern dialects: kepsik ‘girl with loose morals’ < keps ‘leg’, kepslema ‘to prance’ and naisik ‘immoral woman, mother of an illegitimate child’ < naine ‘woman’. Derivatives found in both southern and non-southern dialects are lehmik ‘promiscuous girl’ < lehm ‘cow’; väitsik ‘little girl’; pordik ‘immoral woman’ < pordu-, a root denoting sexual immorality; noorik ‘newly married young woman’ < noor ‘young’; and kaasik ‘female wedding singer, companion of the bride’ < kaas– ‘co-’. Derivatives found exclusively in southern dialects are latsik < lats ‘child’; välgik < välk, väle ‘swift’; edvik ‘flirtatious girl’ < edeve ‘vain’; lupsik ‘disparaging term for a woman’; and hatik ‘flirtatious girl’ < hatt ‘bitch, female dog’. It is evident that the use of the suffix –ik as a feminine marker is not limited to South Estonian but spans across dialects, suggesting that the suffix was a widespread linguistic feature. Moreover, the frequent association of femininity with pejoration in the analyzed derivatives indicates that pejoration is a recurring feature of the feminine suffix -ik.

Other Finnic languages and dialects
arXiv Open Access 2025
Dynamic Membership for Regular Tree Languages

Antoine Amarilli, Corentin Barloy, Louis Jachiet et al.

We study the dynamic membership problem for regular tree languages under relabeling updates: we fix an alphabet $Σ$ and a regular tree language $L$ over $Σ$ (expressed, e.g., as a tree automaton), we are given a tree $T$ with labels in $Σ$, and we must maintain the information of whether the tree $T$ belongs to $L$ while handling relabeling updates that change the labels of individual nodes in $T$. Our first contribution is to show that this problem admits an $O(\log n / \log \log n)$ algorithm for any fixed regular tree language, improving over known $O(\log n)$ algorithms. This generalizes the known $O(\log n / \log \log n)$ upper bound over words, and it matches the lower bound of $Ω(\log n / \log \log n)$ from dynamic membership to some word languages and from the existential marked ancestor problem. Our second contribution is to introduce a class of regular languages, dubbed almost-commutative tree languages, and show that dynamic membership to such languages under relabeling updates can be decided in constant time per update. Almost-commutative languages generalize both commutative languages and finite languages: they are the analogue for trees of the ZG languages enjoying constant-time dynamic membership over words. Our main technical contribution is to show that this class is conditionally optimal when we assume that the alphabet features a neutral letter, i.e., a letter that has no effect on membership to the language. More precisely, we show that any regular tree language with a neutral letter which is not almost-commutative cannot be maintained in constant time under the assumption that the prefix-U1 problem from (Amarilli, Jachiet, Paperman, ICALP'21) also does not admit a constant-time algorithm.

en cs.FL, cs.DS
arXiv Open Access 2025
Measure-Theoretic Aspects of Star-Free and Group Languages

Ryoma Sin'ya, Takao Yuyama

A language $L$ is said to be ${\cal C}$-measurable, where ${\cal C}$ is a class of languages, if there is an infinite sequence of languages in ${\cal C}$ that ``converges'' to $L$. We investigate the properties of ${\cal C}$-measurability in the cases where ${\cal C}$ is SF, the class of all star-free languages, and G, the class of all group languages. It is shown that a language $L$ is SF-measurable if and only if $L$ is GD-measurable, where GD is the class of all generalised definite languages (a more restricted subclass of star-free languages). This means that GD and SF have the same ``measuring power'', whereas GD is a very restricted proper subclass of SF. Moreover, we give a purely algebraic characterisation of SF-measurable regular languages, which is a natural extension of Schutzenberger's theorem stating the correspondence between star-free languages and aperiodic monoids. We also show the probabilistic independence of star-free and group languages, which is an important application of the former result. Finally, while the measuring power of star-free and generalised definite languages are equal, we show that the situation is rather opposite for subclasses of group languages as follows. For any two local subvarieties ${\cal C} \subsetneq {\cal D}$ of group languages, we have $\{L \mid L \text{ is } {\cal C}\text{-measurable}\} \subsetneq \{ L \mid L \text{ is } {\cal D}\text{-measurable}\}$.

en cs.FL
S2 Open Access 2024
Eurasian Ideologies at Odds. Assessing the Opposing Nature of Eurasianism and Turanism

Paolo Pizzolo

Abstract Eurasianism and Turanism epitomize two antithetical ideologies driven by the ambition of politically and culturally integrating Eurasia. While Eurasianism has never developed an exclusivist nationalist sentiment based on ethnolinguistic foundations, Turanism falls into the category of pan-nationalist ideologies that tend to exclude the appearance and affirmation of other nations within their spatial scope. The mutually incompatibility of the two ideologies rests on the fact that Eurasianism is based on the principle of inclusiveness of the different Eurasian populations, while Turanism on the principle of Ural-Altaic exclusivism and the rejection of a symbiosis with the Slavic element. This article aims to compare the classic variant of Russian Eurasianism with Turanism from an ideological and cultural perspective, through the evaluation of the respective intellectual fathers’ works. While Eurasianism builds its political-ideological project on Russian-Eurasian history, the imperial idea, the primacy of geography and the rejection of the West as a philosophical model, Turanism grounds its raison d’etre on ethnocentric and pan-nationalist postulates designed for the political and cultural union of the Turanian peoples and the exclusion of others. In this frame, the two ideologies embody geographically overlapping mutually exclusive paradigms and idiosyncratic Weltanschauungs.

S2 Open Access 2024
Impact of ChatGPT on the writing style of condensed matter physicists

Shaojun Xu, Xiao Ye, Mengqi Zhang et al.

We apply a state-of-the-art difference-in-differences approach to estimate the impact of ChatGPT's release on the writing style of condensed matter papers on arXiv. Our analysis reveals a statistically significant improvement in the English quality of abstracts written by non-native English speakers. Importantly, this improvement remains robust even after accounting for other potential factors, confirming that it can be attributed to the release of ChatGPT. This indicates widespread adoption of the tool. Following the release of ChatGPT, there is a significant increase in the use of unique words, while the frequency of rare words decreases. Across language families, the changes in writing style are significant for authors from the Latin and Ural-Altaic groups, but not for those from the Germanic or other Indo-European groups.

1 sitasi en Computer Science, Physics
S2 Open Access 2024
The etymology and semantic field of the zoonym “cow” in the mythology and religion of Indo-European, Altai and Uralic linguocultures

Natalia Vitalievna Nikolaeva, Guzaliya Sayfullovna Gilazova, G. N. Semenova et al.

The research aims to explore the etymology and semantic field of the zoonym “cow” in the linguocultural contexts of the Indo-European, Uralic and Altaic language families. The article analyzes the origin of the word “cow” in Slavic, Turkic, Romance and Germanic languages, identifying its core and additional meanings, commonalities and specific features. This analysis makes it possible to determine the national characteristics in the expression of knowledge about people, animals, and the world in mythological and religious beliefs by the native speakers. The scientific novelty of the research lies in considering the zoonym “cow” not only as a linguistic object, but also as a cultural phenomenon intertwined with history, mythology, religion, and human lifestyle. This zoonym is studied for the first time in a comparative aspect using data from Slavic, Turkic, Romance and Germanic languages, with the aim of identifying, analyzing, and describing national cultural features, similarities, and differences. As a result, the study proves that the etymology and semantics of the word “cow” in the contexts of the Indo-European, Uralic and Altaic language families display shared roots, dating back to a distant past, and indicate profound connections between the peoples. Despite some differences among the peoples who speak Slavic and Turkic languages, such as national specificity of worldview, varying living conditions, traditions, history, and religion, the words designating the cow in various linguocultures are part of a unified history, connected to the use of this animal in human life.

1 sitasi en
S2 Open Access 2024
The Sir Daria and Fergana Valley – crucial component of the rich historical and cultural heritage of nomadic cultures and ancient civilizations

Djaparov Nooman

This paper will and comment upon a several number of ancient Turkic names, e. g. the model toponym ↔ ethnonym, - and discuss the nature or natural linguistical evolution of their antic forms. So, our examination of ancient Turkic names will bring us up to date and will suggest that with a small and select lexicon of Old Turkic historical-onomastic terms the researches, who investigating and testing about to understand diachronically evolution and meaning of ancient Turkic names. Further, this investigation ancient placenames and/or ethnonyms of a greatest cultural-geographical region and country, whose antic names for thousands of years were unknown their Altaic form to Westerners. Because, written in a script very few, except dialectally forms of several modern Turkic languages in rural regions from Eurasian continent.

1 sitasi en
DOAJ Open Access 2024
The Lord’s Prayer In Finnish by Georg Bruno from 16th century

Ernesta Kazakėnaitė, Petri Kallio

This article briefly presents a new handwritten version of the Lord’s Prayer in Finnish that is currently stored in the National Library of Sweden. It is found in a manuscript attributed to Georg Bruno and dated to the late 16th century. Here we discuss its status and identify its sources. We also question some of the ideas of an earlier researcher of this manuscript. Kokkuvõte. Ernesta Kazakėnaitė, Petri Kallio: Georg Bruno soomekeelne meieisapalve 16. sajandist. Aastal 1955 leidis Lätis sündinud Rootsi teoloog Haralds Biezais Rootsi Rahvusraamatukogu arhiivist 16. sajandi käsikirja, mis sisaldas ühe esimestest lätikeelsetest meieisapalvetest. Tõenäoliselt Georg Bruno poolt kirjutatud käsikirjas on veel 19 muukeelset meieisapalvet, mis on seni uurimata. Artikkel esitab filoloogilise ülevaate käsikirja soomekeelsest meieisapalvest, mis osutub koopiaks Sebastian Münsteri raamatu Cosmographei 1561. aasta trükist.

Philology. Linguistics, Finnic. Baltic-Finnic
arXiv Open Access 2024
Algebraic Language Theory with Effects

Fabian Lenke, Stefan Milius, Henning Urbat et al.

Regular languages -- the languages accepted by deterministic finite automata -- are known to be precisely the languages recognized by finite monoids. This characterization is the origin of algebraic language theory. In this paper, we generalize the correspondence between automata and monoids to automata with generic computational effects given by a monad, providing the foundations of an effectful algebraic language theory. We show that, under suitable conditions on the monad, a language is computable by an effectful automaton precisely when it is recognizable by (1) an effectful monoid morphism into an effect-free finite monoid, and (2) a monoid morphism into a monad-monoid bialgebra whose carrier is a finitely generated algebra for the monad, the former mode of recognition being conceptually completely new. Our prime application is a novel algebraic approach to languages computed by probabilistic finite automata. Additionally, we derive new algebraic characterizations for nondeterministic probabilistic finite automata and for weighted finite automata over unrestricted semirings, generalizing previous results on weighted algebraic recognition over commutative rings.

en cs.FL
arXiv Open Access 2024
Directed Regular and Context-Free Languages

Moses Ganardi, Irmak Saglam, Georg Zetzsche

We study the problem of deciding whether a given language is directed. A language $L$ is \emph{directed} if every pair of words in $L$ have a common (scattered) superword in $L$. Deciding directedness is a fundamental problem in connection with ideal decompositions of downward closed sets. Another motivation is that deciding whether two \emph{directed} context-free languages have the same downward closures can be decided in polynomial time, whereas for general context-free languages, this problem is known to be coNEXP-complete. We show that the directedness problem for regular languages, given as NFAs, belongs to $AC^1$, and thus polynomial time. Moreover, it is NL-complete for fixed alphabet sizes. Furthermore, we show that for context-free languages, the directedness problem is PSPACE-complete.

en cs.FL, cs.CL
S2 Open Access 2024
Tungus and „Palaeo-Siberian” studies in contemporary China

Michael Knüppel

This paper examines the trajectory of Tungusological and „Palaeo-Siberian“ studies in contemporary China. While China boasts a rich tradition of research in Altaic studies, with a focus on Turkish, Mongolian, and Tungusic, this study sheds light on the less-explored domains of Tungusological studies and „Palaeo-Siberian“ research. Tracing the historical backdrop, it discusses early engagements with Altaic languages during periods of Altaic dynastic rule and the contemporary significance of Altaic and „Palaeo-Siberian“ peoples, especially against the backdrop of China’s identification as a „Near Arctic country.“ The paper further explores the evolution of Tungusological research in China, highlighting key scholars, and delves into recent developments, emphasizing the shift from linguistic studies to broader interdisciplinary inquiries encompassing folklore, cultural, and ecological dimensions. The establishment of the Arctic Studies Center at Liaocheng University signals a new phase, demonstrating China’s increasing interest in Arctic affairs and its potential as a research hub for Tungusological and „Palaeo-Siberian“ studies.

S2 Open Access 2024
Korean language attitude in the Altai language system

N. Nikolaeva, M. Song

The relevance of the study is determined by the growing interest of the scientific community in constructing typologies of linguistic features of the Korean language in its relation to the Altai language system. A review of research shows that to date, discussions regarding the place of the Korean language in the Altai language system continue, since there are no uniform criteria by which to describe the relationship of these two languages. The purpose of the article is to systematize the features that allow us to determine the place of the Korean language in the Altai language system. Objectives: provide an overview of studies of the Altai language system; consider arguments about the relationship between the Korean and Altai languages; consider criticism of the theory of kinship between the Korean and Altaic languages. Research methods: systematization, generalization, description, comparison, critical analysis. It has been established that even if we completely deny the family connection between the theory of the Altaic language family and the Korean language, it is still possible to establish a friendly connection between them, which can be taken as a research hypothesis. This is made possible by the fact that the common elements that have been clearly outlined so far exist, albeit in meager quantities. If these elements were not borrowed, it would be difficult to conclude that this is a coincidence. However, it does not seem convincing that the scant data is sufficient to prove the kinship of Koreans and Altai. In this situation, just as it is impossible to draw an unambiguous conclusion that the Korean language has a related relationship with the Altai language or the Altai language family, it is also impossible to conclude that there are no such relationships at all. It is true that the origins of the Korean language are still unclear. Since the existing theory of the Altaic language family and the theory of kinship with Korean were accepted almost without criticism, the counter-arguments against them can be equally strong. However, there are significant differences between these views. Some say this completely refutes conventional wisdom, while others say it is a hypothesis that has not yet been proven. The prospects for the study are seen in the systematization of linguistic typological features of the Korean language in its relation to the Altai language system.

Halaman 13 dari 11022