Kırgızcadaki Alfabe – Yazım Kaynaklı Bazı Dil Sorunları
Mayrambek Orozobayev
Bu çalışmada Kırgızcanın yazı dilindeki alfabe ve yazım kurallarından kaynaklanan bazı sorunlar ayrıntılı olarak ele alınmıştır. Çalışmada incelenen konu Kırgızcada Kullanılan Kiril Alfabesi ve Genel Özellikleri, Kırgız Yazı Dilinin Yazımı ve Tarihçesi, Kırgızcadaki Alfabe ve Yazım Kurallarıyla İlgili Bazı Önemli Sorunlar şeklinde üç alt başlık altında değerlendirilmiştir. Kırgızcanın yazımıyla ilgili bir konunun bu şekilde ele alınmasının başlıca sebebi, Kırgızistan’da uzun süredir gündemde olan alfabe-yazım tartışmalarının kesin bir sonuca ulaşmamış olmasıdır. Çalışmanın amacı Kırgızcanın yazı dilindeki bazı sorunların kaynağını mümkün olduğunca daha ayrıntılı bir şekilde tespit etmek ve etraflıca değerlendirerek bu sorunların çözümüne küçük de olsa katkıda bulunmaktır. Ayrıca bu sorunlara diğer akademik çevrelerinin de dikkatini çekmek çalışmanın hedefleri arasındadır. Böyle bir çalışmanın yalnız Kırgızcadaki değil tüm çağdaş Türk yazı dillerindeki bazı benzer sorunların giderilmesi ve istenilen bir sonuca varılması açısından yararlı olacağı kanaatindeyim.
Ilmalikud laulud infoallikate ja haridusvahenditena talurahvavalgustuse ajal
Māra Grudule
The written and oral culture of the Baltic indigenous peoples underwent gradual changes in the late 18th and 19th centuries. According to Wolfgang Welsch, vision is linked with knowledge and science, while hearing relates to faith and religion (Welsch 1996: 248) – this distinction shaped the interaction between oral and written culture. Among Baltic peasants, oral culture remained dominant until the mid-19th century, with the German clergy continuing to control the information space despite ongoing social change. During the Enlightenment, secular Latvian literature began to emerge. Gotthard Friedrich Stender (1714–1796), a German pastor from Kurzeme, laid the foundation for Latvian secular prose, poetry, and popular science literature. However, his songs, the so-called ziņģes, proved more influential than his prose. The songs combine entertainment with moral instruction on drinking, social harmony, and education. Around the turn of the 19th century, major transformations occurred: the territory of present-day Latvia was incorporated into the Russian Empire, Napoleon’s campaigns threatened the region, serfdom was abolished, and a Latvian school network was created. The public demanded information, which was shared through church sermons and, from the 1820s onward, through Latvian newspapers. Supported by Baltic German pastors, the first generation of Latvian intellectuals emerged. By the 1830s, they actively sought to merge oral and written traditions, adapting elements of the Baltic Germans’ peasant Enlightenment project for the purposes of the Latvian national awakening. This paper examines how three key events of the early 19th century – Napoleon’s campaigns and Latvian recruitment into the Russian army, the abolition of serfdom, and the rise of Latvian schools – were reflected in Latvian songs. It analyzes songs published in Latvian newspapers, in books, and on flyers, and it explores the differing perspectives of Baltic Germans and Latvians.
Other Finnic languages and dialects
Neural Network Verification is a Programming Language Challenge
Lucas C. Cordeiro, Matthew L. Daggitt, Julien Girard-Satabin
et al.
Neural network verification is a new and rapidly developing field of research. So far, the main priority has been establishing efficient verification algorithms and tools, while proper support from the programming language perspective has been considered secondary or unimportant. Yet, there is mounting evidence that insights from the programming language community may make a difference in the future development of this domain. In this paper, we formulate neural network verification challenges as programming language challenges and suggest possible future solutions.
Language Generation: Complexity Barriers and Implications for Learning
Marcelo Arenas, Pablo Barceló, Luis Cofré
et al.
Kleinberg and Mullainathan showed that language generation in the limit is always possible at the level of computability: given enough positive examples, a learner can eventually generate data indistinguishable from a target language. However, such existence results do not address feasibility. We study the sample complexity of language generation in the limit for several canonical classes of formal languages. Our results show that infeasibility already appears for context-free and regular languages, and persists even for strict subclasses such as locally threshold testable languages, as well as for incomparable classes such as non-erasing pattern languages, a well-studied class in the theory of language identification. Overall, our results establish a clear gap between the theoretical possibility of language generation in the limit and its computational feasibility.
Advancing Uto-Aztecan Language Technologies: A Case Study on the Endangered Comanche Language
Jesus Alvarez C, Daua D. Karajeanes, Ashley Celeste Prado
et al.
The digital exclusion of endangered languages remains a critical challenge in NLP, limiting both linguistic research and revitalization efforts. This study introduces the first computational investigation of Comanche, an Uto-Aztecan language on the verge of extinction, demonstrating how minimal-cost, community-informed NLP interventions can support language preservation. We present a manually curated dataset of 412 phrases, a synthetic data generation pipeline, and an empirical evaluation of GPT-4o and GPT-4o-mini for language identification. Our experiments reveal that while LLMs struggle with Comanche in zero-shot settings, few-shot prompting significantly improves performance, achieving near-perfect accuracy with just five examples. Our findings highlight the potential of targeted NLP methodologies in low-resource contexts and emphasize that visibility is the first step toward inclusion. By establishing a foundation for Comanche in NLP, we advocate for computational approaches that prioritize accessibility, cultural sensitivity, and community engagement.
Comparative Aspect of the Dagur-Samoyed Languages of Eastern Transbaikalia
R. Zhamsaranova
This article is devoted to the description of the results of onomasiological analysis of the toponomastic vocabulary of Eastern Transbaikalia. The novelty of the article is due to the lack of comparative studies of the toponomastic vocabulary of Dagur (as one of the Mongolian languages) and Samoyedic languages. The relevance of the article is predetermined by the introduction of the results of the description of Daguro-speaking vocabulary in a comparative aspect into the scientifi c space, which makes relevant scientifi c research as regional one. The purpose of the article is to describe the results of a comparative analysis of the toponomastic vocabulary of the Eastern Trans-Baikal region, which has the scientifi c perspective of defi ning the studied territory as a region that is functionally signifi cant for the scientifi c investigation of a thesis of the diachronic Ural-Altaic linguistic union. The specifi c objectives of the study include describing the Daurian toponyms as a Mongolian toponymic substrate, describing the comparative analysis of appellative vocabulary as the basis of onomasiological strategy of analysis, and describing the elements of a comparative-historical nature when comparing the toponomastic vocabulary of Eastern Trans-Baikal region. Toponyms, especially substrate toponyms, thus, adapting to the phonology of a foreign-language superstrate, are forced to change lexically somewhat in order to fulfi ll their fundamental tasks – deictic (indicative) and functional. Under the condition of distinguishing three types of principles of nomination in toponymy: by the qualities of an object, by the connection of an object with a person, by the connection of an object with other objects, the article proposes a description of substrate toponymy according to the fi rst type – the type of the nominative principle by the qualities and physical and geographical characteristics of an object. The article uses an onomasiological approach, and the main research methods are the onomastic method (method of geographical terminology), the descriptive method, the comparative method, and the comparative-historical method. A partial result of the conducted research is the hypothesis on the nature of the Dagur language as a mediator language that developed during the period of Ural-Altaic diachronic language contacts. The article’s subject matter is original and unique in terms of describing the problems of transference and convergent phenomena, which are verifi ed based on onomastic material. The results represent a contribution to the practice of teaching comparative linguistics, arealogy (areal linguistics). The obtained preliminary conclusions can also be used by specialists both in the fi eld of onomastics and comparative linguistics in general.
Anti-Context-Free languages
Carles Cardó
Context-free languages can be characterized in several ways. This article studies projective linearisations of languages of simple dependency trees, i.e., dependency trees in which a node can govern at most one node with a given syntactic function. We prove that the projective linearisations of local languages of simple dependency trees coincide with the context-free languages. Simple dependency trees suggest alternative dual notions of locality and projectivity, which permits defining a dual language for each context-free language. We call this new class of languages anti-context-free. These languages are related to some linguistic constructions exhibiting the so-called cross-serial dependencies that were historically important for the development of computational linguistics. We propose that this duality could be a relevant linguistic phenomenon.
The Unity and Diversity of Altaic
J. Janhunen
In popular conception, Altaic is often assumed to constitute a language family, or perhaps a phylum, but in reality, it involves a historical, areal, and typological complex of five separate language families of different origins—Turkic, Mongolic, Tungusic, Koreanic, and Japonic—to which Uralic also adheres in the transcontinental context of Ural-Altaic. The similarities between the individual Altaic language families are due to prolonged contacts that have resulted in both lexical borrowing and structural interaction in a number of binary patterns. The historical homelands of the Altaic language families were located in continental Northeast Asia, but secondary expansions have subsequently brought these languages to most parts of northern and central Eurasia, including Anatolia and eastern Europe. The present review summarizes the basic facts concerning the Altaic language families, their common features, their patterns of interaction with each other and with other languages, and their historical and prehistorical context.
Common puntioning problems in teaching Turkish to speakers of other Turkic languages and solutions
Gökcan ÇELİK, Özkan ÇELİK
Punctuation marks have important duties in preserving the structure and semantic
features of the language and ensuring its functioning. In other words, punctuation marks make
the text easier to write and understand more clearly. In this context, it is an issue that should be
given importance in both mother tongue teaching and foreign language teaching.
The aim of this study is to investigate the punctuation errors frequently made by Turkic
origin students learning Turkish in the preparatory classes of Kyrgyzstan-Turkey Manas
University in their written expression skills, to examine the causes of these errors and to offer
solutions for eliminating these errors. 96 students of Turkic origin, whose mother tongue is
Kyrgyz and Russian at B1 level, participated in the research. The data source of the research
consists of the texts that these students produced in the activity of writing the continuation of a
tale, which was predetermined and given an introductory paragraph. In the research, the
document analysis method, one of the qualitative research methods, was used and the mistakes
that the students made about punctuation marks in these texts were determined. Later, these
errors were classified under sub-headings and their reasons were interpreted. As a result of the
research, it was seen that the errors detected were mostly caused by the influence of the
students' mother tongues, Kyrgyz and Russian, and the lack of knowledge about the use of
punctuation marks. The evaluations and suggestions made based on the findings and results of
the research are important in terms of showing the points that should be considered especially in
the use of punctuation marks in teaching Turkish to the Turkic origin students and revealing the
responsibilities of the instructors in this regard.
Language and Literature, Ural-Altaic languages
"Imaginary" Finno-Ugric peoples of the USSR in the projects of the American Center for Uralic–Altaic Studies (1940–1950s)
V. Sharapov, A. Zagrebin, T. Lucina
The paper deals with the history of the creation and functioning of the Center for Uralic–Altaic Studies at Indiana University, which in the 1940s and 1950s became the leading institution in the United States engaged in applied and fundamental research in the field of political history, languages and traditional culture of the Finno-Ugric peoples of the USSR. In the context of the growing confrontation between the North Atlantic bloc and the socialist-oriented countries led by the Soviet Union, ethnological knowledge, including that produced by the Center for Uralic–Altaic Studies, is gaining a special role in the ideological struggle. The emigrant scientists who arrived in the USA under different circumstances, which often predetermined the essential content and purpose of texts, occupy the leading positions in the profile university division.
Structural Biases for Improving Transformers on Translation into Morphologically Rich Languages
Paul Soulos, Sudha Rao, Caitlin Smith
et al.
Machine translation has seen rapid progress with the advent of Transformer-based models. These models have no explicit linguistic structure built into them, yet they may still implicitly learn structured relationships by attending to relevant tokens. We hypothesize that this structural learning could be made more robust by explicitly endowing Transformers with a structural bias, and we investigate two methods for building in such a bias. One method, the TP-Transformer, augments the traditional Transformer architecture to include an additional component to represent structure. The second method imbues structure at the data level by segmenting the data with morphological tokenization. We test these methods on translating from English into morphologically rich languages, Turkish and Inuktitut, and consider both automatic metrics and human evaluations. We find that each of these two approaches allows the network to achieve better performance, but this improvement is dependent on the size of the dataset. In sum, structural encoding methods make Transformers more sample-efficient, enabling them to perform better from smaller amounts of data.
Jeopardy: An Invertible Functional Programming Language
Joachim Tilsted Kristensen, Robin Kaarsgaard, Michael Kirkedal Thomsen
Algorithms are ways of mapping problems to solutions. An algorithm is invertible precisely when this mapping is injective, such that the initial problem can be uniquely inferred from its solution. While invertible algorithms can be described in general-purpose languages, no guarantees are generally made by such languages as regards invertibility, so ensuring invertibility requires additional (and often non-trivial) proof. On the other hand, while reversible programming languages guarantee that their programs are invertible by restricting the permissible operations to those which are locally invertible, writing programs in the reversible style can be cumbersome, and may differ significantly from conventional implementations even when the implemented algorithm is, in fact, invertible. In this paper we introduce Jeopardy, a functional programming language that guarantees program invertibility without imposing local reversibility. In particular, Jeopardy allows the limited use of uninvertible -- and even nondeterministic! -- operations, provided that they are used in a way that can be statically determined to be invertible. To this end, we outline an \emph{implicitly available arguments analysis} and three further approaches that can give a partial static guarantee to the (generally difficult) problem of guaranteeing invertibility.
Social differentiation of language by profession
ANAR FARACOV
One of the main directions of social differentiation of language is the language
peculiarities of people related to the different professions. Any profession has its specific
vocabulary. It should be noted that the development of the professional vocabulary in all
languages of the world does not manifest itself in the same form. Professional vocabulary in the
developed societies is richer.
The main purpose of the research is to identify the basic features of the language
differentiation by profession. Professional vocabulary is the basis of the language differentiation
by profession. There are quite a lot of contentious issues here. The main moot point is
connected with their comparison and identification. The points that are worthy of attention can
be shown in the following directions: professional vocabulary and jargons, professional
vocabulary and dialect words, professional vocabulary and terms.
The social professional differentiation of language differs from the other language
differentiations in many features. The intensive development of professional languages,
formation of the new vocabulary due to the emergence of new professions can be considered the
main distinctive features. As for the language levels, there are also differences at the syntactic
level, from the standpoint of formation of components of text, except the lexical level.
Language and Literature, Ural-Altaic languages
Eesti keele kui teise keele õpetaja tööriistad Eesti Keele Instituudi keeleportaalis Sõnaveeb
Jelena Kallas, Kristina Koppel, Raili Pool
et al.
Artiklis tutvustatakse Eesti Keele Instituudis (EKI) arendatavaid eesti keele kui teise keele õpet toetavaid ressursse, mis on koondatud keeleportaali Sõnaveeb veebirakendusse Õpetaja Tööriistad. Selle moodulid – sõnavara, grammatika, kasutusolukorrad ja teksti hindamine – moodustavad terviku, pakkudes keeleõpetajatele ja keeleõppega seotud spetsialistidele abi kursuste kavandamisel ning õppematerjalide, harjutusvara ja testide koostamisel.
Metodoloogilise raamistiku loomisel ja keeleoskustasemete määramisel on lähtekohaks Euroopa keeleõppe raamdokument (CEFR 2001, eestikeelne versioon Raamdokument 2007), selle sõsarväljaanded (CEFR/CV 2018, CEFR/CV 2020), Euroopa Nõukogu noorte õppijate tasemekirjeldused vanustele 7–10 (Szabo 2018a) ja 11–15 (Szabo 2018b) ning Eesti-sisesed keeleoskustasemeid puudutavad õigusaktid. Keeleliste andmete allikad on peamiselt aastatel 2018–2020 EKI-s loodud eesti keele kui teise keele õpikute ja õppijakeele korpused.
Artiklis kirjeldatakse tööriistamoodulite loomise ja esituse põhimõtteid, tuuakse välja esile kerkinud probleemid ning võimalikud edasiarendused.
***
Estonian as a Second Language Teacher’s Tools in the Institute of Estonian Language’s Language Portal Sõnaveeb
The paper presents the interim results of the project Teacher’s Tools (Õpetaja tööriistad) published as a subpage of the language portal Sõnaveeb. The toolbox includes four modules: vocabulary, grammar, language use situations and text evaluation. The tools are aimed to help second language teachers and specialists plan courses, create new educational materials, exercises and tests.
The methodological framework and CEFR level evaluation for Teacher’s Tools is based on the Common European Framework of Reference for Languages: Learning, teaching, assessment (2001), its Companion Volume with New Descriptors (2018), Collated Representative Samples of Descriptors of Language Competences Developed for Young Learners for Ages 7–10 (Szabo 2018a) and 11–15 (Szabo 2018b) and Estonian legislation on the topic. The methodology is adapted from similar projects for other languages (e. g. Capel 2010, 2012, O’Keeffe, Geraldine 2017, Alfter et al 2019).
In order to gather linguistic data the Institute of Estonian Language compiled Estonian language coursebook and learner’s language corpuses in 2018–2020. First, the textbooks were studied for creating wordlists and analysis of explicit grammar teaching. Second, the results were validated by experts and compared to the wordlists created on the basis of learners’ texts.
The vocabulary and grammar modules represent CEFR-based lexical and grammar profiles for learners of Estonian as a Second Language. The lexical profile covers both young (preA1–B2) and adult (A1–C1) learners, the grammar profile the young learners (pre A1–B2). The text evaluation module runs on morphological analyser estNLTK v 1.6 and marks lemmas in texts according to their CEFR-assignment in vocabulary profile. The language use situation module is mainly based on the descriptors for young learners (Szabo 2018a, 2018b) and is to offer information about the typical situations where the learner should be able to communicate.
Philology. Linguistics, Finnic. Baltic-Finnic
The hydronyms of the Labau, Lämäδ and similar isoglosses in the Ural-Altaic and other languages
S. Nafikov
The article is a study of the origin of Bashkir hydronyms with anlaut L-, (Labau, Lämäz, and a number of others). Against the background of extensive comparisons of similar hydronyms and appellatives from the Turkic, Altaic, and other Eurasian languages. The author considers several versions of the said hydronyms viz possible origin from the Turkic, Altaic, Uralic or Euroasiatic languages at large. The stem and/or root of Bashkir hydronyms of the Lämäz type may be cognate with such hydronims as Laba in Poland > the Elbe in East Germany, -lej ‘a small river’ in the Volga Finnie languages and with a fair number of similar names of water objects in Europe, Asia and beyond. So, convergence with many of the same-root names of water bodies from several dozen languages and/or dialects is proposed. A large amount of material from the dialects and subdialects of the Bashkir language is involved. A conclusion is proposed about the very great antiquity of the hydronyms containing the anlaut L- in the bases of the LVC phonomorphological type. The answer to the question posed in the article’s title can hardly be definitive, as much further research is needed to clarify many points.
Museum and Historical Culture: How is Jewish History Included in the Museum Narrative of Lithuania?
Rūta Šermukšnytė
The aim of this article is to reveal the dynamics of the museum narrative of Jewish history in the contexts of Lithuanian historical culture in the period 1990–2020. Seeing the tensions between ethnocentric and polycentric, civic, civilizational models of identity in historical culture, the topic of Jewish history representation was chosen because of its complicated integration into the scheme of ethnocentric national narrative. The study shows that the museum representations of Jewish history are increasing in number and becoming more various in themes and forms. That was predetermined by the changes in the political conjuncture and public memory and by individual initiatives.
Finnic. Baltic-Finnic, Social Sciences
On the Evolution of Programming Languages
K. R. Chowdhary
This paper attempts to connects the evolution of computer languages with the evolution of life, where the later has been dictated by \emph{theory of evolution of species}, and tries to give supportive evidence that the new languages are more robust than the previous, carry-over the mixed features of older languages, such that strong features gets added into them and weak features of older languages gets removed. In addition, an analysis of most prominent programming languages is presented, emphasizing on how the features of existing languages have influenced the development of new programming languages. At the end, it suggests a set of experimental languages, which may rule the world of programming languages in the time of new multi-core architectures. Index terms- Programming languages' evolution, classifications of languages, future languages, scripting-languages.
From Things' Modeling Language (ThingML) to Things' Machine Learning (ThingML2)
Armin Moin, Stephan Rössler, Marouane Sayih
et al.
In this paper, we illustrate how to enhance an existing state-of-the-art modeling language and tool for the Internet of Things (IoT), called ThingML, to support machine learning on the modeling level. To this aim, we extend the Domain-Specific Language (DSL) of ThingML, as well as its code generation framework. Our DSL allows one to define things, which are in charge of carrying out data analytics. Further, our code generators can automatically produce the complete implementation in Java and Python. The generated Python code is responsible for data analytics and employs APIs of machine learning libraries, such as Keras, Tensorflow and Scikit Learn. Our prototype is available as open source software on Github.
Predicative possession in the Novgorod Birch Bark documents in the Ural-Altaic context
C. Yurayong
This paper discusses predicative possessive constructions in the East Slavic languages, with a particular focus on the Old Novgorod Slavic dialect, in connection to the neighbouring Ural-Altaic languages. An areal-typological investigation shows that the East Slavic languages prefer the use of a locational possessive (mihi est), while the rest of Slavic and Europe’s Indo-European languages primarily use a have-possessive (habeo). Serving as primary data for this study, the dialect written in the Novgorod Birch Bark documents confirms a preference of the locational possessive over the have -possessive. The current study also evaluates three hypotheses on the origin of the East Slavic locational possessive, proposed in earlier studies: 1) a Uralic substrate, 2) a Slavic archaism and 3) a Northern Eurasian areal pattern. Given the typological survey as well as the empirical and historical comparative investigation, the locational possessive can be considered a preferred areal pattern across Northern Eurasia. Being a part of the macro contact zone of Northern Eurasia, the choice of locational possessive in the East Slavic languages is reinforced by the areal diffusion, especially from the close neighbouring languages, Uralic and Turkic.