Art Leete, Laur Vallikivi
Hasil untuk "Other Finnic languages and dialects"
Menampilkan 20 dari ~783994 hasil · dari DOAJ, arXiv, CrossRef
Mihir Panchal, Deeksha Varshney, Mamta et al.
Multilingual large language models (LLMs) are increasingly deployed in linguistically diverse regions like India, yet most interpretability tools remain tailored to English. Prior work reveals that LLMs often operate in English centric representation spaces, making cross lingual interpretability a pressing concern. We introduce Indic-TunedLens, a novel interpretability framework specifically for Indian languages that learns shared affine transformations. Unlike the standard Logit Lens, which directly decodes intermediate activations, Indic-TunedLens adjusts hidden states for each target language, aligning them with the target output distributions to enable more faithful decoding of model representations. We evaluate our framework on 10 Indian languages using the MMLU benchmark and find that it significantly improves over SOTA interpretability methods, especially for morphologically rich, low resource languages. Our results provide crucial insights into the layer-wise semantic encoding of multilingual transformers. Our model is available at https://huggingface.co/spaces/MihirRajeshPanchal/IndicTunedLens. Our code is available at https://github.com/MihirRajeshPanchal/IndicTunedLens.
Víctor García, Santaigo Escobar, Catherine Meadows et al.
Formal patterns are formally specified solutions to frequently occurring distributed system problems that are generic, executable, and come with strong qualitative and/or quantitative formal guarantees. A formal pattern is a generic system transformation which transforms a usually infinite class of systems in need of the pattern's solution into enhanced versions of such systems that solve the problem in question. In this paper we demonstrate the application of formal patterns to protocol dialects. Dialects are methods for hardening protocols so as to endow them with light-weight security, especially against easy attacks that can lead to more serious ones. A lingo is a dialect's key security component, because attackers are unable to ''speak'' the lingo. A lingo's ''talk'' changes all the time, becoming a moving target for attackers. In this paper we present several formal patterns for both lingos and dialects. Lingo formal patterns can make lingos stronger by both transforming them and by composing several lingos into a stronger lingo. Dialects themselves can be obtained by the application of a single dialect formal pattern, generic on both the chosen lingo and the chosen protocol.
Liina Pärismaa
Christoph Blume as a language innovator in the second half of the 17th century This article provides an insight into the language usage in the mid-17th century ecclesiastical texts by the Northern Estonian author Christoph Blume: Das Kleine Corpus Doctrinæ (1662), Geistliche Wochen-Arbeit (1666), and Geistliche Hohe Fäst-Tahgs Freude (1667). The focus is on various morphosyntactic phenomena, such as the expression of negation and future tense, the translation of the definite article, and the relationship between analytical and synthetic locative constructions. The article also explores the extent to which Blume followed in his works the example of authoritative writers or the established norms of the written language of his time. While changes at the individual level of language usage may not be as prominent in linguistic history as broader trends encompassing the shared linguistic attitudes of multiple authors, they help to pinpoint the beginnings of linguistic shifts and offer a more detailed view of the variation and development of linguistic features by highlighting differences between authors’ texts. This study reveals that, on an individual level, the changes had begun to emerge in Christoph Blume’s ecclesiastical texts in the mid-17th century, although the changes became more widespread only in the final decades of the century. The language usage in Blume’s ecclesiastical texts reflect some characteristics typical of his time (e.g., ep-negation or the saama-future). However, his works also introduced several innovative linguistic features that his predecessors, such as Heinrich Stahl, had not yet adopted in their texts or had used differently. These include ei-negation, the negative form pole, as well as the omission of the translation for the German definite article. Blume likely adopted many of these innovative forms, which were atypical for contemporary Northern Estonian literary language, either to achieve suitable rhyming and rhythmic forms for his translations of ecclesiastical songs or as a result of his observations of the local population’s language usage.
Kristiina Ross
Since the second half of the 17th century, parish registers served as administrative documents in which local pastors recorded the births, marriages, and deaths of congregation members. The matrix language of the registers was German. However, the names of peasants were entered following the Estonian model (byname + given name). In some registers, Estonian was also used systematically for additional information concerning individuals from lower social strata. This article investigates how the use of Estonian in parish registers evolved over time, whether there were specific geographic areas where Estonian was used more consistently, and to what extent language choice in the entries can be linked to the ideological views of pastors. To this end, preserved marriage records from four time points were analyzed: the earliest surviving 17th-century register and registers from the years 1720, 1750, and 1780. Of the 25 surviving 17th-century registers, only two can be classified as Estonian-language records. By the 1720s, however, approximately one-third of the records were in Estonian, and this proportion remained relatively stable throughout the 18th century. Geographically, two regions emerged where Estonian-language records were more common. By contrast, in some regions no Estonian-language registers were kept at all. In cases where Estonian was used consistently, this choice cannot be attributed to random code-switching in a bilingual context. Pastors who consistently used Estonian were often pietist in outlook, more familiar with local circumstances, or sympathetic to the Moravian Brethren movement. A deeper understanding of the factors behind language choice would require data from additional time points, comprehensive biographical information on pastors, and a multidimensional prosopographical analysis linking linguistic and biographical data. Nonetheless, based on the present analysis, it can be concluded that Estonian held a significant role in 18th-century parish registers.
Tiina-Erika Friedenthal
Estonian Bible research has predominantly focused on the development of biblical and literary language, with little attention given to paratextual elements such as prefaces and reading aids, despite their considerable presence and volume in early modern Bibles. The Pietistic renewal, initiated by Spener’s Pia desideria (1675), placed the Bible at the very centre of Christian life. Independent engagement with the Bible became soteriologically indispensable for everyone. Spener’s associate Johann Fischer, who had arrived in Livonia in 1673, began organizing a school network and Bible publication in four local languages, parallel to Spener’s work in Frankfurt. The first, a German Bible (Riga, 1677), is considered the earliest Pietistic Bible, featuring a Lutheran text augmented with Pietistic supplementary materials. The first Latvian and South Estonian biblical books followed soon (between 1685 and 1694). Despite the Swedish state, committed to Lutheran orthodoxy, banning Pietism and putting pressure on Fischer, the “Pietistic Bible project” continued after his departure (1699) and the end of the Great Northern War (1710 in Estonia and Livonia). Now under Russian rule, the task of disseminating the Bible and basic literacy in Estonia and Livonia was enthusiastically taken up by men who had absorbed Pietistic views and methods during Pietism’s heyday, directly under Francke in Halle or at other German universities. Despite economic constraints, Bible translations appeared both in Estonian and Latvian. The period culminated with the North Estonian New Testament (2nd ed., 1729), the complete North Estonian Bible, and the complete Latvian Bible (2nd ed., both 1739). All three were, to varying degrees, equipped with Pietistic supplementary materials. This article argues that Bible publishing for the local populations in Estonia and Livonia was a major Pietistic undertaking, unfolding in exact temporal alignment with historical Pietism (1675–c.1740).
Reet Bender
This article explores the everyday written use of Estonian by Baltic Germans in the 18th–20th centuries, drawing on a wide range of illustrative examples. It focuses on surviving chance fragments that reflect the multilingual cultural sphere of the period. Approaching the topic from a cultural-historical perspective, the article describes the Baltic Germans’ use of Estonian (resp., Latvian) as situational and functional, shaped by their living environment, social position, and the dynamics of historical circumstances. From the examples, four principal contexts of use emerge: (a) expression of local colour and nostalgia, (b) domestic interpersonal communication, (c) secret or coded language and a sign of protest, and (d) a unifying element across different national groups. Examples drawn from varied and sometimes unexpected sources point to the natural presence of Estonian in Baltic German culture. Estonian, acquired in childhood through interaction with servants, often became the first language, later supplanted by German, French, or Russian, yet leaving lasting traces in domestic speech. Estonian acquired particular significance as a secret language outside the home or in difficult circumstances, most notably after the Second World War in Germany and beyond, where, in addition to its practical value, it also took on an identity-forming or identity-affirming role, serving as a cultural bridge between different national groups. Although the extant material is fragmentary, it nevertheless reveals the linguistic interweaving and cultural interaction that calls for more systematic investigation. The aim of this article is to encourage scholars to bring similar chance findings into academic circulation, thereby contributing to a fuller understanding of the coexistence of Baltic Germans and local languages, as well as the multilayered nature of cultural memory.
Liina Saarlo
Written traces in the Estonian runosong corpus Authenticity, antiquity, and orality have traditionally been regarded as hallmarks of the Estonian runosong (regilaul). Yet these songs were collected during a period of modernization in Estonian society, when, among other changes, a transition from oral to written culture was taking place. Oral and written cultures have often been viewed as fundamentally different, even oppositional. Written culture is thought to transform oral modes of thought irreversibly. For this reason, folklorists have viewed the rise of written culture as a key factor in the decline of oral traditions and archaic genres: as society modernized, the runosong was replaced by the rhymed stanzaic song, which was perceived as foreign and inauthentic by the elite. The idealized image of a runo singer has been associated with social marginality, exceptional memory, a readiness to improvise, and an affective communicative style, whereas literacy was considered irrelevant. Alongside the collection process, runosongs were continually published in print – both in scholarly publications and in school textbooks or popular song booklets. Consequently, printed versions of runosongs re-entered oral tradition: they were read, memorized, and reinterpreted in performance. The identification of written sources for archived songs and the verification of the authenticity of contributors’ submissions have long been central to Estonian folklorists’ philological work, which has relied on extensive reading and informed intuition. Contemporary corpus-based research now offers new opportunities for such analysis. This article employs the computational similarity-based user interface Runoregi (runoregi.fi) of the Finno-Ugric runosong joint database FILTER to explore the traces that written sources and literacy have left in the Estonian runosong corpus. Today, it is no longer appropriate to label contributors who copied songs from books as “forgers”. Instead, the focus should be on understanding why they copied and which books they used. Songs originating from printed sources do not simply indicate imitation but reveal the multifaceted processes and mechanisms of re-folklorization within folklore.
Annika Viht
This article is a first attempt to analyze the Estonian language of the writings of the Moravian (Herrnhutian) movement, which brought about profound shifts in worldview and social life in Estonia. To this end, I compared the language of Moravian hymnals with that of non-Moravian ones. I juxtaposed five South Estonian and two North Estonian Moravian hymnals from the period 1741–1810 with the official Lutheran hymnals of the same era. From each book, a 5,000-word excerpt was examined for variation in the use of the most frequent linguistic elements. Most of the phonological, inflectional, lexical, and morphosyntactic variation observed can be attributed to dialectal background – that is, to the differing conventions of the South and North Estonian standard languages. Nevertheless, some deviations from this general pattern emerged. For example, in their expression of futurity, the South Estonian Moravian hymnals aligned more closely with the North Estonian tradition than with the South Estonian one, reducing the use of the saama-future construction and introducing the võtma-construction. In the first South Estonian Moravian hymnal, authored by Johann Christian Quandt, additional rare instances of North Estonian linguistic patterns were found. The hymnals with the most distinctive language were that of Matthias Friedrich Hasse (1747), compiled during the sifting period, and that of Christoph Michael Königseer (1759), who partially drew on Hasse’s material. These hymnals featured several times more diminutives than the others, as well as significantly less South Estonian vowel harmony. Hasse was also the only author to use a postmodifier in the elative case – an element that had already been rejected in the 17th century as a German influence. The only feature distinguishing Moravian books from other works was the plural form wellitse ‘brethren’. In earlier South Estonian sources, other forms had been used, the most recent being welle.
Kristiina Ross, Annika Viht
Verena Blaschke, Miriam Winkler, Constantin Förster et al.
Although Germany has a diverse landscape of dialects, they are underrepresented in current automatic speech recognition (ASR) research. To enable studies of how robust models are towards dialectal variation, we present Betthupferl, an evaluation dataset containing four hours of read speech in three dialect groups spoken in Southeast Germany (Franconian, Bavarian, Alemannic), and half an hour of Standard German speech. We provide both dialectal and Standard German transcriptions, and analyze the linguistic differences between them. We benchmark several multilingual state-of-the-art ASR models on speech translation into Standard German, and find differences between how much the output resembles the dialectal vs. standardized transcriptions. Qualitative error analyses of the best ASR model reveal that it sometimes normalizes grammatical differences, but often stays closer to the dialectal constructions.
Anastasiia Ryko
The dialects discussed in this article were considered Belarusian in the early 20th century, and later, as a result of the transfer of the administrative (state) border, they became part of the Russian territory and were considered Russian. The changes occurring in these dialects as a result of the influence of the standard Russian language are interesting from various perspectives. Firstly, the linguistic self-identification of dialect speakers changes and the perception of their dialect as less prestigious compared to the standard language is formed. Secondly, linguistic features that dialectologists previously defined as characteristic of the Belarusian language are being replaced by standard Russian ones. By analyzing the linguistic data obtained from the dialect speakers of different generations, we can trace the emergence of variation and then its loss. Observing which linguistic features are subject to change first, and which remain more stable, allows us to examine linguistic changes through the lens of the “hierarchy of borrowings” theory. Additionally, given the linguistic inequality between the dialect and the standard language, we can observe the gradual transformation of the dialect under the influence of the prestigious standard idiom. Therefore, the loss of Belarusian–Russian variation can be viewed as a process of dedialectization, bringing the dialect closer to the standard language.
Montserrat Recalde, Mauro Fernández
The gheada and the seseo are the two pronunciations most stigmatised by the top-down standardising tradition of Galician from the mid-19th century. Social stereotypes of peasantry, ignorance, and vulgarity were built on them. Nowadays, those stereotypes are the basis for indexical pointing. These pronunciations were outlawed from schools in the past. Today, despite having been considered standard by The Royal Galician Academy since 1982, they are almost absent from the classrooms, including those of Galician language and literature. This situation is detrimental to the linguistic capital of its users as compared to that of standard speakers. Nonetheless, since the end of the 20th century, there has been a social resignification of the gheada and seseo, symbolically used to express authenticity, ethnolinguistic adherence, and/or socio-political and cultural resistance. Currently, the emergence of vernacular language ideologies (VLIs) counterbalances the weight of standard language ideologies (SLIs) on these phenomena. This article analyses the linguistic attitudes of a sample of young people towards these two dialectal varieties as opposed to the standard pronunciations. It also identifies the indexical associations of contrasting varieties and their evolution over time. For this purpose, the matched-guise technique in combination with semantic differential scales (SDSs) has been applied. The results show that whereas standard pronunciations index social success, dialectal pronunciations index solidarity. However, while the standard indexical values are very stable, a rise in dialectal ratings is observed over fifteen years, which means an improvement of the attitudes towards them. As in other European minority languages, this phenomenon indicates a process of value levelling of the linguistic varieties and the growing weight of the VLIs in late modernity in Galicia.
Danila Rygovskiy
This article explores ways in which women navigate their agency within the conservative religious context of Russian Old Belief. Specifically, it examines four closely situated congregations in Kasepää, Suur-Kolkja, Varnja (Pomortsy), and Väike-Kolkja (Fedoseevtsy) in Peipsimaa, based on fieldwork conducted between 2020 and 2021. In the Old Belief tradition, women are barred from leadership roles or preaching; however, they often assume duties traditionally reserved for men. Furthermore, Old Believer communities in Estonia, which tend to have a higher proportion of women than men, rely heavily on women to uphold religious practices. Women’s agency within Old Believer communities does not primarily involve gaining more religious knowledge or higher status. The demographic composition of a religious community is shaped by external economic, political, and social factors. Women, who often lead congregations due to their familiarity with religious tradition and service capabilities, face additional challenges in navigating their religious practices, such as restrictions on reading the Gospel at services or baptizing children. Importantly, the Old Believers’ “culture of exceptions” does not entail flouting essential religious rules; rather, it seeks solutions that are consistent with ritual semantics and acceptable within their religious framework.
Märt Väljataga
This essay explores the emergence and evolution of a literary and artistic trend in Soviet Estonia from the late 1970s to the early 1990s. During this period, young philologists, poets, artists and essayists re-discovered the decadence of the fin-de-siècle and its Estonian expressions as a significant source of inspiration. Generally, in the official Soviet jargon, ‘decadence’ was a highly derogatory term, used during Stalin’s rule to stigmatize all of Western bourgeois culture. Consequently, patriotic scholars, even in the face of easing circumstances, were hesitant to associate early 20th century artists with decadence, as that would have meant condemning them. By the late 1970s, the atmosphere had liberalized enough to make engaging with the motifs and attitudes of decadence less perilous. This shift also provided a means to counter the activism of the 1960s generation, whether loyal to the authorities or dissident. In 1978, Germanist Linnar Priimägi marked the initial steps of the neo-decadence trend with the theoretical manifesto “Decadence as a Cognitive Constant” and the generational manifesto “Tartu Autumn”, co-written with art historian Ants Juske. The former text associated decadence with the appreciation of dispassionate beauty, while the latter expressed refined indolence as the main characteristic of the young generation. References to the decadents of the early 20th century became common among the younger generation of poets, including Doris Kareva, Aado Lintrop, Indrek Hirv, Ilmar Trull, and Hasso Krull. This was accompanied by the rehabilitation of Estonian and Russian decadence in academic literary studies. The emergence of the neo-decadence trend may be attributed to late-Soviet social fatigue and stagnation, the generational desire to distinguish from the dominant 1960s generation, and the growing influence of postmodernism as a departure from the international constructivist and austere style of high modernism. Contemporary criticism occasionally discussed signs of Stoicism, Skepticism, and Epicureanism in culture, sometimes drawing parallels between the emerging postmodernism and Hellenistic imperial culture.
Mari Sarv , Kati Kallio, Maciej Michał Janicki
The article introduces the joint Finnic runosong database and associated web environments and applications developed collaboratively by computer scientists and folklorists from Finland and Estonia. These tools facilitate new approaches to analyzing the extensive dataset. Within the research framework, various computational solutions have been devised in order to identify and associate with one another similar verses and texts that differ in orthography, language, and content. These methods have also been implemented in the web environment Runoregi (runoregi.rahtiapp.fi), allowing researchers and enthusiasts interested in traditional oral poetry to easily navigate the network of variant verses, motifs and texts, and to compare various texts and their elements. Additionally, there is a web application for maps and other visualizations integrated with the database and Runoregi environment. While Runoregi serves as a valuable tool for the close reading and comparison of texts, obtaining an overview of large amounts of texts (the database currently contains over 280,000 texts) remains a challenge. We address this issue through an examination of the frequently contaminated song types “Searching for the Comb” and “Sword from the Sea”. Given that not all texts in the database are consistently typologized by folklorists, our sample includes texts identified by means of similarity calculations as similar to those sorted under the types under consideration. We computed adjacency scores for verse clusters obtained as a result of clustering verses by their similarity scores using the Chinese whispers method, presenting the results as a network graph (with verse clusters as nodes and adjacency scores as edges). The groups appearing in the network reflect regional plot developments and elements. Despite sharing plotlines and even poetic formulas, a clear divide emerged between Northern and Southern Finnic texts. As our verse similarity calculations may not capture linguistically distant variants, we manually consolidated variants of the same verse (with identical root composition of content words) across different dialects and languages. By applying adjacency computation and network visualisation, the graph now represents the general Finnic plot with the main alternative developments. The graph also highlights the stabler cross-Finnic verse types associated with significant plot turns.
Kaia Sisask
In the early 20th century, the influx of foreign literary movements into Estonia was largely facilitated through print media. Periodicals predominantly favoured concise and captivating stories. Between 1900 and 1939, a significant number of translations of Guy de Maupassant’s short stories were published due to their alignment with these criteria. These translated stories later found their way into collections. Maupassant’s short stories can be interpreted as both realistic and decadent, with both perspectives finding representation in the Estonian reception. Nevertheless, the prevailing view depicts Maupassant as an eccentric writer whose mental illness is reflected in his writings. Themes such as neurosis, hysteria, hypnotism, intoxicants and misogyny associate Maupassant with decadent culture. The press does not shy away from portraying his loose lifestyle, battle with syphilis, suicide attempt, and other scandalous episodes. Since the turn of the century, a multitude of Maupassant’s stories about extramarital affairs as well as horror tales with clear ties to decadence have been translated. However, even characters in Maupassant’s war and peasant stories can be interpreted as degenerate. Maupassant’s reception provides insight into the era’s perception of artists and writers in a broader sense: genius is often associated with mental illness and substance abuse. In prefaces to short story collections and reviews published in newspapers, Maupassant is sometimes hailed as a great realist, a discerning student of human psychology, and a master of style. Johannes Aavik introduces language innovation in translating Maupassant, while Tuglas views Maupassant as an exemplar on the path towards more refined realism.
Ott Heinapuu
"Conflict and violence in narratives about sacred natural sites". This article discusses the role and function of violent motifs and folktales found in the place-lore of Estonian sacred natural sites, such as holy groves, and sacred stones and bodies of water, from an ecosemiotic point of view. Drawing comparisons with Estonian archival material, the study also considers Ancient Greek and Saami place-related narratives as examples of premodern discourse on supernatural sites. Building on the theories of Philippe Descola (2022 [2005]), Eduardo Kohn (2013), Bruno Latour (2014 [1991]), and Yuri Lotman (1999), sacred natural sites are viewed as blurry and porous border zones between nature and culture, the natural and the supernatural domains, which can thus function as key points of communication between these spheres. Narratives of violent conflict between supernatural creatures that are said to have taken place in a sacred natural site reinforce the sanctity of these places, highlighting their significance as crucial nodes in the complex network of relationships between humans and supernatural creatures. In cautionary tales explicating the taboos and prohibitions related to sacred natural sites, violent motifs often serve as consequences for violating these interdictions. Thus, these tales instruct the audience on the nature of the rules governing relationships between humans and supernatural creatures, helping humans to prevent overt conflict. The article suggests that structural analysis, following the approach of Alan Dundes’s “The Morphology of North American Indian Folktales” (1964), may help make sense of the fragmentary narratives and motifs often found in recorded archaic place-lore and, in combination with the comparative method, may enhance the understanding of the context surrounding the recorded fragments.
Basel Mousi, Nadir Durrani, Fatema Ahmad et al.
Arabic, with its rich diversity of dialects, remains significantly underrepresented in Large Language Models, particularly in dialectal variations. We address this gap by introducing seven synthetic datasets in dialects alongside Modern Standard Arabic (MSA), created using Machine Translation (MT) combined with human post-editing. We present AraDiCE, a benchmark for Arabic Dialect and Cultural Evaluation. We evaluate LLMs on dialect comprehension and generation, focusing specifically on low-resource Arabic dialects. Additionally, we introduce the first-ever fine-grained benchmark designed to evaluate cultural awareness across the Gulf, Egypt, and Levant regions, providing a novel dimension to LLM evaluation. Our findings demonstrate that while Arabic-specific models like Jais and AceGPT outperform multilingual models on dialectal tasks, significant challenges persist in dialect identification, generation, and translation. This work contributes $\approx$45K post-edited samples, a cultural benchmark, and highlights the importance of tailored training to improve LLM performance in capturing the nuances of diverse Arabic dialects and cultural contexts. We have released the dialectal translation models and benchmarks developed in this study (https://huggingface.co/datasets/QCRI/AraDiCE).
Samantha Link
Recent work found a correspondence between consonant clustering probability in monosyllabic lexemes and the three vowel types, short and long monophthong and diphthong, in German dialects. Furthermore, that correspondence was found to be bound to a North–South divide. This paper explores the preferences in consonant clustering of particular vowels by analyzing the PhonD2-Corpus, a large database of phonotactic and morphological information. The clustering probability of the diphthongs is positively correlated with frequency while the other vowels showed particular preferences that are not positively correlated with frequency. However, all of them are determined by a threefold pattern: short monophthongs prefer coda clusters, diphthongs onset clusters and long monophthong are balanced. Furthermore, it was found that this threefold pattern seems to have evolved from an originally twofold pattern (short monophthong prefers coda clusters and long monophthong and diphthong prefer onset clusters) in Middle High and Low German. This result is then further considered under the aspect of the compensation of the syllable weight and moraicity. Furthermore, some interesting parallels with the syllable vs. word-language typology framework are noted.
Halaman 8 dari 39200