Hasil untuk "Lexicography"

Menampilkan 20 dari ~38710 hasil · dari DOAJ, CrossRef, Semantic Scholar

JSON API
DOAJ Open Access 2025
Datasets for South African Languages: Bilingual Aligned and Monolingual Data for Machine Translation

Tanja Gaustad, Cindy A. McKellar, Martin J. Puttkammer

This data paper describes machine translation datasets built for the Autshumato project. The datasets contain both bilingual aligned data between English and all other official written languages of South Africa, namely Afrikaans (ISO 639-3: afr), isiNdebele (nbl), isiXhosa (xho), isiZulu (zul), Sepedi (nso), Sesotho (sot), Setswana (tsn), Siswati (ssw), Tshivenḓa (ven) and Xitsonga (tso), as well as monolingual data for all 11 languages. The content was sourced from existing and commissioned translations, various publications, and web-crawling of government sites. The present article describes the collection, alignment and cleanup processes that were used to create these resources. It also gives a detailed overview of the amount and provenance of the data included in the final datasets for all languages. Although the datasets were created primarily for the training of statistical and neural machine translation systems, they can also be used for other natural language processing tasks or linguistic research, such as term extraction or lexicography.

History of scholarship and learning. The humanities, Language and Literature
DOAJ Open Access 2025
C-BERT: A Mongolian reverse dictionary based on fused lexical semantic clustering and BERT

Amuguleng Wang, Yilagui Qi, Dahu Baiyila

A reverse dictionary is an electronic dictionary that accepts user-provided natural language descriptions and returns semantically matching lexicons. Despite substantial research achievements in Mongolian lexicography, discussions on Mongolian reverse dictionaries have not yet emerged. To address this, we propose an innovative model, C-BERT, combining advanced lexical semantic clustering and BERT classification technology. Initially, the K-means algorithm was used to cluster preprocessed entries from well-known Mongolian dictionaries into 5000 clusters, forming a comprehensive training set. We then optimized this training set’s data distribution through random negative sampling and fine-tuned the CINO-large model, leading to the creation of the C-BERT model. When users submit descriptions, C-BERT matches them with the central words of 5000 clusters, selecting the top 125 clusters. It then matches target words within these clusters to recommend the top 100 semantically relevant candidates. Compared to the seven baseline models, C-BERT demonstrates superior performance, particularly when evaluated on datasets with human-generated descriptions, where its synonym accuracy@10/100 reaches 16.5% and 71%, respectively. Benefiting from clustering, C-BERT improves inference speed more than tenfold, significantly enhancing its practical utility. Accordingly, we have developed a user-friendly online application platform based on C-BERT for a broad range of users, available at http://mrdp.net/.

Engineering (General). Civil engineering (General)
DOAJ Open Access 2024
Enhancing conceptualisations of information behaviour contexts through insights from research on e-dictionaries and e-lexicography

Theo JD Bothma, Ina Fourie

Introduction. Extensive conceptualisations of context in information behaviour research do not extend to all contexts revealed in the use of electronic (e)-dictionary and similar e-sources. Information behaviour emphasises users’ contexts and their situations. As intermediaries, examples of using e-dictionaries reveal additional contexts. E-dictionary research and lexicographical insight add additional conceptualisations of information behaviour contexts. Method. Conceptual paper drawing on literature reviews of research on e-dictionaries and conceptualisations of information behaviour contexts, and an exemplar approach to e-dictionary use. Analysis. The literature and dictionary use examples are analysed through an information behaviour lens with added lexicographic insight. Results. Conceptualisations of context in information behaviour research strongly focus on the user (e.g. the need, problem) and specific situations in such contexts, sometimes extending to temporality and spatiality. Information retrieval literature also notes the context of the person who created information and of an intermediary (person or system). Three contexts are evident from e-dictionary use and lexicography: user, information intermediary (dictionary), and word, phrase or text (information source). These contexts might influence information behaviour. Conclusion. The use of e-dictionaries and similar intermediaries, observed with lexicographic insight, can enhance conceptualisations of context in information behaviour, which is of value in the use of information sources and information evaluation.

Bibliography. Library science. Information resources
DOAJ Open Access 2023
The multifaceted lexis of Kerstner’s Gruntovčani

Josip Lasić, Željka Macan, Marina Marinković

A particular community is, among other things, also determined by its speech (Pon 2009), and the most conspicuous indicator of the distinctiveness of a given speech, as well as an important linguistic means of national integration and differentiation, is its lexis (Pranjković 2007, 488). The lexis of certain Croatian speeches was shaped by many external influences, namely through the »entry« of lexemes from other, mostly neighbouring languages and linguistic systems (Lisac 2003, 31). This paper analyses the lexis of the cult Croatian TV series Gruntovčani, created by Mladen Kerstner, based on the scripts of the first five episodes: Božja vola (The Will of God Will), Jelen (Deer), Babica su nakanili hmreti (Granny Has Decided to Die), Zlatna jajca (Golden Eggs), and Ovce idu (There Go the Sheep). The speech in question belongs to Kajkavian, primarily the author’s native speech of Ludbreg, interspersed with numerous foreign dialectal elements. The aim of this study is to present the multifaceted nature of the lexis in the TV series Gruntovčani, which is analysed from several perspectives: dialectological, sociocultural, and semantic-pragmatic. The main focus is on the three lexical layers of Kerstner’s text: inherited Slavic lexical layer, loanwords, and internationalisms. The three main findings are as follows: 1. lexemes of Slavic origin greatly correspond to those from Chakavian and Shtokavian as well as Slovenian, and a large number of neologisms and words characteristic of Kajkavian are also observed; 2. there are many loanwords, especially Germanisms; 3. internationalisms, words that have entered Kajkavian primarily due to social and political circumstances of the time, are observed in smaller measure. All of this suggests that the lexis of Gruntovčani is multifaceted, just like today’s local speeches, and that the sources used indicate the desire for language, as a television medium and in accordance with the time and place of the plot, to faithfully and authentically represent reality.

DOAJ Open Access 2023
Category membership and category potential: The case of vague because

Martin Konvička

I analyse the English causal connector because as used in the so-called because X constructions in terms of its word class membership. I show that three types of because can be distinguished based on syntactic criteria. Depending on the complement of the connector, because can be used as a conjunction, as a preposition, and as a member of a third category. I also show that prior to use, we can only talk about an abstract category potential of the connector, but not about its concrete category membership. It is only through the context of use that becomes category membership apparent. On a more general note, this means that linguistic categories should be understood as being emergent from language use, not essentialistically as existing before and being independent of language use.

DOAJ Open Access 2023
Compendiums of knowledge associated with linguistic terminology and linguists: history, classification features, and genre polyfunctionality

Mykola Stepanenko

The author explores the historical development of Ukrainian linguistic terminology within the framework of national terminography, lexicography, encyclopediography. The article reviews and analyses the academic references encomprassing 1) nationally specific and borrowed terms as well as concepts in traditional and emerging linguistics branches (e.g., the Dictionary of Linguistic Terms by Ye. Krotevych and N. Rodzevytch; Dictionary of Linguistic Terms by D. Hanych and I. Oliinyk; Dictionary of Modern Linguistics: Concepts and Terms; Modern Linguistic Dictionary by A. Zahnitko; Ukrainian. Concise Dictionary of Linguistic Terms by S. Yermolenko, S. Bybyk, and O. Todor; Modern Linguistics: Terminological Encyclopedia by O. Selivanova; The Ukrainian Language. Encyclopaedia; etc.), and 2) personalities of Ukrainian linguists (e.g., Ukrainian Grammar in Names by A. Zahnitko, M. Balko; Nizhyn Linguistics by N. Boiko, S. Zinchenko, A. Kaidash; etc.). The author systemizes and classifies encyclopedic works based on different criteria in the classical way (according to nature of information: domain, subject-specific, biographical, personal works; according to target audience: professional linguistics, student philologists, applicants, pupils; according regional focus of linguistic conceptions: the Nizhyn region, the Poltava region; according to article structure: alphabetical, alphabetical-and-clustered), as well as in the new way (syncretism of linguistic and encyclopedic genres: subject-specific linguistic-and-encyclopedic-and-biographic works, domain regional-and-biographic ones, subject-specific regional-and-biographic ones). The universal and specific principles of forming the definitional part of both linguistic-encyclopedic and encyclopedic articles include interpretation by an author, macro- and mini-discursive cross-references, hyperlinks, scholarly inter-texts, novelty, debatable issues personal and bibliographic remarks, and global linguistic experience.

DOAJ Open Access 2021
Dahl – a Man and a Dictionary (for the 220th Anniversary of the Birth)

The author reflects on the fate of the most popular dictionary in Russia, written by the lexicographer, ethnographer and folklorist V. I. Dahl, analyzing the principles of building this “encyclopedia of Russian folk life of the first half of the 19th century” (V. V. Vinogradov). Contemporaries critically evaluated the dictionary, finding the word family method of organizing the material inconvenient, looking for factual errors and being outraged by the use of original Russian words instead of borrowings, sometimes constructed by Dahl himself. Admitting some drawbacks, it is necessary to emphasize that the organization of the material is not accidental and is caused by a polemic with the normative and dialect (regional) dictionaries that were published at that time, in which the word was presented in isolation from its realization in language and speech. The key word in the characterization of the language for Dahl was the word alive, and it is precisely on the representation of the living connections of words in the language that the principles of dictionary compilation “work”: word family method of presentation, a visual demonstration of the paradigmatic and syntagmatic connections of the word within the framework of one dictionary entry, combination of the vocabulary of all forms of the existence of the national language in the dictionary. Modern lexicographers, as a rule, avoid implementing these principles, focusing on the creation of reference dictionaries, in which words are presented in alphabetical order, supplemented with a scientific definition and identification of grammatical and stylistic markers. It is only examples illustrating the meanings of the words that have supplemented the content of the modern dictionary entry in comparison with the dictionaries of the 19th century. There are practically no followers of Dahl in lexicography. Mentioned as steps on the way to follow Dahl is Solzhenitsyn’s “Russian Dictionary of the Language Extension” and the Sictionary of dialects of Russian first settlers of Siberia near Lake Baikal by G. V. Afanasyeva-Medvedeva, in which illustrative examples successfully solve the problem of demonstrating the life of the word in speech and the language system. Meanwhile, a return to the principles of a talented lexicographer can be very productive.

Philology. Linguistics
DOAJ Open Access 2021
Prokhor Kolomiatin’s Turkic Dictio­nary among the Narrative Monuments from the fourteenth to seventeenth centuries Related to Crimea

Kozintcev M.A, Savelieva N.V.

Research objectives: To analyze the genre-typological and stylistic peculiarities of the narrative parts that accompany the actual dictionary entries of the Turkic-Russian dictionary, and thus to add a new source to the group of narrative monuments from the fourteenth to seventeenth centuries centuries which pertain to Crimea. Research materials: The Turkic-Russian dictionary (“Kniga Elihv”) included in the manuscript miscellany (“Tsvetnik”) that was compiled by the hieromonk, Prokhor Kolomiatin, in 1668. The manuscript is kept in the collection of the State Historical Museum (Muzeyskoe sobr., No. 2803). Results and novelty of the research: The Turkic-Russian dictionary included in Prokhor Kolomiatin’s miscellany is one of the earliest examples of a Turkic lexicography in the Cyrillic tradition. Along with the records of lexemes and word collocations, it contains lengthy narratives concerning religion, geography, and ethnography of Crimea. The nature of the information provided suggests that the author of the dictionary was living in Crimea for some time, most likely as a prisoner, although having a certain privileged status. Having little opportunity to travel outside the peninsula, he received knowledge, including information about other countries, from verbal communication with the local inhabitants made up of different national and social groups. Analysis of the content of the narrative material allows us to state that the text has its own degree of originality, although it naturally finds thematic and genre parallels with the well-known medieval narratives concerning Crimea.

Auxiliary sciences of history, History of Civilization
DOAJ Open Access 2020
The Actualisation of the L'viv Dialect in the Works of Yurii Vynnychuk

Liudmyla Pidkuĭmukha

The Actualisation of the L'viv Dialect in the Works of Yurii Vynnychuk The article offers an analysis of the actualisation of Ukrainian vocabulary, primarily in the Galician dialect. Yurii Vynnychuk's novels Tsenzor sniv (Eng. Dream Censor) and Tango smerti (Eng. The Tango of Death) have been chosen as sources for this research. Particular attention is paid to vocabulary for naming abstract notions, some polite words, and vocabulary relating to L'viv domestic life. Furthermore, the semantic structure and stylistic features of the actualised L'viv dialect are identified. The results indicate that the vocabulary in Yurii Vynnychuk's novels was used to describe life in L'viv during the interwar period. In addition, an analysis of modern dictionaries demonstrates that Galician vocabulary, and the L'viv dialect in particular, are returning to the core of the lexical system.   Aktualizacja lwowskiego koine w powieściach Jurija Vynnychuka Artykuł poświęcony jest aktualizacji słownictwa galicyjskiego. Materiałem badawczym są powieści Jurija Vynnychuka Cenzor sniv (2013) i Tango smerti (2016), na których podstawie autor charakteryzuje główne grupy lwowskiego koine, wyróżniając słownictwo abstrakcyjne, domowe, indywidualne wzory etykiety i wiele innych. Określa funkcje zaktualizowanego słownictwa w tekstach literackich oraz analizuje reprezentację jednostek leksykalnych we współczesnych słownikach języka ukraińskiego. Ponadto w artykule dokonana została analiza osobliwości opisu używanych skrótów słownikowych badanych leksemów.

Computational linguistics. Natural language processing, Semantics
DOAJ Open Access 2020
Wspomnienie. Elżbieta Kędelska (16 marca 1949 – 10 listopada 2014)

Arleta Łuczak

In Memoriam: Elżbieta Kędelska (16 March 1949 – 10 November 2014)  Throughout her professional life, Elżbieta Kędelska (1949–2014) was associated with the Institute of Slavic Studies, Polish Academy of Sciences. She made a substantial contribution to research on the history of Polish dictionary-making. Her work was focused on Polish and Slavic linguistics, mainly the history of Latin-Polish and Latin-Czech lexicography, comparative lexicography and issues of sixteenth-century Polish lexis. She also transcribed the 1532 and 1544 Latin-Polish manuscripts by Bartholomeus de Bydgostia – the most outstanding Polish lexicographer of the first half of the sixteenth century, and co-authored the recent reversed Polish-Latin edition of Bartholemeus’ dictionary.   Wspomnienie. Elżbieta Kędelska (16 marca 1949 – 10 listopada 2014)  Elżbieta Kędelska (1949–2014) przez całe swoje życie zawodowe była związana z Instytutem Slawistyki Polskiej Akademii Nauk. Wniosła znaczący wkład w rozwój badań nad historią polskiego słownikarstwa. Skupiała się na językoznawstwie polskim i słowiańskim: głównie historii łacińsko-polskiej i łacińsko-czeskiej leksykografii, komparatystyce leksykograficznej oraz zagadnieniach XVI-wiecznej leksyki polskiej. Odczytywała łacińsko-polskie rękopisy z lat 1532 i 1544 autorstwa Bartłomieja z Bydgoszczy – najwybitniejszego leksykografa polskiego pierwszej połowy XVI wieku. Jest współautorką obecnej edycji Słownika Bartłomieja w odwróconej, polsko-łacińskiej wersji językowej.

Philology. Linguistics, Slavic languages. Baltic languages. Albanian languages
DOAJ Open Access 2019
Languages and cultures in contact. The place of new speakers in the education system in Upper Lusatia

Nicole Dołowy-Rybińska, Cordula Ratajczak

Languages and cultures in contact. The place of new speakers in the education system in Upper Lusatia Upper Sorbs are a Slavic minority group living in eastern Germany. The number of Upper Sorbian speakers is diminishing. Upper Sorbs, the majority of whom are Catholics, have a strong ethnic identity based on language, faith, and tradition and they form a rather closed community in relation to the surrounding German population. To counteract the process of language loss, the Sorbs have established an educational project called ‘Witaj’. The continuation of this project is the `2 plus’ program of bilingual education in schools, which has been implemented by the federal state of Saxony. The idea behind these initiatives is to connect native Upper Sorbian speakers and learners in order to facilitate the achievement of language competence and to break down existing ethnic boundaries. The realisation of this concept has encountered numerous problems. The German-speaking pupils involved often feel unmotivated to learn Sorbian and are often rejected by the Sorbian-speaking community as (potential) members. This article presents the results of a research project examining the way young people from German-speaking homes who attend one of the Upper Sorbian middle schools acquire Sorbian language competence and how they create an identity in relation/opposition to their Sorbian-speaking peers. The analysis is based on the sociolinguistic observations of language practices conducted in the school in 2017 and on interviews with both native speakers and learners of Upper Sorbian. The article focuses on the following issues: relations between language practices, the necessary conditions for the active use of minority languages by learners, language and interpersonal contact, the acceptance of new speakers, and the creation of ‘communities of practice’.   Języki i kultury w kontakcie. Miejsce nowomówców w systemie edukacji na Górnych Łużycach Łużyczanie są słowiańską mniejszością zamieszkującą we wschodniej części Niemiec. Liczba osób posługujących się językami łużyckimi, w tym językiem górnołużyckim, stale maleje. Górnołużyczanie, w przeważającej mierze katolicy, tworzą raczej zamkniętą wspólnotę, opartą o silną tożsamość etniczną, język, wiarę i tradycje. Aby przeciwdziałać procesowi utraty języka, Łużyczanie stworzyli program „Witaj”, którego kontynuacją był stworzony przez land Saksonii program dwujęzycznego nauczania „2 plus”. Celem tych programów było połączenie rodzimych użytkowników łużyckiego oraz dzieci uczących się go, aby umożliwić tym ostatnim zdobywanie kompetencji językowych i przełamać istniejące granice etniczne. Realizacja tego pomysłu napotyka jednak liczne problemy. Uczniowie niemieckojęzyczni często czują się niezmotywowani do uczenia się górnołużyckiego, często są też odrzucani przez łużyckojęzyczną wspólnotę jako (potencjalni) użytkownicy łużyckiego. Artykuł przedstawia wyniki badań dotyczących sposobu, w jaki młodzi ludzie z niemieckojęzycznych domów chodzący do jednej z górnołużyckich szkół średnich, zdobywają kompetencje językowe i jak tworzą swoją tożsamość w relacji lub opozycji do ich łużyckojęzycznych kolegów. Analiza oparta jest o socjolingwistyczne obserwacje praktyk językowych, prowadzonych w szkole w 2017 roku, i na wywiadach z uczniami górnołużycko- i niemieckojęzycznymi. Tekst koncentruje się na relacji między praktykami językowymi, warunkami niezbędnymi do tego, żeby język mniejszościowy mógł być aktywnie używany przez uczących się i tworzeniem relacji między ludźmi, poczucia akceptacji oraz wspólnot praktyki.

Computational linguistics. Natural language processing, Semantics
DOAJ Open Access 2019
Theoretical and Practical Reflections on Specialized Lexicography in African Languages

Dion Nkomo

In this article, reflections are made on some specialized lexicographical/terminographical resources being produced in African languages. The resources are produced in order to contribute towards the intellectualization of those languages for expanded functional usage. The article focuses on lemma selection, provision of data/information for included lemmata and structural aspects of the surveyed resources. With regard to the first area of focus, the article identifies the lack of a systematic approach to lemma selection, which undermines the potential of the resources as communicative and cognitive tools in specialized subject fields and disciplines. Secondly, regarding the provision of data categories, instances of insufficient information and cases of inclusion of irrelevant information are identified, both of which have implications for the functional value of the resources within specialized domains. Finally, reflections on aspects of dictionary structure indicate sub-standard structural designs which affect the user-friendliness of the resources, but some innovative structural designs are also identified. Overall, the article argues for a stronger lexicographic orientation in terms of the theoretical underpinnings guiding the production of specialized lexicographical/terminographical resources in African languages.

Philology. Linguistics, Languages and literature of Eastern Asia, Africa, Oceania
DOAJ Open Access 2018
LOS «CUADROS CULTURALES» EN LOS DICCIONARIOS ITALIANO-ESPAÑOL ACTUALES

Cesáreo Calvo Rigual

Bilingual dictionaries pay more and more attention to the information about the elements that are intimately linked to the culture of one of the two languages in question. This is why, in the absence of an immediate equivalence in the target language, they constitute a challenge for the lexicographer. Thus, bilingual translation dictionaries find it difficult to offer possible equivalents. The bilingual dictionaries that also want to help the decoding of such lexical units offer more extensive information. They do so either through paraphrases that are in the microstructure of the entries, or through what we have called ‘cultural tables’, encyclopaedic texts that complement certain entries and that have not been offered in bilingual dictionaries up until recently. The cultural tables provided by the three bilingual Italian-Spanish dictionaries –that include them in their microstructure (Garzanti, Herder and Zanichelli)– are studied. The analysis is both quantitative (total number, number per field) and qualitative (which ones are preferred by each dictionary, what type of information is offered). Several of these cultural tables are analysed in detail. We conclude that the selection criteria of the cultural-bound elements as well as the information offered on them are too heterogeneous and often unclear.

Language and Literature
DOAJ Open Access 2017
PLEASE MEET POETONYMOLOGY

Valeriy Mikhaylovich Kalinkin

The postulates of the poetonym’s meaning specifics, which fix principal difference in onym semantics of fiction work from proper name in language semantics are considered. The correlation of poetonyms with virtual, verbal (about material) referents is considered. It is founded that in “semantic triangular” of poetonym instead of referent (denotation) in general sense there is a virtual referent (denotation). Besides, this virtual denotation is applied not to the class of objects but to one object, marked as unique feature from some kind of class and that is why it is right to name it a referent. The entirety of poetonymosphere is proved defining the most important features of each separate poetonym. It is filled by force lines of poetonymosphere, it is under influence and influences the features of total poetonyms. The notion “Name’s Philology” is proved. The attention is paid to the use of notion “Name’s Philology” which was built in the XX century as a symbiosis of onomastics and semiotics in the creative work of some groups of scientists in significant and humanitarian centers, the main of which was Tatru-Moscow Semiotic School in the USSR. The special direction of lexicography - poetonymography forming theory and improving practice of dictionaries creation of proper names of fiction works is improved.

Philology. Linguistics
S2 Open Access 2014
An Approximation Algorithm for the Facility Location Problem with Lexicographic Minimax Objective

L. Buzna, M. Koháni, J. Janáček

We present a new approximation algorithm to the discrete facility location problem providing solutions that are close to the lexicographic minimax optimum. The lexicographic minimax optimum is a concept that allows to find equitable location of facilities serving a large number of customers. The algorithm is independent of general purpose solvers and instead uses algorithms originally designed to solve the -median problem. By numerical experiments, we demonstrate that our algorithm allows increasing the size of solvable problems and provides high-quality solutions. The algorithm found an optimal solution for all tested instances where we could compare the results with the exact algorithm.

18 sitasi en Mathematics, Computer Science
DOAJ Open Access 2014
Rare, obscure and marginal affixes in English

Laurie Bauer

English is full of words with recurrent sequences of segments which may or may not be morphologically interpretable. In this paper I look at some which fall on either side the cut-off point between morphology and non-morphology, considering what it is that makes something an affix and how we might decide in awkward cases.

Halaman 45 dari 1936