Hasil untuk "Lexicography"

Menampilkan 20 dari ~30511 hasil · dari DOAJ, Semantic Scholar

JSON API
S2 Open Access 2010
Quantitative Analysis of Culture Using Millions of Digitized Books

Erez Lieberman Aiden, Jean-Baptiste Michel

Linguistic and cultural changes are revealed through the analyses of words appearing in books. We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of ‘culturomics,’ focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. Culturomics extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.

2827 sitasi en History, Art
DOAJ Open Access 2025
Keelekorpus kui leksikograafi abiline kõnekeelsuse tuvastamisel

Lydia Risberg, Maria Tuulik, Margit Langemets et al.

Using corpus data to support lexicographers in identifying informal language This study examines how new corpus analysis tools can assist lexicographers in determining whether to assign a word an informal register label in a dictionary. Labelling words in dictionaries is necessary for language users seeking register information. Moreover, there have been calls for the upcoming Dictionary of Standard Estonian (DSE, 2025) to clearly distinguish standard language from other linguistic varieties. Informal language was chosen for analysis because it is more difficult to define than other marked registers. In DSE 2018, some words were labelled as informal based on language planning decisions rather than empirical analysis. As register labels should be data-driven and based on corpus evidence, a systematic review of these words is necessary for the revised edition. Our study investigates how corpus genre data can support lexicographers in deciding whether to add or remove the informal label. We found that corpus data provided useful insights in 82.1% of cases. Based on our experiment, we developed a guideline to assist in labelling word meanings as informal. Namely, if a word occurs in blogs and forums in 36% or more of its total corpus occurrences, it may be considered as tending towards informal usage. This guideline is not a rigid rule but a supportive tool, as additional factors should be considered based on the lexicographer’s linguistic expertise. Users value reliable linguistic information in dictionaries. Our proposed guideline helps lexicographers make more systematic decisions while maintaining expert judgment as the ultimate determinant.

Other Finnic languages and dialects
DOAJ Open Access 2025
Dynamics of Russian language internet discourse: peculiar representation of lexis in dictionaries

Ekaterina S. Astapkina, Alexander A. Barkovich

The study presents linguistic description of the dynamics of Internet discourse. Nowadays, Internet discourse is not only individual speech practice, but also global communication. Linguists are naturally interested in the issues of the Internet, studies devoted to the emergence and consolidation of the corresponding linguistic means, including ones used in the Russian language, are certainly relevant. In this regard, practically relevant tasks are to consider the sufficiency of modern Russian vocabulary in the context of Internet communication and to identify its corresponding potential for further development of the Russian lexicography. The aim of this study is to analyze the lexicographic reflection of linguistic units associated with the Internet and the trends in their dictionary representation under modern conditions. The empirical possibilities of both this study and linguistic papers devoted to IT issues in general are provided by statistically significant data of modern discourse, in particular, data available through corpus methods and Internet search. To achieve the goal of the study, the representative material available in Russian National Corpus was used. In accordance with the topic of the study, the analysis was focused on the dictionaries of the Russian language: the ones interpreting it, for example, the “Modern Explanatory Dictionary of the Russian Language” edited by T.F. Efremova, and the ones regulating real speech practice, for example, the “Russian Spelling Dictionary’ edited by V.V. Lopatin and O.E. Ivanova. The study used a set of analytical methods, including discourse analysis, component analysis, as well as synthesis, modeling, comparison, and other general scientific and linguistic methods. In general, the authors conclude that the Russian Internet discourse is highly dynamic. At the same time, the model of the ongoing changes shows that lexicographic sources reflect the dynamics of the language incompletely and selectively. The results obtained can be used in organizing scientific support for the modern Russian communication, optimizing lexicographic work, and studying a wide range of theoretical and practical issues of linguistics.

Slavic languages. Baltic languages. Albanian languages
DOAJ Open Access 2025
The spatial representation of the Croatian Encyclopedia of Technology: from idea to fruition

Jasmina Tolj Smolčić

Highly informative content, objectivity, credibility, and the organisation of knowledge are just some of the characteristics that make encyclopedias and online encyclopedic projects reliable works that provide users with efficient access to information related to their area of interest. As valuable sources for acquiring new knowledge, it is essential that they keep pace with today’s websites not only in terms of quality, but also in presentation. One form of presentation is spatial representation, which should be further explored in the encyclopedic context. This practice is already present in various projects, and such a new way of presenting online encyclopedic content would allow for more effective access to information and improved navigation when retrieving content. In encyclopedistics, spatial indicators, as the basis of spatial representation, are not in systematic use, and their full searching and browsing options are not fully enabled. Associating spatial indicators with encyclopedic content would make the creation of searchable and browsable virtual cartographic representations of encyclopedic knowledge possible. To this end, the content of the Croatian Technical Encyclopedia was used for a qualitative analysis of its articles, through which the types of data with spatial attributes typical for each article category were determined, followed by the identification of which types of data should be recorded for each category. The research established that article categories are indeed the key to creating a future model for spatial tagging and representation of encyclopedic knowledge. Based on the creation of unified spatial indicators (geotags) for individual article categories, a set of metadata was proposed, and a model for the spatial representation of encyclopedic knowledge was developed. Based on this model, the Atlas of Croatian Technology Heritage was created as a spatial representation of the knowledge found in the Croatian Encyclopedia of Technology.

DOAJ Open Access 2023
Words of the school: sex, viciniore (and viciniorità)

Yorick Gomez Gane

This paper examines a set of words used in the Italian school environment: sex (Latin numeral often used in school reports as a substitute for sei ‘six’ so that the evaluation cannot be altered), viciniore (indicating an institution or municipality falling within the same province) and its derivative viciniorità (with the anomalous variant viciniorietà). A presentation of the data concerning the chronology and diffusion of these words is followed by an investigation into their origins, semantic implications, relationships with Italian lexicography and finally, in the case of competing variants (such as viciniorità and viciniorietà), the degree of compliance with the rules of word formation in Italian.

Theory and practice of education, Romanic languages
DOAJ Open Access 2021
The Cross-Cultural Understanding of Metaphors in the Information Technology Sphere

Natalia Mykhalchuk, Svitozara Bihunova, Alla Fridrikh et al.

The Cross-Cultural Understanding of Metaphors in the Information Technology Sphere This paper analyses recent changes in cross-cultural communication concerning metaphor use and its functioning on the Internet, specifically in the information technology sphere. The paper outlines the academic literature and proposes a study that aims to evaluate users’ perception of IT metaphors. The study analyses reports and articles for IT users. The articles were profiled according to country and language, with a detailed analysis of English and Ukrainian examples. The paper reviews the relation between IT metaphors and their cognition, introducing a new conceptualization “A Computer as a Human Being”. The research seeks to provide evidence for the claim that understanding metaphors facilitates cross-cultural communication, whether universal or culturally-bounded. The results show the growing scale of the creation of new metaphors due to cross-cultural communication, especially in the IT sphere, and the importance of the cognitive functions of metaphors in a culturally and linguistically diverse environment.   Międzykulturowe rozumienie metafor w sferze technologii informacyjnej Autorki analizują najnowsze zmiany w komunikacji międzykulturowej w zakresie użycia metafor oraz ich funkcjonowania w Internecie w zawężeniu do sfery technologii informacyjnych. Po przedstawieniu stanu badań zaproponowały analizy, których celem jest ocena postrzegania metafor informatycznych przez użytkowników. W badaniu przeanalizowano raporty i artykuły przeznaczone dla użytkowników IT. Autorki sprofilowały artykuły według kraju i języka. Następnie przeprowadziły szczegółową analizę przykładów angielskich i ukraińskich. Dokonały przeglądu relacji pomiędzy metaforami IT a ich poznaniem, wprowadzając nową konceptualizację: "Komputer jako istota ludzka". Przeprowadzone badania mają na celu dostarczenie dowodów na tezę, że rozumienie metafor ułatwia komunikację międzykulturową, zarówno uniwersalną, jak i ograniczoną kulturowo. Wyniki wskazują na rosnącą skalę tworzenia nowych metafor w wyniku komunikacji międzykulturowej, zwłaszcza w sferze IT, oraz na znaczenie funkcji poznawczych metafor w środowisku zróżnicowanym kulturowo i językowo.

Computational linguistics. Natural language processing, Semantics
DOAJ Open Access 2021
Describing Subjective Infinitive Constructions in Lexicographic Sources (on the Example of the Udmurt)

Sergey A. Maksimov

The work is devoted to the actual problem of modern Udmurt lexicography – the issues of lexicography of combinations of a subject with an infinitive. The research material was composed of dictionaries of the Udmurt language. The work is based on a descriptive method. The purpose of the work is to study the transmission in Udmurt dictionaries of verb combinations expressing emotional and mental states and physiological phenomena in the form of infinitive combinations «subject + infinitive» and to suggest acceptable ways of their design. Udmurt lexicography has come a long way since its inception and achieved certain success. However, due to the lack of continuous work in this area, many problems remain unresolved. One of these problems is the registration in dictionaries of combinations associated with the expression of emotional and mental states and physiological phenomena. In living speech, such constructions often consist of a grammatical subject (yyr ‘head’, kӧt ‘belly’, lul ‘soul; breath’, vir ‘blood’, etc.) and a conjugated form of the verb, for example: yyr kur lue ‘me angry’, kӧt kurekte ‘I’m sad’, vir pote ‘blood flows, oozes’. In the Soviet period, the tradition of presenting verbs in dictionary entries in the form of an infinitive (affix -ny) was established in the Udmurt lexicography. Along with the publication of new dictionaries, the number of structures of the «subject + infinitive» type gradually began to increase, although the subject cannot enter into a syntactic connection with the infinitive, and such structures are not found in living speech. The paper describes possible ways to solve the problem under study, while for different groups of structures, slightly different solution models are proposed.

DOAJ Open Access 2020
Literary Journalism of Shanul Haq Haqqi

Irfan Shah

Shanul Haq Haqqee (15th September, 1917 - 11th October, 2005) was a versatile genius having multi-dimensional vision and knowledge. He was at a time great linguist, lexicographer, researcher, scholar, critic, translator, biographer, fiction writer, an acknowledged great poet, a story writer for children, humorist, copywriter and a publicist. His respected father Moulvi Ahteshamuddin was also a great lexicographer, poet, researcher and writer of great eminence. Shanul Haq Haqqee's creative works is found in various forms of Urdu literature in which he has shown his creativity and exhibited his new experiences in them. In his works his approach and vision had various angles and innovations. His writings in respect of credibility and correctness of language attained status of a certificate of perfection. His special interest in lexicography represents his command on the subject and theme. During his honorary appointment in Urdu Development Board, Karachi, he served for many noble services which includes his lexicographic work on Urdu dictionary and issuance of quarterly "Urdu Nama". He also remained associated as Chief Editor Urdu Monthly "Mah-e-Nau" Karachi.

Language. Linguistic theory. Comparative grammar, Computational linguistics. Natural language processing
DOAJ Open Access 2019
Extracting features from text to improve statistical machine translation

Александр Павлович Молчанов

In this paper we investigate the technique of extending the Moses Statistical Machine Translation (SMT) system default set of features using shallow linguistic information from source and target phrases. Although a typical SMT system uses a phrase table with 5 default features, most systems are scalable and support any number of additional features. We assume that linguistic information extracted from the source and target phrases can improve the overall translation quality, i. e. make the system more robust and reduce the number of instances of incorrect word choice, punctuation mistakes and other problems SMT systems are prone to. First, we build a baseline SMT system. Then we extract shallow linguistic features directly from source and target phrases of the baseline system’s phrase table. The features are precomputed and stored in the phrase table, so they can be regarded as stateless dense features. We develop and examine 19 features incorporating information from source and target phrases. We explore features commonly used in monolingual and parallel data filtering techniques. The features we investigate include source and target phrase lengths, word, number and punctuation symbol count, word frequencies according to large monolingual corpora etc. For each feature, we build and evaluate a separate SMT system. We conduct a series of experiments on the English-Russian language pair and obtain statistically significant improvements of up to 0.4 BLEU compared to baseline configuration.

Philology. Linguistics
DOAJ Open Access 2019
Ukrainian, Polish and Russian trilingualism among Ukrainians of non-Polish origin living in Poland

Pavlo Levchuk

Ukrainian, Polish and Russian trilingualism among Ukrainians of non-Polish origin living in Poland This article examines the socio-linguistic situation of Ukrainian migrants who live in Poland but who do not have Polish origins. After presenting the issue and describing the group in question, the article then describes the locations where the languages are used, the personae of the interlocutors, and their emotional attitudes towards each of the languages.   Trójjęzyczność ukraińsko-rosyjsko-polska Ukraińców niepolskiego pochodzenia mieszkających w Polsce W niniejszym artykule została przedstawiona sytuacja socjolingwistyczna migrantów z Ukrainy, którzy nie mają polskiego pochodzenia. Po zarysie problematyki i opisie przedstawionej grupy zostały przedstawione miejsca posługiwania się językami, osoby interlokutorów oraz stosunek emocjonalny do każdego ze znanych respondentom języków.

Computational linguistics. Natural language processing, Semantics
DOAJ Open Access 2018
Pregled općeg političkog stanja u Kraljevini Srba, Hrvata i Slovenaca, kasnije Kraljevini Jugoslaviji

Ivana Žebec Šilj

Povodom stote obljetnice ujedinjenja Države Slovenaca, Srba i Hrvata s Kraljevinom Srbijom u radu se prikazuju politička situacija i odnosi tijekom međuratnoga razdoblja u Kraljevini SHS i Kraljevini Jugoslaviji. Rad panoramski prikazuje jugoslavensku međuratnu povijest, a budući da je promatrano razdoblje »gusto« važnim događajima, pozornost se usmjerava na nekoliko tzv. Žarišnih točka. Kronološki, rad započinje s Prvim svjetskim ratom i završava slomom Kraljevine Jugoslavije.

Lexicography
DOAJ Open Access 2017
Mahmud Kashgari - the founder of areal linguistics

Uldar Keldibekovna Isabekova

Mahmud Kashgaria’s work - “The dictionary of Turkic languages” (“Divan lugat at-tyurk”) is the complex work of comparative-historical linguistics, a lexicography, anthropological linguistics, areal linguistics, a linguistic culture, dialectology. In article the historical questions are considered and its communication with “The map of the Turkic world of the XI century”, the content of “Divanu lugat at-tyurk” in its close coorelation with the “Map” is discovered which is the annex to dictionary reveals. The author of article notes that Mahmud Kashgaria has revealed interrelation of dialectic features of Turkic tribes with their geographical habitat and also language similarity of Turkic tribes. In the conclusion are told that in the history of world linguistics “Divanu lugat at-tyurk” is the first work on areal linguistics.

Language. Linguistic theory. Comparative grammar, Semantics
DOAJ Open Access 2015
From Print to Digital: Implications for Dictionary Policy and Lexicographic Conventions

Michael Rundell

Editorial policies and lexicographic conventions have evolved over hundreds of years. They developed at a time when dictionaries were printed books of finite dimensions — as they have been for almost the whole of their history. In many cases, styles which we take for granted as "natural" features of dictionaries are in reality expedients designed to compress maxi-mum information into the limited space available. A simple example is the kind of "recursive" definition found in many English dictionaries where a nominalization (such as assimilation) is defined in terms of the related verb ("the act of assimilating or state of being assimilated"), and the user is required to make a second look-up (to the base word). Is this an ideal solution, or was it favoured simply as a less space-intensive alternative to a self-sufficient explanation? As dictionaries gradually migrate from print to digital media, space constraints disappear. Some problems simply evaporate. To give a trivial example, the need for abbreviations, tildes and the like no longer exists (though a surprising number of dictionaries maintain these conventions even in their digital versions). So the question arises whether we need to revisit, and re-evaluate, the entire range of editorial policies and conventions in the light of changed circumstances. This paper looks at some familiar editorial and presentational conventions, and considers which are no longer appropriate in the digital medium — and what new policies might replace them.

Philology. Linguistics, Languages and literature of Eastern Asia, Africa, Oceania

Halaman 4 dari 1526