Johannes Friedrich Diehl, Desiree Zecha, David Bindrim
Hasil untuk "Greek philology and language"
Menampilkan 20 dari ~1926 hasil · dari DOAJ, arXiv, Semantic Scholar
Adisai Na-Thalang, Chanakan Wittayasakpan, Kritsadha Phatcharoen et al.
This paper introduces the development of the first open conversational speech dataset for the Isan language, the most widely spoken regional dialect in Thailand. Unlike existing speech corpora that are primarily based on read or scripted speech, this dataset consists of natural speech, thereby capturing authentic linguistic phenomena such as colloquials, spontaneous prosody, disfluencies, and frequent code-switching with central Thai. A key challenge in building this resource lies in the lack of a standardized orthography for Isan. Current writing practices vary considerably, due to the different lexical tones between Thai and Isan. This variability complicates the design of transcription guidelines and poses questions regarding consistency, usability, and linguistic authenticity. To address these issues, we establish practical transcription protocols that balance the need for representational accuracy with the requirements of computational processing. By releasing this dataset as an open resource, we aim to contribute to inclusive AI development, support research on underrepresented languages, and provide a basis for addressing the linguistic and technical challenges inherent in modeling conversational speech.
M. Burkovskyi
The article provides a philosophical review of the mythological concepts of German antiquarians, in particular, G F. Creuzer, K. O. Müller, J. H. Voss and other mythological scholars of the late 18th - early 19th centuries, with an emphasis on their contribution to the formation of European mythological thought and the methodology of the science of myth in the 19th century. The analytical basis consists primarily in a direct analysis of the primary sources of the mentioned authors, as well as representatives of the romantic school, who are a reflection of the romantic worldview, in the light of which myth appears as a principle of interpreting culture, symbolism, as the basis of cultural genesis (with the desire to reveal the "prehistoric beginning" of culture and the common sources of myth). In this vein, Creuzer formulates the theory of the symbol as an "embodied idea" and tries to reconstruct myth as a primary symbolic language that passes between cultures in artifacts. Müller and Voss oppose this from different perspectives: the first defends the localism of myths and the autochthonous nature of Greek mythology as evidence of historical and national development, rejecting unambiguous similarities with the East; the other openly criticizes Creuzer's symbolism, advocating a rational-historical approach, denying the very possibility of transferring myths between cultures. The affinities and differences between the concepts of myth as a holistic cultural phenomenon are analyzed: for Creuzer, myth is a symbolic form of the wisdom of humanity, which requires decoding through the images and materials of ancient artifacts; for Müller, it is the local mythology of peoples, which reveals early forms of culture and migration histories; for Voss, it is an anti-symbolic criticism of romanticism and the rejection of the union of myth and symbol in a single theory. Hegel acts as a critic of the romantic approach, but recognizes the value of myth as a stage of development of the spirit, necessary for further cognition. The article emphasizes that there are contradictions between the two approaches - philosophical-romantic and historical-scientific, which determined the directions of early mythography and the formation of methodological criteria for the study of ancient mythology in European science; it is argued that the discussions between Müller, Kreutzer and Voss, as well as Hegel's positions, became decisive for the formation of mythology as a scientific discipline: they outlined the boundaries between symbolism and historical-critical analysis, identified two main schools - romantic-philosophical and concrete-scientific - and influenced the further evolution of philology, interpretative models of myth and, in general, the orientation of European mythological thought in the 19th century. Such well-founded conclusions actualize the significance of the Ukrainian research discourse on German classical studies and its influence on the world mythological tradition, emphasizing the need for further deepening the study of the heritage of the aforementioned authors and their contemporaries in order to understand modern methodology in the study of mythology and intercultural interactions in European philosophy.
Елена Владимировна Приходько
В святилище Матери богов Вегины и речного бога Эвримедонта в Зиндан Магарасы в 1972 г. К. Бриш нашел и скопировал надпись с алфавитным оракулом. Это святилище было расположено в северной Писидии на землях города Тимбриады. Подробный рассказ об истории его изучения и разбор найденных там надписей был представлен в первой части данной работы. Надпись с алфавитным оракулом была опубликована в 1988 г., но уже С. Митчелл и Д. Кайа, изучавшие руины святилища в 1982 г., этой надписи не видели. Нет информации о ней и во всех более поздних публикациях, излагающих результаты археологических раскопок, проведенных в святилище в 2002–2005 гг. В статье высказывается предположение, что блок с надписью был уничтожен при прокладке водного туннеля в 1977–1982 гг. Текст алфавитного оракула из Тимбриад полностью отличался от основной группы алфавитных оракулов. Во второй части работы автор проводит детальный лексический и синтаксический анализ 12 изречений этого оракула. Каждый стих был составлен из обычных употребительных слов, но их синтаксическое соединение во многих случаях нарушало нормы синтаксиса древнегреческого языка. Это свидетельствует о том, что поэт из Тимбриад не был носителем древнегреческого языка. Все эти синтаксические неологизмы не были составлены сознательно, а возникли как результат незнания определенных правил сочетания слов и калькирования конструкций писидийского языка. Это доказывает, что алфавитные оракулы были созданы в той среде, где древнегреческий язык еще окончательно не вытеснил местный анатолийский язык и в соединении с местами обнаружения надписей с алфавитными оракулами указывает на Писидию как на родину этого вида мантики. In 1972, Cl. Brixhe discovered and copied an inscription containing an alphabetical oracle at the sanctuary of the Meter Theon Veginos and the river god Eurymedon at Zindan Mağarası. This sanctuary was located in northern Pisidia, within the territory of the city of Timbriada. A detailed account of its study and the interpretation of the inscriptions found there was presented in the first part of this paper. The inscription with the alphabetical oracle was published in 1988. However, S. Mitchell and D. Kaya, who examined the sanctuary ruins in 1982, had not encountered this inscription. Furthermore, no mention of it appears in later publications presenting the results of archaeological excavations conducted at the sanctuary between 2002 and 2005. This article suggests that the inscribed block was likely destroyed during the excavation of a water tunnel between 1977 and 1982. The text of this alphabetical oracle differed significantly from the main group of alphabetical oracles. In the second part of this paper, the author conducts an extensive lexical and syntactic analysis of 12 prophecies of this oracle. Each verse was composed of common words, but their syntactic arrangement in many cases violated the norms of syntax of the Ancient Greek language. This indicates that the Timbriada’s poet was not a native speaker of Ancient Greek. All these syntactic neologisms were not created consciously, but emerged as a result of ignorance of certain word-combination rules and calques of Pisidian language structures. This proves that alphabetical oracles originated in an environment where Ancient Greek had not yet fully replaced the local Anatolian language. Combined with the findspots of alphabetical oracle inscriptions, this points to Pisidia as the place of origin of this type of divination.
Rudolf Henneböhl
Zhihan Cao, Hiroaki Yamada, Simone Teufel et al.
Recently, much work has concerned itself with the enigma of what exactly pretrained language models~(PLMs) learn about different aspects of language, and how they learn it. One stream of this type of research investigates the knowledge that PLMs have about semantic relations. However, many aspects of semantic relations were left unexplored. Generally, only one relation has been considered, namely hypernymy. Furthermore, previous work did not measure humans' performance on the same task as that performed by the PLMs. This means that at this point in time, there is only an incomplete view of the extent of these models' semantic relation knowledge. To address this gap, we introduce a comprehensive evaluation framework covering five relations beyond hypernymy, namely hyponymy, holonymy, meronymy, antonymy, and synonymy. We use five metrics (two newly introduced here) for recently untreated aspects of semantic relation knowledge, namely soundness, completeness, symmetry, prototypicality, and distinguishability. Using these, we can fairly compare humans and models on the same task. Our extensive experiments involve six PLMs, four masked and two causal language models. The results reveal a significant knowledge gap between humans and models for all semantic relations. In general, causal language models, despite their wide use, do not always perform significantly better than masked language models. Antonymy is the outlier relation where all models perform reasonably well. The evaluation materials can be found at https://github.com/hancules/ProbeResponses.
Wondimagegnhue Tsegaye Tufa, Ilia Markov, Piek Vossen
Social media conversations frequently suffer from toxicity, creating significant issues for users, moderators, and entire communities. Events in the real world, like elections or conflicts, can initiate and escalate toxic behavior online. Our study investigates how real-world events influence the origin and spread of toxicity in online discussions across various languages and regions. We gathered Reddit data comprising 4.5 million comments from 31 thousand posts in six different languages (Dutch, English, German, Arabic, Turkish and Spanish). We target fifteen major social and political world events that occurred between 2020 and 2023. We observe significant variations in toxicity, negative sentiment, and emotion expressions across different events and language communities, showing that toxicity is a complex phenomenon in which many different factors interact and still need to be investigated. We will release the data for further research along with our code.
Aishwarya Mirashi, Srushti Sonavane, Purva Lingayat et al.
In this work, we introduce L3Cube-IndicNews, a multilingual text classification corpus aimed at curating a high-quality dataset for Indian regional languages, with a specific focus on news headlines and articles. We have centered our work on 10 prominent Indic languages, including Hindi, Bengali, Marathi, Telugu, Tamil, Gujarati, Kannada, Odia, Malayalam, and Punjabi. Each of these news datasets comprises 10 or more classes of news articles. L3Cube-IndicNews offers 3 distinct datasets tailored to handle different document lengths that are classified as: Short Headlines Classification (SHC) dataset containing the news headline and news category, Long Document Classification (LDC) dataset containing the whole news article and the news category, and Long Paragraph Classification (LPC) containing sub-articles of the news and the news category. We maintain consistent labeling across all 3 datasets for in-depth length-based analysis. We evaluate each of these Indic language datasets using 4 different models including monolingual BERT, multilingual Indic Sentence BERT (IndicSBERT), and IndicBERT. This research contributes significantly to expanding the pool of available text classification datasets and also makes it possible to develop topic classification models for Indian regional languages. This also serves as an excellent resource for cross-lingual analysis owing to the high overlap of labels among languages. The datasets and models are shared publicly at https://github.com/l3cube-pune/indic-nlp
Sil Hamilton
No two authors write alike. Personal flourishes invoked in written narratives, from lexicon to rhetorical devices, imply a particular author--what literary theorists label the implied or virtual author; distinct from the real author or narrator of a text. Early large language models trained on unfiltered training sets drawn from a variety of discordant sources yielded incoherent personalities, problematic for conversational tasks but proving useful for sampling literature from multiple perspectives. Successes in alignment research in recent years have allowed researchers to impose subjectively consistent personae on language models via instruction tuning and reinforcement learning from human feedback (RLHF), but whether aligned models retain the ability to model an arbitrary virtual author has received little scrutiny. By studying 4,374 stories sampled from three OpenAI language models, we show successive versions of GPT-3 suffer from increasing degrees of "mode collapse" whereby overfitting the model during alignment constrains it from generalizing over authorship: models suffering from mode collapse become unable to assume a multiplicity of perspectives. Our method and results are significant for researchers seeking to employ language models in sociological simulations.
Liam Lonergan, Ibon Saratxaga, John Sloan et al.
This paper sets out the first web-based transcription system for the Irish language - Fotheidil, a system that utilises speech-related AI technologies as part of the ABAIR initiative. The system includes both off-the-shelf pre-trained voice activity detection and speaker diarisation models and models trained specifically for Irish automatic speech recognition and capitalisation and punctuation restoration. Semi-supervised learning is explored to improve the acoustic model of a modular TDNN-HMM ASR system, yielding substantial improvements for out-of-domain test sets and dialects that are underrepresented in the supervised training set. A novel approach to capitalisation and punctuation restoration involving sequence-to-sequence models is compared with the conventional approach using a classification model. Experimental results show here also substantial improvements in performance. The system will be made freely available for public use, and represents an important resource to researchers and others who transcribe Irish language materials. Human-corrected transcriptions will be collected and included in the training dataset as the system is used, which should lead to incremental improvements to the ASR model in a cyclical, community-driven fashion.
Sina Bagheri Nezhad, Ameeta Agrawal, Rhitabrat Pokharel
Multilingual language models (MLLMs) are crucial for handling text across various languages, yet they often show performance disparities due to differences in resource availability and linguistic characteristics. While the impact of pre-train data percentage and model size on performance is well-known, our study reveals additional critical factors that significantly influence MLLM effectiveness. Analyzing a wide range of features, including geographical, linguistic, and resource-related aspects, we focus on the SIB-200 dataset for classification and the Flores-200 dataset for machine translation, using regression models and SHAP values across 204 languages. Our findings identify token similarity and country similarity as pivotal factors, alongside pre-train data and model size, in enhancing model performance. Token similarity facilitates cross-lingual transfer, while country similarity highlights the importance of shared cultural and linguistic contexts. These insights offer valuable guidance for developing more equitable and effective multilingual language models, particularly for underrepresented languages.
F. Pompeo
This paper examines two Greek inscriptions written in Babylon during the Parthian period with specific focus on the sections containing double dating formulas. Unfortunately, both inscriptions – particularly the first – are damaged. This paper shows that strong evidence in favor of one of the text reconstructions proposed by scholars can be obtained by means of an analysis combiningthe methods and tools of philology and historical (socio)linguistics. Evidence from other languages is also considered.
Jörg Rüpke, Sofia Bianchi Mancini
Epics are the oldest written long texts in many languages. They claim to recount fundamental events and seek to give validity to their version through formal composition. This volume provides an overview of Greek and Latin epic poetry from Homer to Late Antiquity. But above all it asks: How were they made audible? Who wanted to read or listen to them? And how did this change the texts? The professionalisation of such great works also led to competition, outbidding, but also to parody or condensation of the texts. This book is the first to provide a broad and coherent overview from such a perspective. Jörg Rüpke was Professor of Classical Philology at the University of Potsdam from 1995 to 1999 and Professor of Comparative Religious Studies at the University of Erfurt from 1999 to 2008. Since 2008 he has been Fellow of Religious Studies at the Max Weber Centre in Erfurt. Sofia Bianchi Mancini studied Classics at the University of Wales Trinity St David and completed her doctorate at the University of Erfurt in 2021. Since 2021 she is a postdoctoral researcher, working on 'Divine Property: Late Antiquity and Medieval Solutions' at the Max Weber Centre in Erfurt.
L. Vorobiova
Objective. The objective of the article is to identify periodization of the formation of English in the context of the borrowing process; to analyze the historical and lexical and aspects of the borrowings integration in the context of the English language formation. Methods. The main scientific results are obtained using the historical method of theoretical generalization, which makes it possible to determine the nature of borrowings’ periodization; comparative - to compare historical phenomena, events and facts of the socio-cultural life and to establish similarities and differences in the adaptation of the borrowings at different stages of their integration into English. Results. The theoretical analysis of the nature of the borrowings makes it possible to identify the periods of possible periodization that enables effective intercultural studies in the fields of linguistics, philology, terminology. Interpretation and analysis of the genesis of possible periods will lead to successful management of the educational process for philology and history students. Four periods are identified. The first period is characterized with the migration of Germanic tribes of Angles, Saxons, and Jutes to Britain begins. During this time, there was a strong influence of Latin and Scandinavian languages due to the conquests. The second period is marked by significant changes in grammar, simplification of cases and the disappearance of many endings. The third period can be marked by the Great Vowel Shift, a massive phonetic restructuring of English sounds. This period is associated with the growing influence of the Renaissance and printing, which contributed to the standardization of the English language. The fourth period is characterized by the stabilization of the language structure and its vocabulary. In addition to the process of borrowing, modern English is characterised by another wave of vocabulary enrichment caused by three main factors: the unprecedented growth of scientific vocabulary and the emergence of the American version of English. References: Verba, L. (2006). Istoriia anhliiskoi movy [History of the English language]. Vinnytsya, Nova Knyha Publ., 296 p. Amiot, (2004). Haut degré et préfixation. In F. Lefeuvre & M. Noailly (eds.), Intensité, Comparaison, Degré. Travaux linguistiques du Cerlico, no. 17, pp. 91‒104 Anttila, R. (1989). Historical and Comparative Linguistics. Amsterdam, ohn Benjamins , 370 p. Baugh, (1978). History of the English language. Th. Cable. London, Pearson Education Publ., 398 p. Berndt, (1989). A history of the English language. Leipzig, Verlag Enzyklopedie Publ., 240 p. Bortone, (2010). Greek prepositions: from antiquity to the present. Oxford, OxfordUniversity Press Publ., 380 p. Crystal, (2004). The Stories of English. London, Penguin Publ., 400 p. Hoffer, L. (2005). Language Borrowing and the Indices of Adaptability and Receptivity. Intercultural Communication Studies, no. XIV: 2, pp. 53‒72 Hoffer, L. (2002). Language Borrowing and Language Diffusion: an Overview. Intercultural Communication Studies, no. XI-2, pp. 1‒36 Haugen, (1950). The analysis of linguistic borrowing. Language. no. 26.2, pp. 211‒ 231 Jespersen, (1946). Growth and Structure of the English Language. New York, Doubleday & Anchor Publ., 376 p. Matras, , & Sakel J. (2007). Introduction. Grammatical Borrowing in Cross-Linguistic Perspective. Berlin & New York, Mouton de Gruyter Publ., 220 p. Oxford English dictionary. Available at: https://www.oed.com
Madaminova Rayhon Maratovna
This article discusses one of the components of onomastics, phytomymics (phyto-Greek. Phyton-plant), and their usage in Hofiz Khorezmi’s "Devon." The study analyzes the skillful use of plant names by the poet for artistic purposes and the creation of artistic devices. Until now, the phytomymics in this work had not been the subject of study. During the analysis, the plant names used in Hofiz Khorezmi’s "Devon" are examined, categorized into semantic groups, and their variants—Arabic, Persian, and Turkic—are identified. Furthermore, the meanings these words convey in "Devon" are compared with how they were used in the works of Beruni, Mahmud Kashgari's "Devonu lug'otit turk," Alisher Navoi, and Babur. The forms of plant names used in the Uzbek literary language and how they are explained in the Uzbek explanatory dictionary are also analyzed. Examples are provided to explain phytomymics whose meanings have expanded today. The article compares how these phytomymics are explained in B.V. Miller's “Persian-Russian Dictionary,” the “Tajik-Russian Dictionary,” and the "Explanatory Dictionary of the Uzbek Language," conducting an etymological analysis. Additionally, the poetic devices created through these phytomymics are explained with examples. Each phytonym is given a specific definition. For example, "Isiriq" is a perennial wild plant from the family Zygophyllaceae. It contains alkaloids and is used in folk medicine for its healing properties, including being burned as incense. In modern Uzbek, the word "isiriq" has served as the root for words like "isiriqdon" (incense burner) and "isiriqchi" (incense seller). The word "isiriq" was borrowed from Persian into Old Uzbek. In "Devon," this borrowed word appears seven times in the sense of "isiriq" and "isiriq seed." This information enriches the article. The study holds significant value in examining the lexicon of the work. The materials researched in the article will be useful in creating the “Dictionary of the Language of Hofiz Khorezmi’s Devon," the "Dictionary of Borrowed Words in Uzbek," and the "Dictionary of Written Monuments," as well as in teaching specialized courses to students in the philology faculties of higher educational institutions.
Andrea Trisciuoglio, AT
Attraverso l’analisi di alcuni brani della Pro Plancio di Cicerone, dove si richiamano le specifiche modalità di selezione dei giudici alla luce della lex Licinia del 55 a.C. sul crimen sodaliciorum, si conferma la tesi di Fascione, per la quale i giuristi di età classica trassero spunto dalla riflessione teorica sulla fraus legi sviluppata in margine alla legislazione pubblicistica di epoca repubblicana.
Adrian de Wynter, Xun Wang, Alex Sokolov et al.
We present an empirical evaluation of various outputs generated by nine of the most widely-available large language models (LLMs). Our analysis is done with off-the-shelf, readily-available tools. We find a correlation between percentage of memorized text, percentage of unique text, and overall output quality, when measured with respect to output pathologies such as counterfactual and logically-flawed statements, and general failures like not staying on topic. Overall, 80.0% of the outputs evaluated contained memorized data, but outputs containing the most memorized content were also more likely to be considered of high quality. We discuss and evaluate mitigation strategies, showing that, in the models evaluated, the rate of memorized text being output is reduced. We conclude with a discussion on potential implications around what it means to learn, to memorize, and to evaluate quality text.
Federica Lazzerini, FL
Recensione
Vallarino, Giulio
The paper provides the first edition of three vase inscriptions, brought to light in 1976 from the so called ‘Santuario della Sorgente’, in the Greek site of Saturo. The texts, dating from 6th to 5th c. BC, are all related to local cults: two of them are dedicated to the Basilìs, a local goddess attested by other dedications, while the third is devoted to the cult of an anonymous goddess. The latter inscription also presents the verb ἀποδίδωμι, in a formula rarely attested elsewhere at this age. The cults practice witnessed by these new documents shows some similarities between the site of Saturo and inland Messapian sanctuaries.
Feilong Chen, Duzhen Zhang, Minglun Han et al.
In the past few years, the emergence of pre-training models has brought uni-modal fields such as computer vision (CV) and natural language processing (NLP) to a new era. Substantial works have shown they are beneficial for downstream uni-modal tasks and avoid training a new model from scratch. So can such pre-trained models be applied to multi-modal tasks? Researchers have explored this problem and made significant progress. This paper surveys recent advances and new frontiers in vision-language pre-training (VLP), including image-text and video-text pre-training. To give readers a better overall grasp of VLP, we first review its recent advances from five aspects: feature extraction, model architecture, pre-training objectives, pre-training datasets, and downstream tasks. Then, we summarize the specific VLP models in detail. Finally, we discuss the new frontiers in VLP. To the best of our knowledge, this is the first survey focused on VLP. We hope that this survey can shed light on future research in the VLP field.
Halaman 12 dari 97