In-Context Learning in Speech Language Models: Analyzing the Role of Acoustic Features, Linguistic Structure, and Induction Heads
Charlotte Pouw, Hosein Mohebbi, Afra Alishahi
et al.
In-Context Learning (ICL) has been extensively studied in text-only Language Models, but remains largely unexplored in the speech domain. Here, we investigate how linguistic and acoustic features affect ICL in Speech Language Models. We focus on the Text-to-Speech (TTS) task, which allows us to analyze ICL from two angles: (1) how accurately the model infers the task from the demonstrations (i.e., generating the correct spoken content), and (2) to what extent the model mimics the acoustic characteristics of the demonstration speech in its output. We find that speaking rate strongly affects ICL performance and is also mimicked in the output, whereas pitch range and intensity have little impact on performance and are not consistently reproduced. Finally, we investigate the role of induction heads in speech-based ICL and show that these heads play a causal role: ablating the top-k induction heads completely removes the model's ICL ability, mirroring findings from text-based ICL.
Impresum
Uredništvo broja
Za nakladnika:
Filip Hameršak, Leksikografski zavod Miroslav Krleža, Zagreb
Glavni urednik:
Damir Boras, Sveučilište u Zagrebu
Zamjenica glavnoga urednika:
Nataša Jermen, Leksikografski zavod Miroslav Krleža, Zagreb
Uredništvo
Petra Bago, Filozofski fakultet Sveučilišta u Zagrebu
Ivana Crljenko, Leksikografski zavod Miroslav Krleža, Zagreb
Vlatka Dugački, Leksikografski zavod Miroslav Krleža, Zagreb
Ivana Filipović Petrović, Zavod za lingvistička istraživanja u Zagrebu, Hrvatska akademija znanosti i umjetnosti
Zdenko Jecić, Leksikografski zavod Miroslav Krleža, Zagreb
Peter Jordan, Österreichische Academie der Wissenchaften, Institute für Stadt- und Regionalforschung, Wien
Niels Elers Koch Udruga lex.dk, Kopenhagen
Veronika Lipp Institut za leksikologiju, Mađarski centar za lingvistička istraživanja, Budimpešta
Přemysl Mácha Etnološki institut Akademije znanosti Republike Češke, Brno
Nives Mikelić Preradović, Filozofski fakultet Sveučilišta u Zagrebu
Dino Mujadžević Hrvatski institut za povijest, Slavonski Brod
Slaven Ravlić, Pravni fakultet Sveučilišta u Zagrebu
Tea Rogić Musa Leksikografski zavod Miroslav Krleža, Zagreb
Hrvoje Stančić, Filozofski fakultet Sveučilišta u Zagrebu
Goran Sunajko, Filozofski fakultet Sveučilišta u Zagrebu
Valters Ščerbinskis Nacionalna enciklopedija, Nacionalna knjižnica Latvije, Riga
Danko Šipka Državno sveučilište Arizone, Tempe
Toma Tasovac Digitalna istraživačka infrastruktura za umjetnost i humanističke znanosti,
Berlin
Domagoj Vidović, Institut za hrvatski jezik i jezikoslovlje, Zagreb
Savjet časopisa:
Vlaho Bogišić, Leksikografski zavod Miroslav Krleža, Zagreb
Davor Kapetanić, Sveučilište Washington u Seattleu
Bruno Kragić, Leksikografski zavod Miroslav Krleža, Zagreb
Trpimir Macan, Leksikografski zavod Miroslav Krleža, Zagreb
Antun Vujić, Leksikografski zavod Miroslav Krleža, Zagreb
Urednica broja: Tea Rogić Musa
Izvršna urednica: Iva Klobučar Srbić
Urednica online izdanja: Cvijeta Kraus
Prijevod: Boris Blažina
Oblikovanje i prijelom: Ivo Horvat
Dizajn korica: Andrea Holenda
Izrada UDK: Dunja Marija Gabriel
Izlazi dva puta godišnje
Godina izdanja: 2023.
Tisak: Division 4 Vision
ISSN 1846–6745
e-ISSN 2459-5578
Adresa uredništva:
Leksikografski zavod Miroslav Krleža
Frankopanska 26, 10 000 Zagreb, Hrvatska
Tel: +385 1 48 00 300, Tel./Faks: +385 1 48 00 399
E-mail: studia_lexicographica@lzmk.hr
Web: http://studialexicographica.lzmk.hr
Prilozi objavljeni u časopisu Studia lexicographica referiraju se u:
ERIH PLUS
DOAJ
HRČAK
Benchmarking Vision Language Models on German Factual Data
René Peinl, Vincent Tischler
Similar to LLMs, the development of vision language models is mainly driven by English datasets and models trained in English and Chinese language, whereas support for other languages, even those considered high-resource languages such as German, remains significantly weaker. In this work we present an analysis of open-weight VLMs on factual knowledge in the German and English language. We disentangle the image-related aspects from the textual ones by analyzing accu-racy with jury-as-a-judge in both prompt languages and images from German and international contexts. We found that for celebrities and sights, VLMs struggle because they are lacking visual cognition of German image contents. For animals and plants, the tested models can often correctly identify the image contents ac-cording to the scientific name or English common name but fail in German lan-guage. Cars and supermarket products were identified equally well in English and German images across both prompt languages.
Które rośliny czynią człowieka chorym? Próba kontrastywnej analizy językowego obrazu świata w polskich i niemieckich nazwach chorób z elementem roślinnym
Piotr Aleksander Owsiński
Artykuł stanowi prezentację wyników językowo-kognitywnej analizy wybranych nazw chorób i dolegliwości zawierających element roślinny. Celem eksploracji jest próba udzielenia odpowiedzi na pytanie, czy między polskimi i niemieckimi terminami medycznymi istnieje izomorfizm pod względem utrwalonego w obu językach językowego obrazu świata, który jest charakterystyczny dla konkretnego kręgu kulturowego. Na podstawie badania można wysnuć wniosek, że większość analizowanych terminów określających chorobę lub dolegliwość wykazuje całkowitą lub częściową ekwiwalencję, która nie tylko w nauce języka obcego, lecz także w kontaktach między lekarzem i pacjentem jawi się jako istotny czynnik wspierający zarówno proces nauczania lub uczenia się języka obcego, jak i zrozumienie partnera komunikacji oraz recepcję przekazywanej treści w konkretnym kontekście sytuacyjnym.
Language. Linguistic theory. Comparative grammar
Developing Process Writing Ability in Virtual Learning Environment via (Reinforced) Metalinguistic Corrective Feedback
Maryam Naderi Farsani, پرویز علوی نیا, Mehdi Sarkhosh
<p>The current study was performed to investigate the impact of metalinguistic oral and written corrective feedback on learners’ process writing ability through virtual learning environment. To this aim, a total of 66 Iranian EFL students in Shahrekord University participated in the study. To conduct the study, a sample of IELTS expository writing (Writing Task 1) was administered to all participants for homogeneity purposes. Then, each of the two classes was divided into two parts, and each was randomly assigned to one of the four comparison groups (oral metalinguistic feedback, written metalinguistic feedback, oral metalinguistic + error logs, and written metalinguistic + error logs). Next, the writing pretest (a process writing task) was given to participants prior to instruction. The treatment lasted for eight weeks, and then process writing posttest was administered. The results revealed that all groups made progress from pretest to posttest. However, no significant difference was found among the four types of metalinguistic corrective feedback. The implications of the findings are discussed throughout the paper.</p>
Language and Literature, Language. Linguistic theory. Comparative grammar
A Critical Review of Causal Reasoning Benchmarks for Large Language Models
Linying Yang, Vik Shirvaikar, Oscar Clivio
et al.
Numerous benchmarks aim to evaluate the capabilities of Large Language Models (LLMs) for causal inference and reasoning. However, many of them can likely be solved through the retrieval of domain knowledge, questioning whether they achieve their purpose. In this review, we present a comprehensive overview of LLM benchmarks for causality. We highlight how recent benchmarks move towards a more thorough definition of causal reasoning by incorporating interventional or counterfactual reasoning. We derive a set of criteria that a useful benchmark or set of benchmarks should aim to satisfy. We hope this work will pave the way towards a general framework for the assessment of causal understanding in LLMs and the design of novel benchmarks.
Methods of Automatic Matrix Language Determination for Code-Switched Speech
Olga Iakovenko, Thomas Hain
Code-switching (CS) is the process of speakers interchanging between two or more languages which in the modern world becomes increasingly common. In order to better describe CS speech the Matrix Language Frame (MLF) theory introduces the concept of a Matrix Language, which is the language that provides the grammatical structure for a CS utterance. In this work the MLF theory was used to develop systems for Matrix Language Identity (MLID) determination. The MLID of English/Mandarin and English/Spanish CS text and speech was compared to acoustic language identity (LID), which is a typical way to identify a language in monolingual utterances. MLID predictors from audio show higher correlation with the textual principles than LID in all cases while also outperforming LID in an MLID recognition task based on F1 macro (60%) and correlation score (0.38). This novel approach has identified that non-English languages (Mandarin and Spanish) are preferred over the English language as the ML contrary to the monolingual choice of LID.
Quantifying Memorization and Detecting Training Data of Pre-trained Language Models using Japanese Newspaper
Shotaro Ishihara, Hiromu Takahashi
Dominant pre-trained language models (PLMs) have demonstrated the potential risk of memorizing and outputting the training data. While this concern has been discussed mainly in English, it is also practically important to focus on domain-specific PLMs. In this study, we pre-trained domain-specific GPT-2 models using a limited corpus of Japanese newspaper articles and evaluated their behavior. Experiments replicated the empirical finding that memorization of PLMs is related to the duplication in the training data, model size, and prompt length, in Japanese the same as in previous English studies. Furthermore, we attempted membership inference attacks, demonstrating that the training data can be detected even in Japanese, which is the same trend as in English. The study warns that domain-specific PLMs, sometimes trained with valuable private data, can ''copy and paste'' on a large scale.
Speech Analysis of Language Varieties in Italy
Moreno La Quatra, Alkis Koudounas, Elena Baralis
et al.
Italy exhibits rich linguistic diversity across its territory due to the distinct regional languages spoken in different areas. Recent advances in self-supervised learning provide new opportunities to analyze Italy's linguistic varieties using speech data alone. This includes the potential to leverage representations learned from large amounts of data to better examine nuances between closely related linguistic varieties. In this study, we focus on automatically identifying the geographic region of origin of speech samples drawn from Italy's diverse language varieties. We leverage self-supervised learning models to tackle this task and analyze differences and similarities between Italy's regional languages. In doing so, we also seek to uncover new insights into the relationships among these diverse yet closely related varieties, which may help linguists understand their interconnected evolution and regional development over time and space. To improve the discriminative ability of learned representations, we evaluate several supervised contrastive learning objectives, both as pre-training steps and additional fine-tuning objectives. Experimental evidence shows that pre-trained self-supervised models can effectively identify regions from speech recording. Additionally, incorporating contrastive objectives during fine-tuning improves classification accuracy and yields embeddings that distinctly separate regional varieties, demonstrating the value of combining self-supervised pre-training and contrastive learning for this task.
JBBQ: Japanese Bias Benchmark for Analyzing Social Biases in Large Language Models
Hitomi Yanaka, Namgi Han, Ryoma Kumon
et al.
With the development of large language models (LLMs), social biases in these LLMs have become a pressing issue. Although there are various benchmarks for social biases across languages, the extent to which Japanese LLMs exhibit social biases has not been fully investigated. In this study, we construct the Japanese Bias Benchmark dataset for Question Answering (JBBQ) based on the English bias benchmark BBQ, with analysis of social biases in Japanese LLMs. The results show that while current open Japanese LLMs with more parameters show improved accuracies on JBBQ, their bias scores increase. In addition, prompts with a warning about social biases and chain-of-thought prompting reduce the effect of biases in model outputs, but there is room for improvement in extracting the correct evidence from contexts in Japanese. Our dataset is available at https://github.com/ynklab/JBBQ_data.
A Comparative Study of Translation Bias and Accuracy in Multilingual Large Language Models for Cross-Language Claim Verification
Aryan Singhal, Veronica Shao, Gary Sun
et al.
The rise of digital misinformation has heightened interest in using multilingual Large Language Models (LLMs) for fact-checking. This study systematically evaluates translation bias and the effectiveness of LLMs for cross-lingual claim verification across 15 languages from five language families: Romance, Slavic, Turkic, Indo-Aryan, and Kartvelian. Using the XFACT dataset to assess their impact on accuracy and bias, we investigate two distinct translation methods: pre-translation and self-translation. We use mBERT's performance on the English dataset as a baseline to compare language-specific accuracies. Our findings reveal that low-resource languages exhibit significantly lower accuracy in direct inference due to underrepresentation in the training data. Furthermore, larger models demonstrate superior performance in self-translation, improving translation accuracy and reducing bias. These results highlight the need for balanced multilingual training, especially in low-resource languages, to promote equitable access to reliable fact-checking tools and minimize the risk of spreading misinformation in different linguistic contexts.
[Μαραζόπουλος, Πέτρος. Τα "Βαλκάνια" στη Νεοελληνική Κουλτούρα: όψεις της διαχείρισης ενός όρου]
Konstantinos Tsivos
History of Greece, Translating and interpreting
Effective Proxy for Human Labeling: Ensemble Disagreement Scores in Large Language Models for Industrial NLP
Wei Du, Laksh Advani, Yashmeet Gambhir
et al.
Large language models (LLMs) have demonstrated significant capability to generalize across a large number of NLP tasks. For industry applications, it is imperative to assess the performance of the LLM on unlabeled production data from time to time to validate for a real-world setting. Human labeling to assess model error requires considerable expense and time delay. Here we demonstrate that ensemble disagreement scores work well as a proxy for human labeling for language models in zero-shot, few-shot, and fine-tuned settings, per our evaluation on keyphrase extraction (KPE) task. We measure fidelity of the results by comparing to true error measured from human labeled ground truth. We contrast with the alternative of using another LLM as a source of machine labels, or silver labels. Results across various languages and domains show disagreement scores provide a better estimation of model performance with mean average error (MAE) as low as 0.4% and on average 13.8% better than using silver labels.
Zero- and Few-Shot Prompting with LLMs: A Comparative Study with Fine-tuned Models for Bangla Sentiment Analysis
Md. Arid Hasan, Shudipta Das, Afiyat Anjum
et al.
The rapid expansion of the digital world has propelled sentiment analysis into a critical tool across diverse sectors such as marketing, politics, customer service, and healthcare. While there have been significant advancements in sentiment analysis for widely spoken languages, low-resource languages, such as Bangla, remain largely under-researched due to resource constraints. Furthermore, the recent unprecedented performance of Large Language Models (LLMs) in various applications highlights the need to evaluate them in the context of low-resource languages. In this study, we present a sizeable manually annotated dataset encompassing 33,606 Bangla news tweets and Facebook comments. We also investigate zero- and few-shot in-context learning with several language models, including Flan-T5, GPT-4, and Bloomz, offering a comparative analysis against fine-tuned models. Our findings suggest that monolingual transformer-based models consistently outperform other models, even in zero and few-shot scenarios. To foster continued exploration, we intend to make this dataset and our research tools publicly available to the broader research community.
Challenges in Developing LRs for Non-Scheduled Languages: A Case of Magahi
Ritesh Kumar
Magahi is an Indo-Aryan Language, spoken mainly in the Eastern parts of India. Despite having a significant number of speakers, there has been virtually no language resource (LR) or language technology (LT) developed for the language, mainly because of its status as a non-scheduled language. The present paper describes an attempt to develop an annotated corpus of Magahi. The data is mainly taken from a couple of blogs in Magahi, some collection of stories in Magahi and the recordings of conversation in Magahi and it is annotated at the POS level using BIS tagset.
A Review of Bangla Natural Language Processing Tasks and the Utility of Transformer Models
Firoj Alam, Arid Hasan, Tanvirul Alam
et al.
Bangla -- ranked as the 6th most widely spoken language across the world (https://www.ethnologue.com/guides/ethnologue200), with 230 million native speakers -- is still considered as a low-resource language in the natural language processing (NLP) community. With three decades of research, Bangla NLP (BNLP) is still lagging behind mainly due to the scarcity of resources and the challenges that come with it. There is sparse work in different areas of BNLP; however, a thorough survey reporting previous work and recent advances is yet to be done. In this study, we first provide a review of Bangla NLP tasks, resources, and tools available to the research community; we benchmark datasets collected from various platforms for nine NLP tasks using current state-of-the-art algorithms (i.e., transformer-based models). We provide comparative results for the studied NLP tasks by comparing monolingual vs. multilingual models of varying sizes. We report our results using both individual and consolidated datasets and provide data splits for future research. We reviewed a total of 108 papers and conducted 175 sets of experiments. Our results show promising performance using transformer-based models while highlighting the trade-off with computational costs. We hope that such a comprehensive survey will motivate the community to build on and further advance the research on Bangla NLP.
Doing Natural Language Processing in A Natural Way: An NLP toolkit based on object-oriented knowledge base and multi-level grammar base
Yu Guo
We introduce an NLP toolkit based on object-oriented knowledge base and multi-level grammar base. This toolkit focuses on semantic parsing, it also has abilities to discover new knowledge and grammar automatically, new discovered knowledge and grammar will be identified by human, and will be used to update the knowledge base and grammar base. This process can be iterated many times to improve the toolkit continuously.
Guatemalan Spanish in contact: Prosody and intonation
Yolanda Congosto Martín
This study forms part of the Geoprosadic project: the Geoprosodic and Sociodialectal Study of North American Spanish. The main objective of this project is to describe and compare the prosody of three geographical areas, Los Angeles, Mexico and Guatemala, that are closely related historically, socially and linguistically due to the contact established over time and the coexistence of people and cultures. The sole aim is to examine the prosodic differences between the Spanish spoken by Latin American people living in Los Angeles (mainly Mexicans, Salvadorans and Guatemalans) and the Spanish spoken by those who have never left their native countries, and to compare whether the spatio-temporal distance of the former and their immersion in a different sociolinguistic sphere, in contact with the English language and other varieties of Spanish, have brought about differences of a geoprosodic nature. In this case, the study focuses on the Guatemalan Spanish of various speakers. From a methodological point of view, the research is linked to the international project AMPER. The analytical methods of AMPER are followed and the research is limited to the study of female intonation and sentences with a subject–verb–object (SVO) structure from corpus 1 (declaratives and absolute interrogatives). The results of this research corroborate the initial hypothesis and establish the melodic differences between both groups of speakers, particularly regarding declaratives.
Language. Linguistic theory. Comparative grammar
Predicate formation and verb-stranding ellipsis in Uzbek
Vera Gribanova
This paper investigates the interaction between head movement of the verb and ellipsis of vP (verb-stranding ellipsis, VSE) in Uzbek — an understudied Turkic language of Central Asia. I argue that Uzbek verbal predicates are formed by head movement, while non-verbal predicates are formed by a species of Local Dislocation (Embick & Noyer 2001; Embick 2003). Uzbek has two distinct ellipsis strategies that yield similar strings: argument ellipsis (AE) and VSE. VSE occurs only with (head-moved) verbs, and can elide non-verbal predicates, while AE cannot. Uzbek VSE imposes a strict identity requirement on the heads extracted from the ellipsis site (the Verbal Identity Condition (Goldberg 2005b)). Both the genuine existence of this condition, and its source, have recently come under scrutiny; this paper presents Uzbek evidence in support of the claim that the Verbal Identity Condition is genuinely present in a subset of typologically diverse languages with VSE (see Gribanova 2018b). Variable crosslinguistic behavior with respect to the Verbal Identity Condition is predicted by an independently supported view of head movement (Harizanov & Gribanova 2019) in which certain types of head movement are syntactic — yielding the potential for mismatches of extracted material, by analogy with phrasal movement (Merchant 2001) — while others are postsyntactic (yielding the Uzbek-type VSE pattern). The Uzbek investigation therefore provides crucial evidence in favor of a particular view of the crosslinguistic landscape of VSE, and moves us a step closer to explaining why head movement out of ellipsis domains varies systematically in its behavior across languages.
Language. Linguistic theory. Comparative grammar
بررسی ادراک دانشجویان از ویژگی های اخلاقی اساتید زبان با توجه به پایگاه اجتماعی-اقتصادی دانشجویان
مسعود یزدانی مقدم, محمد جواد معافی, علیرضا عامری
هدف از این تحقیق، بررسی ادراک دانشجویان زبان از ویژگی های اخلاقی اساتید زبان با توجه به پایگاه اجتماعی-اقتصادی دانشجویان بود. روش تحقیق ترکیبیِ-توضیحی در این پژوهش بکار رفت. ابتدا پرسشنامه پایگاه اجتماعی-اقتصادی با شرایط تحقیق سازگار شد تا دانشجویان را به دو گروه اجتماعی-اقتصادی بالاتر یا پایین تر طبقه بندی کند. از 81 پرسشنامه پایگاه اجتماعی-اقتصادی پخش شده در 5 کلاس زبان مختلف از 2 دانشگاه در شهر قم، 53 مورد تکمیل شد. پس از آن، 22 نفر از همان دانشجویان در مصاحبه نیمه ساختاریافته شرکت کردند. پس از تجزیه و تحلیل دقیق کد و مقوله های مصاحبه، 1 موضوع اصلی (جدیت) شناسایی و سپس با توجه به پایگاه اجتماعی-اقتصادی دانشجویان، با استفاده از آزمون دقیق فیشر بررسی شد. در نهایت، 1 مدل برای ویژگی های اخلاقی اساتید زبان بر اساس ادراک دانشجویان طراحی شد تا بتوان در برنامه ریزی آیین نامه های اخلاقی که به طور خاص برای مدرسان زبان طراحی شده و همچنین جهت افزایش آگاهی اخلاقی در میان آنها بکار برد.
Language. Linguistic theory. Comparative grammar, Indo-Iranian languages and literature