Galactica: A Large Language Model for Science
Ross Taylor, Marcin Kardas, Guillem Cucurull
et al.
Information overload is a major obstacle to scientific progress. The explosive growth in scientific literature and data has made it ever harder to discover useful insights in a large mass of information. Today scientific knowledge is accessed through search engines, but they are unable to organize scientific knowledge alone. In this paper we introduce Galactica: a large language model that can store, combine and reason about scientific knowledge. We train on a large scientific corpus of papers, reference material, knowledge bases and many other sources. We outperform existing models on a range of scientific tasks. On technical knowledge probes such as LaTeX equations, Galactica outperforms the latest GPT-3 by 68.2% versus 49.0%. Galactica also performs well on reasoning, outperforming Chinchilla on mathematical MMLU by 41.3% to 35.7%, and PaLM 540B on MATH with a score of 20.4% versus 8.8%. It also sets a new state-of-the-art on downstream tasks such as PubMedQA and MedMCQA dev of 77.6% and 52.9%. And despite not being trained on a general corpus, Galactica outperforms BLOOM and OPT-175B on BIG-bench. We believe these results demonstrate the potential for language models as a new interface for science. We open source the model for the benefit of the scientific community.
983 sitasi
en
Computer Science, Mathematics
Ethical and social risks of harm from Language Models
Laura Weidinger, John F. J. Mellor, Maribeth Rauh
et al.
This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguistics, and social sciences. We outline six specific risk areas: I. Discrimination, Exclusion and Toxicity, II. Information Hazards, III. Misinformation Harms, V. Malicious Uses, V. Human-Computer Interaction Harms, VI. Automation, Access, and Environmental Harms. The first area concerns the perpetuation of stereotypes, unfair discrimination, exclusionary norms, toxic language, and lower performance by social group for LMs. The second focuses on risks from private data leaks or LMs correctly inferring sensitive information. The third addresses risks arising from poor, false or misleading information including in sensitive domains, and knock-on risks such as the erosion of trust in shared information. The fourth considers risks from actors who try to use LMs to cause harm. The fifth focuses on risks specific to LLMs used to underpin conversational agents that interact with human users, including unsafe use, manipulation or deception. The sixth discusses the risk of environmental harm, job automation, and other challenges that may have a disparate effect on different social groups or communities. In total, we review 21 risks in-depth. We discuss the points of origin of different risks and point to potential mitigation approaches. Lastly, we discuss organisational responsibilities in implementing mitigations, and the role of collaboration and participation. We highlight directions for further research, particularly on expanding the toolkit for assessing and evaluating the outlined risks in LMs.
1414 sitasi
en
Computer Science
Taxonomy of Risks posed by Language Models
Laura Weidinger, Jonathan Uesato, Maribeth Rauh
et al.
Responsible innovation on large-scale Language Models (LMs) requires foresight into and in-depth understanding of the risks these models may pose. This paper develops a comprehensive taxonomy of ethical and social risks associated with LMs. We identify twenty-one risks, drawing on expertise and literature from computer science, linguistics, and the social sciences. We situate these risks in our taxonomy of six risk areas: I. Discrimination, Hate speech and Exclusion, II. Information Hazards, III. Misinformation Harms, IV. Malicious Uses, V. Human-Computer Interaction Harms, and VI. Environmental and Socioeconomic harms. For risks that have already been observed in LMs, the causal mechanism leading to harm, evidence of the risk, and approaches to risk mitigation are discussed. We further describe and analyse risks that have not yet been observed but are anticipated based on assessments of other language technologies, and situate these in the same taxonomy. We underscore that it is the responsibility of organizations to engage with the mitigations we discuss throughout the paper. We close by highlighting challenges and directions for further research on risk evaluation and mitigation with the goal of ensuring that language models are developed responsibly.
884 sitasi
en
Computer Science
Instruction Tuning for Large Language Models: A Survey
Shengyu Zhang, Linfeng Dong, Xiaoya Li
et al.
This article surveys research works in the quickly advancing field of instruction tuning (IT), a crucial technique to enhance the capabilities and controllability of large language models (LLMs). Instruction tuning refers to the process of further training LLMs on a dataset consisting of (instruction, output) pairs in a supervised fashion, which bridges the gap between the next-word prediction objective of LLMs and the users’ objective of having LLMs adhere to human instructions. In this work, we make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and application, along with analysis of aspects that influence the outcome of IT (e.g., generation of instruction outputs, size of the instruction dataset). We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research.
825 sitasi
en
Computer Science
Comparing Vision Transformers and Convolutional Neural Networks for Image Classification: A Literature Review
J. Maurício, Inês Domingues, Jorge Bernardino
Transformers are models that implement a mechanism of self-attention, individually weighting the importance of each part of the input data. Their use in image classification tasks is still somewhat limited since researchers have so far chosen Convolutional Neural Networks for image classification and transformers were more targeted to Natural Language Processing (NLP) tasks. Therefore, this paper presents a literature review that shows the differences between Vision Transformers (ViT) and Convolutional Neural Networks. The state of the art that used the two architectures for image classification was reviewed and an attempt was made to understand what factors may influence the performance of the two deep learning architectures based on the datasets used, image size, number of target classes (for the classification problems), hardware, and evaluated architectures and top results. The objective of this work is to identify which of the architectures is the best for image classification and under what conditions. This paper also describes the importance of the Multi-Head Attention mechanism for improving the performance of ViT in image classification.
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou, Xuefei Ning, Ke Hong
et al.
Large Language Models (LLMs) have attracted extensive attention due to their remarkable performance across various tasks. However, the substantial computational and memory requirements of LLM inference pose challenges for deployment in resource-constrained scenarios. Efforts within the field have been directed towards developing techniques aimed at enhancing the efficiency of LLM inference. This paper presents a comprehensive survey of the existing literature on efficient LLM inference. We start by analyzing the primary causes of the inefficient LLM inference, i.e., the large model size, the quadratic-complexity attention operation, and the auto-regressive decoding approach. Then, we introduce a comprehensive taxonomy that organizes the current literature into data-level, model-level, and system-level optimization. Moreover, the paper includes comparative experiments on representative methods within critical sub-fields to provide quantitative insights. Last but not least, we provide some knowledge summary and discuss future research directions.
207 sitasi
en
Computer Science
Learners' Listening Comprehension Difficulties in English Language Learning: A Literature Review.
Abbas Pourhosein Gilakjani, Narjes Banou Sabouri
Listening is one of the most important skills in English language learning. When students listen to English language, they face a lot of listening difficulties. Students have critical difficulties in listening comprehension because universities and schools pay more attention to writing, reading, and vocabulary. Listening is not an important part of many course books and most teachers do not pay attention to this important skill in their classes. In this paper, the researchers reviewed the terms listening, listening comprehension, listening comprehension strategies, and listening difficulties. The review of literature indicated that when teachers are aware of students’ learning difficulties they can help them develop effective listening strategies and finally solve their difficulties in listening and improve their listening comprehension abilities.
Der Essay-Wettbewerb (2025)
Jacqueline Dyballa, Johanna Himmer, Vian Kisyova
et al.
Anlässlich des 150. Geburtstags von Rainer Maria Rilke widmete sich der diesjährige Essay-Wettbewerb der Österreich-Bibliothek Sofia – in Zusammenarbeit mit dem OeAD und dem DAAD – dem Werk eines der bedeutendsten deutschsprachigen Lyriker des 20. Jahrhunderts. Schüler:innen und Studierende aus ganz Bulgarien waren eingeladen, unter dem Motto „Rilke heute – Worte, die bleiben“ ein Gedicht Rilkes auszuwählen und auf dessen Grundlage einen persönlichen Essay zu verfassen.
Natural Language Generation
Emiel van Miltenburg, Chenghua Lin
This article provides a brief overview of the field of Natural Language Generation. The term Natural Language Generation (NLG), in its broadest definition, refers to the study of systems that verbalize some form of information through natural language. That information could be stored in a large database or knowledge graph (in data-to-text applications), but NLG researchers may also study summarisation (text-to-text) or image captioning (image-to-text), for example. As a subfield of Natural Language Processing, NLG is closely related to other sub-disciplines such as Machine Translation (MT) and Dialog Systems. Some NLG researchers exclude MT from their definition of the field, since there is no content selection involved where the system has to determine what to say. Conversely, dialog systems do not typically fall under the header of Natural Language Generation since NLG is just one component of dialog systems (the others being Natural Language Understanding and Dialog Management). However, with the rise of Large Language Models (LLMs), different subfields of Natural Language Processing have converged on similar methodologies for the production of natural language and the evaluation of automatically generated text.
Model Merging to Maintain Language-Only Performance in Developmentally Plausible Multimodal Models
Ece Takmaz, Lisa Bylinina, Jakub Dotlacil
State-of-the-art vision-and-language models consist of many parameters and learn from enormous datasets, surpassing the amounts of linguistic data that children are exposed to as they acquire a language. This paper presents our approach to the multimodal track of the BabyLM challenge addressing this discrepancy. We develop language-only and multimodal models in low-resource settings using developmentally plausible datasets, with our multimodal models outperforming previous BabyLM baselines. One finding in the multimodal language model literature is that these models tend to underperform in \textit{language-only} tasks. Therefore, we focus on maintaining language-only abilities in multimodal models. To this end, we experiment with \textit{model merging}, where we fuse the parameters of multimodal models with those of language-only models using weighted linear interpolation. Our results corroborate the findings that multimodal models underperform in language-only benchmarks that focus on grammar, and model merging with text-only models can help alleviate this problem to some extent, while maintaining multimodal performance.
Small Language Models Reshape Higher Education: Courses, Textbooks, and Teaching
Jian Zhang, Jia Shao
While large language models (LLMs) have introduced novel paradigms in science and education, their adoption in higher education is constrained by inherent limitations. These include a tendency to produce inaccuracies and high computational requirements, which compromise the strict demands for accurate and reliable knowledge essential in higher education. Small language models (MiniLMs), by contrast, offer distinct advantages in professional education due to their lightweight nature and precise retrieval capabilities. This research takes "Atmospheric Physics" as an example. We established a specialized corpus and image repository by gathering over 550,000 full-text PDFs from over 130 international well-respected journals in Earth and environmental science. From this collection, we extracted over 100 million high-quality sentence-level corpus and more than 3 million high-resolution academic images. Using MiniLMs, these resources were organized into a high-dimensional vector library for precise retrieval and efficient utilization of extensive educational content. Consequently, we systematically redesigned the courses, textbooks, and teaching strategies for "Atmospheric Physics" based on MiniLMs. The course is designed as a "interdisciplinary-frontier" system, breaking down traditional boundaries between atmospheric science, space science, hydrology, and remote sensing. Teaching materials are transformed from static, lagging text formats into a dynamic digital resource library powered by MiniLM. For teaching methods, we have designed a question-based learning pathway. This paradigm promotes a shift from passive knowledge transfer to active cognitive development. Consequently, this MiniLM-driven "Atmospheric Physics" course demonstrates a specific avenue for "AI for education".
Searching for the Most Human-like Emergent Language
Brendon Boldt, David Mortensen
In this paper, we design a signalling game-based emergent communication environment to generate state-of-the-art emergent languages in terms of similarity to human language. This is done with hyperparameter optimization, using XferBench as the objective function. XferBench quantifies the statistical similarity of emergent language to human language by measuring its suitability for deep transfer learning to human language. Additionally, we demonstrate the predictive power of entropy on the transfer learning performance of emergent language as well as corroborate previous results on the entropy-minimization properties of emergent communication systems. Finally, we report generalizations regarding what hyperparameters produce more realistic emergent languages, that is, ones which transfer better to human language.
Entre a dor e o texto: traços bíblicos em Horto de Incêndio
Łukasz Kraj
The aim of this article is to analyse the role of biblical references in Al Berto’s (Alberto Raposo Pidwell Tavares’) last poetry volume, Horto de Incêndio, published in 1997. Previous research on this poetry has identified intertextuality, an interest in corporeality and the problem of the relationship between experience and text as dominant features of this work. Building upon these insights, I demonstrate that the numerous allusions to the Bible, especially evocations of the Apocalypse, in Horto de Incêndio are related to the author’s attempt to textualise the experience of illness and allow us to partially reconstruct his view of the ontology of the literary text.
İzleyicilikten Katılımcılığa: Türkiye’de Dijital Yerlilerin İnteraktif Sinemada Yeni Görme Kültürü Üzerine Bir Araştırma
Ferhat Zengin, Uğur Baloğlu, Yıldız Derya Birincioğlu
Dijital çağın getirdiği teknolojik dönüşümler, görsel kültür pratiklerini ve izleyici deneyimlerini önemliölçüde değiştirmektedir. Bu değişim özellikle dijital yerlilerin medya tüketim alışkanlıklarında belirginbir şekilde görülmektedir. Bu araştırma, etkileşimli sinemanın sunduğu görsel ve işitsel teknikler ileizleyicilerin görme kültürü ve görme politikalarını nasıl yapılandırdığı konusunu incelemektedir. Çalışma,sinemanın dijitalleşmeyle gelişen anlatım olanaklarının izleyiciye hikâyeyi yönlendirme ve yönetebilmeimkânı sunmasına, dolayısıyla izleyici pratiklerinde ortaya çıkan değişimlere odaklanmaktadır.Araştırmanın kuramsal çerçevesi, etkileşimli ve siber drama ile çizgisel olmayan anlatı bağlamlarındasinema-izleyici ilişkisinin geçirdiği dönüşümleri ele almaktadır. Nitel araştırma yöntemleri arasında yeralan fenomenolojik desen üzerine kurulu olan bu araştırmada, 10 katılımcıya Late Shift interaktif filmiizletilerek derinlemesine görüşmeler yapılmış ve elde edilen veriler betimsel ve tematik analiz teknikleriyleçözümlenmiştir. Araştırmanın bulgularına göre izleyiciler, interaktif film deneyiminde yaratma, yönetmeve temsil etme yanılsaması içerisinde yeni bir özdeşleşme, görme kültürü ve seyir deneyimi içerisinegirmektedir. Çalışma, interaktif sinemanın geleneksel sinemadaki tekil karakter özdeşleşmesini aşarakizleyiciye karakter, yönetmen ve senarist rolleri arasında dinamik geçişler sunan katılımcı bir deneyim yarattığını ve bu sayede salt teknolojik bir yenilikten öte kültürel bir paradigma değişimini temsil ettiğiniortaya koymuştur.
Communication. Mass media
DE L’INTERVENTION DES ACTEURS NON ÉTATIQUES POUR LA SCOLARISATION DES ÉLÈVES DÉPLACÉS INTERNES (EDI)
Issiaka OUEDRAOGO, Goama NAKOULMA & Sylvie KOROGO
Résumé : La crise sécuritaire à laquelle le Burkina Faso est confronté affecte son système éducatif depuis 2016. Elle est à l’origine de la fermeture de plusieurs milliers d’établissements scolaires et du déplacement des élèves de leurs lieux habituels d’habitation vers des zones relativement mieux sécurisées. Dans ces zones d’accueil, leurs besoins éducatifs ne sont pas couverts, ce qui compromet leurs possibilités de poursuite des études. Pour accompagner l’État burkinabè dans la prise en charge de ces besoins d’urgence, des acteurs non étatiques mettent en œuvre des actions en faveur de la scolarisation des EDI. Dans le cadre d’une recherche sur les mécanismes de scolarisation de ces élèves, des enquêtes ont été réalisées dans les villes de Kaya et de Fada N’Gourma. Les résultats révèlent une multiplicité et une diversité de ces acteurs. Certains ont l’expérience de l’intervention dans le secteur humanitaire ou de l’éducation, d’autres en ont dans les deux, tandis qu’il existe des acteurs qui n’ont l’expérience dans aucun de ces domaines. Leurs actions contribuent à la scolarisation de nombreux EDI. Cependant, elles comportent des limites liées aux pratiques d’intervention et à l’insuffisance des aides apportées par rapport à la demande. Leurs interventions prennent peu en compte certains besoins prioritaires pour les EDI. Il est donc indispensable d’améliorer la gouvernance dans le domaine de l’Éducation en Situation d’Urgence (ESU) en vue d’accroître l’efficacité de ces interventions.
Mots-clés : Intervention- Acteur non étatique - Scolarisation - EDI – Zone d’accueil
Arts in general, Computational linguistics. Natural language processing
Morphological evaluation of subwords vocabulary used by BETO language model
Óscar García-Sierra, Ana Fernández-Pampillón Cesteros, Miguel Ortega-Martín
Subword tokenization algorithms used by Large Language Models are significantly more efficient and can independently build the necessary vocabulary of words and subwords without human intervention. However, those subwords do not always align with real morphemes, potentially impacting the models' performance, though it remains uncertain when this might occur. In previous research, we proposed a method to assess the morphological quality of vocabularies, focusing on the overlap between these vocabularies and the morphemes of a given language. Our evaluation method was built on three quality measures, relevance, cohesion, and morphological accuracy, and a procedure for their assessment. By applying this method to vocabularies created by three subword tokenization algorithms, BPE, Wordpiece, and Unigram, we concluded that these vocabularies generally exhibit very low morphological quality. In this article, we apply this evaluation to the tokenizer of BETO, a BERT language model trained on large Spanish corpora. This evaluation, along with our previous results, helped us conclude that its vocabulary has a low morphological quality, and we also found that training the tokenizer in a larger corpus does not improve the morphological quality of the generated vocabulary. Additionally, this evaluation helps clarify the algorithm used by the tokenizer, that is, Wordpiece, given the inconsistencies between the authors' claims and the model's configuration.
Manipulating language models' training data to study syntactic constraint learning: the case of English passivization
Cara Su-Yi Leong, Tal Linzen
Grammatical rules in natural languages are often characterized by exceptions. How do language learners learn these exceptions to otherwise general patterns? Here, we study this question through the case study of English passivization. While passivization is in general quite productive, there are cases where it cannot apply (cf. the following sentence is ungrammatical: *One hour was lasted by the meeting). Using neural network language models as theories of language acquisition, we explore the sources of indirect evidence that a learner can leverage to learn whether a verb can be passivized. We first characterize English speakers' judgments of exceptions to the passive, and confirm that speakers find some verbs more passivizable than others. We then show that a neural network language model's verb passivizability judgments are largely similar to those displayed by humans, suggesting that evidence for these exceptions is available in the linguistic input. Finally, we test two hypotheses as to the source of evidence that language models use to learn these restrictions: frequency (entrenchment) and semantics (affectedness). We do so by training models on versions of the corpus that have had sentences of the types implicated by each hypothesis removed, altered, or introduced. We find support for both hypotheses: entrenchment and affectedness make independent contributions to a verb's passivizability. From a methodological point of view, this study highlights the utility of altering a language model's training data for answering questions where complete control over a learner's input is vital.
A Survey of Large Language Models for Arabic Language and its Dialects
Malak Mashaabi, Shahad Al-Khalifa, Hend Al-Khalifa
This survey offers a comprehensive overview of Large Language Models (LLMs) designed for Arabic language and its dialects. It covers key architectures, including encoder-only, decoder-only, and encoder-decoder models, along with the datasets used for pre-training, spanning Classical Arabic, Modern Standard Arabic, and Dialectal Arabic. The study also explores monolingual, bilingual, and multilingual LLMs, analyzing their architectures and performance across downstream tasks, such as sentiment analysis, named entity recognition, and question answering. Furthermore, it assesses the openness of Arabic LLMs based on factors, such as source code availability, training data, model weights, and documentation. The survey highlights the need for more diverse dialectal datasets and attributes the importance of openness for research reproducibility and transparency. It concludes by identifying key challenges and opportunities for future research and stressing the need for more inclusive and representative models.
Model-based Large Language Model Customization as Service
Zhaomin Wu, Jizhou Guo, Junyi Hou
et al.
Prominent Large Language Model (LLM) services from providers like OpenAI and Google excel at general tasks but often underperform on domain-specific applications. Current customization services for these LLMs typically require users to upload data for fine-tuning, posing significant privacy risks. While differentially private (DP) data synthesis presents a potential alternative, its application commonly results in low effectiveness due to the introduction of excessive noise on data for DP. To overcome this, we introduce Llamdex, a novel framework that facilitates LLM customization as a service, where the client uploads pre-trained domain-specific models rather than data. This client-uploaded model, optionally protected by DP with much lower noise, is inserted into the base LLM via connection modules. Significantly, these connecting modules are trained without requiring sensitive domain data, enabling clients to customize LLM services while preserving data privacy. Experiments demonstrate that Llamdex improves domain-specific accuracy by up to 26% over state-of-the-art private data synthesis methods under identical privacy constraints and, by obviating the need for users to provide domain context within queries, maintains inference efficiency comparable to the original LLM service.
RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning
Yinpei Dai, Jayjun Lee, Nima Fazeli
et al.
Developing robust and correctable visuomotor policies for robotic manipulation is challenging due to the lack of self-recovery mechanisms from failures and the limitations of simple language instructions in guiding robot actions. To address these issues, we propose a scalable data generation pipeline that automatically augments expert demonstrations with failure recovery trajectories and fine-grained language annotations for training. We then introduce Rich languAge-guided failure reCovERy (RACER), a supervisor-actor framework, which combines failure recovery data with rich language descriptions to enhance robot control. RACER features a vision-language model (VLM) that acts as an online supervisor, providing detailed language guidance for error correction and task execution, and a language-conditioned visuomotor policy as an actor to predict the next actions. Our experimental results show that RACER outperforms the state-of-the-art Robotic View Transformer (RVT) on RLbench across various evaluation settings, including standard long-horizon tasks, dynamic goal-change tasks and zero-shot unseen tasks, achieving superior performance in both simulated and real world environments. Videos and code are available at: https://rich-language-failure-recovery.github.io.