Detecting hallucinations in large language models using semantic entropy
Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn
et al.
Large language model (LLM) systems, such as ChatGPT1 or Gemini2, can show impressive reasoning and question-answering capabilities but often ‘hallucinate’ false outputs and unsubstantiated answers3,4. Answering unreliably or without the necessary information prevents adoption in diverse fields, with problems including fabrication of legal precedents5 or untrue facts in news articles6 and even posing a risk to human life in medical domains such as radiology7. Encouraging truthfulness through supervision or reinforcement has been only partially successful8. Researchers need a general method for detecting hallucinations in LLMs that works even with new and unseen questions to which humans might not know the answer. Here we develop new methods grounded in statistics, proposing entropy-based uncertainty estimators for LLMs to detect a subset of hallucinations—confabulations—which are arbitrary and incorrect generations. Our method addresses the fact that one idea can be expressed in many ways by computing uncertainty at the level of meaning rather than specific sequences of words. Our method works across datasets and tasks without a priori knowledge of the task, requires no task-specific data and robustly generalizes to new tasks not seen before. By detecting when a prompt is likely to produce a confabulation, our method helps users understand when they must take extra care with LLMs and opens up new possibilities for using LLMs that are otherwise prevented by their unreliability. Hallucinations (confabulations) in large language model systems can be tackled by measuring uncertainty about the meanings of generated responses rather than the text itself to improve question-answering accuracy.
1052 sitasi
en
Medicine, Computer Science
CSPNet: A New Backbone that can Enhance Learning Capability of CNN
Chien-Yao Wang, H. Liao, I-Hau Yeh
et al.
Neural networks have enabled state-of-the-art approaches to achieve incredible results on computer vision tasks such as object detection. However, such success greatly relies on costly computation resources, which hinders people with cheap devices from appreciating the advanced technology. In this paper, we propose Cross Stage Partial Network (CSPNet) to mitigate the problem that previous works require heavy inference computations from the network architecture perspective. We attribute the problem to the duplicate gradient information within network optimization. The proposed networks respect the variability of the gradients by integrating feature maps from the beginning and the end of a network stage, which, in our experiments, reduces computations by 20% with equivalent or even superior accuracy on the ImageNet dataset, and significantly outperforms state-of-the-art approaches in terms of AP50 on the MS COCO object detection dataset. The CSPNet is easy to implement and general enough to cope with architectures based on ResNet, ResNeXt, and DenseNet.
3989 sitasi
en
Computer Science
Instruction Tuning for Large Language Models: A Survey
Shengyu Zhang, Linfeng Dong, Xiaoya Li
et al.
This article surveys research works in the quickly advancing field of instruction tuning (IT), a crucial technique to enhance the capabilities and controllability of large language models (LLMs). Instruction tuning refers to the process of further training LLMs on a dataset consisting of (instruction, output) pairs in a supervised fashion, which bridges the gap between the next-word prediction objective of LLMs and the users’ objective of having LLMs adhere to human instructions. In this work, we make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and application, along with analysis of aspects that influence the outcome of IT (e.g., generation of instruction outputs, size of the instruction dataset). We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research.
825 sitasi
en
Computer Science
Minimap2: pairwise alignment for nucleotide sequences
Heng Li
Motivation Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 Mb in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Results Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥100 bp in length, ≥1 kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment. Availability and implementation https://github.com/lh3/minimap2. Supplementary information Supplementary data are available at Bioinformatics online.
12000 sitasi
en
Biology, Medicine
AliView: a fast and lightweight alignment viewer and editor for large datasets
A. Larsson
Summary: AliView is an alignment viewer and editor designed to meet the requirements of next-generation sequencing era phylogenetic datasets. AliView handles alignments of unlimited size in the formats most commonly used, i.e. FASTA, Phylip, Nexus, Clustal and MSF. The intuitive graphical interface makes it easy to inspect, sort, delete, merge and realign sequences as part of the manual filtering process of large datasets. AliView also works as an easy-to-use alignment editor for small as well as large datasets. Availability and implementation: AliView is released as open-source software under the GNU General Public License, version 3.0 (GPLv3), and is available at GitHub (www.github.com/AliView). The program is cross-platform and extensively tested on Linux, Mac OS X and Windows systems. Downloads and help are available at http://ormbunkar.se/aliview Contact: anders.larsson@ebc.uu.se Supplementary information: Supplementary data are available at Bioinformatics online.
3150 sitasi
en
Medicine, Computer Science
Otter: A Multi-Modal Model With In-Context Instruction Tuning
Bo Li, Yuanhan Zhang, Liangyu Chen
et al.
Recent advances in Large Multimodal Models (LMMs) have unveiled great potential as visual assistants. However, most existing works focus on responding to individual instructions or using previous dialogues for contextual understanding. There is little discussion on employing both images and text as in-context examples to enhance the instruction following capability. To bridge this gap, we introduce the Otter model to leverage both textual and visual in-context examples for instruction tuning. Specifically, Otter builds upon Flamingo with Perceiver architecture, and has been instruction tuned for general purpose multi-modal assistant. Otter seamlessly processes multi-modal inputs, supporting modalities including text, multiple images, and dynamic video content. To support the training of Otter, we present the MIMIC-IT (MultI-Modal In-Context Instruction Tuning) dataset, which encompasses over 3 million multi-modal instruction-response pairs, including approximately 2.2 million unique instructions across a broad spectrum of images and videos. MIMIC-IT has been carefully curated to feature a diverse array of in-context examples for each entry. Comprehensive evaluations suggest that instruction tuning with these in-context examples substantially enhances model convergence and generalization capabilities. Notably, the extensive scenario coverage provided by the MIMIC-IT dataset empowers the Otter model to excel in tasks involving complex video and multi-image understanding.
661 sitasi
en
Computer Science, Medicine
MICE: Multivariate Imputation by Chained Equations in R
S. Buuren, K. Groothuis-Oudshoorn
The R package mice imputes incomplete multivariate data by chained equations. The software mice 1.0 appeared in the year 2000 as an S-PLUS library, and in 2001 as an R package. mice 1.0 introduced predictor selection, passive imputation and automatic pooling. This article documents mice, which extends the functionality of mice 1.0 in several ways. In mice, the analysis of imputed data is made completely general, whereas the range of models under which pooling works is substantially extended. mice adds new functionality for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs. Imputation of categorical data is improved in order to bypass problems caused by perfect prediction. Special attention is paid to transformations, sum scores, indices and interactions using passive imputation, and to the proper setup of the predictor matrix. mice can be downloaded from the Comprehensive R Archive Network. This article provides a hands-on, stepwise approach to solve applied incomplete data problems.
17285 sitasi
en
Computer Science
MDAnalysis: A toolkit for the analysis of molecular dynamics simulations
Naveen Michaud-Agrawal, Elizabeth J. Denning, T. Woolf
et al.
3487 sitasi
en
Computer Science, Medicine
Chemical and isotopic systematics of oceanic basalts: implications for mantle composition and processes
Shen-su Sun, W. McDonough
What Happened in History
Gordon Childe, G. Clark, Gordon Ghilde
PROFESSOR V. GORDON GHILDE, who died in the Blue Mountains of his native Australia in 1957 soon after retiring from the Directorship of the London University Institute of Archaeology, was one of the great pre-historians of the world. More perhaps than any other man he showed how by using the data won by archaeologists and natural scientists it was possible to gain a new view of what constituted human history. Inevitably some of the books in which he summarized, with brilliant mastery of detail, the current situation in different fields of prehistoric archaeology have begun to lose something of their value for modern students. The general works in which he opened up new and often vast perspectives on the other hand are in many cases classics that repay constant re-reading and are likely to retain their value for a long time to come. Of these one of the most important is the present volume, originally published in 1941 and last revised in 1954.
Recenzja książki Radosława Kostrubca Dopuszczalne ograniczenia prawa do swobodnego, pokojowego zgromadzania się w systemie praw człowieka, Officina Simonidis Wydawnictwo Akademii Zamojskiej, Zamość 2024, ss. 384
Ryszard Pankiewicz
Recenzja książki
History of scholarship and learning. The humanities, Social sciences (General)
Topological singularities and the general classification of Floquet–Bloch systems
Frederik Nathan, M. Rudner
Recent works have demonstrated that the Floquet–Bloch bands of periodically-driven systems feature a richer topological structure than their non-driven counterparts. The additional structure in the driven case arises from the periodicity of quasienergy, the energy-like quantity that defines the spectrum of a periodically-driven system. Here we develop a new paradigm for the topological classification of Floquet–Bloch bands, based on the time-dependent spectrum of the driven system's evolution operator throughout one driving period. Specifically, we show that this spectrum may host topologically-protected degeneracies at intermediate times, which control the topology of the Floquet bands of the full driving cycle. This approach provides a natural framework for incorporating the role of symmetries, enabling a unified and complete classification of Floquet–Bloch bands and yielding new insight into the topological features that distinguish driven and non-driven systems.
Do emotions conquer facts? A CCME model for the impact of emotional information on implicit attitudes in the post-truth era
Ya Yang, Lichao Xiu, Xuejiao Chen
et al.
Abstract This study aimed to examine the influence of emotional media information on information-processing mechanisms in the current post-truth era. A cognitive conflict monitoring and evaluation (CCME) model was proposed to explore news audiences’ attention and implicit attitudes. The study had a 2 (information type, emotional vs. neutral) × 2 (condition, compatible vs. incompatible) × 3 (electrode position: Fz vs. Cz vs. Pz) design, and an implicit association test (IAT) was administered, with event-related potential (ERP) data collected. The results revealed that emotional information evoked different information-processing mechanisms than neutral information. First, in the early conflict-monitoring stage, emotional information altered arousal, and more attentional resources were allocated to semantic processing. Second, in the late evaluation stage, the lack of attentional resources (due to prior allocation) reduced the late-stage evaluation of the target stimuli by participants. Thus, in this post-truth era, attentional resources may be exhausted by processing emotional information in unnecessary media cues irrelevant to facts, inducing early cognitive conflict and prolonged late-stage evaluation of news articles.
History of scholarship and learning. The humanities, Social Sciences
Issue 20 Editorial
Jessica Bradford
History of scholarship and learning. The humanities, Museums. Collectors and collecting
How Learning Works: 7 Research-Based Principles for Smart Teaching
Launa Gauthier
Publisher Description: Distilling the research literature and translating the scientific approach into language relevant to a college or university teacher, this book introduces seven general principles of how students learn. The authors have drawn on research from a breadth of perspectives (cognitive, developmental, and social psychology; educational research; anthropology; demographics; organizational behavior) to identify a set of key principles underlying learning, from how effective organization enhances retrieval and use of information to what impacts motivation. Integrating theory with real-classroom examples in practice, this book helps faculty to apply cognitive science advances to improve their own teaching.
A general framework for aquatic biogeochemical models
J. Bruggeman, K. Bolding
281 sitasi
en
Computer Science
Review of Brendan McGeever. Antisemitism and the Russian Revolution.
J. Guy Lalande
History of scholarship and learning. The humanities, Social sciences (General)
الوعی بتاریخ الیونان القدیم فی الشعر الجاهلی - ذو القرنین أُنموذجًا –
إِسلام حامد, باسم قاسم
یدور هذا البحث حول (الوعی بتاریخ الیونان القدیم فی الشعر الجاهلی –ذو القرنین أنموذجاً-) وتکمن أهمیة هذا البحث فی إبراز جوانب من فکر الإِنسان الجاهلی وإظهار مدى وعی الشعراء الجاهلیین ومعرفتهم بتاریخ من جاورهم من أوائل الأمم القدیمة، وتقصی المرجعیات والموروثات التی أفادوا منها فی التعبیر عن أفکارهم وأغراضهم الشعریة بتضمینها تلک المرجعیات والموروثات بصیغ فنیة وجمالیة، وذلک من خلال إحصاء الإشارات التی أشار إلیه شعراء الجاهلیة إلى شخصیة ذی القرنین فی أشعارهم وذلک عن طریق استقراء عدد غیر قلیل من دواوین الشعر الجاهلی، وإظهار کیف تمکن أُولئک الشعراء بعبقریتهم الفنیة من توظیف ما عرفوه وورثوه سواء أکانت تلک المعرفة دینیة أم تاریخیة، فضلاً عما یتداخل فیها من قصص وأساطیر تدخل فی عالم الخیال.
History of scholarship and learning. The humanities
South Borneo as an ancient Sprachbund area
Alexander Adelaar
In South and Central Kalimantan (southern Borneo) there are some unusual linguistic features shared among languages which are adjacent but do not belong to the same genetic linguistic subgroups. These languages are predominantly Banjar Malay (a Malayic language), Ngaju (a West Barito language), and Ma’anyan (a Southeast Barito language). The same features also appear to some degree in Malagasy, a Southeast Barito language in East Africa. The shared linguistic features are the following ones: a grammaticalized form of the originally Malay noun buah ‘fruit’ expressing affectedness, nasal spreading in which N- not only nasalizes the onset of the first syllable but also a *y in the next syllable, a non-volitional marker derived from the Banjar Malay prefix combination ta-pa- (related to Indonesian tər- + pər-), and the change from Proto Malayo-Polynesian *s to h (or Malagasy Ø). These features have their origins in the various members of the language configuration outlined above and form a Sprachbund or “Linguistic Area”. The concept of Linguistic Area is weak and difficult to define. Lyle Campbell (2002) considers it little else than borrowing or diffusion and writes it off as “no more than [a] post hoc attempt [...] to impose geographical order on varied conglomerations of [...] borrowings”. While mindful of its shortcomings, the current author still uses the concept as a useful tool to distinguish betweeninherited and borrowed commonalities. In the configuration of languages currently under discussion it also provides a better understanding of the linguistic situation in South Borneo at a time prior to the Malagasy migrations to East Africa (some thirteen centuries ago).
History of scholarship and learning. The humanities
“The rebirth of the West begins with you!”—Self-improvement as radicalisation on 4chan
Ben Elley
Abstract Among the discussion threads devoted to racism, conspiracies, and fascist dogma on 4chan’s notorious ‘Politically Incorrect’ board, there is also a small but significant number of posts on the topic of far-right self-improvement. These posts speak in a style that blends the language of self-help and fitness with far-right propaganda and conspiracies, and are designed to turn the movement from aimless online ‘shitposters’ into survivalists and soldiers. This article describes the unique form of self-improvement advice known on 4chan as the ‘iron pill,’ and considers the role that self-improvement plays in radicalisation among the far right online. It addresses how this plays into the history of fascism, looking in particular at the concept of the ‘New Man’ in Italian fascism, and discusses how a political narrative of conspiracy and resistance to imagined tyranny is used to motivate self-improvement, and how this in turn builds and cements radicalisation.
History of scholarship and learning. The humanities, Social Sciences