Does LLM Alignment Really Need Diversity? An Empirical Study of Adapting RLVR Methods for Moral Reasoning
Zhaowei Zhang, Xiaohan Liu, Xuekai Zhu
et al.
Reinforcement learning with verifiable rewards (RLVR) has achieved remarkable success in logical reasoning tasks, yet whether large language model (LLM) alignment requires fundamentally different approaches remains unclear. Given the apparent tolerance for multiple valid responses in moral reasoning, a natural hypothesis is that alignment tasks inherently require diversity-seeking distribution-matching algorithms rather than reward-maximizing policy-based methods. We conduct the first comprehensive empirical study comparing both paradigms on MoReBench. To enable stable RLVR training, we build a rubric-grounded reward pipeline by training a Qwen3-1.7B judge model. Contrary to our hypothesis, we find that distribution-matching approaches do not demonstrate significant advantages over reward-maximizing methods as expected on alignment tasks. Through semantic visualization mapping high-reward responses to semantic space, we demonstrate that moral reasoning exhibits more concentrated high-reward distributions than mathematical reasoning, where diverse solution strategies yield similarly high rewards. This counter-intuitive finding explains why mode-seeking optimization proves equally or more effective for alignment tasks. Our results suggest that alignment tasks do not inherently require diversity-preserving algorithms, and standard reward-maximizing RLVR methods can effectively transfer to moral reasoning without explicit diversity mechanisms.
Review of Encountering Artificial Intelligence: Ethical and Anthropological Investigations
Jodi Hunt
Review of _Encountering Artificial Intelligence: Ethical and Anthropological Investigations_
Dynamic Mediation and Moral Hazard: From Private To Public Communication
Allen Vong
I characterize optimal mediation dynamics with fixed discounting in a moral hazard model where a long-lived worker interacts with short-lived clients. I show that optimal mediation yields a nonstationary correlated information structure that transitions from private to public communication over time. In early periods, it occasionally creates information asymmetry about future play between the worker and the clients by randomizing over two continuations, with the realization privately revealed to the worker. In one, the worker shirks with impunity. In the other, the worker exerts effort subject to minimal punishment for underperformance. Eventually, optimal mediation prescribes only public communication that induces carrot-and-stick incentives.
Flexible Moral Hazard Problems with Adverse Selection
Siwen Liu
We study a moral hazard problem with adverse selection: a risk-neutral agent can directly control the output distribution and possess private information about the production environment. The principal designs a menu of contracts satisfying limited liability. Deviating from classical models, not only can the principal motivate the agent to exert certain levels of aggregate efforts by designing the "power" of the contracts, but she can also regulate the support of the chosen output distributions by designing the "range" of the contract. We show that it is either optimal for the principal to provide a single full-range contract, or the optimal low-type contract range excludes some high outputs, or the optimal high-type contract range excludes some low outputs. We provide sufficient and necessary conditions on when a single full-range contract is optimal under convex effort functions, and show that this condition is also sufficient with general effort functions.
Moral Hazard with Network Effects
Marc Claveria-Mayol
I study a moral hazard problem between a principal and multiple agents who experience positive peer effects represented by a (weighted) network. Under the optimal linear contract, the principal provides high-powered incentives to central agents in the network in order to exploit the larger incentive spillovers such agents create. The analysis reveals a novel measure of network centrality that captures rich channels of direct and indirect incentive spillovers and characterizes the optimal contract and its induced equilibrium efforts. The notion of centrality relevant for incentive spillovers in the model emphasizes the role of pairs of agents who link to common neighbors in the network. This characterization leads to a measure of marginal network effects and identifies the agents whom the principal targets with stronger incentives in response to the addition (or strengthening) of a link. When the principal can position agents with heterogeneous costs of effort in the network, the principal prefers to place low-cost agents in central positions. The results shed light on how firms can increase productivity through corporate culture, office layout, and social interactions.
60 lat „Studiów Warmińskich” (ks. Paweł Rabczyński)
Paweł Rabczyński
Moral theology, Doctrinal Theology
Do Large GPT Models Discover Moral Dimensions in Language Representations? A Topological Study Of Sentence Embeddings
Stephen Fitz
As Large Language Models are deployed within Artificial Intelligence systems, that are increasingly integrated with human society, it becomes more important than ever to study their internal structures. Higher level abilities of LLMs such as GPT-3.5 emerge in large part due to informative language representations they induce from raw text data during pre-training on trillions of words. These embeddings exist in vector spaces of several thousand dimensions, and their processing involves mapping between multiple vector spaces, with total number of parameters on the order of trillions. Furthermore, these language representations are induced by gradient optimization, resulting in a black box system that is hard to interpret. In this paper, we take a look at the topological structure of neuronal activity in the "brain" of Chat-GPT's foundation language model, and analyze it with respect to a metric representing the notion of fairness. We develop a novel approach to visualize GPT's moral dimensions. We first compute a fairness metric, inspired by social psychology literature, to identify factors that typically influence fairness assessments in humans, such as legitimacy, need, and responsibility. Subsequently, we summarize the manifold's shape using a lower-dimensional simplicial complex, whose topology is derived from this metric. We color it with a heat map associated with this fairness metric, producing human-readable visualizations of the high-dimensional sentence manifold. Our results show that sentence embeddings based on GPT-3.5 can be decomposed into two submanifolds corresponding to fair and unfair moral judgments. This indicates that GPT-based language models develop a moral dimension within their representation spaces and induce an understanding of fairness during their training process.
M. M. Tareev’s kenotic ecclesiology: an attempt at a reconstruction
Artem Malyshev
The article attempts to reconstruct and comprehensively analyze the ecclesiological conception of Mikhail Mikhailovich Tareev (1867-1934), Professor of Moral Theology at the Moscow Theological Academy at the beginning of the 20th century. The study consistently outlines the characteristic features of Tareev's theological approach, examines the features of his ecclesiology, proves the author's definition of his ecclesiology as kenotic. The specifics of the Tareev’s method is the division of the elements of the problem being analyzed into categories of absolute (internal) and relative (external). In other words, this problem appears in his extensive legacy as a problem of the correlation of content and form. Russian theologian understands the Church as a form of Christianity. Christianity as an inner, mystical, spiritual life has the life of the Church as its vestment. Christianity is possible only in the Church, and produces it. Being the realization of spiritual life in the world (“mir”), the Church must be holy and free from the world. At the same time, it is through the Church that the Christian spirit, or the spirit of Christ, influences the world, transforms it by Christ’s grace. According to Tareev, Christians represent a golden chain, on the one hand, connected with Christ and the saints, on the other hand, uniting all the living faithful. The Church is defined by the Russian theologian as a Society, a Kingdom, an Organism, a Union etc.; it is founded by Christ during His sufferings. Church’s purpose is to serve as an intermediary between the mystical life of the individual and the historical life of society. Due to the symmetry of Tareev's Christology and ecclesiology, the latter is defined as kenotic. The article ends with the designation of possible reasons for Tareev's formulation of his idea of the Church
Medical Bill Shock and Imperfect Moral Hazard
Alex Hoagland, David M. Anderson, Ed Zhu
Consumers are sensitive to medical prices when consuming care, but delays in price information may distort moral hazard. We study how medical bills affect household spillover spending following utilization, leveraging variation in insurer claim processing times. Households increase spending by 22\% after a scheduled service, but then reduce spending by 11\% after the bill arrives. Observed bill effects are consistent with resolving price uncertainty; bill effects are strongest when pricing information is particularly salient. A model of demand for healthcare with delayed pricing information suggests households misperceive pricing signals prior to bills, and that correcting these perceptions reduce average (median) spending by 16\% (7\%) annually.
Selection on moral hazard in the Swiss market for mandatory health insurance: Empirical evidence from Swiss Household Panel data
Francetic Igor
Selection on moral hazard represents the tendency to select a specific health insurance coverage depending on the heterogeneity in utilisation ''slopes''. I use data from the Swiss Household Panel and from publicly available regulatory data to explore the extent of selection on slopes in the Swiss managed competition system. I estimate responses in terms of (log) doctor visits to lowest and highest deductible levels using Roy-type models, identifying marginal treatment effects with local instrumental variables. The response to high coverage plans (i.e. plans with the lowest deductible level) among high moral hazard types is 25-35 percent higher than average.
Evaluation of Performance-Trust vs Moral-Trust Violation in 3D Environment
Maitry Ronakbhai Trivedi, Zahra Rezaei Khavas, Paul Robinette
Human-Robot Interaction, in which a robot with some level of autonomy interacts with a human to achieve a specific goal has seen much recent progress. With the introduction of autonomous robots and the possibility of widespread use of those in near future, it is critical that humans understand the robot's intention while interacting with them as this will foster the development of human-robot trust. The new conceptualization of trust which had been introduced by researchers in recent years considers trust in Human-Robot Interaction to be a multidimensional nature. Two main aspects which are attributed to trust are performance trust and moral trust. We aim to design an experiment to investigate the consequences of performance-trust violation and moral-trust violation in a search and rescue scenario. We want to see if two similar robot failures, one caused by a performance-trust violation and the other by a moral-trust violation have distinct effects on human trust. In addition to this, we plan to develop an interface that allows us to investigate whether altering the interface's modality from grid-world scenario (2D environment) to realistic simulation (3D environment) affects human perception of the task and the effects of the robot's failure on human trust.
Moral Hazard with Heterogeneous Beliefs
Martin Dumav, Urmee Khan, Luca Rigotti
We study a model of moral hazard with heterogeneous beliefs where each of agent's actions gives rise to a pair of probability distributions over output levels, one representing the beliefs of the agent and the other those of the principal. The agent's relative optimism or pessimism dictates whether the contract is high-powered (i.e. with high variability between wage levels) or low-powered. When the agent is sufficiently more optimistic than the principal, the trade-off between risk-sharing and incentive provision may be eliminated. Using Monotone Likelihood Ratio ranking to model disagreement in the parties' beliefs, we show that incentives move in the direction of increasing disagreement. In general, the shape of the wage scheme is sensitive to the differences in beliefs. Thereby, key features of optimal incentive contracts under common beliefs do not readily generalize to the case of belief heterogeneity.
Moral-Trust Violation vs Performance-Trust Violation by a Robot: Which Hurts More?
Zahra Rezaei Khavas, Russell Perkins, S. Reza Ahmadzadeh
et al.
In recent years a modern conceptualization of trust in human-robot interaction (HRI) was introduced by Ullman et al.\cite{ullman2018does}. This new conceptualization of trust suggested that trust between humans and robots is multidimensional, incorporating both performance aspects (i.e., similar to the trust in human-automation interaction) and moral aspects (i.e., similar to the trust in human-human interaction). But how does a robot violating each of these different aspects of trust affect human trust in a robot? How does trust in robots change when a robot commits a moral-trust violation compared to a performance-trust violation? And whether physiological signals have the potential to be used for assessing gain/loss of each of these two trust aspects in a human. We aim to design an experiment to study the effects of performance-trust violation and moral-trust violation separately in a search and rescue task. We want to see whether two failures of a robot with equal magnitudes would affect human trust differently if one failure is due to a performance-trust violation and the other is a moral-trust violation.
Your Face Mirrors Your Deepest Beliefs-Predicting Personality and Morals through Facial Emotion Recognition
P. A. Gloor, A. Fronzetti Colladon, E. Altuntas
et al.
Can we really "read the mind in the eyes"? Moreover, can AI assist us in this task? This paper answers these two questions by introducing a machine learning system that predicts personality characteristics of individuals on the basis of their face. It does so by tracking the emotional response of the individual's face through facial emotion recognition (FER) while watching a series of 15 short videos of different genres. To calibrate the system, we invited 85 people to watch the videos, while their emotional responses were analyzed through their facial expression. At the same time, these individuals also took four well-validated surveys of personality characteristics and moral values: the revised NEO FFI personality inventory, the Haidt moral foundations test, the Schwartz personal value system, and the domain-specific risk-taking scale (DOSPERT). We found that personality characteristics and moral values of an individual can be predicted through their emotional response to the videos as shown in their face, with an accuracy of up to 86% using gradient-boosted trees. We also found that different personality characteristics are better predicted by different videos, in other words, there is no single video that will provide accurate predictions for all personality characteristics, but it is the response to the mix of different videos that allows for accurate prediction.
The Moral Foundations of Left-Wing Authoritarianism: On the Character, Cohesion, and Clout of Tribal Equalitarian Discourse
Justin E. Lane, Kevin McCaffree, F. LeRon Shults
Left-wing authoritarianism remains far less understood than right-wing authoritarianism. We contribute to the literature on the former, which typically relies on surveys, using a new social media analytics approach. We use a list of 60 terms to provide an exploratory sketch of the outlines of a political ideology (tribal equalitarianism) with origins in 19th and 20th century social philosophy. We then use analyses of the English Corpus of Google Books (over 8 million books) and scraped unique tweets from Twitter (n = 202,852) to conduct a series of investigations to discern the extent to which this ideology is cohesive amongst the public, reveals signatures of authoritarianism and has been growing in popularity. Though exploratory, our results provide some evidence of left-wing authoritarianism in two forms (1) a uniquely conservative moral signature amongst ostensible liberals using measures from Moral Foundations Theory and (2) a substantial prevalence of anger, relative to anxiety or sadness. In general, results indicate that this worldview is growing in popularity, is increasingly cohesive, and shows signatures of authoritarianism.
A Consideration of Teaching, Friendship, and Boundaries in Catholic Higher Education
Bridget Burke Ravizza, Mara Brecht
Paul J. Wadell is widely recognized for his work placing friendship at the heart of ethics. Wadell develops a robust Christian vision of friendship, and invites us to think about friendship as constitutive of the good life. Beginning with a discussion of the virtues Wadell characterizes as central to friendship—hospitality, humility, and love—we consider the role of friendship in the life of faculty at Catholic colleges and universities. Through his theology and his living example, we propose, Wadell offers friendship as a model for teaching. Teacher-as-friend, we argue further, helps faculty navigate a number of difficult challenges familiar to Catholic higher education, particularly around boundaries. We take up three: diminished boundaries in a social media culture; stressed boundaries in an era of abuses of power; and threatened boundaries in a capitalistic culture.
Mormonism and White Supremacy
Joanna Brooks
This book examines the role of white American Christianity in fostering and sustaining white supremacy. It draws from theology, critical race theory, and American religious history to make the argument that predominantly white Christian denominations have served as a venue for establishing white privilege and have conveyed to white believers a sense of moral innocence without requiring moral reckoning with the costs of anti-Black racism. To demonstrate these arguments, the book draws from Mormon history from the 1830s to the present, from an archive that includes speeches, historical documents, theological treatises, Sunday school curricula, and other documents of religious life.
Optimal Rating Design under Moral Hazard
Maryam Saeedi, Ali Shourideh
We study optimal rating design under moral hazard and strategic manipulation. An intermediary observes a noisy indicator of effort and commits to a rating policy that shapes market beliefs and pay. We characterize optimal ratings via concavification of a gain function. Optimal ratings depends on interaction of effort and risk: for activities that raise tail risk, optimal ratings exhibit lower censorship, pooling poor outcomes to insure and encourage risk-taking; for activities that reduce tail risk, upper censorship increases penalties for negligence. In multi-task environments with window dressing, less informative ratings deter manipulation. In redistributive test design, optimal tests exhibit mid-censorship.
Against the Others! Detecting Moral Outrage inSocial Media Networks
Wienke Strathern, Mirco Schoenfeld, Raji Ghawi
et al.
Online firestorms on Twitter are seemingly arbitrarily occurring outrages towards people, companies, media campaigns and politicians. Moral outrages can create an excessive collective aggressiveness against one single argument, one single word, or one action of a person resulting in hateful speech. With a collective "against the others" the negative dynamics often start. Using data from Twitter, we explored the starting points of several firestorm outbreaks. As a social media platform with hundreds of millions of users interacting in real-time on topics and events all over the world, Twitter serves as a social sensor for online discussions and is known for quick and often emotional disputes. The main question we pose in this article, is whether we can detect the outbreak of a firestorm. Given 21 online firestorms on Twitter, the key questions regarding the anomaly detection are: 1) How can we detect the changing point? 2) How can we distinguish the features that cause a moral outrage? In this paper we examine these challenges developing a method to detect the point of change systematically spotting on linguistic cues of tweets. We are able to detect outbreaks of firestorms early and precisely only by applying linguistic cues. The results of our work can help detect negative dynamics and may have the potential for individuals, companies, and governments to mitigate hate in social media networks.
Dini Sosyalleşme Bağlamında Suriyeli Sığınmacılarda Yapısal Uyum: Kilis Örneği
Yusuf Yaralıoğlu, Özcan Güngör
Suriyeli sığınmacıların 2011 yılından beri zorunlu göçe maruz kaldıkları bilinmektedir. Dolayısıyla yaşanan bu süreç, göç edilen Kilis’te dini sosyalleşme ile sığınmacıların yapısal uyum problemiyle karşılaşmalarına sebep olmuştur. Bu bağlamda çalışmanın problemi Suriyeli sığınmacıların dini sosyalleşmelerini yapısal uyum kurumları bağlamında anlamaya çalışmaktır. Çalışmada temelde nitel yöntem tercih edilmiş ve yarı yapılandırımış mülakat tekniği kullanılmıştır, bunun yanında yapının özneye etkisi dikkate alınarak yapısal uyum konusu anlaşılmaya çalışılmıştır. Ayrıca çalışma anlayıcı (yorumlayıcı) yaklaşımla oluşturulmuş ve bir olgu bilim çalışması olarak gerçekleştirilmiştir. Zira göçmenler özne olarak zorunlu bir durumla karşılaşsalar da hem taşıdıkları yapısal özellikler hem de Türkiye’nin yapısal özelliklerinin onların habituslarını etkileyeceği düşünülerek yapı-aktör dikotomisi dikkate alınarak çözümlemeler yapılmıştır. Diğer yandan araştırma, kartopu örneklem yöntemiyle oluşturulmuştur. Araştırmanın verileri Kilis ilinin farklı mahallelerinde ikamet eden sığınmacılardan ve sığınmacıların mikro-sosyolojik deneyimleri göz önünde bulundurulsa da Giddens’in yapılaşmacı teorisinin yaklaşımları çerçevesinde oluşturulmuştur. Araştırmada dini grupların, mescitlerin ve sığınmacılar tarafından kurulan derneklerin yapısal uyum sürecinde farklı sosyal kurum ve kuruluşlardan dini sosyalleşme yoluyla sığınmacıları etkilediği tespit edilmiş ayrıca Arapçanın bu süreçte önemli bir rol oynadığı görülmüştür.
Philosophy. Psychology. Religion, Moral theology