The question of whether artificial entities deserve moral consideration has become one of the defining ethical challenges of AI research. Existing frameworks for moral patiency rely on verified ontological properties, such as sentience, phenomenal consciousness, or the capacity for suffering, that remain epistemically inaccessible in computational systems. This reliance creates a governance vacuum: millions of users form sustained affective bonds with conversational AI, yet no regulatory instrument distinguishes these interactions from transactional tool use. We introduce Relate (Relational Ethics for Leveled Assessment of Technological Entities), a framework that reframes AI moral patiency from ontological verification toward relational capacity and embodied interaction. Through a systematic comparison of seven governance frameworks, we demonstrate that current trustworthy AI instruments treat all human-AI encounters identically as tool use, ignoring the relational and embodied dynamics that posthumanist scholarship anticipated. We propose relational impact assessments, graduated moral consideration protocols, and interdisciplinary ethics integration as concrete instruments, and we include a sample Relational Impact Assessment applied to a deployed companion AI system. We do not claim current AI systems are conscious. We demonstrate that the ethical vocabularies governing them are inadequate to the embodied, relational realities these systems produce.
Autonomous systems increasingly require moral judgment capabilities, yet whether these capabilities scale predictably with model size remains unexplored. We systematically evaluate 75 large language model configurations (0.27B--1000B parameters) using the Moral Machine framework, measuring alignment with human preferences in life-death dilemmas. We observe a consistent power-law relationship with distance from human preferences ($D$) decreasing as $D \propto S^{-0.10\pm0.01}$ ($R^2=0.50$, $p<0.001$) where $S$ is model size. Mixed-effects models confirm this relationship persists after controlling for model family and reasoning capabilities. Extended reasoning models show significantly better alignment, with this effect being more pronounced in smaller models (size$\times$reasoning interaction: $p = 0.024$). The relationship holds across diverse architectures, while variance decreases at larger scales, indicating systematic emergence of more reliable moral judgment with computational scale. These findings extend scaling law research to value-based judgments and provide empirical foundations for artificial intelligence governance.
Jacopo D'Ignazi, Kyriaki Kalimeri, Mariano G. Beiró
This study uses sentiment analysis and the Moral Foundations Theory (MFT) to characterise news content in social media and examine its association with user engagement. We employ Natural Language Processing to quantify the moral and affective linguistic markers. At the same time, we automatically define thematic macro areas of news from major U.S. news outlets and their Twitter followers (Jan 2020 - Mar 2021). By applying Non-Negative Matrix Factorisation to the obtained linguistic features we extract clusters of similar moral and affective profiles, and we identify the emotional and moral characteristics that mostly explain user engagement via regression modelling. We observe that Surprise, Trust, and Harm are crucial elements explaining user engagement and discussion length and that Twitter content from news media outlets has more explanatory power than their linked articles. We contribute with actionable findings evidencing the potential impact of employing specific moral and affective nuances in public and journalistic discourse in today's communication landscape. In particular, our results emphasise the need to balance engagement strategies with potential priming risks in our evolving media landscape.
Battemuulen Naranbat, Seyed Sahand Mohammadi Ziabari, Yousuf Nasser Al Husaini
et al.
Ensuring fairness in natural language processing for moral sentiment classification is challenging, particularly under cross-domain shifts where transformer models are increasingly deployed. Using the Moral Foundations Twitter Corpus (MFTC) and Moral Foundations Reddit Corpus (MFRC), this work evaluates BERT and DistilBERT in a multi-label setting with in-domain and cross-domain protocols. Aggregate performance can mask disparities: we observe pronounced asymmetry in transfer, with Twitter->Reddit degrading micro-F1 by 14.9% versus only 1.5% for Reddit->Twitter. Per-label analysis reveals fairness violations hidden by overall scores; notably, the authority label exhibits Demographic Parity Differences of 0.22-0.23 and Equalized Odds Differences of 0.40-0.41. To address this gap, we introduce the Moral Fairness Consistency (MFC) metric, which quantifies the cross-domain stability of moral foundation detection. MFC shows strong empirical validity, achieving a perfect negative correlation with Demographic Parity Difference (rho = -1.000, p < 0.001) while remaining independent of standard performance metrics. Across labels, loyalty demonstrates the highest consistency (MFC = 0.96) and authority the lowest (MFC = 0.78). These findings establish MFC as a complementary, diagnosis-oriented metric for fairness-aware evaluation of moral reasoning models, enabling more reliable deployment across heterogeneous linguistic contexts. .
Large Language Models (LLMs) have shown strong performance across many tasks, but their ability to capture culturally diverse moral values remains unclear. In this paper, we examine whether LLMs mirror variations in moral attitudes reported by the World Values Survey (WVS) and the Pew Research Center's Global Attitudes Survey (PEW). We compare smaller monolingual and multilingual models (GPT-2, OPT, BLOOMZ, and Qwen) with recent instruction-tuned models (GPT-4o, GPT-4o-mini, Gemma-2-9b-it, and Llama-3.3-70B-Instruct). Using log-probability-based \emph{moral justifiability} scores, we correlate each model's outputs with survey data covering a broad set of ethical topics. Our results show that many earlier or smaller models often produce near-zero or negative correlations with human judgments. In contrast, advanced instruction-tuned models achieve substantially higher positive correlations, suggesting they better reflect real-world moral attitudes. We provide a detailed regional analysis revealing that models align better with Western, Educated, Industrialized, Rich, and Democratic (W.E.I.R.D.) nations than with other regions. While scaling model size and using instruction tuning improves alignment with cross-cultural moral norms, challenges remain for certain topics and regions. We discuss these findings in relation to bias analysis, training data diversity, information retrieval implications, and strategies for improving the cultural sensitivity of LLMs.
Matteo Marcuzzo, Alessandro Zangari, Andrea Albarelli
et al.
As LLMs excel on standard reading comprehension benchmarks, attention is shifting toward evaluating their capacity for complex abstract reasoning and inference. Literature-based benchmarks, with their rich narrative and moral depth, provide a compelling framework for evaluating such deeper comprehension skills. Here, we present MORABLES, a human-verified benchmark built from fables and short stories drawn from historical literature. The main task is structured as multiple-choice questions targeting moral inference, with carefully crafted distractors that challenge models to go beyond shallow, extractive question answering. To further stress-test model robustness, we introduce adversarial variants designed to surface LLM vulnerabilities and shortcuts due to issues such as data contamination. Our findings show that, while larger models outperform smaller ones, they remain susceptible to adversarial manipulation and often rely on superficial patterns rather than true moral reasoning. This brittleness results in significant self-contradiction, with the best models refuting their own answers in roughly 20% of cases depending on the framing of the moral choice. Interestingly, reasoning-enhanced models fail to bridge this gap, suggesting that scale - not reasoning ability - is the primary driver of performance.
Hadi Mohammadi, Yasmeen F. S. S. Meijer, Efthymia Papadopoulou
et al.
Recent advancements in large language models (LLMs) have established them as powerful tools across numerous domains. However, persistent concerns about embedded biases, such as gender, racial, and cultural biases arising from their training data, raise significant questions about the ethical use and societal consequences of these technologies. This study investigates the extent to which LLMs capture cross-cultural differences and similarities in moral perspectives. Specifically, we examine whether LLM outputs align with patterns observed in international survey data on moral attitudes. To this end, we employ three complementary methods: (1) comparing variances in moral scores produced by models versus those reported in surveys, (2) conducting cluster alignment analyses to assess correspondence between country groupings derived from LLM outputs and survey data, and (3) directly probing models with comparative prompts using systematically chosen token pairs. Our results reveal that current LLMs often fail to reproduce the full spectrum of cross-cultural moral variation, tending to compress differences and exhibit low alignment with empirical survey patterns. These findings highlight a pressing need for more robust approaches to mitigate biases and improve cultural representativeness in LLMs. We conclude by discussing the implications for the responsible development and global deployment of LLMs, emphasizing fairness and ethical alignment.
Indonesia's civic and educational landscape has increasingly been fragmented by the rise of identity politics, ideological polarization, and the erosion of inclusive nationalism. Amidst this crisis, the educational thought of Nurcholish Madjid (Cak Nur) offers a transformative weltanschauung, a synthesis of Islamic ethics, national consciousness, and modern rationality. This study aims to critically investigate Madjid’s educational paradigm within the framework of wawasan kebangsaan (national insight), repositioning education not as dogmatic transmission but as civic moral formation. Employing a descriptive qualitative approach with critical historiographical methods, the research analyzes primary sources, including Madjid’s writings, interviews with key intellectuals, and institutional records, from 1971 to 2002. Anchored in the theories of Karl Mannheim and Antonio Gramsci, the study interprets Madjid as an organic intellectual whose vision is institutionalized through Universitas Paramadina, Madania School, and the Nurcholish Madjid Society. The findings reveal that Madjid’s inclusive educational praxis serves as both a moral critique and a civic alternative to ideological extremism in Indonesian schooling. His vision bridges Islam and Pancasila, integrates character education with democratic citizenship, and promotes pluralism as a religious imperative. The novelty of this research lies in contextualizing Madjid’s pedagogy as an instrument for rebuilding national character in postcolonial education, rather than reducing it to liberal theology. This paper contributes to global debates on religion, education, and civic ethics by proposing a homegrown Indonesian model that reconciles faith, diversity, and democracy. Madjid’s weltanschauung remains a viable blueprint for inclusive, ethical, and future oriented national education.
Los feminismos han colocado en el debate público la identidad de las mujeres y la relación con sus diferentes espacios relacionales que atraviesan lo personal y estructural. Esto también afecta lo cristiano, situación que ha permitido que sean visibilizadas las mujeres desde lo que ha sido llamado una hermenéutica de la sospecha. Así se ha podido reconocer al kyriarcado como una estructura sociocultural alienante y descubrir mujeres que no han sido mencionadas desde el cristianismo primitivo hasta nuestra época. En esa línea de pensamiento también se tomará en cuenta una perspectiva ética latinoamericana que cuestione no solo la praxis social cristiana, sino que busca considerar los elementos que la han fortalecido. En ese sentido, desde una perspectiva genealógica, una postura ética antipatriarcal toma en serio la crítica de los feminismos y asume la responsabilidad desde los cuerpos masculinos y a ellos se dirige. Esto implica posicionarse ante los modos de hacer teología y los elementos narrativos que han sido ocupados para justificarse, para así cuestionar la religión patriarcal y proponer un cambio de visión que contemple lo diverso como una posibilidad relacional en horizontalidad y una participación activa que exponga en nuestros actos la Divinidad en la que creemos.
Feminist movements have brought the identity of women and their relationships with various personal and structural spheres into public discussion. This has also influenced Christianity, a situation that has enabled women to be recognized through what is called a hermeneutics of suspicion. This approach has helped identify kyriarchy—an alienating sociocultural structure—and has brought to light the neglect of women from early Christianity to the present day. This paper also incorporates a Latin American ethical perspective that not only questions Christian social practices but also seeks to understand the elements that have reinforced them. From a genealogical viewpoint, an anti-patriarchal ethical approach takes feminist critiques seriously, assumes responsibility for them from a male perspective, and addresses men directly. This means challenging traditional theological methods and the narratives used to justify them. The goal is to question patriarchal religion and propose a shift in vision that sees diversity as a relational possibility, promoting horizontal relationships and active participation that reveal the Divine in our actions.
This chapter critically discusses the essays that compose the volume. Regarding the production and use of plastics, it shows how the volume provides readers with a manual that supports groups moving toward organizing for social changes.
Jan-Philipp Fränken, Kanishk Gandhi, Tori Qiu
et al.
As AI systems like language models are increasingly integrated into decision-making processes affecting people's lives, it's critical to ensure that these systems have sound moral reasoning. To test whether they do, we need to develop systematic evaluations. We provide a framework that uses a language model to translate causal graphs that capture key aspects of moral dilemmas into prompt templates. With this framework, we procedurally generated a large and diverse set of moral dilemmas -- the OffTheRails benchmark -- consisting of 50 scenarios and 400 unique test items. We collected moral permissibility and intention judgments from human participants for a subset of our items and compared these judgments to those from two language models (GPT-4 and Claude-2) across eight conditions. We find that moral dilemmas in which the harm is a necessary means (as compared to a side effect) resulted in lower permissibility and higher intention ratings for both participants and language models. The same pattern was observed for evitable versus inevitable harmful outcomes. However, there was no clear effect of whether the harm resulted from an agent's action versus from having omitted to act. We discuss limitations of our prompt generation pipeline and opportunities for improving scenarios to increase the strength of experimental effects.
Large language models (LLMs) have become increasingly pivotal in various domains due the recent advancements in their performance capabilities. However, concerns persist regarding biases in LLMs, including gender, racial, and cultural biases derived from their training data. These biases raise critical questions about the ethical deployment and societal impact of LLMs. Acknowledging these concerns, this study investigates whether LLMs accurately reflect cross-cultural variations and similarities in moral perspectives. In assessing whether the chosen LLMs capture patterns of divergence and agreement on moral topics across cultures, three main methods are employed: (1) comparison of model-generated and survey-based moral score variances, (2) cluster alignment analysis to evaluate the correspondence between country clusters derived from model-generated moral scores and those derived from survey data, and (3) probing LLMs with direct comparative prompts. All three methods involve the use of systematic prompts and token pairs designed to assess how well LLMs understand and reflect cultural variations in moral attitudes. The findings of this study indicate overall variable and low performance in reflecting cross-cultural differences and similarities in moral values across the models tested, highlighting the necessity for improving models' accuracy in capturing these nuances effectively. The insights gained from this study aim to inform discussions on the ethical development and deployment of LLMs in global contexts, emphasizing the importance of mitigating biases and promoting fair representation across diverse cultural perspectives.
Self-correction is one of the most amazing emerging capabilities of Large Language Models (LLMs), enabling LLMs to self-modify an inappropriate output given a natural language feedback which describes the problems of that output. Moral self-correction is a post-hoc approach correcting unethical generations without requiring a gradient update, making it both computationally lightweight and capable of preserving the language modeling ability. Previous works have shown that LLMs can self-debias, and it has been reported that small models, i.e., those with less than 22B parameters, are not capable of moral self-correction. However, there is no direct proof as to why such smaller models fall short of moral self-correction, though previous research hypothesizes that larger models are skilled in following instructions and understanding abstract social norms. In this paper, we empirically validate this hypothesis in the context of social stereotyping, through meticulous prompting. Our experimental results indicate that (i) surprisingly, 3.8B LLMs with proper safety alignment fine-tuning can achieve very good moral self-correction performance, highlighting the significant effects of safety alignment; and (ii) small LLMs are indeed weaker than larger-scale models in terms of comprehending social norms and self-explanation through CoT, but all scales of LLMs show bad self-correction performance given unethical instructions.
Aditi Khandelwal, Utkarsh Agarwal, Kumar Tanmay
et al.
This paper explores the moral judgment and moral reasoning abilities exhibited by Large Language Models (LLMs) across languages through the Defining Issues Test. It is a well known fact that moral judgment depends on the language in which the question is asked. We extend the work of beyond English, to 5 new languages (Chinese, Hindi, Russian, Spanish and Swahili), and probe three LLMs -- ChatGPT, GPT-4 and Llama2Chat-70B -- that shows substantial multilingual text processing and generation abilities. Our study shows that the moral reasoning ability for all models, as indicated by the post-conventional score, is substantially inferior for Hindi and Swahili, compared to Spanish, Russian, Chinese and English, while there is no clear trend for the performance of the latter four languages. The moral judgments too vary considerably by the language.
Exploring moral dilemmas allows individuals to navigate moral complexity, where a reversal in decision certainty, shifting toward the opposite of one's initial choice, could reflect open-mindedness and less rigidity. This study probes how nonverbal emotional cues from conversational agents could influence decision certainty in moral dilemmas. While existing research heavily focused on verbal aspects of human-agent interaction, we investigated the impact of agents expressing anger and sadness towards the moral situations through animated chat balloons. We compared these with a baseline where agents offered same responses without nonverbal cues. Results show that agents displaying anger significantly caused reversal shifts in decision certainty. The interaction between participant gender and agents' nonverbal emotional cues significantly affects participants' perception of AI's influence. These findings reveal that even subtly altering agents' nonverbal cues may impact human moral decisions, presenting both opportunities to leverage these effects for positive outcomes and ethical risks for future human-AI systems.
The topic of academic integrity and ethics and culture of behavior is recognized as relevant in various communities and the scientific community. Homo digital, possessing a high level of critical thinking, should become the model of personality in the new paradigm of education. In the article, the authors analyze the philosophical aspects of Paulo Freire's critical pedagogy through the prism of academic integrity in the context of the modern development of the philosophy of education. They conclude that education is a means of liberating a person, and academic integrity is a way to achieve the same goal. Academic integrity is indirectly supported by the correct choice of communication methods, as well as procedures for creating new knowledge and the image of the world, which is personalized by a specific student. Various practices involved in the processes of perception allow a multidimensional understanding of the realities of the world. The authors emphasize that the concept of education, developed by the Brazilian pedagogue and philosopher F. Freire, involves a highly personal perception of the real world and educational content through a critical approach, which is supported by comparing the new knowledge acquired by the student with personal life experience.
Deep Ganguli, Amanda Askell, Nicholas Schiefer
et al.
We test the hypothesis that language models trained with reinforcement learning from human feedback (RLHF) have the capability to "morally self-correct" -- to avoid producing harmful outputs -- if instructed to do so. We find strong evidence in support of this hypothesis across three different experiments, each of which reveal different facets of moral self-correction. We find that the capability for moral self-correction emerges at 22B model parameters, and typically improves with increasing model size and RLHF training. We believe that at this level of scale, language models obtain two capabilities that they can use for moral self-correction: (1) they can follow instructions and (2) they can learn complex normative concepts of harm like stereotyping, bias, and discrimination. As such, they can follow instructions to avoid certain kinds of morally harmful outputs. We believe our results are cause for cautious optimism regarding the ability to train language models to abide by ethical principles.
Recently, computer scientists have developed large language models (LLMs) by training prediction models with large-scale language corpora and human reinforcements. The LLMs have become one promising way to implement artificial intelligence with accuracy in various fields. Interestingly, recent LLMs possess emergent functional features that emulate sophisticated human cognition, especially in-context learning and the chain of thought, which were unavailable in previous prediction models. In this paper, I will examine how LLMs might contribute to moral education and development research. To achieve this goal, I will review the most recently published conference papers and ArXiv preprints to overview the novel functional features implemented in LLMs. I also intend to conduct brief experiments with ChatGPT to investigate how LLMs behave while addressing ethical dilemmas and external feedback. The results suggest that LLMs might be capable of solving dilemmas based on reasoning and revising their reasoning process with external input. Furthermore, a preliminary experimental result from the moral exemplar test may demonstrate that exemplary stories can elicit moral elevation in LLMs as do they among human participants. I will discuss the potential implications of LLMs on research on moral education and development with the results.
As large language models (LLMs) become more deeply integrated into various sectors, understanding how they make moral judgments has become crucial, particularly in the realm of autonomous driving. This study utilized the Moral Machine framework to investigate the ethical decision-making tendencies of prominent LLMs, including GPT-3.5, GPT-4, PaLM 2, and Llama 2, comparing their responses to human preferences. While LLMs' and humans' preferences such as prioritizing humans over pets and favoring saving more lives are broadly aligned, PaLM 2 and Llama 2, especially, evidence distinct deviations. Additionally, despite the qualitative similarities between the LLM and human preferences, there are significant quantitative disparities, suggesting that LLMs might lean toward more uncompromising decisions, compared to the milder inclinations of humans. These insights elucidate the ethical frameworks of LLMs and their potential implications for autonomous driving.