Hasil untuk "Islam. Bahai Faith. Theosophy, etc."

Menampilkan 20 dari ~617320 hasil · dari arXiv, DOAJ, CrossRef

JSON API
DOAJ Open Access 2026
Love and Compassion in Learning: A Qur'an and Hadith Perspective in the Context of Inclusive Education

Rita Sriayu, Mahyuddin Barni, Abd. Majid et al.

This research aims to analyze the concepts of vinta (mahabbah) and compassion (rahmah) from the perspective of the Quran and hadith, and their relevance to inclusive education practices in Islam. This study is based on the understanding that love and compassion are the essence of Islamic teachings, which are not only normatively taught in religious texts but also have significant implications for a learning process that is friendly, just, and respectful of diversity. The research method used is literature review, examining literature related to the Quran, Hadith, Islamic education, and inclusive education. The study results show that the Quran and Hadith clearly emphasize the importance of rahmah and mahabbah as the foundation for social and educational relationships. These values then became the basic principles of inclusive education in Islam, as they encourage an attitude of valuing differences, empathy, and providing equal learning opportunities for all students. In addition, the application of love and affection in inclusive learning can be realized thru teacher modeling, a humanistic approach, teaching strategies responsive to individual needs, and the creation of a safe and supportive school culture. Thus, this research confirms that integrating the values of love and compassion is an important foundation for realizing holistic, humanistic, and inclusive education that aligns with the principles of Islamic education.

Islam, Education (General)
arXiv Open Access 2025
Matching, Unanticipated Experiences, Divorce, Flirting, Rematching, Etc

Burkhard C. Schipper, Tina Danting Zhang

We study dynamic decentralized two-sided matching in which players may encounter unanticipated experiences. As they become aware of these experiences, they may change their preferences over players on the other side of the market. Consequently, they may get ``divorced'' and rematch again with other agents, which may lead to further unanticipated experiences etc. A matching is stable if there is absence of pairwise common belief in blocking. Stable matchings can be destabilized by unanticipated experiences. Yet, we show that there exist self-confirming outcomes that are stable and do not lead to further unanticipated experiences. We introduce a natural decentralized matching process that, at each period assigns probability $1 - \varepsilon$ to the satisfaction of a mutual optimal blocking pair (if it exists) and picks any optimal blocking pair otherwise. The parameter $\varepsilon$ is interpreted as a friction of the matching market. We show that for any decentralized matching process, frictions are necessary for convergence to stability even without unawareness. Our process converges to self-confirming stable outcomes. Further, we allow for bilateral communication/flirting that changes the awareness and say that a matching is flirt-proof stable if there is absence of communication leading to pairwise common belief in blocking. We show that our natural decentralized matching process converges to flirt-proof self-confirming outcomes.

en econ.TH, cs.GT
arXiv Open Access 2025
Multi-Step Knowledge Interaction Analysis via Rank-2 Subspace Disentanglement

Sekh Mainul Islam, Pepa Atanasova, Isabelle Augenstein

Natural Language Explanations (NLEs) describe how Large Language Models (LLMs) make decisions by drawing on external Context Knowledge (CK) and Parametric Knowledge (PK). Understanding the interaction between these sources is key to assessing NLE grounding, yet these dynamics remain underexplored. Prior work has largely focused on (1) single-step generation and (2) modelled PK-CK interaction as a binary choice within a rank-1 subspace. This approach overlooks richer interactions and how they unfold over longer generations, such as complementary or supportive knowledge. We propose a novel rank-2 projection subspace that disentangles PK and CK contributions more accurately and use it for the first multi-step analysis of knowledge interactions across longer NLE sequences. Experiments across four QA datasets and three open-weight LLMs demonstrate that while rank-1 subspaces struggle to represent diverse interactions, our rank-2 formulation captures them effectively, highlighting PK alignment for supportive interactions and CK alignment for conflicting ones. Our multi-step analysis reveals, among others, that hallucinated generations exhibit strong alignment with the PK direction, whereas context-faithful generations maintain a more balanced alignment between PK and CK.

en cs.CL, cs.AI
arXiv Open Access 2025
Beyond Output Faithfulness: Learning Attributions that Preserve Computational Pathways

Siyu Zhang, Kenneth Mcmillan

Faithfulness metrics such as insertion and deletion evaluate how feature removal affects model outputs but overlook whether explanations preserve the computational pathway the network actually uses. We show that external metrics can be maximized through alternative pathways -- perturbations that reroute computation via different feature detectors while preserving output behavior. To address this, we propose activation preservation as a tractable proxy for preserving computational pathways We introduce Faithfulness-guided Ensemble Interpretation (FEI), which jointly optimizes external faithfulness (via ensemble quantile optimization of insertion/deletion curves) and internal faithfulness (via selective gradient clipping). Across VGG and ResNet on ImageNet and CUB-200-2011, FEI achieves state-of-the-art insertion/deletion scores while maintaining significantly lower activation deviation, showing that both external and internal faithfulness are essential for reliable explanations.

en cs.LG, cs.AI
arXiv Open Access 2025
Multilingual Self-Taught Faithfulness Evaluators

Carlo Alfano, Aymen Al Marjani, Zeno Jonke et al.

The growing use of large language models (LLMs) has increased the need for automatic evaluation systems, particularly to address the challenge of information hallucination. Although existing faithfulness evaluation approaches have shown promise, they are predominantly English-focused and often require expensive human-labeled training data for fine-tuning specialized models. As LLMs see increased adoption in multilingual contexts, there is a need for accurate faithfulness evaluators that can operate across languages without extensive labeled data. This paper presents Self-Taught Evaluators for Multilingual Faithfulness, a framework that learns exclusively from synthetic multilingual summarization data while leveraging cross-lingual transfer learning. Through experiments comparing language-specific and mixed-language fine-tuning approaches, we demonstrate a consistent relationship between an LLM's general language capabilities and its performance in language-specific evaluation tasks. Our framework shows improvements over existing baselines, including state-of-the-art English evaluators and machine translation-based approaches.

en cs.CL, cs.LG
arXiv Open Access 2025
Improving Diffusion-Based Image Editing Faithfulness via Guidance and Scheduling

Hansam Cho, Seoung Bum Kim

Text-guided diffusion models have become essential for high-quality image synthesis, enabling dynamic image editing. In image editing, two crucial aspects are editability, which determines the extent of modification, and faithfulness, which reflects how well unaltered elements are preserved. However, achieving optimal results is challenging because of the inherent trade-off between editability and faithfulness. To address this, we propose Faithfulness Guidance and Scheduling (FGS), which enhances faithfulness with minimal impact on editability. FGS incorporates faithfulness guidance to strengthen the preservation of input image information and introduces a scheduling strategy to resolve misalignment between editability and faithfulness. Experimental results demonstrate that FGS achieves superior faithfulness while maintaining editability. Moreover, its compatibility with various editing methods enables precise, high-quality image edits across diverse tasks.

en cs.CV, cs.AI
arXiv Open Access 2025
Faithfulness metric fusion: Improving the evaluation of LLM trustworthiness across domains

Ben Malin, Tatiana Kalganova, Nikolaos Boulgouris

We present a methodology for improving the accuracy of faithfulness evaluation in Large Language Models (LLMs). The proposed methodology is based on the combination of elementary faithfulness metrics into a combined (fused) metric, for the purpose of improving the faithfulness of LLM outputs. The proposed strategy for metric fusion deploys a tree-based model to identify the importance of each metric, which is driven by the integration of human judgements evaluating the faithfulness of LLM responses. This fused metric is demonstrated to correlate more strongly with human judgements across all tested domains for faithfulness. Improving the ability to evaluate the faithfulness of LLMs, allows for greater confidence to be placed within models, allowing for their implementation in a greater diversity of scenarios. Additionally, we homogenise a collection of datasets across question answering and dialogue-based domains and implement human judgements and LLM responses within this dataset, allowing for the reproduction and trialling of faithfulness evaluation across domains.

en cs.CL, cs.AI
DOAJ Open Access 2025
Pengembangan Alat Ukur Sabar Menjauhi Maksiat

Rifqi Ramadhani, M. Nursalim Malay, Nurul Isnaini

Degradasi moral di kalangan remaja Indonesia menunjukkan tren yang mengkhawatirkan, dengan meningkatnya pelanggaran norma sosial-agama. Meskipun telah ada beberapa instrumen pengukuran kesabaran, belum ada yang secara khusus mengukur kesabaran dalam konteks menjauhi maksiat. Penelitian ini bertujuan mengembangkan dan menguji validitas serta reliabilitas alat ukur sabar menjauhi maksiat berdasarkan teori Jauziyah (1998). Penelitian menggunakan pendekatan psikometrik dengan melibatkan 476 responden (316 perempuan, 160 laki-laki) berusia 18-21 tahun. Pengembangan instrumen melalui tahap studi literatur, expert judgment (dua pakar psikologi dan tiga pakar keislaman), dan uji psikometrik. Validitas konstruk diuji menggunakan Confirmatory Factor Analysis (CFA) dengan LISREL 8.80 Full Version. Dari 45 item awal yang mengukur lima aspek (ketahanan emosional, pengendalian niat, pengendalian prasangka, pengendalian ucapan, dan pengendalian perbuatan), 43 item terbukti valid dengan t-value >1.96. Reliabilitas konstruk menunjukkan nilai 0.621-0.787, dengan aspek pengendalian ucapan memiliki reliabilitas tertinggi (0.787). Hasil uji kecocokan model memenuhi kriteria good fit pada semua aspek (RMSEA = 0.000, Chi-Square / df = 2, NFI = 1.00, NNFI = ≥ 0.90, CFI = 1.00, SRMR ≤ 0.05, GFI = 1.00, AGFI ≥ 0.99). Penelitian ini menghasilkan alat ukur sabar menjauhi maksiat yang valid dan reliabel, terdiri dari 43 item yang mengukur lima aspek kesabaran. Alat ukur ini dapat dimanfaatkan oleh praktisi kesehatan mental dan pendidik dalam merancang dan mengevaluasi program pembinaan karakter berbasis nilai Islam, serta mendukung pengembangan intervensi berbasis bukti untuk penguatan kesabaran di kalangan remaja Muslim Indonesia

Islam, Psychology
arXiv Open Access 2024
Reconsidering Faithfulness in Regular, Self-Explainable and Domain Invariant GNNs

Steve Azzolin, Antonio Longa, Stefano Teso et al.

As Graph Neural Networks (GNNs) become more pervasive, it becomes paramount to build reliable tools for explaining their predictions. A core desideratum is that explanations are \textit{faithful}, \ie that they portray an accurate picture of the GNN's reasoning process. However, a number of different faithfulness metrics exist, begging the question of what is faithfulness exactly and how to achieve it. We make three key contributions. We begin by showing that \textit{existing metrics are not interchangeable} -- \ie explanations attaining high faithfulness according to one metric may be unfaithful according to others -- and can systematically ignore important properties of explanations. We proceed to show that, surprisingly, \textit{optimizing for faithfulness is not always a sensible design goal}. Specifically, we prove that for injective regular GNN architectures, perfectly faithful explanations are completely uninformative. This does not apply to modular GNNs, such as self-explainable and domain-invariant architectures, prompting us to study the relationship between architectural choices and faithfulness. Finally, we show that \textit{faithfulness is tightly linked to out-of-distribution generalization}, in that simply ensuring that a GNN can correctly recognize the domain-invariant subgraph, as prescribed by the literature, does not guarantee that it is invariant unless this subgraph is also faithful.The code is publicly available on GitHub

en cs.LG, cs.AI
arXiv Open Access 2024
Harnessing Ferro-Valleytricity in Penta-Layer Rhombohedral Graphene for Memory and Compute

Md Mazharul Islam, Shamiul Alam, Md Rahatul Islam Udoy et al.

Two-dimensional materials with multiple degrees of freedom, including spin, valleys, and orbitals, open up an exciting avenue for engineering multifunctional devices. Beyond spintronics, these degrees of freedom can lead to novel quantum effects such as valley-dependent Hall effects and orbital magnetism, which could revolutionize next-generation electronics. However, achieving independent control over valley polarization and orbital magnetism has been a challenge due to the need for large electric fields. A recent breakthrough involving penta-layer rhombohedral graphene has demonstrated the ability to individually manipulate anomalous Hall signals and orbital magnetic hysteresis, forming what is known as a valley-magnetic quartet. Here, we leverage the electrically tunable Ferro-valleytricity of penta-layer rhombohedral graphene to develop non-volatile memory and in-memory computation applications. We propose an architecture for a dense, scalable, and selector-less non-volatile memory array that harnesses the electrically tunable ferro-valleytricity. In our designed array architecture, non-destructive read and write operations are conducted by sensing the valley state through two different pairs of terminals, allowing for independent optimization of read/write peripheral circuits. The power consumption of our PRG-based array is remarkably low, with only ~ 6 nW required per write operation and ~ 2.3 nW per read operation per cell. This consumption is orders of magnitude lower than that of the majority of state-of-the-art cryogenic memories. Additionally, we engineer in-memory computation by implementing majority logic operations within our proposed non-volatile memory array without modifying the peripheral circuitry. Our framework presents a promising pathway toward achieving ultra-dense cryogenic memory and in-memory computation capabilities.

en cond-mat.mes-hall, cs.AR
arXiv Open Access 2024
Location Agnostic Source-Free Domain Adaptive Learning to Predict Solar Power Generation

Md Shazid Islam, A S M Jahid Hasan, Md Saydur Rahman et al.

The prediction of solar power generation is a challenging task due to its dependence on climatic characteristics that exhibit spatial and temporal variability. The performance of a prediction model may vary across different places due to changes in data distribution, resulting in a model that works well in one region but not in others. Furthermore, as a consequence of global warming, there is a notable acceleration in the alteration of weather patterns on an annual basis. This phenomenon introduces the potential for diminished efficacy of existing models, even within the same geographical region, as time progresses. In this paper, a domain adaptive deep learning-based framework is proposed to estimate solar power generation using weather features that can solve the aforementioned challenges. A feed-forward deep convolutional network model is trained for a known location dataset in a supervised manner and utilized to predict the solar power of an unknown location later. This adaptive data-driven approach exhibits notable advantages in terms of computing speed, storage efficiency, and its ability to improve outcomes in scenarios where state-of-the-art non-adaptive methods fail. Our method has shown an improvement of $10.47 \%$, $7.44 \%$, $5.11\%$ in solar power prediction accuracy compared to best performing non-adaptive method for California (CA), Florida (FL) and New York (NY), respectively.

en cs.LG
arXiv Open Access 2024
Heat transfer in a planer diverging channel with a slot jet inlet

Md Insiat Islam Rabby, Mohammad Ali Rob Sharif, Mohammad Tarequl Islam et al.

This article delves into a numerical exploration of two-dimensional, incompressible, laminar flow within a confined diverging jet. The study aims to understand how variations in the inlet opening fraction and Reynolds number affect the heat transfer and flow patterns. The research employs the finite volume method with a collocated mesh to solve the governing equations. Across a broad spectrum of inlet opening fractions (0.2, 0.4, and 0.6) and Reynolds numbers (ranging from 500 to 900), the findings reveal that increasing the inlet opening fraction of the jet in the diverging channel can lead to a remarkable (53%) improvement in heat transfer while simultaneously reducing pressure loss by 90%. This outcome holds the potential to conserve energy by requiring less pumping power. Notably, this investigation is pioneering and offers novel and valuable insights into enhancing heat transfer through the utilization of a diverging channel.

en physics.flu-dyn, physics.comp-ph
arXiv Open Access 2024
Size and Smoothness Aware Adaptive Focal Loss for Small Tumor Segmentation

Md Rakibul Islam, Riad Hassan, Abdullah Nazib et al.

Deep learning has achieved remarkable accuracy in medical image segmentation, particularly for larger structures with well-defined boundaries. However, its effectiveness can be challenged by factors such as irregular object shapes and edges, non-smooth surfaces, small target areas, etc. which complicate the ability of networks to grasp the intricate and diverse nature of anatomical regions. In response to these challenges, we propose an Adaptive Focal Loss (A-FL) that takes both object boundary smoothness and size into account, with the goal to improve segmentation performance in intricate anatomical regions. The proposed A-FL dynamically adjusts itself based on an object's surface smoothness, size, and the class balancing parameter based on the ratio of targeted area and background. We evaluated the performance of the A-FL on the PICAI 2022 and BraTS 2018 datasets. In the PICAI 2022 dataset, the A-FL achieved an Intersection over Union (IoU) score of 0.696 and a Dice Similarity Coefficient (DSC) of 0.769, outperforming the regular Focal Loss (FL) by 5.5% and 5.4% respectively. It also surpassed the best baseline by 2.0% and 1.2%. In the BraTS 2018 dataset, A-FL achieved an IoU score of 0.883 and a DSC score of 0.931. Our ablation experiments also show that the proposed A-FL surpasses conventional losses (this includes Dice Loss, Focal Loss, and their hybrid variants) by large margin in IoU, DSC, and other metrics. The code is available at https://github.com/rakibuliuict/AFL-CIBM.git.

en eess.IV, cs.AI
arXiv Open Access 2024
FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on

Chenhui Wang, Tao Chen, Zhihao Chen et al.

Despite their impressive generative performance, latent diffusion model-based virtual try-on (VTON) methods lack faithfulness to crucial details of the clothes, such as style, pattern, and text. To alleviate these issues caused by the diffusion stochastic nature and latent supervision, we propose a novel Faithful Latent Diffusion Model for VTON, termed FLDM-VTON. FLDM-VTON improves the conventional latent diffusion process in three major aspects. First, we propose incorporating warped clothes as both the starting point and local condition, supplying the model with faithful clothes priors. Second, we introduce a novel clothes flattening network to constrain generated try-on images, providing clothes-consistent faithful supervision. Third, we devise a clothes-posterior sampling for faithful inference, further enhancing the model performance over conventional clothes-agnostic Gaussian sampling. Extensive experimental results on the benchmark VITON-HD and Dress Code datasets demonstrate that our FLDM-VTON outperforms state-of-the-art baselines and is able to generate photo-realistic try-on images with faithful clothing details.

en cs.CV
DOAJ Open Access 2024
Influence of student factors on entrepreneurial intentions: Evidence from Nigeria

Olumuyiwa Oluseun Adeoye, Timilehin Olasoji Olubiyi

Purpose – This study aims to identify the factors influencing entrepreneurial intentions among private university students in South-West Nigeria, and to determine their significance. Methodology – The study population consisted of final-year students from the Faculty/College of Business and Social Sciences across 11 selected private universities in South-West, Nigeria, offering entrepreneurship courses. This study used a sample of 623 students. Data were collected using a self-developed instrument with a reliability coefficient of 0.847 for student-related factors and 0.80 a entrepreneurial intentions. The Relative Significance Index (RSI) and multiple regression analyses were used. Findings – The results revealed that most students perceived several factors as influential on entrepreneurial intention. The key factors were students’ personal factors, family history, technical abilities, and parental attitude. Despite recognizing these influences, many students lacked the skills to solve challenges and effectively utilized technical literature and other information sources. Multiple regression analysis indicated that student parental attitude, student personal factors, and technical abilities significantly influenced entrepreneurial intentions. Implications – This study highlights the importance of enhancing students' personal factors, technical abilities, and parental attitudes to foster entrepreneurial intention. Educational institutions and policymakers should focus on these areas to cultivate an entrepreneurial mindset among students. Originality – This study provides empirical evidence on the determinants of entrepreneurial intentions among private university students in Nigeria, contributing to a broader understanding of how personal, familial, and technical factors shape entrepreneurial aspirations.

Islamic law, Islam
DOAJ Open Access 2024
Examining Ibn Junaid's View about the Scope of hijab and Muslim Women's Shar'i Covering

Hosein Souzanchi

Ibn Junaid Eskafi is one of the great Shiite scholars of the 4th century, whose books have not reached us, but some of his views have been quoted in jurisprudence books. Allameh Helii brought up a sentence of his about the clothing of men and women and he himself added an explanation to it, which has been misunderstood; To the extent that some non-experts have stated that Ibn Junayd considers the extent of obligatory covering for men and women to be equal in front of others! In this article, by an analytical method and by referring to the quoted sayings of Ibn Junayd, it becomes clear that such an understanding of Ibn Junayd's words was due to unfamiliarity with specialized jurisprudential texts, and Ibn Junayd did not believe in such a view not only regarding the women's veils in front of non-incest, but even in the "Prayer Veil"; Rather, the only difference between his opinion and others regarding women's veil is that he does not consider it obligatory to cover women's head during prayer, with the condition that non- incest does not see during prayer. Also, by carefully examining the above-mentioned quotes from him by Allameh Helii and others, it becomes clear that Ibn Junaid's sentence, which has been misunderstood, was only referring to the meaning of the word ``Awrat'', and if there is any ruling from it, it is the general ruling of veiling, which is also Both men and women are obliged to observe it in front of everyone (that is, even in front of their own sex and in front of incest); And his fatwa in this context is the same as the fatwa of the general Shia jurists.

The family. Marriage. Woman, Islam
DOAJ Open Access 2024
Analysis of Determinants Driving Interest Student Accountancy for Role in World Businessman

Rendra Bagus Saputra, Lintang Kurniawati

Commencing a firm or engaging in entrepreneurship is a means for individuals to generate income, so bolstering a nation's economy and employment opportunities. This study sought to examine the influence of motivation, environment, social media, digital marketing, and love of money on entrepreneurial inclinations. 973 alumni graduated in the years 2020 and 2021. This study employed a qualitative research approach, gathering data through a questionnaire administered via Google Forms. The data were examined using the purposive sampling technique based on the Slobin formula. The influence of motivation on the entrepreneurial aspirations of students is significant. The environment plays a significant role in shaping the amount of interest in entrepreneurship. Social media has a significant impact on the entrepreneurial aspirations of students. Digital marketing has a big impact on students' entrepreneurial interests. An individual's level of interest in entrepreneurship can be greatly influenced by their strong desire for riches.

Islam, Economics as a science
DOAJ Open Access 2024
Developing Digital-Based Islamic Religious Education Teaching Modules on the Subject Matter of Duha Prayer in Elementary Schools

Hamdi, Setria Utama Rizal, Nurul Hikmah et al.

Purpose –  The purposes of this study are to produce digital-based Islamic religious education teaching module products developed using the eXe-Learning application and to find out whether digital-based Islamic Religious Education teaching module products are feasible to use in learning. Design/methods/approach – This study used development research (Research and Development) with the 4D development model (Define, Design, Development, and Dissemination). At the design validation stage, the digital-based Islamic Religious Education teaching module was validated by material experts and media experts. Data collection techniques used questionnaire instruments Findings – The assessment by the material experts, consisting of 16 aspects, was 90% categorized as very feasible. The evaluation by the first media expert, consisting of 20 aspects, was 82% categorized as very feasible. The second media expert validation, as a whole, was 96% categorized as “very feasible.” The average value of material and media experts was 89.33% (Very Feasible). Meanwhile, the student response to the digital-based Islamic Religious Education teaching module, consisting of 12 aspects, obtained 67.2%, including the “feasible” category. Moreover, in terms of the module's effectiveness in learning, it had an effect in the form of increasing student skills in performing the Duha prayer. Research implications/limitations – The implications of this development research practically have contributed to producing digital-based PAI teaching module products on the Duha prayer material used in learning. The main field trial was limited to Class IV of SDIT Al-Furqon Palangka Raya. Originality/value –  This development research supports the implementation of the Merdeka Curriculum in fulfilling the learning tools for the Merdeka Curriculum teaching modules, allowing it to contribute to the world of education

Theory and practice of education, Islam
DOAJ Open Access 2024
ANALYSIS OF RUSSIAN FACTORS SUPPORTING NICOLAS MADURO IN THE VENEZUELAN CRISIS FROM ALEXANDER WENDT'S CONSTRUCTIVIST PERSPECTIVE

Jamal Din Aulia, Endah Kurniati

The crisis happening in Venezuela is a humanitarian crisis that affects political instability and local economics. This economic crisis is caused by reducing the cost of oil supply. Besides, the political crisis began with internal problems between the government and the opponent, exacerbating the situation. It brings attention to developing countries to get into this crisis dynamic, such as Russia. In Venezuela's crisis, Russia offered material support and morality to Nicolas Maduro's government. Thus, Russia attempts to support Maduro from international claims. This research intends to analyze the reasons behind Russia's supporting Nicolas Maduro in the Venezuela Crisis by using the constructivism theory of Alexander Wendt. Based on research results, showed 3 variables that explain Russia's involvement in the Venezuela crisis. Firstly, interdependence is related to corporation dependence between Russia and Venezuela. Second, homogeneity, is based on both the same background. Lastly, common fate is based on the same destiny and adversary between Russia and Venezuela.

arXiv Open Access 2023
Evaluation of Faithfulness Using the Longest Supported Subsequence

Anirudh Mittal, Timo Schick, Mikel Artetxe et al.

As increasingly sophisticated language models emerge, their trustworthiness becomes a pivotal issue, especially in tasks such as summarization and question-answering. Ensuring their responses are contextually grounded and faithful is challenging due to the linguistic diversity and the myriad of possible answers. In this paper, we introduce a novel approach to evaluate faithfulness of machine-generated text by computing the longest noncontinuous substring of the claim that is supported by the context, which we refer to as the Longest Supported Subsequence (LSS). Using a new human-annotated dataset, we finetune a model to generate LSS. We introduce a new method of evaluation and demonstrate that these metrics correlate better with human ratings when LSS is employed, as opposed to when it is not. Our proposed metric demonstrates an 18% enhancement over the prevailing state-of-the-art metric for faithfulness on our dataset. Our metric consistently outperforms other metrics on a summarization dataset across six different models. Finally, we compare several popular Large Language Models (LLMs) for faithfulness using this metric. We release the human-annotated dataset built for predicting LSS and our fine-tuned model for evaluating faithfulness.

en cs.CL, cs.AI

Halaman 16 dari 30866