Responsible Evaluation of AI for Mental Health
Hiba Arnaout, Anmol Goel, H. Andrew Schwartz
et al.
Although artificial intelligence (AI) shows growing promise for mental health care, current approaches to evaluating AI tools in this domain remain fragmented and poorly aligned with clinical practice, social context, and first-hand user experience. This paper argues for a rethinking of responsible evaluation -- what is measured, by whom, and for what purpose -- by introducing an interdisciplinary framework that integrates clinical soundness, social context, and equity, providing a structured basis for evaluation. Through an analysis of 135 recent *CL publications, we identify recurring limitations, including over-reliance on generic metrics that do not capture clinical validity, therapeutic appropriateness, or user experience, limited participation from mental health professionals, and insufficient attention to safety and equity. To address these gaps, we propose a taxonomy of AI mental health support types -- assessment-, intervention-, and information synthesis-oriented -- each with distinct risks and evaluative requirements, and illustrate its use through case studies.
Differential Mental Disorder Detection with Psychology-Inspired Multimodal Stimuli
Zhiyuan Zhou, Jingjing Wu, Zhibo Lei
et al.
Differential diagnosis of mental disorders remains a fundamental challenge in real-world clinical practice, where multiple conditions often exhibit overlapping symptoms. However, most existing public datasets are developed under single-disorder settings and rely on limited data elicitation paradigms, restricting their ability to capture disorder-specific patterns. In this work, we investigate differential mental disorder detection through psychology-inspired multimodal stimuli, designed to elicit diverse emotional, cognitive, and behavioral responses grounded in findings from experimental psychology. Based on this paradigm, we collect a large-scale multimodal mental health dataset (MMH) covering depression, anxiety, and schizophrenia, with all diagnostic labels clinically verified by licensed psychiatrists. To effectively model the heterogeneous signals induced by diverse elicitation tasks, we further propose a paradigm-aware multimodal framework that leverages inter-disorder differences prior knowledge as prompt-guided semantic descriptions to capture task-specific affective and interaction contexts for multimodal representation learning in the new differential mental disorder detection task. Extensive experiments show that our framework consistently outperforms existing baselines, underscoring the value of psychology-inspired stimulus design for differential mental disorder detection.
A Conditional Companion: Lived Experiences of People with Mental Health Disorders Using LLMs
Aditya Kumar Purohit, Hendrik Heuer
Large Language Models (LLMs) are increasingly used for mental health support, yet little is known about how people with mental health challenges engage with them, how they evaluate their usefulness, and what design opportunities they envision. We conducted 20 semi-structured interviews with people in the UK who live with mental health conditions and have used LLMs for mental health support. Through reflexive thematic analysis, we found that participants engaged with LLMs in conditional and situational ways: for immediacy, the desire for non-judgement, self-paced disclosure, cognitive reframing, and relational engagement. Simultaneously, participants articulated clear boundaries informed by prior therapeutic experience: LLMs were effective for mild-to-moderate distress but inadequate for crises, trauma, and complex social-emotional situations. We contribute empirical insights into the lived use of LLMs for mental health, highlight boundary-setting as central to their safe role, and propose design and governance directions for embedding them responsibly within care ecosystem.
Student Mental Health Screening via Fitbit Data Collected During the COVID-19 Pandemic
Rebecca Lopez, Avantika Shrestha, ML Tlachac
et al.
College students experience many stressors, resulting in high levels of anxiety and depression. Wearable technology provides unobtrusive sensor data that can be used for the early detection of mental illness. However, current research is limited concerning the variety of psychological instruments administered, physiological modalities, and time series parameters. In this research, we collect the Student Mental and Environmental Health (StudentMEH) Fitbit dataset from students at our institution during the pandemic. We provide a comprehensive assessment of the ability of predictive machine learning models to screen for depression, anxiety, and stress using different Fitbit modalities. Our findings indicate potential in physiological modalities such as heart rate and sleep to screen for mental illness with the F1 scores as high as 0.79 for anxiety, the former modality reaching 0.77 for stress screening, and the latter modality achieving 0.78 for depression. This research highlights the potential of wearable devices to support continuous mental health monitoring, the importance of identifying best data aggregation levels and appropriate modalities for screening for different mental ailments.
Domain-Specific Constitutional AI: Enhancing Safety in LLM-Powered Mental Health Chatbots
Chenhan Lyu, Yutong Song, Pengfei Zhang
et al.
Mental health applications have emerged as a critical area in computational health, driven by rising global rates of mental illness, the integration of AI in psychological care, and the need for scalable solutions in underserved communities. These include therapy chatbots, crisis detection, and wellness platforms handling sensitive data, requiring specialized AI safety beyond general safeguards due to emotional vulnerability, risks like misdiagnosis or symptom exacerbation, and precise management of vulnerable states to avoid severe outcomes such as self-harm or loss of trust. Despite AI safety advances, general safeguards inadequately address mental health-specific challenges, including crisis intervention accuracy to avert escalations, therapeutic guideline adherence to prevent misinformation, scale limitations in resource-constrained settings, and adaptation to nuanced dialogues where generics may introduce biases or miss distress signals. We introduce an approach to apply Constitutional AI training with domain-specific mental health principles for safe, domain-adapted CAI systems in computational mental health applications.
EMINDS: Understanding User Behavior Progression for Mental Health Exploration on Social Media
Rui Sheng, Yifang Wang, Xingbo Wang
et al.
Mental health is an urgent societal issue, and social scientists are increasingly turning to online mental health communities (OMHCs) to analyze user behavior data for early intervention. However, existing sequence mining techniques fall short of the urgent need to explore the behavior progression of different groups (e.g., recovery or deterioration groups) and track the potential long-term impact of behaviors on mental health status. To address this issue, we introduce EMINDS, a visual analytics system built on a novel automatic mining pipeline that extracts distinct behavior stages and assesses the potential impact of frequent stage patterns on mental health status over time. The system includes a set of interactive visualizations that summarize the meaning of each behavior stage and the evolution of different stage patterns. We feature a pattern-centric Sankey diagram to reveal contextual information about the impact of stage patterns on mental health, helping experts understand the specific changes in sequences before and after a stage pattern. We evaluated the effectiveness and usability of EMINDS through two case studies and expert interviews, which examined the potential stage patterns impacting long-term mental health by analyzing user behaviors on Reddit.
Hyperphantasia: A Benchmark for Evaluating the Mental Visualization Capabilities of Multimodal LLMs
Mohammad Shahab Sepehri, Berk Tinaz, Zalan Fabian
et al.
Mental visualization, the ability to construct and manipulate visual representations internally, is a core component of human cognition and plays a vital role in tasks involving reasoning, prediction, and abstraction. Despite the rapid progress of Multimodal Large Language Models (MLLMs), current benchmarks primarily assess passive visual perception, offering limited insight into the more active capability of internally constructing visual patterns to support problem solving. Yet mental visualization is a critical cognitive skill in humans, supporting abilities such as spatial navigation, predicting physical trajectories, and solving complex visual problems through imaginative simulation. To bridge this gap, we introduce Hyperphantasia, a synthetic benchmark designed to evaluate the mental visualization abilities of MLLMs through four carefully constructed puzzles. Each puzzle is procedurally generated and presented at three difficulty levels, enabling controlled analysis of model performance across increasing complexity. Our comprehensive evaluation of state-of-the-art models reveals a substantial gap between the performance of humans and MLLMs. Additionally, we explore the potential of reinforcement learning to improve visual simulation capabilities. Our findings suggest that while some models exhibit partial competence in recognizing visual patterns, robust mental visualization remains an open challenge for current MLLMs.
OAM-Assisted Self-Healing Is Directional, Proportional and Persistent
Marek Klemes, Lan Hu, Greg Bowles
et al.
In this paper we demonstrate the postulated mechanism of self-healing specifically due to orbital-angular-momentum (OAM) in radio vortex beams having equal beam-widths. In previous work we experimentally demonstrated self-healing effects in OAM beams at 28 GHz and postulated a theoretical mechanism to account for them. In this work we further characterize the OAM self-healing mechanism theoretically and confirm those characteristics with systematic and controlled experimental measurements on a 28 GHz outdoor link. Specifically, we find that the OAM self-healing mechanism is an additional self-healing mechanism in structured electromagnetic beams which is directional with respect to the displacement of an obstruction relative to the beam axis. We also confirm our previous findings that the amount of OAM self-healing is proportional to the OAM order, and additionally find that it persists beyond the focusing region into the far field. As such, OAM-assisted self-healing brings an advantage over other so-called non-diffracting beams both in terms of the minimum distance for onset of self-healing and the amount of self-healing obtainable. We relate our findings by extending theoretical models in the literature and develop a unifying electromagnetic analysis to account for self-healing of OAM-bearing non-diffracting beams more rigorously.
en
physics.optics, eess.SP
Mental Multi-class Classification on Social Media: Benchmarking Transformer Architectures against LSTM Models
Khalid Hasan, Jamil Saquer, Yifan Zhang
Millions of people openly share mental health struggles on social media, providing rich data for early detection of conditions such as depression, bipolar disorder, etc. However, most prior Natural Language Processing (NLP) research has focused on single-disorder identification, leaving a gap in understanding the efficacy of advanced NLP techniques for distinguishing among multiple mental health conditions. In this work, we present a large-scale comparative study of state-of-the-art transformer versus Long Short-Term Memory (LSTM)-based models to classify mental health posts into exclusive categories of mental health conditions. We first curate a large dataset of Reddit posts spanning six mental health conditions and a control group, using rigorous filtering and statistical exploratory analysis to ensure annotation quality. We then evaluate five transformer architectures (BERT, RoBERTa, DistilBERT, ALBERT, and ELECTRA) against several LSTM variants (with or without attention, using contextual or static embeddings) under identical conditions. Experimental results show that transformer models consistently outperform the alternatives, with RoBERTa achieving 91-99% F1-scores and accuracies across all classes. Notably, attention-augmented LSTMs with BERT embeddings approach transformer performance (up to 97% F1-score) while training 2-3.5 times faster, whereas LSTMs using static embeddings fail to learn useful signals. These findings represent the first comprehensive benchmark for multi-class mental health detection, offering practical guidance on model selection and highlighting an accuracy-efficiency trade-off for real-world deployment of mental health NLP systems.
Menta: A Small Language Model for On-Device Mental Health Prediction
Tianyi Zhang, Xiangyuan Xue, Lingyan Ruan
et al.
Mental health conditions affect hundreds of millions globally, yet early detection remains limited. While large language models (LLMs) have shown promise in mental health applications, their size and computational demands hinder practical deployment. Small language models (SLMs) offer a lightweight alternative, but their use for social media--based mental health prediction remains largely underexplored. In this study, we introduce Menta, the first optimized SLM fine-tuned specifically for multi-task mental health prediction from social media data. Menta is jointly trained across six classification tasks using a LoRA-based framework, a cross-dataset strategy, and a balanced accuracy--oriented loss. Evaluated against nine state-of-the-art SLM baselines, Menta achieves an average improvement of 15.2\% across tasks covering depression, stress, and suicidality compared with the best-performing non--fine-tuned SLMs. It also achieves higher accuracy on depression and stress classification tasks compared to 13B-parameter LLMs, while being approximately 3.25x smaller. Moreover, we demonstrate real-time, on-device deployment of Menta on an iPhone 15 Pro Max, requiring only approximately 3GB RAM. Supported by a comprehensive benchmark against existing SLMs and LLMs, Menta highlights the potential for scalable, privacy-preserving mental health monitoring. Code is available at: https://hong-labs.github.io/menta-project/
Transforming mental health care in Kenya: The critical role of routine outcomes monitoring in specialized services
Clara Paz, Anne A. Obondo, Ian Kanyanya
et al.
This short communication advocates for the strategic implementation of Routine Outcome Monitoring (ROM) systems within Kenya's specialized mental health institutions, especially teaching and referral hospitals. The proposed shift aims to transform these institutions into learning health systems, dynamic structures that systematically capture and utilize data from everyday clinical practice to drive continuous improvement in care quality, accountability, and population mental health outcomes. Kenya's mental health system, while advancing in policy and infrastructure, still faces a critical gap in routine, data-driven practices that support outcome-based care. ROM offers a practical and scalable solution to bridge this divide.We emphasize that this transition requires more than technological adoption. It demands a systemic reorientation: establishing digital standards for outcome tracking, allocating protected time for data reflection, strengthening referral pathways, and aligning local practices with national health indicators. These changes can help institutions become national exemplars, where evidence from clinical encounters informs policy, enhances training, and fosters research capacity.ROM is framed not as a technical add-on but as a transformative tool capable of reshaping how care is delivered, evaluated, and improved over time. It can anchor Kenya's mental health care in real-time learning, ensuring that services adapt responsively to the needs of patients and communities. With political will, stakeholder collaboration, and thoughtful implementation, Kenya has the opportunity to lead among LMICs in establishing a mental health system that is evidence-informed, equitable, and sustainable.
Mental healing, Public aspects of medicine
Is Bipolar Disorder Worked With in NHS Talking Therapies, and What Are the Views of Staff and Service Users? Results From a Linked Staff and Service User Survey and Freedom of Information Request
Thomas Richardson, Kim Wright, Rebecca Strawbridge
et al.
ABSTRACT CBT is effective for Bipolar Disorder (BD), however there is often poor access. Despite IAPT‐SMI pilot sites, there has been no roll out of CBT for BD in NHS Talking Therapies Services. This study aimed to examine the extent to which BD is seen in these services. A survey was conducted of 147 service users with BD and 106 staff. A freedom of information request was also responded to by 48 NHS trusts. Forty‐nine percent of those with BD had tried to access NHS Talking Therapies, with this being before a formal diagnosis for 42% of those who had tried to access. 29% were told that they could not be worked with as they had BD. Main reasons for referral were depression followed by anxiety disorders and PTSD. Staff surveys and FOI requests showed that relapse prevention work was rarely conducted with BD though comorbid conditions in particular anxiety and PTSD were often treated. BD was rarely routinely screened for, and staff were rarely trained about working with BD specifically. FOI requests showed that a formal BD diagnosis made up only 0.2% of overall referrals, with those with BD being significantly more likely to be discharged after an initial assessment (OR = 4.69). There are few people with a formal BD diagnosis seen within NHS Talking Therapies services, however, increased screening may help with earlier diagnosis of those who present with depression. Comorbid anxiety and PTSD are usually worked with in these services. Staff have limited confidence and additional training is warranted.
Mental healing, Psychiatry
Managing delayed-onset post-traumatic stress disorder triggered by the Syrian war, COVID-19, and an earthquake: A case report on therapy for a sexual abuse survivor using continuous exposure and digital communication
Hussam Gharib, Mohamed Bassam Hayek, Ahmad Shathel Omar Dakkak
et al.
A clinically significant number of patients suffer from delayed-onset post-traumatic stress disorder (delayed-onset PTSD), where symptoms appear at least six months after the traumatic event. This case report describes the combined use of pharmacotherapy with cognitive‒behavioral therapy (CBT) and Narrative Exposure Therapy (NET) to treat symptoms of delayed-onset PTSD in a young Syrian girl who experienced recurring flashbacks of repeated sexual abuse that began several years prior. The death of her brother, the COVID-19 pandemic, and a subsequent earthquake in Aleppo-Syria at 2023 triggered memories of prolonged physical abuse, that were not contextually related. The patient coped with these distressing memories and unwanted thoughts through behavioral avoidance. She received treatment through psychiatric clinic sessions, and WhatsApp communication (due to confidentiality concerns), combining pharmacological therapy with psychological support. The treatment included cognitive restructuring, narrative exposure, and identifying triggers, leading to the reprocessing of trauma-related memories. Her PTSD symptoms reached a non-observable stage of PTSD, even when she was confronted with thoughts or contextual situations.
Mental healing, Public aspects of medicine
Diversity, biology, and history of psilocybin-containing fungi: Suggestions for research and technological development.
R. V. Van Court, M. Wiseman, K. W. Meyer
et al.
Therapeutic use of psilocybin has become a focus of recent international research, with preliminary data showing promise to address a range of treatment-resistant mental health conditions. However, use of psilocybin as a healing entheogen has a long history through traditional consumption of mushrooms from the genus Psilocybe. The forthcoming adoption of new psilocybin-assisted therapeutic practices necessitates identification of preferred sources of psilocybin; consequently, comprehensive understanding of psilocybin-containing fungi is fundamental to consumer safety. Here we examine psilocybin producing fungi, discuss their biology, diversity, and ethnomycological uses. We also review recent work focused on elucidation of psilocybin biosynthetic production pathways, especially those from the genus Psilocybe, and their evolutionary history. Current research on psilocybin therapies is discussed, and recommendations for necessary future mycological research are outlined.
Wearable Sensor for Continuous Sweat Biomarker Monitoring
Yuting Qiao, Lijuan Qiao, Zhiming Chen
et al.
In recent years, wearable sensors have enabled the unique mode of real-time and noninvasive monitoring to develop rapidly in medical care, sports, and other fields. Sweat contains a wide range of biomarkers such as metabolites, electrolytes, and various hormones. Combined with wearable technology, sweat can reflect human fatigue, disease, mental stress, dehydration, and so on. This paper comprehensively describes the analysis of sweat components such as glucose, lactic acid, electrolytes, pH, cortisol, vitamins, ethanol, and drugs by wearable sensing technology, and the application of sweat wearable devices in glasses, patches, fabrics, tattoos, and paper. The development trend of sweat wearable devices is prospected. It is believed that if the sweat collection, air permeability, biocompatibility, sensing array construction, continuous monitoring, self-healing technology, power consumption, real-time data transmission, specific recognition, and other problems of the wearable sweat sensor are solved, we can provide the wearer with important information about their health level in the true sense.
Chronic Pain-Induced Depression: A Review of Prevalence and Management
Roja T Meda, S. P. Nuguru, Sriker Rachakonda
et al.
Chronic pain is ongoing pain that has persisted beyond standard tissue healing time along with comorbidities such as depression. This article discusses studies that have shown the prevalence of chronic pain and chronic pain-induced depression and explained methods of prevention for these conditions. The molecular mechanisms such as monoamine neurotransmitters, brain-derived neurotrophic factor, inflammatory factors, and glutamate that are similar in chronic pain and depression have also been discussed. This article reviews the methods of management that utilize the identification of these molecular mechanisms to treat this condition further. It also emphasizes the importance of the awareness of chronic pain-induced depression for the upcoming advances in the subject of mental health.
Automated Multi-Label Annotation for Mental Health Illnesses Using Large Language Models
Abdelrahaman A. Hassan, Radwa J. Hanafy, Mohammed E. Fouda
The growing prevalence and complexity of mental health disorders present significant challenges for accurate diagnosis and treatment, particularly in understanding the interplay between co-occurring conditions. Mental health disorders, such as depression and Anxiety, often co-occur, yet current datasets derived from social media posts typically focus on single-disorder labels, limiting their utility in comprehensive diagnostic analyses. This paper addresses this critical gap by proposing a novel methodology for cleaning, sampling, labeling, and combining data to create versatile multi-label datasets. Our approach introduces a synthetic labeling technique to transform single-label datasets into multi-label annotations, capturing the complexity of overlapping mental health conditions. To achieve this, two single-label datasets are first merged into a foundational multi-label dataset, enabling realistic analyses of co-occurring diagnoses. We then design and evaluate various prompting strategies for large language models (LLMs), ranging from single-label predictions to unrestricted prompts capable of detecting any present disorders. After rigorously assessing multiple LLMs and prompt configurations, the optimal combinations are identified and applied to label six additional single-disorder datasets from RMHD. The result is SPAADE-DR, a robust, multi-label dataset encompassing diverse mental health conditions. This research demonstrates the transformative potential of LLM-driven synthetic labeling in advancing mental health diagnostics from social media data, paving the way for more nuanced, data-driven insights into mental health care.
Touch, feel, heal. The use of hospital green spaces and landscape as sensory-therapeutic gardens: a case study in a university clinic
Mihaela Dinu Roman Szabo, Adelina Dumitras, Diana-Maria Mircea
et al.
It has been documented that patients with mental or physical disabilities can benefit from being placed within the setting of a natural environment. Consequently, the concept of creating spaces that can enhance health preservation or patient recovery, while also augmenting environmental and aesthetic value, has merged as a contemporary discourse. Green areas around hospitals can offer a great opportunity to incorporate healing gardens to benefit their patients and not only. The aim of this paper is to propose a design for a sensory-therapeutic garden based on key principles derived from selected academic literature, focusing on the application of these principles in a healthcare setting in Cluj-Napoca, Romania. The design was informed also by onsite data collection and analysis, and it aims to create a healing landscape that addresses the needs of patients, healthcare providers, and visitors. This study seeks to augment the discourse in the field by demonstrating the practical application of key therapeutic garden design principles in a specific context and how these principles impacted the design process.
Human Emotions Recognition, Analysis and Transformation by the Bioenergy Field in Smart Grid Using Image Processing
Gunjan Chhabra, Edeh Michael Onyema, Sunil Kumar
et al.
The passage of electric signals throughout the human body produces an electromagnetic field, known as the human biofield, which carries information about a person’s psychological health. The human biofield can be rehabilitated by using healing techniques such as sound therapy and many others in a smart grid. However, psychiatrists and psychologists often face difficulties in clarifying the mental state of a patient in a quantifiable form. Therefore, the objective of this research work was to transform human emotions using sound healing therapy and produce visible results, confirming the transformation. The present research was based on the amalgamation of image processing and machine learning techniques, including a real-time aura-visualization interpretation and an emotion-detection classifier. The experimental results highlight the effectiveness of healing emotions through the aforementioned techniques. The accuracy of the proposed method, specifically, the module combining both emotion and aura, was determined to be ~88%. Additionally, the participants’ feedbacks were recorded and analyzed based on the prediction capability of the proposed module and their overall satisfaction. The participants were strongly satisfied with the prediction capability (~81%) of the proposed module and future recommendations (~84%). The results indicate the positive impact of sound therapy on emotions and the biofield. In the future, experimentation using different therapies and integrating more advanced techniques are anticipated to open new gateways in healthcare.
Opportunities in Mental Health Support for Informal Dementia Caregivers Suffering from Verbal Agitation
Taewook Kim, Hyeok Kim, Angela Roberts
et al.
People with dementia (PwD) often present verbal agitation such as cursing, screaming, and persistently complaining. Verbal agitation can impose mental distress on informal caregivers (e.g., family, friends), which may cause severe mental illnesses, such as depression and anxiety disorders. To improve informal caregivers' mental health, we explore design opportunities by interviewing 11 informal caregivers suffering from verbal agitation of PwD. In particular, we first characterize how the predictability of verbal agitation impacts informal caregivers' mental health and how caregivers' coping strategies vary before, during, and after verbal agitation. Based on our findings, we propose design opportunities to improve the mental health of informal caregivers suffering from verbal agitation: distracting PwD (in-situ support; before), prompting just-in-time maneuvers (information support; during), and comfort and education (social & information support; after). We discuss our reflections on cultural disparities between participants. Our work envisions a broader design space for supporting informal caregivers' well-being and describes when and how that support could be provided.