Advances in data collection enable the capture of rich patient-generated data: from passive sensing (e.g., wearables and smartphones) to active self-reports (e.g., cross-sectional surveys and ecological momentary assessments). Although prior research has demonstrated the utility of patient-generated data in mental healthcare, significant challenges remain in effectively presenting these data streams along with clinical data (e.g., clinical notes) for clinical decision-making. Through co-design sessions with five clinicians, we propose MIND, a large language model-powered dashboard designed to present clinically relevant multimodal data insights for mental healthcare. MIND presents multimodal insights through narrative text, complemented by charts communicating underlying data. Our user study (N=16) demonstrates that clinicians perceive MIND as a significant improvement over baseline methods, reporting improved performance to reveal hidden and clinically relevant data insights (p<.001) and support their decision-making (p=.004). Grounded in the study results, we discuss future research opportunities to integrate data narratives in broader clinical practices.
Online support communities have become vital spaces offering varied forms of support to individuals facing mental health challenges. Despite the proliferation of platforms with distinct technical structures, little is known about how these features shape support dynamics and the socio-technical mechanisms at play. This study introduces a technical-structural-functional model of social support and systematically compares communication network structures and support types in 20 forum-based and 20 chat-based mental health communities. Using supervised machine learning and social network analysis, we find that forum-based communities foster more informational and emotional support, whereas chat-based communities promote greater companionship. These patterns were partially explained by network structure: higher in-degree centralization in forums accounted for the prevalence of informational support, while decentralized reply patterns in chat groups accounted for more companionship. These findings extend the structural-functional model of support to online contexts and provide actionable guidance for designing support communities that align technical structures with users' support needs.
Anjali D. Poe, Parmis Khosravi, Sara Kirschner
et al.
The current 20-year longitudinal study examined whether behavioral inhibition (BI), a temperament identified in late infancy/early childhood, moderates the associations between a) peer connectedness (feelings of closeness, ease, and availability of meaningful social bonds), b) frequency of functional support (how often interpersonal supportive behaviors are received) throughout middle childhood and adolescence, and c) adulthood anxiety. Data from 291 participants were analyzed. Participants were selected during infancy based on their level of reactivity to novel social and nonsocial stimuli, and BI was measured in toddlerhood. Participants completed questionnaires assessing social involvement throughout middle childhood and adolescence, and anxiety in early adulthood. Confirmatory factor analyses extracted factor scores from multiple indicators for peer connectedness, frequency of functional support, and anxiety. Multiple linear regression analyses were used to examine the moderating effects of toddlerhood BI on the association between indices of social involvement and adulthood anxiety. BI moderates the relations between social involvement and adulthood anxiety; higher frequency of functional support in the presence of relatively low BI predicted lower adulthood anxiety (t = -2.49, SE =.010, p = .014). Peer connectedness directly and positively predicted adulthood anxiety (ß = 0.24, p=.016), but, BI did not moderate this association (p = .606), nor was there a main effect of BI on adult anxiety (p = .129). Complex relations manifest among toddlerhood BI, social involvement across development, and anxiety in adulthood. These relations reflect interacting risk and protective factors operating in a developmental context. Implications of these findings are discussed.
Danush Khanna, Pratinav Seth, Sidhaarth Sredharan Murali
et al.
Mental manipulation is a subtle yet pervasive form of abuse in interpersonal communication, making its detection critical for safeguarding potential victims. However, due to manipulation's nuanced and context-specific nature, identifying manipulative language in complex, multi-turn, and multi-person conversations remains a significant challenge for large language models (LLMs). To address this gap, we introduce the MultiManip dataset, comprising 220 multi-turn, multi-person dialogues balanced between manipulative and non-manipulative interactions, all drawn from reality shows that mimic real-world scenarios. For manipulative interactions, it includes 11 distinct manipulations depicting real-life scenarios. We conduct extensive evaluations of state-of-the-art LLMs, such as GPT-4o and Llama-3.1-8B, employing various prompting strategies. Despite their capabilities, these models often struggle to detect manipulation effectively. To overcome this limitation, we propose SELF-PERCEPT, a novel, two-stage prompting framework inspired by Self-Perception Theory, demonstrating strong performance in detecting multi-person, multi-turn mental manipulation. Our code and data are publicly available at https://github.com/danushkhanna/self-percept .
Large language models (LLMs) have been widely used for various tasks and applications. However, LLMs and fine-tuning are limited to the pre-trained data. For example, ChatGPT's world knowledge until 2021 can be outdated or inaccurate. To enhance the capabilities of LLMs, Retrieval-Augmented Generation (RAG), is proposed to augment LLMs with additional, new, latest details and information to LLMs. While RAG offers the correct information, it may not best present it, especially to different population groups with personalizations. Reinforcement Learning from Human Feedback (RLHF) adapts to user needs by aligning model responses with human preference through feedback loops. In real-life applications, such as mental health problems, a dynamic and feedback-based model would continuously adapt to new information and offer personalized assistance due to complex factors fluctuating in a daily environment. Thus, we propose an Online Reinforcement Learning-based Retrieval-Augmented Generation (OnRL-RAG) system to detect and personalize the responding systems to mental health problems, such as stress, anxiety, and depression. We use an open-source dataset collected from 2028 College Students with 28 survey questions for each student to demonstrate the performance of our proposed system with the existing systems. Our system achieves superior performance compared to standard RAG and simple LLM via GPT-4o, GPT-4o-mini, Gemini-1.5, and GPT-3.5. This work would open up the possibilities of real-life applications of LLMs for personalized services in the everyday environment. The results will also help researchers in the fields of sociology, psychology, and neuroscience to align their theories more closely with the actual human daily environment.
Evaluating the safety alignment of LLM responses in high-risk mental health dialogues is particularly difficult due to missing gold-standard answers and the ethically sensitive nature of these interactions. To address this challenge, we propose PsyCrisis-Bench, a reference-free evaluation benchmark based on real-world Chinese mental health dialogues. It evaluates whether the model responses align with the safety principles defined by experts. Specifically designed for settings without standard references, our method adopts a prompt-based LLM-as-Judge approach that conducts in-context evaluation using expert-defined reasoning chains grounded in psychological intervention principles. We employ binary point-wise scoring across multiple safety dimensions to enhance the explainability and traceability of the evaluation. Additionally, we present a manually curated, high-quality Chinese-language dataset covering self-harm, suicidal ideation, and existential distress, derived from real-world online discourse. Experiments on 3600 judgments show that our method achieves the highest agreement with expert assessments and produces more interpretable evaluation rationales compared to existing approaches. Our dataset and evaluation tool are publicly available to facilitate further research.
Human content moderators (CMs) routinely review distressing digital content at scale. Beyond exposure, the work context (e.g., workload, team structure, and support) may shape mental health outcomes. We examined a cross sectional international CM sample (N = 166) and a U.S. prospective CM sample, including a comparison group of data labelers or tech support workers (N = 45) and gold standard diagnostic interviews. Predictors included workplace factors (e.g., hours per day distressing content, culture), cognitive-affective individual differences, and coping. Across samples, probable diagnoses based on validated clinical cutoffs were elevated (PTSD: 25.9 to 26.3%; depression: 42.1 to 48.5%; somatic symptoms: 68.7 to 89.5%; alcohol misuse: 10.5% to 18.3%). In the U.S. sample, CMs had higher interviewer rated PTSD severity (d = 1.50), likelihood of a current mood disorder (RR = 8.22), and lifetime major depressive disorder (RR = 2.15) compared to data labelers/tech-support workers. Negative automatic thoughts (b = .39 to .74), ongoing stress (b = .27 to .55), and avoidant coping (b = .30 to .34) consistently predicted higher PTSD and depression severity across samples and at 3 month followup. Poorer perceived workplace culture was associated with higher depression (b = -.16 to -.32). These findings strongly implicate organizational context and related individual response styles, not exposure dose alone in shaping risk. We highlight structural and technological interventions such as limits on daily exposure, supportive team culture, interface features to reduce intrusive memories, and training of cognitive restructuring and adaptive coping to support mental health. We also connect implications to adjacent human in the loop data work (e.g., AI red teaming), where similar risks are emerging.
Arya VarastehNezhad, Reza Tavasoli, Soroush Elyasi
et al.
Depression, anxiety, and stress are widespread mental health concerns that increasingly drive individuals to seek information from Large Language Models (LLMs). This study investigates how eight LLMs (Claude Sonnet, Copilot, Gemini Pro, GPT-4o, GPT-4o mini, Llama, Mixtral, and Perplexity) reply to twenty pragmatic questions about depression, anxiety, and stress when those questions are framed for six user profiles (baseline, woman, man, young, old, and university student). The models generated 2,880 answers, which we scored for sentiment and emotions using state-of-the-art tools. Our analysis revealed that optimism, fear, and sadness dominated the emotional landscape across all outputs, with neutral sentiment maintaining consistently high values. Gratitude, joy, and trust appeared at moderate levels, while emotions such as anger, disgust, and love were rarely expressed. The choice of LLM significantly influenced emotional expression patterns. Mixtral exhibited the highest levels of negative emotions including disapproval, annoyance, and sadness, while Llama demonstrated the most optimistic and joyful responses. The type of mental health condition dramatically shaped emotional responses: anxiety prompts elicited extraordinarily high fear scores (0.974), depression prompts generated elevated sadness (0.686) and the highest negative sentiment, while stress-related queries produced the most optimistic responses (0.755) with elevated joy and trust. In contrast, demographic framing of queries produced only marginal variations in emotional tone. Statistical analyses confirmed significant model-specific and condition-specific differences, while demographic influences remained minimal. These findings highlight the critical importance of model selection in mental health applications, as each LLM exhibits a distinct emotional signature that could significantly impact user experience and outcomes.
Sara B. Marjanovic, Madelene Christin Holm Bukhari, Rikka Kjelkenes
et al.
While 90 % of females with a menstrual cycle will experience premenstrual symptoms in their reproductive years, it is estimated that 20 % experience treatment-warranted emotional, behavioral, or somatic symptoms in the premenstrual phase of their menstrual cycle. Premenstrual symptoms have been partly attributed to the brain's sensitivity to menstrual cycle-related hormonal fluctuations, which may be modulated by individual differences in the structural characteristics of the brain. In a population-based sample of 292 non-pregnant females aged 23–43 years, we tested for associations between self-reported premenstrual symptom load and T1-weighted MRI based brain measures of cortical thickness, volume, and surface area as well as subcortical volumes, not controlling for menstrual cycle phase. After corrections for multiple comparison, linear models including age revealed significant positive associations between premenstrual symptom load and the volume of the left posterior cingulate cortex. Item-level analyses confirmed that the association with overall symptom load were not driven by specific symptom domains. These findings partly overlap with previous brain morphological findings in individuals with PMS and could possibly represent a non-phase dependent correlate of premenstrual symptoms.
Asma Humayun, Arooj Najmussaqib, Noor ul Ain Muneeb
Background: The province of Khyber Pakhtunkhwa (KP) in Pakistan faces significant gaps in mental health services, marked by limited resources and inequitable distribution of services. To strengthen the existing services in nine districts of the province, 105 primary healthcare workers (PHCWs)— including primary care physicians and clinical psychologists—were previously trained to assess and manage mental health conditions using mhGAP-HIG adapted for Pakistan. The PHCWs received remote supervision for three months post-training. This study discusses the mechanism for remote supervision using digital technology and evaluated its impact on the performance of trained PHCWs. Methods: A mixed-method approach was used to analyze clinical the reported data. The assessment, management (including pharmacological and psychosocial interventions), and referral needs in all reported cases were monitored during supervision. Both qualitative and quantitative feedback from the PHCWs were also analyzed. Results: Out of 105 trained PHCWs, 50.34 % submitted 413 cases through the mhGAP-HIG-PK mobile application. Supervision was crucial in ensuring compliance with assessment protocols in 24.70 % cases, management protocols in 38.25 % of cases and referral protocols in 5 % of cases. The most frequently identified condition was depressive disorder (56.9 %). Commonly reported stressors by the patients at the primary healthcare included bereavement, socio-economic difficulties, marital challenges, and stressors related to family. PHCWs expressed a preference for remote supervision and found it beneficial for assessment (61.1 %), management (72.2 %), and referral (44.4 %). Conclusion: Following mhGAP training, remote supervision using digital tools can be effective in monitoring the performance of PHCWs to enhance their skills to assess and manage mental health conditions in low-resource settings. The collection and evaluation of supervision-based data are crucial for improving training programs to strengthen the capacity of PHCWs.
Midhat Patel, MD, Charles Cogan, MD, Catherine Shemo, BS
et al.
Background: Open reduction and internal fixation (ORIF) of proximal humerus fractures (PHFs) is a challenging operation with high rates of loss of reduction, screw cut out, avascular necrosis, and subsequent unplanned reoperation. Augmentation of the repair with synthetic bone fillers and other alternatives has shown promise in decreasing adverse outcomes. Our aim is to report radiographic and clinical outcome of impaction grafting with cancellous allograft chips and injection of magnesium-based bone filler for ORIF augmentation of PHFs. Methods: All patients that underwent ORIF for a 3- or 4-part PHF with the standardized protocol with a minimum of 6 months radiographic follow-up by a single surgeon (VE) were included. Radiographs were taken at standardized time points up to 6 months, followed by a final radiographic follow-up, to define radiographic healing or failure. Patient-reported outcome measures were collected at final follow-up. Patient characteristics, complications, reoperations and radiographic measures of reduction quality were also recorded. Results: 17 patients were identified with a mean 34.1 months of radiographic follow-up. Median Penn Shoulder Score was 89 (interquartile range 19), American Shoulder and Elbow Surgeons score was 92 (25), Veterans RAND 12-Item Health Survey Mental Component Score was 53.9 (11.9), Veterans RAND 12-Item Health Survey Physical Component Score was 51.6 (12.8), and Single Assessment Numerical Assessment was 85 (18.5). 14 patients (82.3%) had routine radiographic healing. Two patients (11.8%) developed avascular necrosis with screw cutout. Two patients (11.8%) had reoperation including one hardware removal and one conversion to reverse total shoulder arthroplasty for a subsequent rotator cuff tear. No signs of glenohumeral arthritis were present in 12/17 patients (70.6%), and no signs of cuff tear arthropathy were noted in 13/17 patients (76%). Seven out of nine (78%) patients who worked prior to injury returned to work at a mean of 14.7 weeks postoperatively. Conclusion: Augmentation of 3- and 4-part proximal humeral fractures with a standardized protocol utilizing cancellous chips and a synthetic magnesium-based bone filler results in a high rate of maintenance of fracture reduction, radiographic healing, and satisfactory patient outcomes. Further comparative data is needed to evaluate the efficacy of this technique compared to alternative methods of augmentation.
Orthopedic surgery, Diseases of the musculoskeletal system
Yikai Yin, Shaswat Mohanty, Christopher B. Cooper
et al.
Highly stretchable and self-healable supramolecular elastomers are promising materials for future soft electronics, biomimetic systems, and smart textiles, due to their dynamic cross-linking bonds. The dynamic or reversible nature of the cross-links gives rise to interesting macroscopic responses in these materials such as self-healing and rapid stress-relaxation. However, the relationship between bond activity and macroscopic mechanical response, and the self-healing properties of these dynamic polymer networks (DPNs) remains poorly understood. Using coarse-grained molecular dynamics (CGMD) simulations, we reveal a fundamental connection between the macroscopic behaviors of DPNs and the shortest paths between distant nodes in the polymer network. Notably, the trajectories of the material on the shortest path-strain map provide key insights into understanding the stress-strain hysteresis, anisotropy, stress relaxation, and self-healing of DPNs. Based on CGMD simulations under various loading histories, we formulate a set of empirical rules that dictate how the shortest path interacts with stress and strain. This lays the foundation for the development of a physics-based theory centered around the non-local microstructural feature of shortest paths to predict the mechanical behavior of DPNs.
Johannes Schneider, Arianna Casanova Flores, Anne-Catherine Kranz
This study explores real-world human interactions with large language models (LLMs) in diverse, unconstrained settings in contrast to most prior research focusing on ethically trimmed models like ChatGPT for specific tasks. We aim to understand the originator of toxicity. Our findings show that although LLMs are rightfully accused of providing toxic content, it is mostly demanded or at least provoked by humans who actively seek such content. Our manual analysis of hundreds of conversations judged as toxic by APIs commercial vendors, also raises questions with respect to current practices of what user requests are refused to answer. Furthermore, we conjecture based on multiple empirical indicators that humans exhibit a change of their mental model, switching from the mindset of interacting with a machine more towards interacting with a human.
Puneet Kumar, Alexander Vedernikov, Yuwei Chen
et al.
Analysis of stress, depression and engagement is less common and more complex than that of frequently discussed emotions such as happiness, sadness, fear and anger. The importance of these psychological states has been increasingly recognized due to their implications for mental health and well-being. Stress and depression are interrelated and together they impact engagement in daily tasks, highlighting the need to explore their interplay. This survey is the first to simultaneously explore computational methods for analyzing stress, depression and engagement. We present a taxonomy and timeline of the computational approaches used to analyze them and we discuss the most commonly used datasets and input modalities, along with the categories and generic pipeline of these approaches. Subsequently, we describe state-of-the-art computational approaches, including a performance summary on the most commonly used datasets. Following this, we explore the applications of stress, depression and engagement analysis, along with the associated challenges, limitations and future research directions.
Prottay Kumar Adhikary, Aseem Srivastava, Shivani Kumar
et al.
Comprehensive summaries of sessions enable an effective continuity in mental health counseling, facilitating informed therapy planning. Yet, manual summarization presents a significant challenge, diverting experts' attention from the core counseling process. This study evaluates the effectiveness of state-of-the-art Large Language Models (LLMs) in selectively summarizing various components of therapy sessions through aspect-based summarization, aiming to benchmark their performance. We introduce MentalCLOUDS, a counseling-component guided summarization dataset consisting of 191 counseling sessions with summaries focused on three distinct counseling components (aka counseling aspects). Additionally, we assess the capabilities of 11 state-of-the-art LLMs in addressing the task of component-guided summarization in counseling. The generated summaries are evaluated quantitatively using standard summarization metrics and verified qualitatively by mental health professionals. Our findings demonstrate the superior performance of task-specific LLMs such as MentalLlama, Mistral, and MentalBART in terms of standard quantitative metrics such as Rouge-1, Rouge-2, Rouge-L, and BERTScore across all aspects of counseling components. Further, expert evaluation reveals that Mistral supersedes both MentalLlama and MentalBART based on six parameters -- affective attitude, burden, ethicality, coherence, opportunity costs, and perceived effectiveness. However, these models share the same weakness by demonstrating a potential for improvement in the opportunity costs and perceived effectiveness metrics.
Francesca Bianco, Silvia Rigato, Maria Laura Filippetti
et al.
Theory of Mind (ToM), the ability to attribute beliefs, intentions, or mental states to others, is a crucial feature of human social interaction. In complex environments, where the human sensory system reaches its limits, behaviour is strongly driven by our beliefs about the state of the world around us. Accessing others' mental states, e.g., beliefs and intentions, allows for more effective social interactions in natural contexts. Yet, these variables are not directly observable, making understanding ToM a challenging quest of interest for different fields, including psychology, machine learning and robotics. In this paper, we contribute to this topic by showing a developmental synergy between learning to predict low-level mental states (e.g., intentions, goals) and attributing high-level ones (i.e., beliefs). Specifically, we assume that learning beliefs attribution can occur by observing one's own decision processes involving beliefs, e.g., in a partially observable environment. Using a simple feed-forward deep learning model, we show that, when learning to predict others' intentions and actions, more accurate predictions can be acquired earlier if beliefs attribution is learnt simultaneously. Furthermore, we show that the learning performance improves even when observed actors have a different embodiment than the observer and the gain is higher when observing beliefs-driven chunks of behaviour. We propose that our computational approach can inform the understanding of human social cognitive development and be relevant for the design of future adaptive social robots able to autonomously understand, assist, and learn from human interaction partners in novel natural environments and tasks.
Alice Fattori, Anna Comotti, Paolo Brambilla
et al.
Background: Moral distress among healthcare workers (HCWs) has dramatically increased during Covid-19 emergency however most evidence relies on cross-sectional data collected during Covid-19 early stages. Aims: This longitudinal cohort study aims to provide a better insight into the occurrence and associations of moral distress, focusing on both its short and long-term impact on HCWs’ mental health. Methods: A total of 990 healthcare workers completed a mental health evaluation between July 2020-July 2021 (Time 1) reporting frequencies of moral distress and psychological distress (GHQ-12), post-traumatic (IES-R) and anxiety (GAD-7) symptoms; after one year (July 2021-July 2022; Time 2), 310 participants repeated the psychological evaluation. We investigated differences considering socio-demographic and occupational characteristics. Two logistic regression models examined the potential role of moral distress as a risk factor for scorings above scales’ cut-offs at Time 1 and at Time 2. Results: Frequent episodes of moral distress were mostly reported by nurses (24 %), physicians (22 %), younger workers (<40y; 23 %) and workers engaged in Covid-19 area; HCWs from Emergency/Intensive Care Departments reported the highest occurrence of moral distress (29 %). Results showed increases in all psychological symptoms as episodes of moral distress became more frequent. Moral distress experienced at Time 1 resulted as a persistent risk for mental health impairment in the following year, with stable ORs for post-traumatic symptoms (Time1 OR=7.8, 95 %CI=(5.3,11.6) and Time2 OR=6.6, 95 %CI=(2.9,15.7). Conclusions: Our findings support long-term consequences of moral distress; preventive strategies may be addressed with priority to younger HCWs and nurses/physicians from Emergency and Intensive Care Departments.
Illness narratives inarguably resonate with the physical and psychological pain experienced by the patients as well as the caregivers because of the impact of illness and social alienation. The act of writing about the tribulations of being ill, as Lacan posits, protects the writer from its more devastating effects; though it is inadequate to eliminate psychosis. Shaheen Bhatt narrates the agonizing ordeal in I’ve Never Been (Un)happier through a metaphorical conceptualisation of the mental disorders she suffers from. When the telling of the lived experience occurs, the writer creates an alternate story in which she plays the dominant role of a protagonist, retrospects the episodes of mood disorders and fantasizes about death. This paper attempts to analyse the significance of the conceptual metaphors corresponding to the pervasive psychological distress and the role of writing in the recovery process.