With the growing demand for real-time video enhancement in live applications, existing methods often struggle to balance speed and effective exposure control, particularly under uneven lighting. We introduce RRNet (Rendering Relighting Network), a lightweight and configurable framework that achieves a state-of-the-art tradeoff between visual quality and efficiency. By estimating parameters for a minimal set of virtual light sources, RRNet enables localized relighting through a depth-aware rendering module without requiring pixel-aligned training data. This object-aware formulation preserves facial identity and supports real-time, high-resolution performance using a streamlined encoder and lightweight prediction head. To facilitate training, we propose a generative AI-based dataset creation pipeline that synthesizes diverse lighting conditions at low cost. With its interpretable lighting control and efficient architecture, RRNet is well suited for practical applications such as video conferencing, AR-based portrait enhancement, and mobile photography. Experiments show that RRNet consistently outperforms prior methods in low-light enhancement, localized illumination adjustment, and glare removal.
Contemporary cultural tourism faces a critical digital authenticity paradox where social media engagement necessitates platform integration, yet algorithms prioritize engagement-driven content over culturally accurate heritage representations. This systematic review develops an initial framework addressing authenticity preservation challenges through systematic analysis of platform-mediated heritage representation. Following PRISMA guidelines, researchers searched Scopus and ScienceDirect databases for peer-reviewed articles published 2020–2025 using search terms: “Cultural Tourism” AND “Heritage Tourism” AND “Photograph” AND “Social media” AND “Authenticity.” Inclusion criteria encompassed English-language journal articles and conference papers in social sciences, business, management, and humanities. VOSviewer software facilitated bibliometric analysis through keyword co-occurrence mapping with minimum three-occurrence threshold. From 68 articles, analysis revealed five thematic clusters: Ecosystem Tourism, Social Media and Technology, Tourism Management, Authenticity, and Photography & Storytelling, informing an integrated Input-Process-Integration-Output framework. Input encompasses cultural contexts and authenticity evaluation criteria; Process integrates social media dynamics with tourism management strategies; Integration synthesizes authentic contexts through platform-adapted digital storytelling; Output addresses platform-mediated tourist experiences. The framework establishes systematic relationships between heritage preservation and digital platform mechanisms, providing methodological innovation while addressing algorithmic optimization conflicts with heritage preservation, offering practical guidance for tourism organizations navigating Instagram, TikTok, Facebook, and emerging platforms while preserving authentic cultural representation.
PurposeWhile deep learning (DL) has demonstrated significant utility in ocular diseases, no clinically validated algorithm currently exists for diagnosing neuromyelitis optica (NMO). This study aimed to develop a proof-of-concept multimodal artificial intelligence (AI) diagnostic model that synergistically integrates ultrawide field fundus photographs (UWFs) with clinical examination data for predicting the onset and stage of suspected NMO.MethodsThe study utilized the UWFs of 330 eyes from 285 NMO patients and 1,288 eyes from 770 non-NMO participants, along with clinical examination reports, to develop an AI model for predicting the onset or stage of suspected NMO. The performance of the AI model was evaluated based on the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity.ResultsThe multimodal AI diagnostic model achieved an AUC of 0.9923, a maximum Youden index of 0.9389, a sensitivity of 97.0% and a specificity of 96.9% in predicting the prevalence of NMO on test data set.ConclusionOur study demonstrates the feasibility of DL algorithms in diagnosing and predicting of NMO.
Abdussalam Elhanashi, Sergio Saponara, Qinghe Zheng
et al.
Artificial intelligence (AI)-based object detection in radiology can assist in clinical diagnosis and treatment planning. This article examines the AI-based object detection models currently used in many imaging modalities, including X-ray Magnetic Resonance Imaging (MRI), Computed Tomography (CT), and Ultrasound (US). The key models from the convolutional neural network (CNN) as well as the contemporary transformer and hybrid models are analyzed based on their ability to detect pathological features, such as tumors, lesions, and tissue abnormalities. In addition, this review offers a closer look at the strengths and weaknesses of these models in terms of accuracy, robustness, and speed in real clinical settings. The common issues related to these models, including limited data, annotation quality, and interpretability of AI decisions, are discussed in detail. Moreover, the need for strong applicable models across different populations and imaging modalities are addressed. The importance of privacy and ethics in general data use as well as safety and regulations for healthcare data are emphasized. The future potential of these models lies in their accessibility in low resource settings, usability in shared learning spaces while maintaining privacy, and improvement in diagnostic accuracy through multimodal learning. This review also highlights the importance of interdisciplinary collaboration among artificial intelligence researchers, radiologists, and policymakers. Such cooperation is essential to address current challenges and to fully realize the potential of AI-based object detection in radiology.
Photography, Computer applications to medicine. Medical informatics
Personalized dual-person portrait customization has considerable potential applications, such as preserving emotional memories and facilitating wedding photography planning. However, the absence of a benchmark dataset hinders the pursuit of high-quality customization in dual-person portrait generation. In this paper, we propose the PairHuman dataset, which is the first large-scale benchmark dataset specifically designed for generating dual-person portraits that meet high photographic standards. The PairHuman dataset contains more than 100K images that capture a variety of scenes, attire, and dual-person interactions, along with rich metadata, including detailed image descriptions, person localization, human keypoints, and attribute tags. We also introduce DHumanDiff, which is a baseline specifically crafted for dual-person portrait generation that features enhanced facial consistency and simultaneously balances in personalized person generation and semantic-driven scene creation. Finally, the experimental results demonstrate that our dataset and method produce highly customized portraits with superior visual quality that are tailored to human preferences. Our dataset is publicly available at https://github.com/annaoooo/PairHuman.
Perceiving and producing aesthetic judgments is a fundamental yet underexplored capability for multimodal large language models (MLLMs). However, existing benchmarks for image aesthetic assessment (IAA) are narrow in perception scope or lack the diversity needed to evaluate systematic aesthetic production. To address this gap, we introduce AesTest, a comprehensive benchmark for multimodal aesthetic perception and production, distinguished by the following features: 1) It consists of curated multiple-choice questions spanning ten tasks, covering perception, appreciation, creation, and photography. These tasks are grounded in psychological theories of generative learning. 2) It integrates data from diverse sources, including professional editing workflows, photographic composition tutorials, and crowdsourced preferences. It ensures coverage of both expert-level principles and real-world variation. 3) It supports various aesthetic query types, such as attribute-based analysis, emotional resonance, compositional choice, and stylistic reasoning. We evaluate both instruction-tuned IAA MLLMs and general MLLMs on AesTest, revealing significant challenges in building aesthetic intelligence. We will publicly release AesTest to support future research in this area.
Bokeh rendering simulates the shallow depth-of-field effect in photography, enhancing visual aesthetics and guiding viewer attention to regions of interest. Although recent approaches perform well, rendering controllable bokeh without additional depth inputs remains a significant challenge. Existing classical and neural controllable methods rely on accurate depth maps, while generative approaches often struggle with limited controllability and efficiency. In this paper, we propose BokehFlow, a depth-free framework for controllable bokeh rendering based on flow matching. BokehFlow directly synthesizes photorealistic bokeh effects from all-in-focus images, eliminating the need for depth inputs. It employs a cross-attention mechanism to enable semantic control over both focus regions and blur intensity via text prompts. To support training and evaluation, we collect and synthesize four datasets. Extensive experiments demonstrate that BokehFlow achieves visually compelling bokeh effects and offers precise control, outperforming existing depth-dependent and generative methods in both rendering quality and efficiency.
Low-light image enhancement remains a challenging task, particularly in the absence of paired training data. In this study, we present LucentVisionNet, a novel zero-shot learning framework that addresses the limitations of traditional and deep learning-based enhancement methods. The proposed approach integrates multi-scale spatial attention with a deep curve estimation network, enabling fine-grained enhancement while preserving semantic and perceptual fidelity. To further improve generalization, we adopt a recurrent enhancement strategy and optimize the model using a composite loss function comprising six tailored components, including a novel no-reference image quality loss inspired by human visual perception. Extensive experiments on both paired and unpaired benchmark datasets demonstrate that LucentVisionNet consistently outperforms state-of-the-art supervised, unsupervised, and zero-shot methods across multiple full-reference and no-reference image quality metrics. Our framework achieves high visual quality, structural consistency, and computational efficiency, making it well-suited for deployment in real-world applications such as mobile photography, surveillance, and autonomous navigation.
Uncrewed aerial vehicles (UAVs) performing tasks such as transportation and aerial photography are vulnerable to intentional projectile attacks from humans. Dodging such a sudden and fast projectile poses a significant challenge for UAVs, requiring ultra-low latency responses and agile maneuvers. Drawing inspiration from baseball, in which pitchers' body movements are analyzed to predict the ball's trajectory, we propose a novel real-time dodging system that leverages an RGB-D camera. Our approach integrates human pose estimation with depth information to predict the attacker's motion trajectory and the subsequent projectile trajectory. Additionally, we introduce an uncertainty-aware dodging strategy to enable the UAV to dodge incoming projectiles efficiently. Our perception system achieves high prediction accuracy and outperforms the baseline in effective distance and latency. The dodging strategy addresses temporal and spatial uncertainties to ensure UAV safety. Extensive real-world experiments demonstrate the framework's reliable dodging capabilities against sudden attacks and its outstanding robustness across diverse scenarios.
Image convolution with complex kernels is a fundamental operation in photography, scientific imaging, and animation effects, yet direct dense convolution is computationally prohibitive on resource-limited devices. Existing approximations, such as simulated annealing or low-rank decompositions, either lack efficiency or fail to capture non-convex kernels. We introduce a differentiable kernel decomposition framework that represents a target spatially-variant, dense, complex kernel using a set of sparse kernel samples. Our approach features (i) a decomposition that enables differentiable optimization of sparse kernels, (ii) a dedicated initialization strategy for non-convex shapes to avoid poor local minima, and (iii) a kernel-space interpolation scheme that extends single-kernel filtering to spatially varying filtering without retraining and additional runtime overhead. Experiments on Gaussian and non-convex kernels show that our method achieves higher fidelity than simulated annealing and significantly lower cost than low-rank decompositions. Our approach provides a practical solution for mobile imaging and real-time rendering, while remaining fully differentiable for integration into broader learning pipelines.
F. Neamonitou, K.K. Neamonitos, S. Stavrianos
et al.
Angiofibromas are a common facial manifestation of tuberous sclerosis (TS). However, current treatments have proven ineffective due to high recurrence rates and noncompliance. To address this issue, we developed a new triple laser therapy protocol for more effective management of angiofibromas. We conducted tests to validate its efficacy. This is a prospective study of 10 patients with TS (4 women and 6 men, mean age 26.3 years [15–37 years]) with angiofibromata who received triple sequential laser therapy at our private dermatological clinic conducted from January 2000 to December 2022. We evaluated the outcome with the Facial Angiofibromata Severity Index (FASI) via clinical photography (0, 6 months, 1 year, and 2 years), and Dermatology Life Quality Index (DLQI). All patients had a successful recovery without any complications. Among these 10 patients, 4 experienced localized recurrences at their 6-month follow-up. These recurrences were treated with a second single carbon dioxide laser session. After 2 years of follow-up, we observed no recurring facial cutaneous manifestations. Furthermore, all patients experienced a decrease in their FASI score after treatment. According to the Visual Analogue Scale, patients reported 95% satisfaction, and DLQI indicated only a minor impact on their everyday lives. We believe that this protocol of three-step laser treatment is effective, safe, and compliable for patients with facial angiofibromata, providing a satisfactory outcome adaptable to the daily dermatological and plastic surgery practice.
The field of medical image segmentation is challenged by domain generalization (DG) due to domain shifts in clinical datasets. The DG challenge is exacerbated by the scarcity of medical data and privacy concerns. Traditional single-source domain generalization (SSDG) methods primarily rely on stacking data augmentation techniques to minimize domain discrepancies. In this paper, we propose Random Amplitude Spectrum Synthesis (RASS) as a training augmentation for medical images. RASS enhances model generalization by simulating distribution changes from a frequency perspective. This strategy introduces variability by applying amplitude-dependent perturbations to ensure broad coverage of potential domain variations. Furthermore, we propose random mask shuffle and reconstruction components, which can enhance the ability of the backbone to process structural information and increase resilience intra- and cross-domain changes. The proposed Random Amplitude Spectrum Synthesis for Single-Source Domain Generalization (RAS^4DG) is validated on 3D fetal brain images and 2D fundus photography, and achieves an improved DG segmentation performance compared to other SSDG models.
Luca Savant Aira, Diego Valsesia, Andrea Bordone Molini
et al.
Multi-image super-resolution (MISR) allows to increase the spatial resolution of a low-resolution (LR) acquisition by combining multiple images carrying complementary information in the form of sub-pixel offsets in the scene sampling, and can be significantly more effective than its single-image counterpart. Its main difficulty lies in accurately registering and fusing the multi-image information. Currently studied settings, such as burst photography, typically involve assumptions of small geometric disparity between the LR images and rely on optical flow for image registration. We study a MISR method that can increase the resolution of sets of images acquired with arbitrary, and potentially wildly different, camera positions and orientations, generalizing the currently studied MISR settings. Our proposed model, called EpiMISR, moves away from optical flow and explicitly uses the epipolar geometry of the acquisition process, together with transformer-based processing of radiance feature fields to substantially improve over state-of-the-art MISR methods in presence of large disparities in the LR images.
Combining images, comparing and linking them in chains, clusters and texts is a cultural practice that was not invented with digitisation. It dates back to the nineteenth century, when the invention of photography facilitated the task of copying artworks and other cultural material, and putting them in different contexts. Later, with the invention of the moving image, the gesture of montage was developed as an entirely new device of narration and thinking. Alain Bergala refers to this cultural practice when he proposes, in The Cinema Hypothesis, the combination of film clips as a film-pedagogical praxis as well as a research method. This article investigates the theoretical, cultural and practical aspects of this method, in revisiting a wide range of writings by Jacques Rancière, Roland Barthes, André Malraux and Wsewolod Pudowkin, as well as materials from Aby Warburg’s Bilderatlas and the found footage film Why Don’t You Love Me? by Christoph Girardet and Matthias Müller (1999). Furthermore, by comparing an extract from Grigris by Mahamat-Saleh Haroun (2013) to Sandro Botticelli’s The Birth of Venus (1485/6), the didactic potential of this method is explored. The article thus considers the pedagogical, aesthetic, cultural and filmic aspects of the practice of ‘montage’ in its most basic sense: the combination of (audio)visual material.
Purpose: Diseases affecting the cornea are a major cause of corneal blindness globally. The pressing issue we are facing today is the lack of diagnostic devices in rural areas to diagnose these conditions. The aim of the study is to establish sensitivity and accuracy of smartphone photography using a smart eye camera (SEC) in ophthalmologic community outreach programs. Methods: In this pilot study, a prospective non-randomized comparative analysis of inter-observer variability of anterior segment imaging recorded using an SEC was performed. Consecutive 100 patients with corneal pathologies, who visited the cornea specialty outpatient clinic, were enrolled. They were examined with a conventional non-portable slit lamp by a cornea consultant, and the diagnoses were recorded. This was compared with the diagnoses made by two other consultants based on SEC videos of the anterior segment of the same 100 patients. The accuracy of SEC was accessed using sensitivity, specificity, PPV, and NPV. Kappa statistics was used to find the agreement between two consultants by using STATA 17.0 (Texas, USA). Results: There was agreement between the two consultants to diagnosing by using SEC. Above 90% agreements were found in all the diagnoses, which were statistically significant (P-value < 0.001). More than 90% sensitivity and a negative predictive value were found. Conclusion: SEC can be used successfully in the community outreach programs like field visits, eye camps, teleophthalmology, and community centers, where either a clinical setup is lacking or ophthalmologists are not available.
This article reflects on two exhibitions, in 2018 and 2019, about the Nazi persecution of German Sinti and Roma. One was produced by an Anglo-German curatorial team and toured Britain and Continental Europe. The second was designed by South Korean curators and installed temporarily in a gallery in downtown Seoul. The two exhibitions drew on the same photographic archive, narrated the persecution histories of Romani subjects of the photographs, and used the story of their relationship with the non-Romani photographer to ask questions about responsibility and to prompt visitors to reflect on their own status as “implicated subjects” in contemporary forms of discrimination. Given different expectations of the level of knowledge that visitors bring to the exhibition and different communicative tools familiar to them (the Seoul curators included creative artists), the two curatorial teams took very different approaches to informing and moving their audiences – and to meeting the recognized challenges of representing Romani history and identity – not least in the ways in which the exhibition’s message was mediated in face-to-face conversations on site. The aesthetic approach adopted in Seoul did not fully succeed in maintaining the balance between explanation and exoticization. The evaluation relies on visitor surveys (quantitative and qualitative) and interviews with guides.
Colonies and colonization. Emigration and immigration. International migration, Communities. Classes. Races
Graph Neural Networks (GNNs) are a family of graph networks inspired by mechanisms existing between nodes on a graph. In recent years there has been an increased interest in GNN and their derivatives, i.e., Graph Attention Networks (GAT), Graph Convolutional Networks (GCN), and Graph Recurrent Networks (GRN). An increase in their usability in computer vision is also observed. The number of GNN applications in this field continues to expand; it includes video analysis and understanding, action and behavior recognition, computational photography, image and video synthesis from zero or few shots, and many more. This contribution aims to collect papers published about GNN-based approaches towards computer vision. They are described and summarized from three perspectives. Firstly, we investigate the architectures of Graph Neural Networks and their derivatives used in this area to provide accurate and explainable recommendations for the ensuing investigations. As for the other aspect, we also present datasets used in these works. Finally, using graph analysis, we also examine relations between GNN-based studies in computer vision and potential sources of inspiration identified outside of this field.
Hugh S. Hudson, Laura Peticolas, Calvin Johnson
et al.
The total solar eclipse of August 21, 2017, crossed the whole width of North America, the first occasion for this during the modern age of consumer electronics. Accordingly, it became a great opportunity to engage the public and to enlist volunteer observers with relatively high-level equipment; our program ("Eclipse Megamovie") took advantage of this as a means of creating a first-ever public database of such eclipse photography. This resulted in a large outreach program, involving many hundreds of individuals, supported almost entirely on a volunteer basis and with the institutional help of Google, the Astronomical Society of the Pacific, and the University of California, Berkeley. The project home page is \url{http://eclipsemegamovie.org}, which contains the movie itself. We hope that our comments here will help with planning for similar activities in the total eclipse of April 8, 2024.
Optical imaging systems are widely used in the drone industry (optical navigation, terrain scanners), industry (distance measurement, 3D visualization), security (optical scanners and motion sensors), robotics (motion sensors, terrain or field scanning), etc. Lidar (Light Detection and Ranging) systems have a different structure depending on their purpose, and the principle of operation is based on measuring the time difference between the emission time of a laser or other light source and the time of its reflection from the object, where the reflected photons fall on the photodetector. This time interval is about hundreds of ps. Accuracy, photography speed and system efficiency depend on the performance of the photodetector module. Modern development of high technologies; the new efficiency of detecting a large photon flux makes it possible to develop photodetectors with a nanolayer micropixel structure. The paper presents the development of a modern and highly sensitive micropixel avalanche photodetector. The developed photodetector have high speed, low noise and high resolution. The improvement of these parameters allows the developed photodetectors to become an indispensable component of lidar systems.