Towards Interpretable Foundation Models for Retinal Fundus Images
Samuel Ofosu Mensah, Maria Camila Roa Carvajal, Kerol Djoumessi
et al.
Foundation models are used to extract transferable representations from large amounts of unlabeled data, typically via self-supervised learning (SSL). However, many of these models rely on architectures that offer limited interpretability, which is a critical issue in high-stakes domains such as medical imaging. We propose Dual-IFM, a foundation model that is interpretable-by-design in two ways: First, it provides local interpretability for individual images through class evidence maps that are faithful to the decision-making process. Second, it provides global interpretability for entire datasets through a 2D projection layer that allows for direct visualization of the model's representation space. We trained our model on over 800,000 color fundus photography from various sources to learn generalizable, interpretable representations for different downstream tasks. Our results show that our model reaches a performance range similar to that of state-of-the-art foundation models with up to $16\times$ the number of parameters, while providing interpretable predictions on out-of-distribution data. Our results suggest that large-scale SSL pretraining paired with inherent interpretability can lead to robust representations for retinal imaging.
Crítica al régimen topofóbico de representación: contravisualidades del feminicidio y la desaparición forzada en México
Sergio Rodríguez-Blanco, Violeta Santiago-Hernández
This article offers a critical reading of the topophobic representation of enforced disappearance and femicide in Mexico through Sara Ahmed’s affective turn. Focusing on the art installation ¡Visite Ciudad Juárez! by Ambra Polidori and the journalistic report Los desaparecidos by Nolen and Márquez, it argues that emotions such as fear, pain, and disgust function as social technologies that shape bodies and territories. The study examines the ethical limits of representing suffering and proposes indignation as a potential site of political resistance.
French literature - Italian literature - Spanish literature - Portuguese literature, Social sciences (General)
SABE-YOLO: Structure-Aware and Boundary-Enhanced YOLO for Weld Seam Instance Segmentation
Rui Wen, Wu Xie, Yong Fan
et al.
Accurate weld seam recognition is essential in automated welding systems, as it directly affects path planning and welding quality. With the rapid advancement of industrial vision, weld seam instance segmentation has emerged as a prominent research focus in both academia and industry. However, existing approaches still face significant challenges in boundary perception and structural representation. Due to the inherently elongated shapes, complex geometries, and blurred edges of weld seams, current segmentation models often struggle to maintain high accuracy in practical applications. To address this issue, a novel structure-aware and boundary-enhanced YOLO (SABE-YOLO) is proposed for weld seam instance segmentation. First, a Structure-Aware Fusion Module (SAFM) is designed to enhance structural feature representation through strip pooling attention and element-wise multiplicative fusion, targeting the difficulty in extracting elongated and complex features. Second, a C2f-based Boundary-Enhanced Aggregation Module (C2f-BEAM) is constructed to improve edge feature sensitivity by integrating multi-scale boundary detail extraction, feature aggregation, and attention mechanisms. Finally, the inner minimum point distance-based intersection over union (Inner-MPDIoU) is introduced to improve localization accuracy for weld seam regions. Experimental results on the self-built weld seam image dataset show that SABE-YOLO outperforms YOLOv8n-Seg by 3 percentage points in the AP(50–95) metric, reaching 46.3%. Meanwhile, it maintains a low computational cost (18.3 GFLOPs) and a small number of parameters (6.6M), while achieving an inference speed of 127 FPS, demonstrating a favorable trade-off between segmentation accuracy and computational efficiency. The proposed method provides an effective solution for high-precision visual perception of complex weld seam structures and demonstrates strong potential for industrial application.
Photography, Computer applications to medicine. Medical informatics
Ce qui précède le seuil
Sylvie Nayral
Among the accounts of the invention of photography, the perception of the research carried out by Nicéphore Niépce has been conditioned by a number of factors that have focused attention on the results obtained, to the detriment of the wandering timeframe of his research, which the present text aims to re-read from a non-teleological perspective. The generous correspondence maintained with his brother Claude prior to the year 1827 which is considered a defining moment, offers an insight into research that is not summarized in the only heliography to have been discovered, the Point de vue du Gras. The exchange of letters depicts not one, but two inventors whose career choices led them to embark on a common adventure. The two brothers never gave up on this shared dream, which shaped the heliography project led by Nicéphore alone. Over eleven years of production, the fleeting points of view are erased and lost, their novelty subdued over time.
PDE: Gene Effect Inspired Parameter Dynamic Evolution for Low-light Image Enhancement
Tong Li, Lizhi Wang, Hansen Feng
et al.
Low-light image enhancement (LLIE) is a fundamental task in computational photography, aiming to improve illumination, reduce noise, and enhance image quality. While recent advancements focus on designing increasingly complex neural network models, we observe a peculiar phenomenon: resetting certain parameters to random values unexpectedly improves enhancement performance for some images. Drawing inspiration from biological genes, we term this phenomenon the gene effect. The gene effect limits enhancement performance, as even random parameters can sometimes outperform learned ones, preventing models from fully utilizing their capacity. In this paper, we investigate the reason and propose a solution. Based on our observations, we attribute the gene effect to static parameters, analogous to how fixed genetic configurations become maladaptive when environments change. Inspired by biological evolution, where adaptation to new environments relies on gene mutation and recombination, we propose parameter dynamic evolution (PDE) to adapt to different images and mitigate the gene effect. PDE employs a parameter orthogonal generation technique and the corresponding generated parameters to simulate gene recombination and gene mutation, separately. Experiments validate the effectiveness of our techniques. The code will be released to the public.
GenSpace: Benchmarking Spatially-Aware Image Generation
Zehan Wang, Jiayang Xu, Ziang Zhang
et al.
Humans can intuitively compose and arrange scenes in the 3D space for photography. However, can advanced AI image generators plan scenes with similar 3D spatial awareness when creating images from text or image prompts? We present GenSpace, a novel benchmark and evaluation pipeline to comprehensively assess the spatial awareness of current image generation models. Furthermore, standard evaluations using general Vision-Language Models (VLMs) frequently fail to capture the detailed spatial errors. To handle this challenge, we propose a specialized evaluation pipeline and metric, which reconstructs 3D scene geometry using multiple visual foundation models and provides a more accurate and human-aligned metric of spatial faithfulness. Our findings show that while AI models create visually appealing images and can follow general instructions, they struggle with specific 3D details like object placement, relationships, and measurements. We summarize three core limitations in the spatial perception of current state-of-the-art image generation models: 1) Object Perspective Understanding, 2) Egocentric-Allocentric Transformation and 3) Metric Measurement Adherence, highlighting possible directions for improving spatial intelligence in image generation.
Aerial Image Stitching Using IMU Data from a UAV
Selim Ahmet Iz, Mustafa Unel
Unmanned Aerial Vehicles (UAVs) are widely used for aerial photography and remote sensing applications. One of the main challenges is to stitch together multiple images into a single high-resolution image that covers a large area. Featurebased image stitching algorithms are commonly used but can suffer from errors and ambiguities in feature detection and matching. To address this, several approaches have been proposed, including using bundle adjustment techniques or direct image alignment. In this paper, we present a novel method that uses a combination of IMU data and computer vision techniques for stitching images captured by a UAV. Our method involves several steps such as estimating the displacement and rotation of the UAV between consecutive images, correcting for perspective distortion, and computing a homography matrix. We then use a standard image stitching algorithm to align and blend the images together. Our proposed method leverages the additional information provided by the IMU data, corrects for various sources of distortion, and can be easily integrated into existing UAV workflows. Our experiments demonstrate the effectiveness and robustness of our method, outperforming some of the existing feature-based image stitching algorithms in terms of accuracy and reliability, particularly in challenging scenarios such as large displacements, rotations, and variations in camera pose.
MFSR-GAN: Multi-Frame Super-Resolution with Handheld Motion Modeling
Fadeel Sher Khan, Joshua Ebenezer, Hamid Sheikh
et al.
Smartphone cameras have become ubiquitous imaging tools, yet their small sensors and compact optics often limit spatial resolution and introduce distortions. Combining information from multiple low-resolution (LR) frames to produce a high-resolution (HR) image has been explored to overcome the inherent limitations of smartphone cameras. Despite the promise of multi-frame super-resolution (MFSR), current approaches are hindered by datasets that fail to capture the characteristic noise and motion patterns found in real-world handheld burst images. In this work, we address this gap by introducing a novel synthetic data engine that uses multi-exposure static images to synthesize LR-HR training pairs while preserving sensor-specific noise characteristics and image motion found during handheld burst photography. We also propose MFSR-GAN: a multi-scale RAW-to-RGB network for MFSR. Compared to prior approaches, MFSR-GAN emphasizes a "base frame" throughout its architecture to mitigate artifacts. Experimental results on both synthetic and real data demonstrates that MFSR-GAN trained with our synthetic engine yields sharper, more realistic reconstructions than existing methods for real-world MFSR.
Survey on Single-Image Reflection Removal using Deep Learning Techniques
Kangning Yang, Huiming Sun, Jie Cai
et al.
The phenomenon of reflection is quite common in digital images, posing significant challenges for various applications such as computer vision, photography, and image processing. Traditional methods for reflection removal often struggle to achieve clean results while maintaining high fidelity and robustness, particularly in real-world scenarios. Over the past few decades, numerous deep learning-based approaches for reflection removal have emerged, yielding impressive results. In this survey, we conduct a comprehensive review of the current literature by focusing on key venues such as ICCV, ECCV, CVPR, NeurIPS, etc., as these conferences and journals have been central to advances in the field. Our review follows a structured paper selection process, and we critically assess both single-stage and two-stage deep learning methods for reflection removal. The contribution of this survey is three-fold: first, we provide a comprehensive summary of the most recent work on single-image reflection removal; second, we outline task hypotheses, current deep learning techniques, publicly available datasets, and relevant evaluation metrics; and third, we identify key challenges and opportunities in deep learning-based reflection removal, highlighting the potential of this rapidly evolving research area.
Perception-Inspired Color Space Design for Photo White Balance Editing
Yang Cheng, Ziteng Cui, Shenghan Su
et al.
White balance (WB) is a key step in the image signal processor (ISP) pipeline that mitigates color casts caused by varying illumination and restores the scene's true colors. Currently, sRGB-based WB editing for post-ISP WB correction is widely used to address color constancy failures in the ISP pipeline when the original camera RAW is unavailable. However, additive color models (e.g., sRGB) are inherently limited by fixed nonlinear transformations and entangled color channels, which often impede their generalization to complex lighting conditions. To address these challenges, we propose a novel framework for WB correction that leverages a perception-inspired Learnable HSI (LHSI) color space. Built upon a cylindrical color model that naturally separates luminance from chromatic components, our framework further introduces dedicated parameters to enhance this disentanglement and learnable mapping to adaptively refine the flexibility. Moreover, a new Mamba-based network is introduced, which is tailored to the characteristics of the proposed LHSI color space. Experimental results on benchmark datasets demonstrate the superiority of our method, highlighting the potential of perception-inspired color space design in computational photography. The source code is available at https://github.com/YangCheng58/WB_Color_Space.
Modeling and analysis of fractal transformation of distorted images of the Earth’s surface obtained by optoelectronic surveillance systems
A. S. Andrusenko, A. N. Grigor’ev, D. S. Korshunov
The results of a study of methods for processing optoelectronic images of the Earth’s surface are presented. The application of fractal transformations to solve the problems of automated and automatic analysis of terrain images, ensuring the separation of natural and anthropogenic objects without the use of machine learning, is shown. The analysis of existing works has shown the absence of studies linking the result of fractal transformation with the image quality recorded in real conditions of optoelectronic photography. There is no justification for choosing a specific fractal transformation for the applied processing of images with certain typical distortions. The purpose of this work was to identify the dependence of the signal-to-noise ratio of fractal dimension on the quality of the source images, to determine the type of fractal transformation that is most resistant to the effects of the considered negative factors. Methods of fractal transformations for thematic image processing are defined, which include the prism method and the differential cube counting method, and their description is presented. To study the selected methods, real images of the Earth’s surface were used, simulating distorted images of the terrain. Image distortions determined by the instability of shooting conditions and the properties of the optoelectronic complex are considered: defocusing, smudging and noise. The mathematical models used to describe them are summarized. A technique for analyzing the signal-to-noise ratio of fractal transformation is described, involving the processing of reference and distorted images of the terrain. The aspects of distortion modeling and indicators characterizing the level of image distortion are indicated. To implement the experiment, images of the area were selected characterized by various plots. For each plot, the dependences of the signal-to-noise ratio on the indicators characterizing the studied distortions are obtained. By estimating the signal-tonoise ratio, the analysis of the influence of distorting factors on the fractal dimension field being formed was performed. The results of the experiment confirmed the possibility of using fractal transformations for thematic processing of distorted optoelectronic images. It is shown that the dependence of the signal-to-noise ratio on the distortion index has a pronounced nonlinear character. It is established that for distortions of the defocusing and smearing type, the prism method is more stable, and in the presence of noise, the differential cube method is more stable. For processing images of an area represented mainly by images of forest vegetation, the best result is shown by using the differential cube counting method.
Comparison of Three Methods for Lip Print Analysis: Lipstick, Latent, and Digital Photography for Sex Determination and Permanency Assessment Over Time
Sumanta K. Kolay, Rohit Sharma, Ajit K. Chaudhary
et al.
Background:
Lip print analysis, also known as cheiloscopy, is a valuable tool in forensic science for personal identification and sex determination.
Methods:
A total of 116 individuals (58 males and 58 females) were enrolled in this study. Lip prints were obtained using three methods: lipstick application, latent print development, and digital photography. The obtained lip prints were analyzed for accuracy, sex-specific variations, and stability over a 6-month period.
Results:
Lip print patterns exhibited significant differences between males and females across all three methods. The lipstick method demonstrated consistent and identifiable lip print patterns, with 25 males (43.1%) and 39 females (67.2%) displaying pattern 1. In contrast, the latent method showed less consistent results, with lip print pattern 1 observed in 25 males (43.1%) and 28 females (48.3%). Digital photography emerged as a promising alternative, offering detailed documentation capabilities, with pattern 1 observed in 30 males (51.7%) and 37 females (63.8%). However, no statistically significant association was found between lip print patterns and sex in the latent and digital methods. A longitudinal assessment revealed variations in lip print patterns over time, with some individuals showing changes in pattern distribution.
Conclusion:
These findings underscore the utility of lip print analysis in forensic science and emphasize the need for further research to enhance the interpretation of lip print evidence.
Pharmacy and materia medica, Analytical chemistry
Seeing Photographically and the Memory of Photography
Paul Frosh
“Seeing photographically” is an act of cultural memory. In an era of AI-generated images, screenshots, “disappearing” or “view once” photographs, and myriad other practices that challenge the definitional boundaries of photography, the phrase invokes past understandings of the medium’s sensory affordances, transferring them into a continually changing present. Focusing on a case study of the digital “rescue” of found film chemical photography, the article excavates cultural memory processes that relocate photographic seeing to digital arenas. The memory of “seeing photographically” does more, it claims, than preserve photography as a “zombie category” that disguises the reality of computational imagery. Rather, it helps construct and maintain a media ideology of what photography was and is, and of its continuing cultural, and especially existential, significance. Mobilizing worldviews, social values, and moral obligations associated with photography in the past, “seeing photographically” reanimates them in contemporary contexts of media ubiquity, intensified visibility, and existential anxiety, with profound ramifications.
Communication. Mass media
Implementing an Optimized and Secured Multimedia Streaming Protocol in a Participatory Sensing Scenario
Andrea Vaiuso
Multimedia streaming protocols are becoming increasingly popular in Crowdsensing due to their ability to deliver high-quality video content over the internet in real-time. Streaming multimedia content, as in the context of live video streaming, requires high bandwidth and large storage capacity to ensure a sufficient throughput. Crowdsensing can distribute information about shared video contents among multiple users in network, reducing storage capacity and computational and bandwidth requirements. However, Crowdsensing introduces several security constraints that must be taken into account to ensure the confidentiality, integrity, and availability of the data. In the specific case of video streaming, commonly named as visual crowdsensing (VCS) within this context, data is transmitted over wireless networks, making it vulnerable to security breaches and susceptible to eavesdropping and interception by attackers. Multimedias often contains sensitive user data and may be subject to various privacy laws, including data protection laws and laws related to photography and video recording, based on local GDPR (General Data Protection Regulation). For this reason the realization of a secure protocol optimized for a distributed data streaming in real-time becomes increasingly important in crowdsensing and smart-enviroment context. In this article, we will discuss the use of a symmetric AES-CTR encryption based protocol for securing data streaming over a crowd-sensed network.
Unified-EGformer: Exposure Guided Lightweight Transformer for Mixed-Exposure Image Enhancement
Eashan Adhikarla, Kai Zhang, Rosaura G. VidalMata
et al.
Despite recent strides made by AI in image processing, the issue of mixed exposure, pivotal in many real-world scenarios like surveillance and photography, remains inadequately addressed. Traditional image enhancement techniques and current transformer models are limited with primary focus on either overexposure or underexposure. To bridge this gap, we introduce the Unified-Exposure Guided Transformer (Unified-EGformer). Our proposed solution is built upon advanced transformer architectures, equipped with local pixel-level refinement and global refinement blocks for color correction and image-wide adjustments. We employ a guided attention mechanism to precisely identify exposure-compromised regions, ensuring its adaptability across various real-world conditions. U-EGformer, with a lightweight design featuring a memory footprint (peak memory) of only $\sim$1134 MB (0.1 Million parameters) and an inference time of 95 ms (9.61x faster than the average), is a viable choice for real-time applications such as surveillance and autonomous navigation. Additionally, our model is highly generalizable, requiring minimal fine-tuning to handle multiple tasks and datasets with a single architecture.
Exposure Bracketing Is All You Need For A High-Quality Image
Zhilu Zhang, Shuohao Zhang, Renlong Wu
et al.
It is highly desired but challenging to acquire high-quality photos with clear content in low-light environments. Although multi-image processing methods (using burst, dual-exposure, or multi-exposure images) have made significant progress in addressing this issue, they typically focus on specific restoration or enhancement problems, and do not fully explore the potential of utilizing multiple images. Motivated by the fact that multi-exposure images are complementary in denoising, deblurring, high dynamic range imaging, and super-resolution, we propose to utilize exposure bracketing photography to get a high-quality image by combining these tasks in this work. Due to the difficulty in collecting real-world pairs, we suggest a solution that first pre-trains the model with synthetic paired data and then adapts it to real-world unlabeled images. In particular, a temporally modulated recurrent network (TMRNet) and self-supervised adaptation method are proposed. Moreover, we construct a data simulation pipeline to synthesize pairs and collect real-world images from 200 nighttime scenarios. Experiments on both datasets show that our method performs favorably against the state-of-the-art multi-image processing ones. Code and datasets are available at https://github.com/cszhilu1998/BracketIRE.
Deep Generative Adversarial Network for Occlusion Removal from a Single Image
Sankaraganesh Jonna, Moushumi Medhi, Rajiv Ranjan Sahay
Nowadays, the enhanced capabilities of in-expensive imaging devices have led to a tremendous increase in the acquisition and sharing of multimedia content over the Internet. Despite advances in imaging sensor technology, annoying conditions like \textit{occlusions} hamper photography and may deteriorate the performance of applications such as surveillance, detection, and recognition. Occlusion segmentation is difficult because of scale variations, illumination changes, and so on. Similarly, recovering a scene from foreground occlusions also poses significant challenges due to the complexity of accurately estimating the occluded regions and maintaining coherence with the surrounding context. In particular, image de-fencing presents its own set of challenges because of the diverse variations in shape, texture, color, patterns, and the often cluttered environment. This study focuses on the automatic detection and removal of occlusions from a single image. We propose a fully automatic, two-stage convolutional neural network for fence segmentation and occlusion completion. We leverage generative adversarial networks (GANs) to synthesize realistic content, including both structure and texture, in a single shot for inpainting. To assess zero-shot generalization, we evaluated our trained occlusion detection model on our proposed fence-like occlusion segmentation dataset. The dataset can be found on GitHub.
Detonation driving rules for cylindrical casings under asymmetrical multipoint initiations
Yuan Li, Xiaogang Li, Yuquan Wen
et al.
The detonation wave–aiming warhead can effectively enhance the lethality efficiency. In the past, rules for casing rupture and velocity distribution under asymmetrical initiations were not adequately investigated. In this study, X-ray photography and numerical modelling are used to examine the casing expansions under centre point, asymmetrical one-point, and asymmetrical two-point (with central angles of 45° and 90°) initiations. The results indicate that early casing ruptures are caused by local high pressures, induced by the initiation, detonation wave interaction, and Mach wave onset. The fragment shapes are controlled by the impact angle of the detonation wave. The fragment velocity distributions differ under different initiation types, and the end rarefaction waves can affect the velocity distribution. This study can serve as a reference for the design and optimization of high-efficiency warheads.
Japońskie animacje a folklor w narracjach o tożsamości narodowej
MAREK BOCHNIARZ, IZUMI YOSHIDA
Japanese animation and folklore in narratives about national identity
This paper studies the activity of Japanese directors who decided to make film adaptations of the Momotaro story, a classic legend based on Japanese folklore and the most popular one in the modern period. This activity is studied within the framework of cultural politics. The works examined belong to the early period of Japanese animation and war propaganda films, as well as studio-based anime productions which were made in the postwar period.
Photography, Dramatic representation. The theater
A Generic Fundus Image Enhancement Network Boosted by Frequency Self-supervised Representation Learning
Heng Li, Haofeng Liu, Huazhu Fu
et al.
Fundus photography is prone to suffer from image quality degradation that impacts clinical examination performed by ophthalmologists or intelligent systems. Though enhancement algorithms have been developed to promote fundus observation on degraded images, high data demands and limited applicability hinder their clinical deployment. To circumvent this bottleneck, a generic fundus image enhancement network (GFE-Net) is developed in this study to robustly correct unknown fundus images without supervised or extra data. Levering image frequency information, self-supervised representation learning is conducted to learn robust structure-aware representations from degraded images. Then with a seamless architecture that couples representation learning and image enhancement, GFE-Net can accurately correct fundus images and meanwhile preserve retinal structures. Comprehensive experiments are implemented to demonstrate the effectiveness and advantages of GFE-Net. Compared with state-of-the-art algorithms, GFE-Net achieves superior performance in data dependency, enhancement performance, deployment efficiency, and scale generalizability. Follow-up fundus image analysis is also facilitated by GFE-Net, whose modules are respectively verified to be effective for image enhancement.