Querying GI Endoscopy Images: A VQA Approach
Gaurav Parajuli
VQA (Visual Question Answering) combines Natural Language Processing (NLP) with image understanding to answer questions about a given image. It has enormous potential for the development of medical diagnostic AI systems. Such a system can help clinicians diagnose gastro-intestinal (GI) diseases accurately and efficiently. Although many of the multimodal LLMs available today have excellent VQA capabilities in the general domain, they perform very poorly for VQA tasks in specialized domains such as medical imaging. This study is a submission for ImageCLEFmed-MEDVQA-GI 2025 subtask 1 that explores the adaptation of the Florence2 model to answer medical visual questions on GI endoscopy images. We also evaluate the model performance using standard metrics like ROUGE, BLEU and METEOR
Channel characterization in screen-to-camera based optical camera communication
Vaigai Nayaki Yokar, Hoa Le Minh, Zabih Ghassemlooy
et al.
With the increase in optical camera communication (OCC), a screen to camera-based communication can be established. This opens a new field of visible light communication (VLC) known as smartphone to smartphone based visible light communication (S2SVLC) system. In this paper, we experimentally demonstrate a S2SVLC system based on VLC technology using a smartphone screen and a smartphone camera over a link span of 20 cms. We analyze the Lambertian order of the smartphone screen and carry out a channel characterization of a screen to camera link-based VLC system under specific test conditions.
MIDOG 2025: Mitotic Figure Detection with Attention-Guided False Positive Correction
Andrew Broad, Jason Keighley, Lucy Godson
et al.
We present a novel approach which extends the existing Fully Convolutional One-Stage Object Detector (FCOS) for mitotic figure detection. Our composite model adds a Feedback Attention Ladder CNN (FAL-CNN) model for classification of normal versus abnormal mitotic figures, feeding into a fusion network that is trained to generate adjustments to bounding boxes predicted by FCOS. Our network aims to reduce the false positive rate of the FCOS object detector, to improve the accuracy of object detection and enhance the generalisability of the network. Our model achieved an F1 score of 0.655 for mitosis detection on the preliminary evaluation dataset.
Rethinking Learned Image Compression: Context is All You Need
Jixiang Luo
Since LIC has made rapid progress recently compared to traditional methods, this paper attempts to discuss the question about 'Where is the boundary of Learned Image Compression(LIC)?'. Thus this paper splits the above problem into two sub-problems:1)Where is the boundary of rate-distortion performance of PSNR? 2)How to further improve the compression gain and achieve the boundary? Therefore this paper analyzes the effectiveness of scaling parameters for encoder, decoder and context model, which are the three components of LIC. Then we conclude that scaling for LIC is to scale for context model and decoder within LIC. Extensive experiments demonstrate that overfitting can actually serve as an effective context. By optimizing the context, this paper further improves PSNR and achieves state-of-the-art performance, showing a performance gain of 14.39% with BD-RATE over VVC.
Hyperspectral Unmixing of Agricultural Images taken from UAV Using Adapted U-Net Architecture
Vytautas Paura, Virginijus Marcinkevičius
The hyperspectral unmixing method is an algorithm that extracts material (usually called endmember) data from hyperspectral data cube pixels along with their abundances. Due to a lower spatial resolution of hyperspectral sensors data in each of the pixels may contain mixed information from multiple endmembers. In this paper we create a hyperspectral unmixing dataset, created from blueberry field data gathered by a hyperspectral camera mounted on a UAV. We also propose a hyperspectral unmixing algorithm based on U-Net network architecture to achieve more accurate unmixing results on existing and newly created hyperspectral unmixing datasets.
Parallax in angular sensitive powder diffraction tomography
Peter Modregger, Ahmar Khaliq, Felix Wittwer
While a few methods for the determination of depth-resolved strain distributions each with inherent limitations are available, tomographic reconstruction has been applied to this problem in only a limited sense. One of the challenges was the potential impact of geometric parallax, which constitutes a non-negligible lateral offset of diffraction information arising from different sample depths at the detector. Here, the effect of parallax was investigated and two main results have emerged. First, the impact of parallax was found to be additive to other offset contributions, which implies a straightforward correction. Second, for tomographic scans utilizing a full 360° rotation parallax has been found to have no impact on reconstructions of angular information.
en
eess.IV, cond-mat.mtrl-sci
Convergence Analysis of a Proximal Stochastic Denoising Regularization Algorithm
Marien Renaud, Julien Hermant, Nicolas Papadakis
Plug-and-Play methods for image restoration are iterative algorithms that solve a variational problem to recover a clean image from a degraded observation. These algorithms are known to be flexible to changes of degradation and to perform state-of-the-art restoration. Recently, significant efforts have been made to explore new stochastic algorithms based on the Plug-and-Play or REgularization by Denoising (RED) frameworks, such as SNORE, which is a convergent stochastic gradient descent algorithm. A variant of this algorithm, named SNORE Prox, reaches state-of-the-art performances, especially for inpainting tasks. However, the convergence of SNORE Prox, that can be seen as a stochastic proximal gradient descent, has not been analyzed so far. In this paper, we prove the convergence of SNORE Prox under non convex assumptions.
Phase retrieval via non-rigid image registration
Erik Malm
Phase retrieval is the numerical procedure of recovering a complex-valued signal from knowledge about its amplitude and some additional information. Here, an indirect registration procedure, based on the large deformation diffeomorphic metric mapping (LDDMM) formalism, is investigated as a phase retrieval method for coherent diffractive imaging. The method attempts to find a deformation which transforms an initial, template image to match an unknown target image by comparing the diffraction pattern to the data. The exterior calculus framework is used to treat different types of deformations in a unified and coordinate-free way. The algorithm performance with respect to measurement noise, image topology, and particular action are explored through numerical examples.
An invariant feature extraction for multi-modal images matching
Chenzhong Gao, Wei Li
This paper aims at providing an effective multi-modal images invariant feature extraction and matching algorithm for the application of multi-source data analysis. Focusing on the differences and correlation of multi-modal images, a feature-based matching algorithm is implemented. The key technologies include phase congruency (PC) and Shi-Tomasi feature point for keypoints detection, LogGabor filter and a weighted partial main orientation map (WPMOM) for feature extraction, and a multi-scale process to deal with scale differences and optimize matching results. The experimental results on practical data from multiple sources prove that the algorithm has effective performances on multi-modal images, which achieves accurate spatial alignment, showing practical application value and good generalization.
Generalised Diffusion Probabilistic Scale-Spaces
Pascal Peter
Diffusion probabilistic models excel at sampling new images from learned distributions. Originally motivated by drift-diffusion concepts from physics, they apply image perturbations such as noise and blur in a forward process that results in a tractable probability distribution. A corresponding learned reverse process generates images and can be conditioned on side information, which leads to a wide variety of practical applications. Most of the research focus currently lies on practice-oriented extensions. In contrast, the theoretical background remains largely unexplored, in particular the relations to drift-diffusion. In order to shed light on these connections to classical image filtering, we propose a generalised scale-space theory for diffusion probabilistic models. Moreover, we show conceptual and empirical connections to diffusion and osmosis filters.
Apoptosis classification using attention based spatio temporal graph convolution neural network
Akash Awasthi
Accurate classification of apoptosis plays an important role in cell biology research. There are many state-of-the-art approaches which use deep CNNs to perform the apoptosis classification but these approaches do not account for the cell interaction. Our paper proposes the Attention Graph spatio-temporal graph convolutional network to classify the cell death based on the target cells in the video. This method considers the interaction of multiple target cells at each time stamp. We model the whole video sequence as a set of graphs and classify the target cell in the video as dead or alive. Our method encounters both spatial and temporal relationships.
Opening the Black Box of Learned Image Coders
Zhihao Duan, Ming Lu, Zhan Ma
et al.
End-to-end learned lossy image coders (LICs), as opposed to hand-crafted image codecs, have shown increasing superiority in terms of the rate-distortion performance. However, they are mainly treated as black-box systems and their interpretability is not well studied. In this paper, we show that LICs learn a set of basis functions to transform input image for its compact representation in the latent space, as analogous to the orthogonal transforms used in image coding standards. Our analysis provides insights to help understand how learned image coders work and could benefit future design and development.
Orientation recognition and correction of Cardiac MRI with deep neural network
Jiyao Liu
In this paper, the problem of orientation correction in cardiac MRI images is investigated and a framework for orientation recognition via deep neural networks is proposed. For multi-modality MRI, we introduce a transfer learning strategy to transfer our proposed model from single modality to multi-modality. We embed the proposed network into the orientation correction command-line tool, which can implement orientation correction on 2D DICOM and 3D NIFTI images. Our source code, network models and tools are available at https://github.com/Jy-stdio/MSCMR_orient/
A geometry method for LED mapping
Junlin Huang, Shangsheng Wen, Weipeng Guan
With inputs from RGB-D camera, industrial camera and wheel odometer, in this letter, we propose a geometry-based detecting method, by which the 3-D modulated LED map can be acquired with the aid of visual odometry algorithm from ORB-SLAM2 system when the decoding result of LED-ID is inaccurate. Subsequently, an enhanced cost function is proposed to optimize the mapping result of LEDs. The average 3-D mapping error (8.5cm) is evaluated with a real-world experiment. This work can be viewed as a preliminary work of visible light positioning systems, offering a way to prevent the labor-intensive manual site surveys of LEDs.
ChestX-Det10: Chest X-ray Dataset on Detection of Thoracic Abnormalities
Jingyu Liu, Jie Lian, Yizhou Yu
Instance level detection of thoracic diseases or abnormalities are crucial for automatic diagnosis in chest X-ray images. Most existing works on chest X-rays focus on disease classification and weakly supervised localization. In order to push forward the research on disease classification and localization on chest X-rays. We provide a new benchmark called ChestX-Det10, including box-level annotations of 10 categories of disease/abnormality of $\sim$ 3,500 images. The annotations are located at https://github.com/Deepwise-AILab/ChestX-Det10-Dataset.
Effect of the regularization hyperparameter on deep learning-based segmentation in LGE-MRI
Olivier Rukundo
The extent to which the arbitrarily selected L2 regularization hyperparameter value affects the outcome of semantic segmentation with deep learning is demonstrated. Demonstrations rely on training U-net on small LGE-MRI datasets using the arbitrarily selected L2 regularization values. The remaining hyperparameters are to be manually adjusted or tuned only when 10 % of all epochs are reached before the training validation accuracy reaches 90%. Semantic segmentation with deep learning outcomes are objectively and subjectively evaluated against the manual ground truth segmentation.
Comparison between CS and JPEG in terms of image compression
Danko Petric, Marija Milinkovic
The comparison between two approaches, JPEG and Compressive Sensing, is done in the paper. The approaches are compared in terms of image compression. Comparison is done by measuring the image quality versus number of samples used for image recovering. Images are visually compared. Also, numerical quality value, PSNR, is calculated and compared for the two approaches. It is shown that images, recovered by using the Compressive Sensing approach, have higher PSNR values compared to the images under JPEG compression. Difference is larger in grayscale images with small number of details, like e.g. medical images (x-ray). The theory is supported by the experimental results.
Texture Object Segmentation Based on Affine Invariant Texture Detection
Jianwei Zhang, Xu Chen, Xuezhong Xiao
To solve the issue of segmenting rich texture images, a novel detection methods based on the affine invariable principle is proposed. Considering the similarity between the texture areas, we first take the affine transform to get numerous shapes, and utilize the KLT algorithm to verify the similarity. The transforms include rotation, proportional transformation and perspective deformation to cope with a variety of situations. Then we propose an improved LBP method combining canny edge detection to handle the boundary in the segmentation process. Moreover, human-computer interaction of this method which helps splitting the matched texture area from the original images is user-friendly.
High efficiency compression for object detection
Hyomin Choi, Ivan V. Bajic
Image and video compression has traditionally been tailored to human vision. However, modern applications such as visual analytics and surveillance rely on computers seeing and analyzing the images before (or instead of) humans. For these applications, it is important to adjust compression to computer vision. In this paper we present a bit allocation and rate control strategy that is tailored to object detection. Using the initial convolutional layers of a state-of-the-art object detector, we create an importance map that can guide bit allocation to areas that are important for object detection. The proposed method enables bit rate savings of 7% or more compared to default HEVC, at the equivalent object detection rate.
Optimization of phase retrieval in the Fresnel domain by the modified Gerchberg-Saxton algorithm
Soheil Mehrabkhani, Melvin Kuester
The modified Gerchberg-Saxton algorithm (MGSA) is one of the standard methods for phase retrieval. In this work we apply the MGSA in the paraxial domain. For three given physical parameters - i.e. wavelength, propagation distance and pixel size the computational width in the Fresnel-Transform is fixed. This width can be larger than the real dimension of the input or output images. Consequently, it can induce a padding around the real input and output without given amplitude (intensity) values. To solve this problem, we propose a very simple and efficient solution and compare it to other approaches. We demonstrate that the new modified GSA provides almost perfect results without losing the time efficiency of the simplest method.
en
eess.IV, physics.optics