Hasil untuk "eess.IV"

Menampilkan 20 dari ~1 hasil · dari DOAJ, arXiv

JSON API
arXiv Open Access 2025
PathSeqSAM: Sequential Modeling for Pathology Image Segmentation with SAM2

Mingyang Zhu, Yinting Liu, Mingyu Li et al.

Current methods for pathology image segmentation typically treat 2D slices independently, ignoring valuable cross-slice information. We present PathSeqSAM, a novel approach that treats 2D pathology slices as sequential video frames using SAM2's memory mechanisms. Our method introduces a distance-aware attention mechanism that accounts for variable physical distances between slices and employs LoRA for domain adaptation. Evaluated on the KPI Challenge 2024 dataset for glomeruli segmentation, PathSeqSAM demonstrates improved segmentation quality, particularly in challenging cases that benefit from cross-slice context. We have publicly released our code at https://github.com/JackyyyWang/PathSeqSAM.

en eess.IV, cs.CV
arXiv Open Access 2024
Hierarchical Loss And Geometric Mask Refinement For Multilabel Ribs Segmentation

Aleksei Leonov, Aleksei Zakharov, Sergey Koshelev et al.

Automatic ribs segmentation and numeration can increase computed tomography assessment speed and reduce radiologists mistakes. We introduce a model for multilabel ribs segmentation with hierarchical loss function, which enable to improve multilabel segmentation quality. Also we propose postprocessing technique to further increase labeling quality. Our model achieved new state-of-the-art 98.2% label accuracy on public RibSeg v2 dataset, surpassing previous result by 6.7%.

en eess.IV, cs.CV
arXiv Open Access 2024
CRIS: Collaborative Refinement Integrated with Segmentation for Polyp Segmentation

Ankush Gajanan Arudkar, Bernard J. E. Evans

Accurate detection of colorectal cancer and early prevention heavily rely on precise polyp identification during gastrointestinal colonoscopy. Due to limited data, many current state-of-the-art deep learning methods for polyp segmentation often rely on post-processing of masks to reduce noise and enhance results. In this study, we propose an approach that integrates mask refinement and binary semantic segmentation, leveraging a novel collaborative training strategy that surpasses current widely-used refinement strategies. We demonstrate the superiority of our approach through comprehensive evaluation on established benchmark datasets and its successful application across various medical image segmentation architectures.

en eess.IV, cs.CV
arXiv Open Access 2024
Efficient Image Compression Using Advanced State Space Models

Bouzid Arezki, Anissa Mokraoui, Fangchen Feng

Transformers have led to learning-based image compression methods that outperform traditional approaches. However, these methods often suffer from high complexity, limiting their practical application. To address this, various strategies such as knowledge distillation and lightweight architectures have been explored, aiming to enhance efficiency without significantly sacrificing performance. This paper proposes a State Space Model-based Image Compression (SSMIC) architecture. This novel architecture balances performance and computational efficiency, making it suitable for real-world applications. Experimental evaluations confirm the effectiveness of our model in achieving a superior BD-rate while significantly reducing computational complexity and latency compared to competitive learning-based image compression methods.

en eess.IV
arXiv Open Access 2024
Demons registration for 2D empirical wavelet transforms

Charles-Gérard Lucas, Jérôme Gilles

The empirical wavelet transform is a fully adaptive time-scale representation that has been widely used in the last decade. Inspired by the empirical mode decomposition, it consists of filter banks based on harmonic mode supports. Recently, it has been generalized to build the filter banks from any generating function using mappings. In practice, the harmonic mode supports can have low constrained shape in 2D, leading to numerical difficulties to compute the mappings and therefore the related wavelet filters. This work aims to propose an efficient numerical scheme to compute empirical wavelet coefficients using the demons registration algorithm. Results show that the proposed approach gives a numerically robust wavelet transform. An application to texture segmentation of scanning tunnelling microscope images is also presented.

en eess.IV
arXiv Open Access 2024
LMM-driven Semantic Image-Text Coding for Ultra Low-bitrate Learned Image Compression

Shimon Murai, Heming Sun, Jiro Katto

Supported by powerful generative models, low-bitrate learned image compression (LIC) models utilizing perceptual metrics have become feasible. Some of the most advanced models achieve high compression rates and superior perceptual quality by using image captions as sub-information. This paper demonstrates that using a large multi-modal model (LMM), it is possible to generate captions and compress them within a single model. We also propose a novel semantic-perceptual-oriented fine-tuning method applicable to any LIC network, resulting in a 41.58\% improvement in LPIPS BD-rate compared to existing methods. Our implementation and pre-trained weights are available at https://github.com/tokkiwa/ImageTextCoding.

en eess.IV, cs.CV
arXiv Open Access 2024
U-WNO:U-Net-enhanced Wavelet Neural Operator for fetal head segmentation

Pranava Seth, Deepak Mishra, Veena Iyer

This article describes the development of a novel U-Net-enhanced Wavelet Neural Operator (U-WNO),which combines wavelet decomposition, operator learning, and an encoder-decoder mechanism. This approach harnesses the superiority of the wavelets in time frequency localization of the functions, and the combine down-sampling and up-sampling operations to generate the segmentation map to enable accurate tracking of patterns in spatial domain and effective learning of the functional mappings to perform regional segmentation. By bridging the gap between theoretical advancements and practical applications, the U-WNO holds potential for significant impact in multiple science and industrial fields, facilitating more accurate decision-making and improved operational efficiencies. The operator is demonstrated for different pregnancy trimesters, utilizing two-dimensional ultrasound images.

en eess.IV, cs.CV
arXiv Open Access 2023
Multi-Task Learning for Screen Content Image Coding

Rashid Zamanshoar Heris, Ivan V. Bajić

With the rise of remote work and collaboration, compression of screen content images (SCI) is becoming increasingly important. While there are efficient codecs for natural images, as well as codecs for purely-synthetic images, those SCIs that contain both synthetic and natural content pose a particular challenge. In this paper, we propose a learning-based image coding model developed for such SCIs. By training an encoder to provide a latent representation suitable for two tasks -- input reconstruction and synthetic/natural region segmentation -- we create an effective SCI image codec whose strong performance is verified through experiments. Once trained, the second task (segmentation) need not be used; the codec still benefits from the segmentation-friendly latent representation.

en eess.IV
arXiv Open Access 2023
Edge-weighted pFISTA-Net for MRI Reconstruction

Jianpeng Cao

Deep learning based on unrolled algorithm has served as an effective method for accelerated magnetic resonance imaging (MRI). However, many methods ignore the direct use of edge information to assist MRI reconstruction. In this work, we present the edge-weighted pFISTA-Net that directly applies the detected edge map to the soft-thresholding part of pFISTA-Net. The soft-thresholding value of different regions will be adjusted according to the edge map. Experimental results of a public brain dataset show that the proposed yields reconstructions with lower error and better artifact suppression compared with the state-of-the-art deep learning-based methods. The edge-weighted pFISTA-Net also shows robustness for different undersampling masks and edge detection operators. In addition, we extend the edge weighted structure to joint reconstruction and segmentation network and obtain improved reconstruction performance and more accurate segmentation results.

en eess.IV, cs.CV
arXiv Open Access 2023
HDR-VDP-3: A multi-metric for predicting image differences, quality and contrast distortions in high dynamic range and regular content

Rafal K. Mantiuk, Dounia Hammou, Param Hanji

High-Dynamic-Range Visual-Difference-Predictor version 3, or HDR-VDP-3, is a visual metric that can fulfill several tasks, such as full-reference image/video quality assessment, prediction of visual differences between a pair of images, or prediction of contrast distortions. Here we present a high-level overview of the metric, position it with respect to related work, explain the main differences compared to version 2.2, and describe how the metric was adapted for the HDR Video Quality Measurement Grand Challenge 2023.

en eess.IV, cs.CV
arXiv Open Access 2023
Graph-based Active Learning for Surface Water and Sediment Detection in Multispectral Images

Bohan Chen, Kevin Miller, Andrea L. Bertozzi et al.

We develop a graph active learning pipeline (GAP) to detect surface water and in-river sediment pixels in satellite images. The active learning approach is applied within the training process to optimally select specific pixels to generate a hand-labeled training set. Our method obtains higher accuracy with far fewer training pixels than both standard and deep learning models. According to our experiments, our GAP trained on a set of 3270 pixels reaches a better accuracy than the neural network method trained on 2.1 million pixels.

en eess.IV
arXiv Open Access 2022
Improving The Reconstruction Quality by Overfitted Decoder Bias in Neural Image Compression

Oussama Jourairi, Muhammet Balcilar, Anne Lambert et al.

End-to-end trainable models have reached the performance of traditional handcrafted compression techniques on videos and images. Since the parameters of these models are learned over large training sets, they are not optimal for any given image to be compressed. In this paper, we propose an instance-based fine-tuning of a subset of decoder's bias to improve the reconstruction quality in exchange for extra encoding time and minor additional signaling cost. The proposed method is applicable to any end-to-end compression methods, improving the state-of-the-art neural image compression BD-rate by $3-5\%$.

en eess.IV, cs.CV
arXiv Open Access 2022
Low-Light Image Restoration Based on Retina Model using Neural Networks

Yurui Ming, Yuanyuan Liang

We report the possibility of using a simple neural network for effortless restoration of low-light images inspired by the retina model, which mimics the neurophysiological principles and dynamics of various types of optical neurons. The proposed neural network model saves the cost of computational overhead in contrast with traditional signal-processing models, and generates results comparable with complicated deep learning models from the subjective perceptual perspective. This work shows that to directly simulate the functionalities of retinal neurons using neural networks not only avoids the manually seeking for the optimal parameters, but also paves the way to build corresponding artificial versions for certain neurobiological organizations.

en eess.IV, cs.AI
arXiv Open Access 2022
Performance Comparison of Deep Learning Architectures for Artifact Removal in Gastrointestinal Endoscopic Imaging

Taira Watanabe, Kensuke Tanioka, Satoru Hiwa et al.

Endoscopic images typically contain several artifacts. The artifacts significantly impact image analysis result in computer-aided diagnosis. Convolutional neural networks (CNNs), a type of deep learning, can removes such artifacts. Various architectures have been proposed for the CNNs, and the accuracy of artifact removal varies depending on the choice of architecture. Therefore, it is necessary to determine the artifact removal accuracy, depending on the selected architecture. In this study, we focus on endoscopic surgical instruments as artifacts, and determine and discuss the artifact removal accuracy using seven different CNN architectures.

en eess.IV, cs.CV
arXiv Open Access 2022
Realization Scheme for Visual Cryptography with Computer-generated Holograms

Tao Yu, Jinge Ma, Guilin Li et al.

We propose to realize visual cryptography in an indirect way with the help of computer-generated hologram. At present, the recovery method of visual cryptography is mainly superimposed on transparent film or superimposed by computer equipment, which greatly limits the application range of visual cryptography. In this paper, the shares of the visual cryptography were encoded with computer-generated hologram, and the shares is reproduced by optical means, and then superimposed and decrypted. This method can expand the application range of visual cryptography and further increase the security of visual cryptography.

en eess.IV, cs.CR
arXiv Open Access 2022
COVID Detection and Severity Prediction with 3D-ConvNeXt and Custom Pretrainings

Daniel Kienzle, Julian Lorenz, Robin Schön et al.

Since COVID strongly affects the respiratory system, lung CT-scans can be used for the analysis of a patients health. We introduce a neural network for the prediction of the severity of lung damage and the detection of a COVID-infection using three-dimensional CT-data. Therefore, we adapt the recent ConvNeXt model to process three-dimensional data. Furthermore, we design and analyze different pretraining methods specifically designed to improve the models ability to handle three-dimensional CT-data. We rank 2nd in the 1st COVID19 Severity Detection Challenge and 3rd in the 2nd COVID19 Detection Challenge.

en eess.IV, cs.CV
arXiv Open Access 2022
Motion Compensated Frequency Selective Extrapolation for Error Concealment in Video Coding

Jürgen Seiler, André Kaup

Although wireless and IP-based access to video content gives a new degree of freedom to the viewers, the risk of severe block losses caused by transmission errors is always present. The purpose of this paper is to present a new method for concealing block losses in erroneously received video sequences. For this, a motion compensated data set is generated around the lost block. Based on this aligned data set, a model of the signal is created that continues the signal into the lost areas. Since spatial as well as temporal informations are used for the model generation, the proposed method is superior to methods that use either spatial or temporal information for concealment. Furthermore it outperforms current state of the art spatio-temporal concealment algorithms by up to 1.4 dB in PSNR.

en eess.IV, cs.MM
arXiv Open Access 2022
Optimized and Parallelized Processing Order for Improved Frequency Selective Signal Extrapolation

Jürgen Seiler, André Kaup

In the recent years, multi-core processor designs have found their way into many computing devices. To exploit the capabilities of such devices in the best possible way, signal processing algorithms have to be adapted to an operation in parallel tasks. In this contribution an optimized processing order is proposed for Frequency Selective Extrapolation, a powerful signal extrapolation algorithm. Using this optimized order, the extrapolation can be carried out in parallel. The algorithm scales very good, resulting in an acceleration of a factor of up to 7.7 for an eight core computer. Additionally, the optimized processing order aims at reducing the propagation of extrapolation errors over consecutive losses. Thus, in addition to the acceleration, a visually noticeable improvement in quality of up to 0.5 dB PSNR can be achieved.

en eess.IV
arXiv Open Access 2021
Electromagnetic neural source imaging under sparsity constraints with SURE-based hyperparameter tuning

Pierre-Antoine Bannier, Quentin Bertrand, Joseph Salmon et al.

Estimators based on non-convex sparsity-promoting penalties were shown to yield state-of-the-art solutions to the magneto-/electroencephalography (M/EEG) brain source localization problem. In this paper we tackle the model selection problem of these estimators: we propose to use a proxy of the Stein's Unbiased Risk Estimator (SURE) to automatically select their regularization parameters. The effectiveness of the method is demonstrated on realistic simulations and $30$ subjects from the Cam-CAN dataset. To our knowledge, this is the first time that sparsity promoting estimators are automatically calibrated at such a scale. Results show that the proposed SURE approach outperforms cross-validation strategies and state-of-the-art Bayesian statistics methods both computationally and statistically.

en eess.IV, eess.SP
arXiv Open Access 2018
Image denoising through bivariate shrinkage function in framelet domain

Hamid Reza Shahdoosti

Denoising of coefficients in a sparse domain (e.g. wavelet) has been researched extensively because of its simplicity and effectiveness. Literature mainly has focused on designing the best global threshold. However, this paper proposes a new denoising method using bivariate shrinkage function in framelet domain. In the proposed method, maximum aposteriori probability is used for estimate of the denoised coefficient and non-Gaussian bivariate function is applied to model the statistics of framelet coefficients. For every framelet coefficient, there is a corresponding threshold depending on the local statistics of framelet coefficients. Experimental results show that using bivariate shrinkage function in framelet domain yields significantly superior image quality and higher PSNR than some well-known denoising methods.

en eess.IV, cs.CV