Hasil untuk "eess.IV"

Menampilkan 20 dari ~772779 hasil · dari arXiv, DOAJ, CrossRef

JSON API
arXiv Open Access 2025
Comparison of Neural Models for X-ray Image Classification in COVID-19 Detection

Jimi Togni, Romis Attux

This study presents a comparative analysis of methods for detecting COVID-19 infection in radiographic images. The images, sourced from publicly available datasets, were categorized into three classes: 'normal,' 'pneumonia,' and 'COVID.' For the experiments, transfer learning was employed using eight pre-trained networks: SqueezeNet, DenseNet, ResNet, AlexNet, VGG, GoogleNet, ShuffleNet, and MobileNet. DenseNet achieved the highest accuracy of 97.64% using the ADAM optimization function in the multiclass approach. In the binary classification approach, the highest precision was 99.98%, obtained by the VGG, ResNet, and MobileNet networks. A comparative evaluation was also conducted using heat maps.

en eess.IV, cs.LG
arXiv Open Access 2025
MDN: Mamba-Driven Dualstream Network For Medical Hyperspectral Image Segmentation

Shijie Lin, Boxiang Yun, Wei Shen et al.

Medical Hyperspectral Imaging (MHSI) offers potential for computational pathology and precision medicine. However, existing CNN and Transformer struggle to balance segmentation accuracy and speed due to high spatial-spectral dimensionality. In this study, we leverage Mamba's global context modeling to propose a dual-stream architecture for joint spatial-spectral feature extraction. To address the limitation of Mamba's unidirectional aggregation, we introduce a recurrent spectral sequence representation to capture low-redundancy global spectral features. Experiments on a public Multi-Dimensional Choledoch dataset and a private Cervical Cancer dataset show that our method outperforms state-of-the-art approaches in segmentation accuracy while minimizing resource usage and achieving the fastest inference speed. Our code will be available at https://github.com/DeepMed-Lab-ECNU/MDN.

en eess.IV
arXiv Open Access 2025
Skull stripping with purely synthetic data

Jong Sung Park, Juhyung Ha, Siddhesh Thakur et al.

While many skull stripping algorithms have been developed for multi-modal and multi-species cases, there is still a lack of a fundamentally generalizable approach. We present PUMBA(PUrely synthetic Multimodal/species invariant Brain extrAction), a strategy to train a model for brain extraction with no real brain images or labels. Our results show that even without any real images or anatomical priors, the model achieves comparable accuracy in multi-modal, multi-species and pathological cases. This work presents a new direction of research for any generalizable medical image segmentation task.

en eess.IV, cs.CV
arXiv Open Access 2025
Variable Rate Image Compression via N-Gram Context based Swin-transformer

Priyanka Mudgal

This paper presents an N-gram context-based Swin Transformer for learned image compression. Our method achieves variable-rate compression with a single model. By incorporating N-gram context into the Swin Transformer, we overcome its limitation of neglecting larger regions during high-resolution image reconstruction due to its restricted receptive field. This enhancement expands the regions considered for pixel restoration, thereby improving the quality of high-resolution reconstructions. Our method increases context awareness across neighboring windows, leading to a -5.86\% improvement in BD-Rate over existing variable-rate learned image compression techniques. Additionally, our model improves the quality of regions of interest (ROI) in images, making it particularly beneficial for object-focused applications in fields such as manufacturing and industrial vision systems.

en eess.IV, cs.CV
arXiv Open Access 2025
Confidence-Weighted Semi-Supervised Learning for Skin Lesion Segmentation Using Hybrid CNN-Transformer Networks

Saqib Qamar

Automated skin lesion segmentation through dermoscopic analysis is essential for early skin cancer detection, yet remains challenging due to limited annotated training data. We present MIRA-U, a semi-supervised framework that combines uncertainty-aware teacher-student pseudo-labeling with a hybrid CNN-Transformer architecture. Our approach employs a teacher network pre-trained via masked image modeling to generate confidence-weighted soft pseudo-labels, which guide a U-shaped CNN-Transformer student network featuring cross-attention skip connections. This design enhances pseudo-label quality and boundary delineation, surpassing reconstruction-based and CNN-only baselines, particularly in low-annotation regimes. Extensive evaluation on ISIC-2016 and PH2 datasets demonstrates superior performance, achieving a Dice Similarity Coefficient (DSC) of 0.9153 and Intersection over Union (IoU) of 0.8552 using only 50% labeled data. Code is publicly available on GitHub.

en eess.IV, cs.CV
arXiv Open Access 2025
Leveraging Overfitting for Low-Complexity and Modality-Agnostic Joint Source-Channel Coding

Haotian Wu, Gen Li, Pier Luigi Dragotti et al.

This paper introduces Implicit-JSCC, a novel overfitted joint source-channel coding paradigm that directly optimizes channel symbols and a lightweight neural decoder for each source. This instance-specific strategy eliminates the need for training datasets or pre-trained models, enabling a storage-free, modality-agnostic solution. As a low-complexity alternative, Implicit-JSCC achieves efficient image transmission with around 1000x lower decoding complexity, using as few as 607 model parameters and 641 multiplications per pixel. This overfitted design inherently addresses source generalizability and achieves state-of-the-art results in the high SNR regimes, underscoring its promise for future communication systems, especially streaming scenarios where one-time offline encoding supports multiple online decoding.

en eess.IV, cs.IT
arXiv Open Access 2025
Potential Contrast: Properties, Equivalences, and Generalization to Multiple Classes

Wallace Peaslee, Anna Breger, Carola-Bibiane Schönlieb

Potential contrast is typically used as an image quality measure and quantifies the maximal possible contrast between samples from two classes of pixels in an image after an arbitrary grayscale transformation. It has been applied in cultural heritage to evaluate multispectral images using a small number of labeled pixels. In this work, we introduce a normalized version of potential contrast that removes dependence on image format and also prove equalities that enable generalization to more than two classes and to continuous settings. Finally, we exemplify the utility of multi-class normalized potential contrast through an application to a medieval music manuscript with visible bleedthrough from the back of the page. We share our implementations, based on both original algorithms and our new equalities, including generalization to multiple classes, at https://github.com/wallacepeaslee/Multiple-Class-Normalized-Potential-Contrast.

en eess.IV, math.ST
arXiv Open Access 2025
Improving Diagnostic Accuracy of Pigmented Skin Lesions With CNNs: an Application on the DermaMNIST Dataset

Nerma Kadric, Amila Akagic, Medina Kapo

Pigmented skin lesions represent localized areas of increased melanin and can indicate serious conditions like melanoma, a major contributor to skin cancer mortality. The MedMNIST v2 dataset, inspired by MNIST, was recently introduced to advance research in biomedical imaging and includes DermaMNIST, a dataset for classifying pigmented lesions based on the HAM10000 dataset. This study assesses ResNet-50 and EfficientNetV2L models for multi-class classification using DermaMNIST, employing transfer learning and various layer configurations. One configuration achieves results that match or surpass existing methods. This study suggests that convolutional neural networks (CNNs) can drive progress in biomedical image analysis, significantly enhancing diagnostic accuracy.

en eess.IV, cs.AI
arXiv Open Access 2024
METRIC: a complete methodology for performances evaluation of automatic target Detection, Recognition and Tracking algorithms in infrared imagery

Jérôme Gilles, Stéphane Landeau, Tristan Dagobert et al.

In this communication, we deal with the question of automatic target detection, recognition and tracking (ATD/R/T) algorithms performance assessment. We propose a complete methodology of evaluation which approaches objective image datasets development and adapted metrics definition for the different tasks (detection, recognition and tracking). We present some performance results which are currently processed in a French-MoD program called 2ACI (``Acquisition Automatique de Cibles par Imagerie``).

en eess.IV, cs.CV
arXiv Open Access 2024
Enhancing Bronchoscopy Depth Estimation through Synthetic-to-Real Domain Adaptation

Qingyao Tian, Huai Liao, Xinyan Huang et al.

Monocular depth estimation has shown promise in general imaging tasks, aiding in localization and 3D reconstruction. While effective in various domains, its application to bronchoscopic images is hindered by the lack of labeled data, challenging the use of supervised learning methods. In this work, we propose a transfer learning framework that leverages synthetic data with depth labels for training and adapts domain knowledge for accurate depth estimation in real bronchoscope data. Our network demonstrates improved depth prediction on real footage using domain adaptation compared to training solely on synthetic data, validating our approach.

en eess.IV, cs.CV
arXiv Open Access 2024
Comparing ImageNet Pre-training with Digital Pathology Foundation Models for Whole Slide Image-Based Survival Analysis

Kleanthis Marios Papadopoulos, Tania Stathaki

The abundance of information present in Whole Slide Images (WSIs) renders them an essential tool for survival analysis. Several Multiple Instance Learning frameworks proposed for this task utilize a ResNet50 backbone pre-trained on natural images. By leveraging recenetly released histopathological foundation models such as UNI and Hibou, the predictive prowess of existing MIL networks can be enhanced. Furthermore, deploying an ensemble of digital pathology foundation models yields higher baseline accuracy, although the benefits appear to diminish with more complex MIL architectures. Our code will be made publicly available upon acceptance.

en eess.IV, cs.CV
arXiv Open Access 2023
An Improved Upper Bound on the Rate-Distortion Function of Images

Zhihao Duan, Jack Ma, Jiangpeng He et al.

Recent work has shown that Variational Autoencoders (VAEs) can be used to upper-bound the information rate-distortion (R-D) function of images, i.e., the fundamental limit of lossy image compression. In this paper, we report an improved upper bound on the R-D function of images implemented by (1) introducing a new VAE model architecture, (2) applying variable-rate compression techniques, and (3) proposing a novel \ourfunction{} to stabilize training. We demonstrate that at least 30\% BD-rate reduction w.r.t. the intra prediction mode in VVC codec is achievable, suggesting that there is still great potential for improving lossy image compression. Code is made publicly available at https://github.com/duanzhiihao/lossy-vae.

en eess.IV
arXiv Open Access 2022
An incomplete taxonomy of self-assigned color specialties

Jan Morovic

Every discipline in science or professional practice can be sub-divided into specialties and subspecialties. E.g., in medicine a doctor could indicate that their specialty is ophthalmology and their sub-specialty the retina, or in chemistry a specialty could be analytical chemistry and a sub-specialty spectroscopy. But, what are the specialties and sub-specialties of the color discipline? The present report shares the anonymized results of a 152-participant, online survey conducted between 23rd May and 8th June 2022 that suggests 11 top-level color specialties.

en eess.IV
arXiv Open Access 2020
Benchmarking the Gerchberg-Saxton Algorithm

Peter J. Christopher, George S. D. Gordon, Timothy D. Wilkinson

Due to the proliferation of spatial light modulators, digital holography is finding wide-spread use in fields from augmented reality to medical imaging to additive manufacturing to lithography to optical tweezing to telecommunications. There are numerous types of SLM available with a multitude of algorithms for generating holograms. Each algorithm has limitations in terms of convergence speed, power efficiency, accuracy and data storage requirement. Here, we consider probably the most common algorithm for computer generated holography - Gerchberg-Saxton - and examine the trade-off in convergent quality, performance and efficiency. In particular, we focus on measuring and understanding the factors that control runtime and convergence.

en eess.IV, physics.optics
arXiv Open Access 2020
On the Information Leakage of Camera Fingerprint Estimates

Samuel Fernández-Menduiña, Fernando Pérez-González

Camera fingerprints based on sensor PhotoResponse Non-Uniformity (PRNU) have gained broad popularity in forensic applications due to their ability to univocally identify the camera that captured a certain image. This fingerprint of a given sensor is extracted through some estimation method that requires a few images known to be taken with such sensor. In this paper, we show that the fingerprints extracted in this way leak a considerable amount of information from those images used in the estimation, thus constituting a potential threat to privacy. We propose to quantify the leakage via two measures: one based on the Mutual Information, and another based on the output of a membership inference test. Experiments with practical fingerprint estimators on a real-world image dataset confirm the validity of our measures and highlight the seriousness of the leakage and the importance of implementing techniques to mitigate it. Some of these techniques are presented and briefly discussed.

en eess.IV
arXiv Open Access 2020
Real time computer generation of three-dimensional point cloud holograms through GPU implementation of compressed sensing Gerchberg-Saxton algorithm

Paolo Pozzi, Jonathan Mapelli

Phase-only spatial light modulators can be employed to structure laser light in complex three dimensional focusing patterns, with a variety of applications. While spatial light modulators have typical refresh frequencies of tens of Hz, the computation time of three dimensional holograms ranges between a few seconds and a few minutes, therefore limiting the use of the maximum refresh rate of spatial light modulators to either pre-calculated sequences of high quality holograms, or low quality holograms only for real time update. Here, we propose the implementation of a recently developed compressed sensing Gerchberg-Saxton algorithm on a consumer graphical processor allowing the generation of high quality holograms at video rate.

en eess.IV, physics.optics
arXiv Open Access 2020
Using Deep Convolutional Neural Networks to Diagnose COVID-19 From Chest X-Ray Images

Yi Zhong

The COVID-19 epidemic has become a major safety and health threat worldwide. Imaging diagnosis is one of the most effective ways to screen COVID-19. This project utilizes several open-source or public datasets to present an open-source dataset of COVID-19 CXRs, named COVID-19-CXR-Dataset, and introduces a deep convolutional neural network model. The model validates on 740 test images and achieves 87.3% accuracy, 89.67 % precision, and 84.46% recall, and correctly classifies 98 out of 100 COVID-19 x-ray images in test set with more than 81% prediction probability under the condition of 95% confidence interval. This project may serve as a reference for other researchers aiming to advance the development of deep learning applications in medical imaging.

en eess.IV, cs.CV
arXiv Open Access 2020
High Definition image classification in Geoscience using Machine Learning

Yajun An, Zachary Golden, Tarka Wilcox et al.

High Definition (HD) digital photos taken with drones are widely used in the study of Geoscience. However, blurry images are often taken in collected data, and it takes a lot of time and effort to distinguish clear images from blurry ones. In this work, we apply Machine learning techniques, such as Support Vector Machine (SVM) and Neural Network (NN) to classify HD images in Geoscience as clear and blurry, and therefore automate data cleaning in Geoscience. We compare the results of classification based on features abstracted from several mathematical models. Some of the implementation of our machine learning tool is freely available at: https://github.com/zachgolden/geoai.

en eess.IV, cs.CV
arXiv Open Access 2020
Poisson Image Deconvolution by a Plug-and-Play Quantum Denoising Scheme

Sayantan Dutta, Adrian Basarab, Bertrand Georgeot et al.

This paper introduces a new Plug-and-Play (PnP) alternating direction of multipliers (ADMM) scheme based on a recently proposed denoiser using the Schroedinger equation's solutions of quantum physics. The efficiency of the proposed algorithm is evaluated for Poisson image deconvolution, which is very common for imaging applications, such as, for example, limited photon acquisition. Numerical results show the superiority of the proposed scheme compared to recent state-of-the-art techniques, for both low and high signal-to-noise-ratio scenarios. This performance gain is mostly explained by the flexibility of the embedded quantum denoiser for different types of noise affecting the observations.

en eess.IV, eess.SP

Halaman 22 dari 38639