A Novel Real-Time Full-Color 3D Holographic (Diffractive) Video Capture, Processing, and Transmission Pipeline Using Off-The-Shelf Hardware
Ankur Samanta, Gregor Mackenzie, Tyler Rathkamp
et al.
This paper details the world's first live 3D holographic (diffractive) video call using off-the-shelf hardware. We introduce a novel pipeline that facilitates the capture, processing, and transmission of RGBZ data, using an iPhone for image and depth capture with VividQ's SDK for hologram generation and hardware for display.
ResNet101 and DAE for Enhance Quality and Classification Accuracy in Skin Cancer Imaging
Sibasish Dhibar
Skin cancer is a crucial health issue that requires timely detection for higher survival rates. Traditional computer vision techniques face challenges in addressing the advanced variability of skin lesion features, a gap partially bridged by convolutional neural networks (CNNs). To overcome the existing issues, we introduce an innovative convolutional ensemble network approach named deep autoencoder (DAE) with ResNet101. This method utilizes convolution-based deep neural networks for the detection of skin cancer. The ISIC-2018 public data taken from the source is used for experimental results, which demonstrate remarkable performance with the different in terms of performance metrics. The methods result in 96.03% of accuracy, 95.40 % of precision, 96.05% of recall, 0.9576 of F-measure, 0.98 of AUC.
Computer aided diagnosis system for Alzheimers disease using principal component analysis and machine learning based approaches
Lilia Lazli
Alzheimers disease (AD) is a severe neurological brain disorder. It is not curable, but earlier detection can help improve symptoms in a great deal. The machine learning based approaches are popular and well motivated models for medical image processing tasks such as computer-aided diagnosis. These techniques can improve the process for accurate diagnosis of AD. In this paper, we investigate the performance of these techniques for AD detection and classification using brain MRI and PET images from the OASIS database. The proposed system takes advantage of the artificial neural network and support vector machines as classifiers, and principal component analysis as a feature extraction technique. The results indicate that the combined scheme achieves good accuracy and offers a significant advantage over the other approaches.
IOI: Invisible One-Iteration Adversarial Attack on No-Reference Image- and Video-Quality Metrics
Ekaterina Shumitskaya, Anastasia Antsiferova, Dmitriy Vatolin
No-reference image- and video-quality metrics are widely used in video processing benchmarks. The robustness of learning-based metrics under video attacks has not been widely studied. In addition to having success, attacks that can be employed in video processing benchmarks must be fast and imperceptible. This paper introduces an Invisible One-Iteration (IOI) adversarial attack on no reference image and video quality metrics. We compared our method alongside eight prior approaches using image and video datasets via objective and subjective tests. Our method exhibited superior visual quality across various attacked metric architectures while maintaining comparable attack success and speed. We made the code available on GitHub: https://github.com/katiashh/ioi-attack.
A Tutorial on Explainable Image Classification for Dementia Stages Using Convolutional Neural Network and Gradient-weighted Class Activation Mapping
Kevin Kam Fung Yuen
This paper presents a tutorial of an explainable approach using Convolutional Neural Network (CNN) and Gradient-weighted Class Activation Mapping (Grad-CAM) to classify four progressive dementia stages based on open MRI brain images. The detailed implementation steps are demonstrated with an explanation. Whilst the proposed CNN architecture is demonstrated to achieve more than 99% accuracy for the test dataset, the computational procedure of CNN remains a black box. The visualisation based on Grad-CAM is attempted to explain such very high accuracy and may provide useful information for physicians. Future motivation based on this work is discussed.
Quantum Implicit Neural Compression
Takuya Fujihashi, Toshiaki Koike-Akino
Signal compression based on implicit neural representation (INR) is an emerging technique to represent multimedia signals with a small number of bits. While INR-based signal compression achieves high-quality reconstruction for relatively low-resolution signals, the accuracy of high-frequency details is significantly degraded with a small model. To improve the compression efficiency of INR, we introduce quantum INR (quINR), which leverages the exponentially rich expressivity of quantum neural networks for data compression. Evaluations using some benchmark datasets show that the proposed quINR-based compression could improve rate-distortion performance in image compression compared with traditional codecs and classic INR-based coding methods, up to 1.2dB gain.
Automated Optical Reading of Scanned ECGs
Manuel Pazos-Santomé, Fernando Martín-Rodríguez, Mónica Fernández-Barciela
Electrocardiogram (ECG) is a valuable tool for medical diagnosis used worldwide. Its use has contributed significantly to the prevention of cardiovascular diseases including infarctions. Although physicians need to see the printed curves for a diagnosis, nowadays there exist automated tools based on machine learning that can help diagnosis of arrhythmias and other pathologies, these tools operate on digitalized ECG data that are merely one-dimensional discrete signals (a kind of information that is much similar to digitized audio). Thus, it is interesting to have both the graphical information and the digitized data. This is possible with modern, digital equipment. Nevertheless, there still exist many analog electrocardiogram machines that plot results on paper with a printed gris measured in millimeters. This paper presents a novel image analysis method that is capable of reading a printed ECG and converting it into a sampled digital signal.
Spatially-varying Regularization with Conditional Transformer for Unsupervised Image Registration
Junyu Chen, Yihao Liu, Yufan He
et al.
In the past, optimization-based registration models have used spatially-varying regularization to account for deformation variations in different image regions. However, deep learning-based registration models have mostly relied on spatially-invariant regularization. Here, we introduce an end-to-end framework that uses neural networks to learn a spatially-varying deformation regularizer directly from data. The hyperparameter of the proposed regularizer is conditioned into the network, enabling easy tuning of the regularization strength. The proposed method is built upon a Transformer-based model, but it can be readily adapted to any network architecture. We thoroughly evaluated the proposed approach using publicly available datasets and observed a significant performance improvement while maintaining smooth deformation. The source code of this work will be made available after publication.
Weighted Anisotropic-Isotropic Total Variation for Poisson Denoising
Kevin Bui, Yifei Lou, Fredrick Park
et al.
Poisson noise commonly occurs in images captured by photon-limited imaging systems such as in astronomy and medicine. As the distribution of Poisson noise depends on the pixel intensity value, noise levels vary from pixels to pixels. Hence, denoising a Poisson-corrupted image while preserving important details can be challenging. In this paper, we propose a Poisson denoising model by incorporating the weighted anisotropic-isotropic total variation (AITV) as a regularization. We then develop an alternating direction method of multipliers with a combination of a proximal operator for an efficient implementation. Lastly, numerical experiments demonstrate that our algorithm outperforms other Poisson denoising methods in terms of image quality and computational efficiency.
Transformer-based Variable-rate Image Compression with Region-of-interest Control
Chia-Hao Kao, Ying-Chieh Weng, Yi-Hsin Chen
et al.
This paper proposes a transformer-based learned image compression system. It is capable of achieving variable-rate compression with a single model while supporting the region-of-interest (ROI) functionality. Inspired by prompt tuning, we introduce prompt generation networks to condition the transformer-based autoencoder of compression. Our prompt generation networks generate content-adaptive tokens according to the input image, an ROI mask, and a rate parameter. The separation of the ROI mask and the rate parameter allows an intuitive way to achieve variable-rate and ROI coding simultaneously. Extensive experiments validate the effectiveness of our proposed method and confirm its superiority over the other competing methods.
Convergent ADMM Plug and Play PET Image Reconstruction
Florent Sureau, Mahdi Latreche, Marion Savanier
et al.
In this work, we investigate hybrid PET reconstruction algorithms based on coupling a model-based variational reconstruction and the application of a separately learnt Deep Neural Network operator (DNN) in an ADMM Plug and Play framework. Following recent results in optimization, fixed point convergence of the scheme can be achieved by enforcing an additional constraint on network parameters during learning. We propose such an ADMM algorithm and show in a realistic [18F]-FDG synthetic brain exam that the proposed scheme indeed lead experimentally to convergence to a meaningful fixed point. When the proposed constraint is not enforced during learning of the DNN, the proposed ADMM algorithm was observed experimentally not to converge.
3D-U-SAM Network For Few-shot Tooth Segmentation in CBCT Images
Yifu Zhang, Zuozhu Liu, Yang Feng
et al.
Accurate representation of tooth position is extremely important in treatment. 3D dental image segmentation is a widely used method, however labelled 3D dental datasets are a scarce resource, leading to the problem of small samples that this task faces in many cases. To this end, we address this problem with a pretrained SAM and propose a novel 3D-U-SAM network for 3D dental image segmentation. Specifically, in order to solve the problem of using 2D pre-trained weights on 3D datasets, we adopted a convolution approximation method; in order to retain more details, we designed skip connections to fuse features at all levels with reference to U-Net. The effectiveness of the proposed method is demonstrated in ablation experiments, comparison experiments, and sample size experiments.
Technical description of the EPFL submission to the JPEG DNA CfP
Davi Lazzarotto, Jorge Encinas Ramos, Michela Testolina
et al.
This document provides a technical description of the codec proposed by EPFL to the JPEG DNA Call for Proposals. The codec we refer to as V-DNA for its versatility, enables the encoding of raw images and already compressed JPEG 1 bitstreams, but the underlying algorithm could be used to encode and transcode any kind of data. The codec is composed of two main modules: the image compression module, handled by the state-of-the-art JPEG XL codec, and the DNA encoding module, implemented using a modified Raptor Code implementation following the RU10 (Raptor Unsystematic) description. The code for encoding and decoding, as well as the objective metrics results, plots and biochemical constraints analysis are available on ISO Documents system with document number WG1M101013-ICQ-EPFL submission to the JPEG DNA CfP.
Component-wise Power Estimation of Electrical Devices Using Thermal Imaging
Christian Herglotz, Simon Grosche, Akarsh Bharadwaj
et al.
This paper presents a novel method to estimate the power consumption of distinct active components on an electronic carrier board by using thermal imaging. The components and the board can be made of heterogeneous material such as plastic, coated microchips, and metal bonds or wires, where a special coating for high emissivity is not required. The thermal images are recorded when the components on the board are dissipating power. In order to enable reliable estimates, a segmentation of the thermal image must be available that can be obtained by manual labeling, object detection methods, or exploiting layout information. Evaluations show that with low-resolution consumer infrared cameras and dissipated powers larger than 300mW, mean estimation errors of 10% can be achieved.
Spatio-Temporal Perception-Distortion Trade-off in Learned Video SR
Nasrin Rahimi, A. Murat Tekalp
Perception-distortion trade-off is well-understood for single-image super-resolution. However, its extension to video super-resolution (VSR) is not straightforward, since popular perceptual measures only evaluate naturalness of spatial textures and do not take naturalness of flow (temporal coherence) into account. To this effect, we propose a new measure of spatio-temporal perceptual video quality emphasizing naturalness of optical flow via the perceptual straightness hypothesis (PSH) for meaningful spatio-temporal perception-distortion trade-off. We also propose a new architecture for perceptual VSR (PSVR) to explicitly enforce naturalness of flow to achieve realistic spatio-temporal perception-distortion trade-off according to the proposed measures. Experimental results with PVSR support the hypothesis that a meaningful perception-distortion tradeoff for video should account for the naturalness of motion in addition to naturalness of texture.
Joint Multi-Echo/Respiratory Motion-Resolved Compressed Sensing Reconstruction of Free-Breathing Non-Cartesian Abdominal MRI
Youngwook Kee, MungSoo Kang, Seongho Jeong
et al.
We propose a novel respiratory motion-resolved MR image reconstruction method that jointly treats multi-echo k-space raw data. Continuously acquired non-Cartesian multi-echo/multi-coil k-space data with free breathing are sorted/binned into the motion states from end-expiratory to end-inspiratory phases based on a respiratory motion signal. Temporal total variation applied to the motion state dimension of each echo is then coupled in the $\ell_2$ sense for joint reconstruction of the multiple echoes. Reconstructed source images of the proposed method are compared with conventional echo-by-echo motion-resolved reconstruction, and R2* of the proposed and echo-by-echo methods are compared with respect to a clinical reference. We demonstrate that inconsistency between echoes is successfully suppressed in the proposed joint reconstruction method, producing high-quality source images and R2* measurements compared to clinical reference.
Low-Dose CT Using Denoising Diffusion Probabilistic Model for 20$\times$ Speedup
Wenjun Xia, Qing Lyu, Ge Wang
Low-dose computed tomography (LDCT) is an important topic in the field of radiology over the past decades. LDCT reduces ionizing radiation-induced patient health risks but it also results in a low signal-to-noise ratio (SNR) and a potential compromise in the diagnostic performance. In this paper, to improve the LDCT denoising performance, we introduce the conditional denoising diffusion probabilistic model (DDPM) and show encouraging results with a high computational efficiency. Specifically, given the high sampling cost of the original DDPM model, we adapt the fast ordinary differential equation (ODE) solver for a much-improved sampling efficiency. The experiments show that the accelerated DDPM can achieve 20x speedup without compromising image quality.
On Advances, Challenges and Potentials of Remote Sensing Image Analysis in Marine Debris and Suspected Plastics Monitoring
Oktay Karakuş
Marine plastic pollution is an emerging environmental problem since it pollutes the ocean, air and food whilst endangering the ocean wildlife via the ingestion and entanglements. During the last decade, an enormous effort has been spent on finding possible solutions to marine plastic pollution. Remote sensing imagery sits in a crucial place for these efforts since it provides informative earth observation products, and the current technology offers further essential development. Despite the advances in the last decade, there is still a way to go for marine plastic monitoring research where challenges are rarely highlighted. This paper contributes to the literature with a critical review and aims to highlight literature milestones in marine debris and suspected plastics (MD&SP) monitoring by promoting the computational imaging methodology behind these approaches along with detailed discussions on challenges and potential future research directions.
Illumination-Based Color Reconstruction for the Dynamic Vision Sensor
Khen Cohen, Omer Hershko, Homer Levy
et al.
This work demonstrates a novel, state of the art method to reconstruct colored images via the Dynamic Vision Sensor (DVS). The DVS is an image sensor that indicates only a binary change in brightness, with no information about the captured wavelength (color), or intensity level. We present a novel method to reconstruct a full spatial resolution colored image with the DVS and an active colored light source. We analyze the DVS response and present two reconstruction algorithms: Linear based and Convolutional Neural Network Based. In addition, we demonstrate our algorithm robustness to changes in environmental conditions such as illumination and distance. Finally, comparing with previous works, we show how we reach the state of the art results.
Near-Lossless Deep Feature Compression for Collaborative Intelligence
Hyomin Choi, Ivan V. Bajic
Collaborative intelligence is a new paradigm for efficient deployment of deep neural networks across the mobile-cloud infrastructure. By dividing the network between the mobile and the cloud, it is possible to distribute the computational workload such that the overall energy and/or latency of the system is minimized. However, this necessitates sending deep feature data from the mobile to the cloud in order to perform inference. In this work, we examine the differences between the deep feature data and natural image data, and propose a simple and effective near-lossless deep feature compressor. The proposed method achieves up to 5% bit rate reduction compared to HEVC-Intra and even more against other popular image codecs. Finally, we suggest an approach for reconstructing the input image from compressed deep features in the cloud, that could serve to supplement the inference performed by the deep model.