Users prefer Jpegli over same-sized libjpeg-turbo or MozJPEG
Martin Bruse, Luca Versari, Zoltan Szabadka
et al.
We performed pairwise comparisons by human raters of JPEG images from MozJPEG, libjpeg-turbo and our new Jpegli encoder. When compressing images at a quality similar to libjpeg-turbo quality 95, the Jpegli images were 54% likely to be preferred over both libjpeg-turbo and MozJPEG images, but used only 2.8 bits per pixel compared to libjpeg-turbo and MozJPEG that used 3.8 and 3.5 bits per pixel respectively. The raw ratings and source images are publicly available for further analysis and study.
End-to-End Optimization of JPEG-Based Deep Learning Process for Image Classification
Siyu Qi, Lahiru D. Chamain, Zhi Ding
Among major deep learning (DL) applications, distributed learning involving image classification require effective image compression codecs deployed on low-cost sensing devices for efficient transmission and storage. Traditional codecs such as JPEG designed for perceptual quality are not configured for DL tasks. This work introduces an integrative end-to-end trainable model for image compression and classification consisting of a JPEG image codec and a DL-based classifier. We demonstrate how this model can optimize the widely deployed JPEG codec settings to improve classification accuracy in consideration of bandwidth constraint. Our tests on CIFAR-100 and ImageNet also demonstrate improved validation accuracy over preset JPEG configuration.
Stochastic Super-Resolution For Gaussian Textures
Emile Pierret, Bruno Galerne
Super-resolution (SR) is an ill-posed inverse problem which consists in proposing high-resolution images consistent with a given low-resolution one. While most SR algorithms are deterministic, stochastic SR deals with designing a stochastic sampler generating any realistic SR solution. The goal of this paper is to show that stochastic SR is a well-posed and solvable problem when restricting to Gaussian stationary textures. Using Gaussian conditional sampling and exploiting the stationarity assumption, we propose an efficient algorithm based on fast Fourier transform. We also demonstrate the practical relevance of the approach for SR with a reference image. Although limited to stationary microtextures, our approach compares favorably in terms of speed and visual quality to some state of the art methods designed for a larger class of images.
SubZero: Subspace Zero-Shot MRI Reconstruction
Heng Yu, Yamin Arefeen, Berkin Bilgic
Recently introduced zero-shot self-supervised learning (ZS-SSL) has shown potential in accelerated MRI in a scan-specific scenario, which enabled high-quality reconstructions without access to a large training dataset. ZS-SSL has been further combined with the subspace model to accelerate 2D T2-shuffling acquisitions. In this work, we propose a parallel network framework and introduce an attention mechanism to improve subspace-based zero-shot self-supervised learning and enable higher acceleration factors. We name our method SubZero and demonstrate that it can achieve improved performance compared with current methods in T1 and T2 mapping acquisitions.
Gaussian Blur and Relative Edge Response
Austin C. Bergstrom, David Conran, David W. Messinger
It is often convenient to use Gaussian blur in studying image quality or in data augmentation pipelines for training convoluional neural networks. Because of their convenience, Guassians are sometimes used as first order approximations of optical point spread functions. Here, we derive and evaluate closed form relationships between Gaussian blur parameters and relative edge response, finding good agreement with measured results. Additionally, we evaluate the extent to which Gaussian approximations of optical point spread functions can be used to predict relative edge response, finding that Gaussian relationships provide a reasonable approximation in limited circumstances but not across a wide range of optical parameters.
en
eess.IV, physics.optics
INR-LDDMM: Fluid-based Medical Image Registration Integrating Implicit Neural Representation and Large Deformation Diffeomorphic Metric Mapping
Chulong Zhang, Xiaokun Liang
We propose a fluid-based registration framework of medical images based on implicit neural representation. By integrating implicit neural representation and Large Deformable Diffeomorphic Metric Mapping (LDDMM), we employ a Multilayer Perceptron (MLP) as a velocity generator while optimizing velocity and image similarity. Moreover, we adopt a coarse-to-fine approach to address the challenge of deformable-based registration methods dropping into local optimal solutions, thus aiding the management of significant deformations in medical image registration. Our algorithm has been validated on a paired CT-CBCT dataset of 50 patients,taking the Dice coefficient of transferred annotations as an evaluation metric. Compared to existing methods, our approach achieves the state-of-the-art performance.
Joint denoising and HDR for RAW video sequences
A. Buades, O. Martorell, M. Sánchez-Beeckman
We propose a patch-based method for the simultaneous denoising and fusion of a sequence of RAW multi-exposed images. A spatio-temporal criterion is used to select similar patches along the sequence, and a weighted principal component analysis permits to both denoise and fuse the multi exposed data. The overall strategy permits to denoise and fuse the set of images without the need of recovering each denoised image in the multi-exposure set, leading to a very efficient procedure. Several experiments show that the proposed method permits to obtain state-of-the-art fusion results with real RAW data.
Improved Wavelets for Image Compression from Unitary Circuits
James C. McCord, Glen Evenbly
We benchmark the efficacy of several novel orthogonal, symmetric, dilation-3 wavelets, derived from a unitary circuit based construction, towards image compression. The performance of these wavelets is compared across several photo databases against the CDF-9/7 wavelets in terms of the minimum number of non-zero wavelet coefficients needed to obtain a specified image quality, as measured by the multi-scale structural similarity index (MS-SSIM). The new wavelets are found to consistently offer better compression efficiency than the CDF-9/7 wavelets across a broad range of image resolutions and quality requirements, averaging 7-8% improved compression efficiency on high-resolution photo images when high-quality (MS-SSIM = 0.99) is required.
Multi-Modality Image Inpainting using Generative Adversarial Networks
Aref Abedjooy, Mehran Ebrahimi
Deep learning techniques, especially Generative Adversarial Networks (GANs) have significantly improved image inpainting and image-to-image translation tasks over the past few years. To the best of our knowledge, the problem of combining the image inpainting task with the multi-modality image-to-image translation remains intact. In this paper, we propose a model to address this problem. The model will be evaluated on combined night-to-day image translation and inpainting, along with promising qualitative and quantitative results.
Frame-type Sensitive RDO Control for Content-Adaptive-encoding
Vibhoothi, François Pitié, Anil Kokaram
Video transcoding is an increasingly important application in the streaming media industry. It has become important to investigate the optimisation of transcoder parameters for a single clip simply because of the immense number of playbacks for popular clips. In this paper, we explore the use of a canned optimiser to estimate the optimal RD tradeoff achievable for a particular clip. We show that by adjusting the Lagrange multiplier in RD optimisation on keyframes alone we can achieve more than 10$\times$ the previous BD-Rate gains possible without affecting quality for any operating point.
ORB-based SLAM accelerator on SoC FPGA
Vibhakar Vemulapati, Deming Chen
Simultaneous Localization and Mapping (SLAM) is one of the main components of autonomous navigation systems. With the increase in popularity of drones, autonomous navigation on low-power systems is seeing widespread application. Most SLAM algorithms are computationally intensive and struggle to run in real-time on embedded devices with reasonable accuracy. ORB-SLAM is an open-sourced feature-based SLAM that achieves high accuracy with reduced computational complexity. We propose an SoC based ORB-SLAM system that accelerates the computationally intensive visual feature extraction and matching on hardware. Our FPGA system based on a Zynq-family SoC runs 8.5x, 1.55x and 1.35x faster compared to an ARM CPU, Intel Desktop CPU, and a state-of-the-art FPGA system respectively, while averaging a 2x improvement in accuracy compared to prior work on FPGA.
Recognition of Cardiac MRI Orientation via Deep Neural Networks and a Method to Improve Prediction Accuracy
Houxin Zhou
In most medical image processing tasks, the orientation of an image would affect computing result. However, manually reorienting images wastes time and effort. In this paper, we study the problem of recognizing orientation in cardiac MRI and using deep neural network to solve this problem. For multiple sequences and modalities of MRI, we propose a transfer learning strategy, which adapts our proposed model from a single modality to multiple modalities. We also propose a prediction method that uses voting. The results shows that deep neural network is an effective way in recognition of cardiac MRI orientation and the voting prediction method could improve accuracy.
Segmentation of 3D Dental Images Using Deep Learning
Omar Boudraa
3D image segmentation is a recent and crucial step in many medical analysis and recognition schemes. In fact, it represents a relevant research subject and a fundamental challenge due to its importance and influence. This paper provides a multi-phase Deep Learning-based system that hybridizes various efficient methods in order to get the best 3D segmentation output. First, to reduce the amount of data and accelerate the processing time, the application of Decimate compression technique is suggested and justified. We then use a CNN model to segment dental images into fifteen separated classes. In the end, a special KNN-based transformation is applied for the purpose of removing isolated meshes and of correcting dental forms. Experimentations demonstrate the precision and the robustness of the selected framework applied to 3D dental images within a private clinical benchmark.
Super-resolved multi-temporal segmentation with deep permutation-invariant networks
Diego Valsesia, Enrico Magli
Multi-image super-resolution from multi-temporal satellite acquisitions of a scene has recently enjoyed great success thanks to new deep learning models. In this paper, we go beyond classic image reconstruction at a higher resolution by studying a super-resolved inference problem, namely semantic segmentation at a spatial resolution higher than the one of sensing platform. We expand upon recently proposed models exploiting temporal permutation invariance with a multi-resolution fusion module able to infer the rich semantic information needed by the segmentation task. The model presented in this paper has recently won the AI4EO challenge on Enhanced Sentinel 2 Agriculture.
Multiple Selection Approximation for Improved Spatio-Temporal Prediction in Video Coding
Jürgen Seiler, André Kaup
In this contribution, a novel spatio-temporal prediction algorithm for video coding is introduced. This algorithm exploits temporal as well as spatial redundancies for effectively predicting the signal to be encoded. To achieve this, the algorithm operates in two stages. Initially, motion compensated prediction is applied on the block being encoded. Afterwards this preliminary temporal prediction is refined by forming a joint model of the initial predictor and the spatially adjacent already transmitted blocks. The novel algorithm is able to outperform earlier refinement algorithms in speed and prediction quality. Compared to pure motion compensated prediction, the mean data rate can be reduced by up to 15% and up to 1.16 dB gain in PSNR can be achieved for the considered sequences.
Despeckling Sentinel-1 GRD images by deep learning and application to narrow river segmentation
Nicolas Gasnier, Emanuele Dalsasso, Loïc Denis
et al.
This paper presents a despeckling method for Sentinel-1 GRD images based on the recently proposed framework "SAR2SAR": a self-supervised training strategy. Training the deep neural network on collections of Sentinel 1 GRD images leads to a despeckling algorithm that is robust to space-variant spatial correlations of speckle. Despeckled images improve the detection of structures like narrow rivers. We apply a detector based on exogenous information and a linear features detector and show that rivers are better segmented when the processing chain is applied to images pre-processed by our despeckling neural network.
Image Segmentation, Compression and Reconstruction from Edge Distribution Estimation with Random Field and Random Cluster Theories
Robert A. Murphy
Random field and random cluster theory are used to describe certain mathematical results concerning the probability distribution of image pixel intensities characterized as generic $2D$ integer arrays. The size of the smallest bounded region within an image is estimated for segmenting an image, from which, the equilibrium distribution of intensities can be recovered. From the estimated bounded regions, properties of the sub-optimal and equilibrium distributions of intensities are derived, which leads to an image compression methodology whereby only slightly more than half of all pixels are required for a worst-case reconstruction of the original image. A custom deep belief network and heuristic allows for the unsupervised segmentation, detection and localization of objects in an image. An example illustrates the mathematical results.
Deep Curriculum Learning for PolSAR Image Classification
Hamidreza Mousavi, Maryam Imani, Hassan Ghassemian
Following the great success of curriculum learning in the area of machine learning, a novel deep curriculum learning method proposed in this paper, entitled DCL, particularly for the classification of fully polarimetric synthetic aperture radar (PolSAR) data. This method utilizes the entropy-alpha target decomposition method to estimate the degree of complexity of each PolSAR image patch before applying it to the convolutional neural network (CNN). Also, an accumulative mini-batch pacing function is used to introduce more difficult patches to CNN.Experiments on the widely used data set of AIRSAR Flevoland reveal that the proposed curriculum learning method can not only increase classification accuracy but also lead to faster training convergence.
i-RIM applied to the fastMRI challenge
Patrick Putzky, Dimitrios Karkalousos, Jonas Teuwen
et al.
We, team AImsterdam, summarize our submission to the fastMRI challenge (Zbontar et al., 2018). Our approach builds on recent advances in invertible learning to infer models as presented in Putzky and Welling (2019). Both, our single-coil and our multi-coil model share the same basic architecture.
Implementation Of Digital Image Processing And Computation Technology On Measurement And Testing Of Various Knit Fabric Parameters
Andrian Wijayono, Valentinus Galih Vidia Putra
One of the big challenges of the industry today is how to produce quality products, one of which is in the knit fabric industry. The improvement of the evaluation and quality control processes of non woven production has been widely developed to support the improvement of the quality of production. The use of information and computational technology is now widely applied to the quality control process of textile material production, one of which is the use of image processing technology in the evaluation process of knit fabric. This chapter will explain various methods of applying image processing technology in the field of evaluation and quality control of textile production.