K-Space Transformer for Undersampled MRI Reconstruction
Ziheng Zhao, Tianjiao Zhang, Weidi Xie
et al.
This paper considers the problem of undersampled MRI reconstruction. We propose a novel Transformer-based framework for directly processing signal in k-space, going beyond the limitation of regular grids as ConvNets do. We adopt an implicit representation of k-space spectrogram, treating spatial coordinates as inputs, and dynamically query the sparsely sampled points to reconstruct the spectrogram, i.e. learning the inductive bias in k-space. To strike a balance between computational cost and reconstruction quality, we build the decoder with hierarchical structure to generate low-resolution and high-resolution outputs respectively. To validate the effectiveness of our proposed method, we have conducted extensive experiments on two public datasets, and demonstrate superior or comparable performance to state-of-the-art approaches.
Joint cardiac $T_1$ mapping and cardiac function estimation using a deep manifold framework
Qing Zou, Mathews Jacob
In this work, we proposed a continuous-acquisition strategy using a gradient echo (GRE) inversion recovery sequence based on spiral trajectories to simultaneously obtain the $T_1$ mapping and CINE imaging. The acquisition is using a free-breathing and ungated fashion. An approach based on variational auto-encoder(VAE) is used for the motion estimation from the centered k-space data. The motion signal is then used to train a deep manifold reconstruction algorithm for image reconstruction. Once the network is trained, we can excite the latent vectors (the estimated motion signals and the contrast signal) in any way as we wanted to generate the image frames in the time series. We can estimate the $T_1$ mapping using the generated image frames where only contrast is varying. We can also generate the breath-hold CINE in different contrast.
Enhancing VVC with Deep Learning based Multi-Frame Post-Processing
Duolikun Danier, Chen Feng, Fan Zhang
et al.
This paper describes a CNN-based multi-frame post-processing approach based on a perceptually-inspired Generative Adversarial Network architecture, CVEGAN. This method has been integrated with the Versatile Video Coding Test Model (VTM) 15.2 to enhance the visual quality of the final reconstructed content. The evaluation results on the CLIC 2022 validation sequences show consistent coding gains over the original VVC VTM at the same bitrates when assessed by PSNR. The integrated codec has been submitted to the Challenge on Learned Image Compression (CLIC) 2022 (video track), and the team name associated with this submission is BVI_VC.
Compound Multi-branch Feature Fusion for Real Image Restoration
Chi-Mao Fan, Tsung-Jung Liu, Kuan-Hsien Liu
Image restoration is a challenging and ill-posed problem which also has been a long-standing issue. However, most of learning based restoration methods are proposed to target one degradation type which means they are lack of generalization. In this paper, we proposed a multi-branch restoration model inspired from the Human Visual System (i.e., Retinal Ganglion Cells) which can achieve multiple restoration tasks in a general framework. The experiments show that the proposed multi-branch architecture, called CMFNet, has competitive performance results on four datasets, including image dehazing, deraindrop, and deblurring, which are very common applications for autonomous cars. The source code and pretrained models of three restoration tasks are available at https://github.com/FanChiMao/CMFNet.
Sparse Video Representation Using Steered Mixture-of-Experts With Global Motion Compensation
Rolf Jongebloed, Erik Bochinski, Thomas Sikora
Steered-Mixtures-of Experts (SMoE) present a unified framework for sparse representation and compression of image data with arbitrary dimensionality. Recent work has shown great improvements in the performance of such models for image and light-field representation. However, for the case of videos the straight-forward application yields limited success as the SMoE framework leads to a piece-wise linear representation of the underlying imagery which is disrupted by nonlinear motion. We incorporate a global motion model into the SMoE framework which allows for higher temporal steering of the kernels. This drastically increases its capabilities to exploit correlations between adjacent frames by only adding 2 to 8 motion parameters per frame to the model but decreasing the required amount of kernels on average by 54.25%, respectively, while maintaining the same reconstruction quality yielding higher compression gains.
Dynamic Background Subtraction by Generative Neural Networks
Fateme Bahri, Nilanjan Ray
Background subtraction is a significant task in computer vision and an essential step for many real world applications. One of the challenges for background subtraction methods is dynamic background, which constitute stochastic movements in some parts of the background. In this paper, we have proposed a new background subtraction method, called DBSGen, which uses two generative neural networks, one for dynamic motion removal and another for background generation. At the end, the foreground moving objects are obtained by a pixel-wise distance threshold based on a dynamic entropy map. The proposed method has a unified framework that can be optimized in an end-to-end and unsupervised fashion. The performance of the method is evaluated over dynamic background sequences and it outperforms most of state-of-the-art methods. Our code is publicly available at https://github.com/FatemeBahri/DBSGen.
Off-resonance artifact correction for magnetic resonance imaging: a review
Melissa W. Haskell, Jon-Fredrik Nielsen, Douglas C. Noll
In magnetic resonance imaging (MRI), inhomogeneity in the main magnetic field used for imaging, referred to as off-resonance, can lead to image artifacts ranging from mild to severe depending on the application. Off-resonance artifacts, such as signal loss, geometric distortions, and blurring, can compromise the clinical and scientific utility of MR images. In this review, we describe sources of off-resonance in MRI, how off-resonance affects images, and strategies to prevent and correct for off-resonance. Given recent advances and the great potential of low field and/or portable MRI, we also highlight the advantages and challenges of imaging at low field with respect to off-resonance.
en
eess.IV, physics.med-ph
Multi-Modality Image Super-Resolution using Generative Adversarial Networks
Aref Abedjooy, Mehran Ebrahimi
Over the past few years deep learning-based techniques such as Generative Adversarial Networks (GANs) have significantly improved solutions to image super-resolution and image-to-image translation problems. In this paper, we propose a solution to the joint problem of image super-resolution and multi-modality image-to-image translation. The problem can be stated as the recovery of a high-resolution image in a modality, given a low-resolution observation of the same image in an alternative modality. Our paper offers two models to address this problem and will be evaluated on the recovery of high-resolution day images given low-resolution night images of the same scene. Promising qualitative and quantitative results will be presented for each model.
Resolution enhancement of placenta histological images using deep learning
Arash Rabbani, Masoud Babaei
In this study, a method has been developed to improve the resolution of histological human placenta images. For this purpose, a paired series of high- and low-resolution images have been collected to train a deep neural network model that can predict image residuals required to improve the resolution of the input images. A modified version of the U-net neural network model has been tailored to find the relationship between the low resolution and residual images. After training for 900 epochs on an augmented dataset of 1000 images, the relative mean squared error of 0.003 is achieved for the prediction of 320 test images. The proposed method has not only improved the contrast of the low-resolution images at the edges of cells but added critical details and textures that mimic high-resolution images of placenta villous space.
DCNNV-19: A Deep Convolutional Neural Network for COVID-19 Detection in Chest Computed Tomographies
Victor Felipe Reis-Silva
This technical report proposes the use of a deep convolutional neural network as a preliminary diagnostic method in the analysis of chest computed tomography images from patients with symptoms of Severe Acute Respiratory Syndrome (SARS) and suspected COVID-19 disease, especially on occasions when the delay of the RT-PCR result and the absence of urgent care could result in serious temporary, long-term, or permanent health damage. The model was trained on 83,391 images, validated on 15,297, and tested on 22,185 figures, achieving an F1-Score of 98%, 97.59% in Cohen's Kappa, 98.4% in Accuracy, and 5.09% in Loss. Attesting a highly accurate automated classification and providing results in less time than the current gold-standard exam, Real-Time reverse-transcriptase Polymerase Chain Reaction (RT-PCR).
PARSE challenge 2022: Pulmonary Arteries Segmentation using Swin U-Net Transformer(Swin UNETR) and U-Net
Akansh Maurya, Kunal Dashrath Patil, Rohan Padhy
et al.
In this work, we present our proposed method to segment the pulmonary arteries from the CT scans using Swin UNETR and U-Net-based deep neural network architecture. Six models, three models based on Swin UNETR, and three models based on 3D U-net with residual units were ensemble using a weighted average to make the final segmentation masks. Our team achieved a multi-level dice score of 84.36 percent through this method. The code of our work is available on the following link: https://github.com/akansh12/parse2022. This work is part of the MICCAI PARSE 2022 challenge.
Triple Motion Estimation and Frame Interpolation based on Adaptive Threshold for Frame Rate Up-Conversion
Hanieh Naderi, Mohammad Rahmati
In this paper, we propose a novel motion-compensated frame rate up-conversion (MC-FRUC) algorithm. The proposed algorithm creates interpolated frames by first estimating motion vectors using unilateral (jointing forward and backward) and bilateral motion estimation. Then motion vectors are combined based on adaptive threshold, in order to creates high-quality interpolated frames and reduce block artifacts. Since motion-compensated frame interpolation along unilateral motion trajectories yields holes, a new algorithm is introduced to resolve this problem. The experimental results show that the quality of the interpolated frames using the proposed algorithm is much higher than the existing algorithms.
Introducing Vision Transformer for Alzheimer's Disease classification task with 3D input
Zilun Zhang, Farzad Khalvati
Many high-performance classification models utilize complex CNN-based architectures for Alzheimer's Disease classification. We aim to investigate two relevant questions regarding classification of Alzheimer's Disease using MRI: "Do Vision Transformer-based models perform better than CNN-based models?" and "Is it possible to use a shallow 3D CNN-based model to obtain satisfying results?" To achieve these goals, we propose two models that can take in and process 3D MRI scans: Convolutional Voxel Vision Transformer (CVVT) architecture, and ConvNet3D-4, a shallow 4-block 3D CNN-based model. Our results indicate that the shallow 3D CNN-based models are sufficient to achieve good classification results for Alzheimer's Disease using MRI scans.
Noisier2Noise: Learning to Denoise from Unpaired Noisy Data
Nick Moran, Dan Schmidt, Yu Zhong
et al.
We present a method for training a neural network to perform image denoising without access to clean training examples or access to paired noisy training examples. Our method requires only a single noisy realization of each training example and a statistical model of the noise distribution, and is applicable to a wide variety of noise models, including spatially structured noise. Our model produces results which are competitive with other learned methods which require richer training data, and outperforms traditional non-learned denoising methods. We present derivations of our method for arbitrary additive noise, an improvement specific to Gaussian additive noise, and an extension to multiplicative Bernoulli noise.
Tracking-Assisted Segmentation of Biological Cells
Deepak K. Gupta, Nathan de Bruijn, Andreas Panteli
et al.
U-Net and its variants have been demonstrated to work sufficiently well in biological cell tracking and segmentation. However, these methods still suffer in the presence of complex processes such as collision of cells, mitosis and apoptosis. In this paper, we augment U-Net with Siamese matching-based tracking and propose to track individual nuclei over time. By modelling the behavioural pattern of the cells, we achieve improved segmentation and tracking performances through a re-segmentation procedure. Our preliminary investigations on the Fluo-N2DH-SIM+ and Fluo-N2DH-GOWT1 datasets demonstrate that absolute improvements of up to 3.8 % and 3.4% can be obtained in segmentation and tracking accuracy, respectively.
A Multimodal Vision Sensor for Autonomous Driving
Dongming Sun, Xiao Huang, Kailun Yang
This paper describes a multimodal vision sensor that integrates three types of cameras, including a stereo camera, a polarization camera and a panoramic camera. Each sensor provides a specific dimension of information: the stereo camera measures depth per pixel, the polarization obtains the degree of polarization, and the panoramic camera captures a 360-degree landscape. Data fusion and advanced environment perception could be built upon the combination of sensors. Designed especially for autonomous driving, this vision sensor is shipped with a robust semantic segmentation network. In addition, we demonstrate how cross-modal enhancement could be achieved by registering the color image and the polarization image. An example of water hazard detection is given. To prove the multimodal vision sensor's compatibility with different devices, a brief runtime performance analysis is carried out.
Improving localization-based approaches for breast cancer screening exam classification
Thibault Févry, Jason Phang, Nan Wu
et al.
We trained and evaluated a localization-based deep CNN for breast cancer screening exam classification on over 200,000 exams (over 1,000,000 images). Our model achieves an AUC of 0.919 in predicting malignancy in patients undergoing breast cancer screening, reducing the error rate of the baseline (Wu et al., 2019a) by 23%. In addition, the models generates bounding boxes for benign and malignant findings, providing interpretable predictions.
Vertebra partitioning with thin-plate spline surfaces steered by a convolutional neural network
Nikolas Lessmann, Jelmer M. Wolterink, Majd Zreik
et al.
Thin-plate splines can be used for interpolation of image values, but can also be used to represent a smooth surface, such as the boundary between two structures. We present a method for partitioning vertebra segmentation masks into two substructures, the vertebral body and the posterior elements, using a convolutional neural network that predicts the boundary between the two structures. This boundary is modeled as a thin-plate spline surface defined by a set of control points predicted by the network. The neural network is trained using the reconstruction error of a convolutional autoencoder to enable the use of unpaired data.
Constrained Linear Data-feature Mapping for Image Classification
Juncai He, Yuyan Chen, Lian Zhang
et al.
In this paper, we propose a constrained linear data-feature mapping model as an interpretable mathematical model for image classification using convolutional neural network (CNN) such as the ResNet. From this viewpoint, we establish the detailed connections in a technical level between the traditional iterative schemes for constrained linear system and the architecture for the basic blocks of ResNet. Under these connections, we propose some natural modifications of ResNet type models which will have less parameters but still maintain almost the same accuracy as these corresponding original models. Some numerical experiments are shown to demonstrate the validity of this constrained learning data-feature mapping assumption.
Color-wise Attention Network for Low-light Image Enhancement
Yousef Atoum, Mao Ye, Liu Ren
et al.
Absence of nearby light sources while capturing an image will degrade the visibility and quality of the captured image, making computer vision tasks difficult. In this paper, a color-wise attention network (CWAN) is proposed for low-light image enhancement based on convolutional neural networks. Motivated by the human visual system when looking at dark images, CWAN learns an end-to-end mapping between low-light and enhanced images while searching for any useful color cues in the low-light image to aid in the color enhancement process. Once these regions are identified, CWAN attention will be mainly focused to synthesize these local regions, as well as the global image. Both quantitative and qualitative experiments on challenging datasets demonstrate the advantages of our method in comparison with state-of-the-art methods.