Hasil "eess.IV" - JURNALIN

arXiv Open Access 2026

Aligned Stable Inpainting: Mitigating Unwanted Object Insertion and Preserving Color Consistency

Yikai Wang, Junqiu Yu, Chenjie Cao et al.

Generative image inpainting can produce realistic, high-fidelity results even with large, irregular masks. However, existing methods still face key issues that make inpainted images look unnatural. In this paper, we identify two main problems: (1) Unwanted object insertion: generative models may hallucinate arbitrary objects in the masked region that do not match the surrounding context. (2) Color inconsistency: inpainted regions often exhibit noticeable color shifts, leading to smeared textures and degraded image quality. We analyze the underlying causes of these issues and propose efficient post-hoc solutions for pre-trained inpainting models. Specifically, we introduce the principled framework of Aligned Stable inpainting with UnKnown Areas prior (ASUKA). To reduce unwanted object insertion, we use reconstruction-based priors to guide the generative model, suppressing hallucinated objects while preserving generative flexibility. To address color inconsistency, we design a specialized VAE decoder that formulates latent-to-image decoding as a local harmonization task. This design significantly reduces color shifts and produces more color-consistent results. We implement ASUKA on two representative inpainting architectures: a U-Net-based model and a DiT-based model. We analyze and propose lightweight injection strategies that minimize interference with the model's original generation capacity while ensuring the mitigation of the two issues. We evaluate ASUKA using the Places2 dataset and MISATO, our proposed diverse benchmark. Experiments show that ASUKA effectively suppresses object hallucination and improves color consistency, outperforming standard diffusion, rectified flow models, and other inpainting methods. Dataset, models and codes will be released in github.

en cs.CV, eess.IV

Detail Sumber

arXiv Open Access 2025

Non-Invasive Detection of PROState Cancer with Novel Time-Dependent Diffusion MRI and AI-Enhanced Quantitative Radiological Interpretation: PROS-TD-AI

Baltasar Ramos, Cristian Garrido, Paulette Narv'aez et al.

Prostate cancer (PCa) is the most frequently diagnosed malignancy in men and the eighth leading cause of cancer death worldwide. Multiparametric MRI (mpMRI) has become central to the diagnostic pathway for men at intermediate risk, improving de-tection of clinically significant PCa (csPCa) while reducing unnecessary biopsies and over-diagnosis. However, mpMRI remains limited by false positives, false negatives, and moderate to substantial interobserver agreement. Time-dependent diffusion (TDD) MRI, a novel sequence that enables tissue microstructure characterization, has shown encouraging preclinical performance in distinguishing clinically significant from insignificant PCa. Combining TDD-derived metrics with machine learning may provide robust, zone-specific risk prediction with less dependence on reader training and improved accuracy compared to current standard-of-care. This study protocol out-lines the rationale and describes the prospective evaluation of a home-developed AI-enhanced TDD-MRI software (PROSTDAI) in routine diagnostic care, assessing its added value against PI-RADS v2.1 and validating results against MRI-guided prostate biopsy.

en eess.IV, cs.CV

Detail Sumber

arXiv Open Access 2025

Memory-Efficient Super-Resolution of 3D Micro-CT Images Using Octree-Based GANs: Enhancing Resolution and Segmentation Accuracy

Evgeny Ugolkov, Xupeng He, Hyung Kwak et al.

We present a memory-efficient algorithm for significantly enhancing the quality of segmented 3D micro-Computed Tomography (micro-CT) images of rocks using a generative model. The proposed model achieves a 16x increase in resolution and corrects inaccuracies in segmentation caused by the overlapping X-ray attenuation in micro-CT measurements across different minerals. The generative model employed is a 3D Octree-based convolutional Wasserstein generative adversarial network with gradient penalty. To address the challenge of high memory consumption inherent in standard 3D convolutional layers, we implemented an Octree structure within the 3D progressive growing generator model. This enabled the use of memory-efficient 3D Octree-based convolutional layers. The approach is pivotal in overcoming the long-standing memory bottleneck in volumetric deep learning, making it possible to reach 16x super-resolution in 3D, a scale that is challenging to attain due to cubic memory scaling. For training, we utilized segmented 3D low-resolution micro-CT images along with unpaired segmented complementary 2D high-resolution laser scanning microscope images. Post-training, resolution improved from 7 to 0.44 micro-m/voxel with accurate segmentation of constituent minerals. Validated on Berea sandstone, this framework demonstrates substantial improvements in pore characterization and mineral differentiation, offering a robust solution to one of the primary computational limitations in modern geoscientific imaging.

en eess.IV, cs.CV

Detail Sumber

arXiv Open Access 2025

ZACH-ViT: A Zero-Token Vision Transformer with ShuffleStrides Data Augmentation for Robust Lung Ultrasound Classification

Athanasios Angelakis, Amne Mousa, Micah L. A. Heldeweg et al.

Differentiating cardiogenic pulmonary oedema (CPE) from non-cardiogenic and structurally normal lungs in lung ultrasound (LUS) videos remains challenging due to the high visual variability of non-cardiogenic inflammatory patterns (NCIP/ARDS-like), interstitial lung disease, and healthy lungs. This heterogeneity complicates automated classification as overlapping B-lines and pleural artefacts are common. We introduce ZACH-ViT (Zero-token Adaptive Compact Hierarchical Vision Transformer), a 0.25 M-parameter Vision Transformer variant that removes both positional embeddings and the [CLS] token, making it fully permutation-invariant and suitable for unordered medical image data. To enhance generalization, we propose ShuffleStrides Data Augmentation (SSDA), which permutes probe-view sequences and frame orders while preserving anatomical validity. ZACH-ViT was evaluated on 380 LUS videos from 95 critically ill patients against nine state-of-the-art baselines. Despite the heterogeneity of the non-cardiogenic group, ZACH-ViT achieved the highest validation and test ROC-AUC (0.80 and 0.79) with balanced sensitivity (0.60) and specificity (0.91), while all competing models collapsed to trivial classification. It trains 1.35x faster than Minimal ViT (0.62M parameters) with 2.5x fewer parameters, supporting real-time clinical deployment. These results show that aligning architectural design with data structure can outperform scale in small-data medical imaging.

en cs.LG, cs.CV

Detail Sumber

arXiv Open Access 2024

Compressive radio-interferometric sensing with random beamforming as rank-one signal covariance projections

Olivier Leblanc, Yves Wiaux, Laurent Jacques

Radio-interferometry (RI) observes the sky at unprecedented angular resolutions, enabling the study of several far-away galactic objects such as galaxies and black holes. In RI, an array of antennas probes cosmic signals coming from the observed region of the sky. The covariance matrix of the vector gathering all these antenna measurements offers, by leveraging the Van Cittert-Zernike theorem, an incomplete and noisy Fourier sensing of the image of interest. The number of noisy Fourier measurements -- or visibilities -- scales as $\mathcal O(Q^2B)$ for $Q$ antennas and $B$ short-time integration (STI) intervals. We address the challenges posed by this vast volume of data, which is anticipated to increase significantly with the advent of large antenna arrays, by proposing a compressive sensing technique applied directly at the level of the antenna measurements. First, this paper shows that beamforming -- a common technique of dephasing antenna signals -- usually used to focus some region of the sky, is equivalent to sensing a rank-one projection (ROP) of the signal covariance matrix. We build upon our recent work arXiv:2306.12698v3 [eess.IV] to propose a compressive sensing scheme relying on random beamforming, trading the $Q^2$-dependence of the data size for a smaller number $P$ ROPs. We provide image recovery guarantees for sparse image reconstruction. Secondly, the data size is made independent of $B$ by applying $M$ Bernoulli modulations of the ROP vectors obtained for the STI. The resulting sample complexities, theoretically derived in a simpler case without modulations and numerically obtained in phase transition diagrams, are shown to scale as $\mathcal O(K)$ where $K$ is the image sparsity. This illustrates the potential of the approach.

en eess.IV

Detail Sumber

arXiv Open Access 2024

ResNCT: A Deep Learning Model for the Synthesis of Nephrographic Phase Images in CT Urography

Syed Jamal Safdar Gardezi, Lucas Aronson, Peter Wawrzyn et al.

Purpose: To develop and evaluate a transformer-based deep learning model for the synthesis of nephrographic phase images in CT urography (CTU) examinations from the unenhanced and urographic phases. Materials and Methods: This retrospective study was approved by the local Institutional Review Board. A dataset of 119 patients (mean $\pm$ SD age, 65 $\pm$ 12 years; 75/44 males/females) with three-phase CT urography studies was curated for deep learning model development. The three phases for each patient were aligned with an affine registration algorithm. A custom model, coined Residual transformer model for Nephrographic phase CT image synthesis (ResNCT), was developed and implemented with paired inputs of non-contrast and urographic sets of images trained to produce the nephrographic phase images, that were compared with the corresponding ground truth nephrographic phase images. The synthesized images were evaluated with multiple performance metrics, including peak signal to noise ratio (PSNR), structural similarity index (SSIM), normalized cross correlation coefficient (NCC), mean absolute error (MAE), and root mean squared error (RMSE). Results: The ResNCT model successfully generated synthetic nephrographic images from non-contrast and urographic image inputs. With respect to ground truth nephrographic phase images, the images synthesized by the model achieved high PSNR (27.8 $\pm$ 2.7 dB), SSIM (0.88 $\pm$ 0.05), and NCC (0.98 $\pm$ 0.02), and low MAE (0.02 $\pm$ 0.005) and RMSE (0.042 $\pm$ 0.016). Conclusion: The ResNCT model synthesized nephrographic phase CT images with high similarity to ground truth images. The ResNCT model provides a means of eliminating the acquisition of the nephrographic phase with a resultant 33% reduction in radiation dose for CTU examinations.

en eess.IV, cs.AI

Detail Sumber

arXiv Open Access 2024

Resource Constrained U-Net for Extraction of Retinal Vascular Trees

Georgiy Kiselev

This paper demonstrates the efficacy of a modified U-Net structure for the extraction of vascular tree masks for human fundus photographs. On limited compute resources and training data, the proposed model only slightly underperforms when compared to state of the art methods.

en eess.IV, cs.CV

Detail Sumber

DOAJ Open Access 2023

Impact of Image Enhancement Methods on Automatic Transcription Trainings with eScriptorium

Pauline Jacsont, Elina Leblanc

This study stems from the Desenrollando el cordel (Untangling the cordel) project, which focuses on 19th-century Spanish prints editing. It evaluates the impact of image enhancement methods on the automatic transcription of low-quality documents, both in terms of printing and digitisation. We compare different methods (binarisation, deblur) and present the results obtained during the training of models with the Kraken tool. We demonstrate that binarisation methods give better results than the other, and that the combination of several techniques did not significantly improve the transcription prediction. This study shows the significance of using image enhancement methods with Kraken. It paves the way for further experiments with larger and more varied corpora to help future projects design their automatic transcription workflow.

History of scholarship and learning. The humanities, Bibliography. Library science. Information resources

Detail DOI Sumber

arXiv Open Access 2022

Current State of Community-Driven Radiological AI Deployment in Medical Imaging

Vikash Gupta, Barbaros Selnur Erdal, Carolina Ramirez et al.

Artificial Intelligence (AI) has become commonplace to solve routine everyday tasks. Because of the exponential growth in medical imaging data volume and complexity, the workload on radiologists is steadily increasing. We project that the gap between the number of imaging exams and the number of expert radiologist readers required to cover this increase will continue to expand, consequently introducing a demand for AI-based tools that improve the efficiency with which radiologists can comfortably interpret these exams. AI has been shown to improve efficiency in medical-image generation, processing, and interpretation, and a variety of such AI models have been developed across research labs worldwide. However, very few of these, if any, find their way into routine clinical use, a discrepancy that reflects the divide between AI research and successful AI translation. To address the barrier to clinical deployment, we have formed MONAI Consortium, an open-source community which is building standards for AI deployment in healthcare institutions, and developing tools and infrastructure to facilitate their implementation. This report represents several years of weekly discussions and hands-on problem solving experience by groups of industry experts and clinicians in the MONAI Consortium. We identify barriers between AI-model development in research labs and subsequent clinical deployment and propose solutions. Our report provides guidance on processes which take an imaging AI model from development to clinical implementation in a healthcare institution. We discuss various AI integration points in a clinical Radiology workflow. We also present a taxonomy of Radiology AI use-cases. Through this report, we intend to educate the stakeholders in healthcare and AI (AI researchers, radiologists, imaging informaticists, and regulators) about cross-disciplinary challenges and possible solutions.

en cs.AI, cs.CY

Detail Sumber

arXiv Open Access 2022

Compressive Image Classification using Deterministic Sensing Matrices

Sheel Shah, Kushal Kejriwal

We look at the use of deterministic sensing matrices for compressed sensing and provide worst-case bounds on the classification accuracy of SVMs on compressively sensed data.

en eess.IV

Detail Sumber

arXiv Open Access 2022

UAS Imagery and Computer Vision for Site-Specific Weed Control in Corn

Ranjan Sapkota, Paulo Flores

Currently, weed control in a corn field is performed by a blanket application of herbicides which do not consider spatial distribution information of weeds and also uses an extensive amount of chemical herbicides. In order to reduce the amount of chemicals, we used drone based high-resolution imagery and computer-vision techniwue to perform site-specific weed control in corn.

en eess.IV

Detail Sumber

arXiv Open Access 2021

Mathematical Theory of Computational Resolution Limit in Multi-dimensions

Ping Liu, Hai Zhang

Resolving a linear combination of point sources from their band-limited Fourier data is a fundamental problem in imaging and signal processing. With the incomplete Fourier data and the inevitable noise in the measurement, there is a fundamental limit on the separation distance between point sources that can be resolved. This is the so-called resolution limit problem. Characterization of this resolution limit is still a long-standing puzzle despite the prevalent use of the classic Rayleigh limit. It is well-known that Rayleigh limit is heuristic and its drawbacks become prominent when dealing with data that is subjected to delicate processing, as is what modern computational imaging methods do. Therefore, more precise characterization of the resolution limit becomes increasingly necessary with the development of data processing methods. For this purpose, we developed a theory of "computational resolution limit" for both number detection and support recovery in one dimension in [arXiv:2003.02917[cs.IT], arXiv:1912.05430[eess.IV]]. In this paper, we extend the one-dimensional theory to multi-dimensions. More precisely, we define and quantitatively characterize the "computational resolution limit" for the number detection and support recovery problems in a general k-dimensional space. Our results indicate that there exists a phase transition phenomenon regarding to the super-resolution factor and the signal-to-noise ratio in each of the two recovery problems. Our main results are derived using a subspace projection strategy. Finally, to verify the theory, we proposed deterministic subspace projection based algorithms for the number detection and support recovery problems in dimension two and three. The numerical results confirm the phase transition phenomenon predicted by the theory.

en eess.IV, eess.SP

Detail DOI Sumber

arXiv Open Access 2021

Multi-source Domain Adaptation Using Gradient Reversal Layer for Mitotic Cell Detection

Satoshi Kondo

This is a write-up of our method submitted to Mitosis Domain Generalization (MIDOG 2021) Challenge held in MICCAI2021 conference.

en eess.IV

Detail Sumber

S2 Open Access 2020

Tractography filtering using autoencoders

J. Legarreta, L. Petit, F. Rheault et al.

1 sitasi en Computer Science

Detail Sumber

arXiv Open Access 2020

Fisheye lens distortion correction

Dmitry Pozdnyakov

A new distortion correction algorithm for fisheye lens with equidistant mapping function is considered in the present study. The algorithm is much more data lossless and accurate than such a classical approach like Brown-Conrady model

en eess.IV, cs.CV

Detail Sumber

arXiv Open Access 2020

A Database of Dorsal Hand Vein Images

Felipe Wilches-Bernal, Bernardo Núñez-Álvares, Pedro Vizcaya

The dorsal hand vein has been demonstrated as a useful biometric for identity verification. This work details the procedure taken to collect two databases of dorsal hand veins in a biometric recognition project. The purpose of this work is to serve as a reference for the databases that are being shared with the public.

en eess.IV

Detail Sumber

arXiv Open Access 2020

Loss Ensembles for Extremely Imbalanced Segmentation

Jun Ma

This short paper briefly presents our methodology details of automatic intracranial aneurysms segmentation from brain MR scans. We use ensembles of multiple models trained from different loss functions. Our method ranked first place in the ADAM challenge segmentation task. The code and trained models are publicly available at https://github.com/JunMa11/ADAM2020.

en eess.IV, cs.CV

Detail Sumber

arXiv Open Access 2019

RAUNet: Residual Attention U-Net for Semantic Segmentation of Cataract Surgical Instruments

Zhen-Liang Ni, Gui-Bin Bian, Xiao-Hu Zhou et al.

Semantic segmentation of surgical instruments plays a crucial role in robot-assisted surgery. However, accurate segmentation of cataract surgical instruments is still a challenge due to specular reflection and class imbalance issues. In this paper, an attention-guided network is proposed to segment the cataract surgical instrument. A new attention module is designed to learn discriminative features and address the specular reflection issue. It captures global context and encodes semantic dependencies to emphasize key semantic features, boosting the feature representation. This attention module has very few parameters, which helps to save memory. Thus, it can be flexibly plugged into other networks. Besides, a hybrid loss is introduced to train our network for addressing the class imbalance issue, which merges cross entropy and logarithms of Dice loss. A new dataset named Cata7 is constructed to evaluate our network. To the best of our knowledge, this is the first cataract surgical instrument dataset for semantic segmentation. Based on this dataset, RAUNet achieves state-of-the-art performance 97.71% mean Dice and 95.62% mean IOU.

en cs.CV

Detail Sumber

arXiv Open Access 2019

IRXCT: Iterative Reconstruction and visualization application for X-ray Computed Tomography

D Trinca, R Madureira

This report describes the IRXCT Windows application for reconstruction and visualization of tomography tasks.

en eess.IV

Detail Sumber

arXiv Open Access 2019

A Superpixel Segmentation Based Technique for Multiple Sclerosis Lesion Detection

Saba Heidari Gheshlaghi, Amin Ranjbar, Amir Abolfazl Suratgar et al.

A Superpixel Segmentation Based Technique for Multiple Sclerosis Lesion Detection

en eess.IV

Detail Sumber

Hasil untuk "eess.IV"