Muhammad Syahrul Fajar Utomo, Widyastuti Widyastuti
Hasil untuk "cs.CV"
Menampilkan 20 dari ~116476 hasil Β· dari CrossRef, DOAJ, arXiv
Florian Bauer
Cameras play a crucial role in modern driver assistance systems and are an essential part of the sensor technology for automated driving. The quality of images captured by in-vehicle cameras highly influences the performance of visual perception systems. This paper presents a feature-based algorithm to detect certain effects that can degrade image quality in automotive applications. The algorithm is based on an intelligent selection of significant features. Due to the small number of features, the algorithm performs well even with small data sets. Experiments with different data sets show that the algorithm can detect soiling adhering to camera lenses and classify different types of image degradation.
Manoj Kumar, Mostafa Dehghani, Neil Houlsby
We propose Dual PatchNorm: two Layer Normalization layers (LayerNorms), before and after the patch embedding layer in Vision Transformers. We demonstrate that Dual PatchNorm outperforms the result of exhaustive search for alternative LayerNorm placement strategies in the Transformer block itself. In our experiments, incorporating this trivial modification, often leads to improved accuracy over well-tuned Vision Transformers and never hurts.
Zhenchao Jin
This paper presents SSSegmenation, which is an open source supervised semantic image segmentation toolbox based on PyTorch. The design of this toolbox is motivated by MMSegmentation while it is easier to use because of fewer dependencies and achieves superior segmentation performance under a comparable training and testing setup. Moreover, the toolbox also provides plenty of trained weights for popular and contemporary semantic segmentation methods, including Deeplab, PSPNet, OCRNet, MaskFormer, \emph{etc}. We expect that this toolbox can contribute to the future development of semantic segmentation. Codes and model zoos are available at \href{https://github.com/SegmentationBLWX/sssegmentation/}{SSSegmenation}.
Shaoxu Li
We propose a method for synthesizing edited photo-realistic digital avatars with text instructions. Given a short monocular RGB video and text instructions, our method uses an image-conditioned diffusion model to edit one head image and uses the video stylization method to accomplish the editing of other head images. Through iterative training and update (three times or more), our method synthesizes edited photo-realistic animatable 3D neural head avatars with a deformable neural radiance field head synthesis method. In quantitative and qualitative studies on various subjects, our method outperforms state-of-the-art methods.
Raed Abu Zitar, Mohammad Al-Betar, Mohamad Ryalat et al.
This paper presents a review of techniques used for the detection and tracking of UAVs or drones. There are different techniques that depend on collecting measurements of the position, velocity, and image of the UAV and then using them in detection and tracking. Hybrid detection techniques are also presented. The paper is a quick reference for a wide spectrum of methods that are used in the drone detection process.
Chenghao Li, Chaoning Zhang
ChatGPT and its improved variant GPT4 have revolutionized the NLP field with a single model solving almost all text related tasks. However, such a model for computer vision does not exist, especially for 3D vision. This article first provides a brief view on the progress of deep learning in text, image and 3D fields from the model perspective. Moreover, this work further discusses how AIGC evolves from the data perspective. On top of that, this work presents an outlook on the development of AIGC in 3D from the data perspective.
Martin Weigert, Uwe Schmidt
Instance segmentation and classification of nuclei is an important task in computational pathology. We show that StarDist, a deep learning nuclei segmentation method originally developed for fluorescence microscopy, can be extended and successfully applied to histopathology images. This is substantiated by conducting experiments on the Lizard dataset, and through entering the Colon Nuclei Identification and Counting (CoNIC) challenge 2022, where our approach achieved the first spot on the leaderboard for the segmentation and classification task for both the preliminary and final test phase.
Stephen Royle
Part of a series on the development of Early Career Researchers in the lab. The idea for the CV clinic came from the lab themselves. We had previously had a session on creating a research profile and a large part of that session was spent looking at CVs.
Christopher Zach, Virginia Estellers
In this work we address supervised learning of neural networks via lifted network formulations. Lifted networks are interesting because they allow training on massively parallel hardware and assign energy models to discriminatively trained neural networks. We demonstrate that the training methods for lifted networks proposed in the literature have significant limitations and show how to use a contrastive loss to address those limitations. We demonstrate that this contrastive training approximates back-propagation in theory and in practice and that it is superior to the training objective regularly used for lifted networks.
RenΓ© Schuster, Oliver WasenmΓΌller, Didier Stricker
Scene flow describes 3D motion in a 3D scene. It can either be modeled as a single task, or it can be reconstructed from the auxiliary tasks of stereo depth and optical flow estimation. While the second method can achieve real-time performance by using real-time auxiliary methods, it will typically produce non-dense results. In this representation of a basic combination approach for scene flow estimation, we will tackle the problem of non-density by interpolation.
Alexander Hagg, Frederik Hegger, Paul PlΓΆger
Current object recognition methods fail on object sets that include both diffuse, reflective and transparent materials, although they are very common in domestic scenarios. We show that a combination of cues from multiple sensor modalities, including specular reflectance and unavailable depth information, allows us to capture a larger subset of household objects by extending a state of the art object recognition method. This leads to a significant increase in robustness of recognition over a larger set of commonly used objects.
Lorenzo Baraldi, Costantino Grana, Rita Cucchiara
We present a model that automatically divides broadcast videos into coherent scenes by learning a distance measure between shots. Experiments are performed to demonstrate the effectiveness of our approach by comparing our algorithm against recent proposals for automatic scene segmentation. We also propose an improved performance measure that aims to reduce the gap between numerical evaluation and expected results, and propose and release a new benchmark dataset.
Kevin J. Shih, Saurabh Singh, Derek Hoiem
We present a method that learns to answer visual questions by selecting image regions relevant to the text-based query. Our method exhibits significant improvements in answering questions such as "what color," where it is necessary to evaluate a specific location, and "what room," where it selectively identifies informative image regions. Our model is tested on the VQA dataset which is the largest human-annotated visual question answering dataset to our knowledge.
A. C. Sparavigna, R. Marazzato
A GIMP Retinex filtering can be used for enhancing images, with good results on foggy images, as recently discussed. Since this filter has some parameters that can be adjusted to optimize the output image, several approaches can be decided according to desired results. Here, as a criterion for optimizing the filtering parameters, we consider the maximization of the image entropy. We use, besides the Shannon entropy, also a generalized entropy.
Thomas Ruland
In their work "Global Optimization through Rotation Space Search", Richard Hartley and Fredrik Kahl introduce a global optimization strategy for problems in geometric computer vision, based on rotation space search using a branch-and-bound algorithm. In its core, Lemma 2 of their publication is the important foundation for a class of global optimization algorithms, which is adopted over a wide range of problems in subsequent publications. This lemma relates a metric on rotations represented by rotation matrices with a metric on rotations in axis-angle representation. This work focuses on a proof for this relationship, which is based on Rodrigues' Rotation Theorem for the composition of rotations in axis-angle representation.
Chandranath Adak
This paper introduces an efficient edge detection method based on Gabor filter and rough clustering. The input image is smoothed by Gabor function, and the concept of rough clustering is used to focus on edge detection with soft computational approach. Hysteresis thresholding is used to get the actual output, i.e. edges of the input image. To show the effectiveness, the proposed technique is compared with some other edge detection methods.
Jaejun Lee, Taeseon Yun
This paper suggests an effective method for facial recognition using fuzzy theory and Shannon entropy. Combination of fuzzy theory and Shannon entropy eliminates the complication of other methods. Shannon entropy calculates the ratio of an element between faces, and fuzzy theory calculates the member ship of the entropy with 1. More details will be mentioned in Section 3. The learning performance is better than others as it is very simple, and only need two data per learning. By using factors that don't usually change during the life, the method will have a high accuracy.
Hugh L. Kennedy
Recursive, causal and non-causal, multidimensional digital filters, with infinite impulse responses and maximally flat magnitude and delay responses in the low-frequency region, are designed to negate correlated clutter and interference in the background and to accumulate power due to dim targets in the foreground of a surveillance sensor. Expressions relating mean impulse-response duration, frequency selectivity and group delay, to low-order linear-difference-equation coefficients are derived using discrete Laguerre polynomials and discounted least-squares regression, then verified through simulation.
Arnav Bhavsar
We report a method for super-resolution of range images. Our approach leverages the interpretation of LR image as sparse samples on the HR grid. Based on this interpretation, we demonstrate that our recently reported approach, which reconstructs dense range images from sparse range data by exploiting a registered colour image, can be applied for the task of resolution enhancement of range images. Our method only uses a single colour image in addition to the range observation in the super-resolution process. Using the proposed approach, we demonstrate super-resolution results for large factors (e.g. 4) with good localization accuracy.
Halaman 16 dari 5824