Recent literature has witnessed significant interest towards 3D biometrics employing monocular vision for robust authentication methods. Motivated by this, in this work we seek to provide insight on recent development in the area of 3D biometrics employing monocular vision. We present the similarity and dissimilarity of 3D monocular biometrics and classical biometrics, listing the strengths and challenges. Further, we provide an overview of recent techniques in 3D biometrics with monocular vision, as well as application systems adopted by the industry. Finally, we discuss open research problems in this area of research
Semantic segmentation of road elements in 2D images is a crucial task in the recognition of some static objects such as lane lines and free space. In this paper, we propose DHSNet,which extracts the objects features with a end-to-end architecture along with a heatmap proposal. Deformable convolutions are also utilized in the proposed network. The DHSNet finely combines low-level feature maps with high-level ones by using upsampling operators as well as downsampling operators in a U-shape manner. Besides, DHSNet also aims to capture static objects of various shapes and scales. We also predict a proposal heatmap to detect the proposal points for more accurate target aiming in the network.
This paper introduces DogHeart, a dataset comprising 1400 training, 200 validation, and 400 test images categorized as small, normal, and large based on VHS score. A custom CNN model is developed, featuring a straightforward architecture with 4 convolutional layers and 4 fully connected layers. Despite the absence of data augmentation, the model achieves a 72\% accuracy in classifying cardiomegaly severity. The study contributes to automated assessment of cardiac conditions in dogs, highlighting the potential for early detection and intervention in veterinary care.
Local feature extractors are the cornerstone of many computer vision tasks. However, their vulnerability to adversarial attacks can significantly compromise their effectiveness. This paper discusses approaches to attack sophisticated local feature extraction algorithms and models to achieve two distinct goals: (1) forcing a match between originally non-matching image regions, and (2) preventing a match between originally matching regions. At the end of the paper, we discuss the performance and drawbacks of different patch generation methods.
The paper proposes a new testing approach for Deep Neural Networks (DNN) using gradient-free optimization to find perturbation chains that successfully falsify the tested DNN, going beyond existing grid-based or combinatorial testing. Applying it to an image segmentation task of detecting railway tracks in images, we demonstrate that the approach can successfully identify weaknesses of the tested DNN regarding particular combinations of common perturbations (e.g., rain, fog, blur, noise) on specific clusters of test images.
Multiple Deep Neural Networks (DNNs) integrated into single Deep Learning (DL) inference pipelines e.g. Multi-Task Learning (MTL) or Ensemble Learning (EL), etc., albeit very accurate, pose challenges for edge deployment. In these systems, models vary in their quantization tolerance and resource demands, requiring meticulous tuning for accuracy-latency balance. This paper introduces an automated heterogeneous quantization approach for DL inference pipelines with multiple DNNs.
Kidney and Kidney Tumor Segmentation Challenge (KiTS) 2023 offers a platform for researchers to compare their solutions to segmentation from 3D CT. In this work, we describe our submission to the challenge using automated segmentation of Auto3DSeg available in MONAI. Our solution achieves the average dice of 0.835 and surface dice of 0.723, which ranks first and wins the KiTS 2023 challenge.
El presente artículo indaga las maneras en que la comunicación transmedia aunada a la participación ciudadana se constituye en una estrategia para generar y fortalecer procesos de movilización social. Plantea un recorrido teórico y conceptual sobre el concepto de narrativas transmedia en marco de la movilización social y toma como ejemplo para el análisis, por su representatividad y reconocimiento en las dinámicas sociales y culturales, al colectivo de Hip- Hop Casa Kolacho, cuyas acciones se llevan a cabo en la Comuna 13 de Medellín, territorio referente en la producción del Hip-hop en la ciudad. Se investiga además la perspectiva de éstos jóvenes involucrados en la producción cultural y creativa, relacionada con el Hip-Hop en la ciudad de Medellín. Para lograrlo, se llevó a cabo un análisis de redes sociales en el que se presentan contenidos relacionados con las producciones asociadas al Centro Cultural Casa Kolacho, la aplicación de entrevistas y seguimiento de las actividades de algunos graffiteros y miembros del Colectivo Casa Kolacho.
Residual Neural Networks [1] won first place in all five main tracks of the ImageNet and COCO 2015 competitions. This kind of network involves the creation of pluggable modules such that the output contains a residual from the input. The residual in that paper is the identity function. We propose to include residuals from all lower layers, suitably normalized, to create the residual. This way, all previous layers contribute equally to the output of a layer. We show that our approach is an improvement on [1] for the CIFAR-10 dataset.
This paper addresses the issue of matching rigid 3D object points with 2D image points through point registration based on maximum likelihood principle in computer simulated images. Perspective projection is necessary when transforming 3D coordinate into 2D. The problem then recasts into a missing data framework where unknown correspondences are handled via mixture models. Adopting the Expectation Conditional Maximization for Point Registration (ECMPR), two different rotation and translation optimization algorithms are compared in this paper. We analyze in detail the associated consequences in terms of estimation of the registration parameters theoretically and experimentally.
This paper presents a fast and robust method for fixed pattern noise nonuniformity correction of infrared focal plane arrays. The proposed method requires neither shutter nor elaborate calibrations and therefore enables a real time correction with no interruptions. Based on derivative estimation of the fixed pattern noise from pixel sized translations of the focal plane array, the proposed method has the advantages of being invariant to the noise magnitude and robust to unknown camera and inter-scene movements while being virtually transparent to the end-user.
This paper proposes a branched residual network for image classification. It is known that high-level features of deep neural network are more representative than lower-level features. By sharing the low-level features, the network can allocate more memory to high-level features. The upper layers of our proposed network are branched, so that it mimics the ensemble learning. By mimicking ensemble learning with single network, we have achieved better performance on ImageNet classification task.
The motivation for using qualitative shape descriptions is as follows: qualitative shape descriptions can implicitly act as a schema for measuring the similarity of shapes, which has the potential to be cognitively adequate. Then, shapes which are similar to each other would also be similar for a pattern recognition algorithm. There is substantial work in pattern recognition and computer vision dealing with shape similarity. Here with our approach to qualitative shape descriptions and shape similarity, the focus is on achieving a representation using only simple predicates that a human could even apply without computer support.
Generic 3D reconstruction from a single image is a difficult problem. A lot of data loss occurs in the projection. A domain based approach to reconstruction where we solve a smaller set of problems for a particular use case lead to greater returns. The project provides a way to automatically generate full 3-D renditions of actual symmetric images that have some prior information provided in the pipeline by a recognition algorithm. We provide a critical analysis on how this can be enhanced and improved to provide a general reconstruction framework for automatic reconstruction for any symmetric shape.
Gatys et al. (2015) showed that pair-wise products of features in a convolutional network are a very effective representation of image textures. We propose a simple modification to that representation which makes it possible to incorporate long-range structure into image generation, and to render images that satisfy various symmetry constraints. We show how this can greatly improve rendering of regular textures and of images that contain other kinds of symmetric structure. We also present applications to inpainting and season transfer.
We present a 3D zigzag rafter (first in literature) which allows us to obtain the exact sequence of spectral components after application of Discrete Cosine Transform 3D (DCT-2D) over a cube. Such cube represents part of a video or eventually a group of images such as multislicing (e.g., Magnetic Resonance or Computed Tomography imaging) and multi or hyperspectral imagery (optical satellites). Besides, we present a new version of the traditional 2D zigzag, including the case of rectangular blocks. Finally, all the attached code is done in MATLAB, and that code serves both blocks of pixels or blocks of blocks.
The scanning electron microscopy (SEM) is probably one the most fascinating examination approach that has been used since more than two decades to detailed inspection of micro scale objects. Most of the scanning electron microscopes could only produce 2D images that could not assist operational analysis of microscopic surface properties. Computer vision algorithms combined with very advanced geometry and mathematical approaches turn any SEM into a full 3D measurement device. This work focuses on a methodical literature review for automatic 3D surface reconstruction of scanning electron microscope images.
In this work we show how sublabel-accurate multilabeling approaches can be derived by approximating a classical label-continuous convex relaxation of nonconvex free-discontinuity problems. This insight allows to extend these sublabel-accurate approaches from total variation to general convex and nonconvex regularizations. Furthermore, it leads to a systematic approach to the discretization of continuous convex relaxations. We study the relationship to existing discretizations and to discrete-continuous MRFs. Finally, we apply the proposed approach to obtain a sublabel-accurate and convex solution to the vectorial Mumford-Shah functional and show in several experiments that it leads to more precise solutions using fewer labels.