We propose PISE, a physics-informed deep ghost imaging framework for low-bandwidth edge perception. By combining adjoint operator initialization with semantic guidance, PISE improves classification accuracy by 2.57% and reduces variance by 9x at 5% sampling.
We propose a novel active learning framework for multi-view semantic segmentation. This framework relies on a new score that measures the discrepancy between point cloud distributions generated from the extra geometrical information derived from the model's prediction across different views. Our approach results in a data efficient and explainable active learning method. The source code is available at https://github.com/chilai235/viewpclAL.
We introduce a novel method for overlaying cell type proportion data onto tissue images. This approach preserves spatial context while avoiding visual clutter or excessively obscuring the underlying slide. Our proposed technique involves clustering the data and aggregating neighboring points of the same cluster into polygons.
Existing knowledge distillation methods generally use a teacher-student approach, where the student network solely learns from a well-trained teacher. However, this approach overlooks the inherent differences in learning abilities between the teacher and student networks, thus causing the capacity-gap problem. To address this limitation, we propose a novel method called SLKD.
This article presents an efficient end-to-end method to perform instance-level recognition employed to the task of labeling and ranking landmark images. In a first step, we embed images in a high dimensional feature space using convolutional neural networks trained with an additive angular margin loss and classify images using visual similarity. We then efficiently re-rank predictions and filter noise utilizing similarity to out-of-domain images. Using this approach we achieved the 1st place in the 2020 edition of the Google Landmark Recognition challenge.
This paper describes the proposed methodology, data used and the results of our participation in the ChallengeTrack 2 (Expr Challenge Track) of the Affective Behavior Analysis in-the-wild (ABAW) Competition 2020. In this competition, we have used a proposed deep convolutional neural network (CNN) model to perform automatic facial expression recognition (AFER) on the given dataset. Our proposed model has achieved an accuracy of 50.77% and an F1 score of 29.16% on the validation set.
TACO is an open image dataset for litter detection and segmentation, which is growing through crowdsourcing. Firstly, this paper describes this dataset and the tools developed to support it. Secondly, we report instance segmentation performance using Mask R-CNN on the current version of TACO. Despite its small size (1500 images and 4784 annotations), our results are promising on this challenging problem. However, to achieve satisfactory trash detection in the wild for deployment, TACO still needs much more manual annotations. These can be contributed using: http://tacodataset.org/
The current state of the art alpha matting methods mainly rely on the trimap as the secondary and only guidance to estimate alpha. This paper investigates the effects of utilising the background information as well as trimap in the process of alpha calculation. To achieve this goal, a state of the art method, AlphaGan is adopted and modified to process the background information as an extra input channel. Extensive experiments are performed to analyse the effect of the background information in image and video matting such as training with mildly and heavily distorted backgrounds. Based on the quantitative evaluations performed on Adobe Composition-1k dataset, the proposed pipeline significantly outperforms the state of the art methods using AlphaMatting benchmark metrics.
Landmark detection for clothes is a fundamental problem for many applications. In this paper, a new training scheme for clothes landmark detection: $\textit{Aggregation and Finetuning}$, is proposed. We investigate the homogeneity among landmarks of different categories of clothes, and utilize it to design the procedure of training. Extensive experiments show that our method outperforms current state-of-the-art methods by a large margin. Our method also won the 1st place in the DeepFashion2 Challenge 2020 - Clothes Landmark Estimation Track with an AP of 0.590 on the test set, and 0.615 on the validation set. Code will be publicly available at https://github.com/lzhbrian/deepfashion2-kps-agg-finetune .
Ever since the prevalent use of the LiDARs in autonomous driving, tremendous improvements have been made to the learning on the point clouds. However, recent progress largely focuses on detecting objects in a single 360-degree sweep, without extensively exploring the temporal information. In this report, we describe a simple way to pass such information in the learning pipeline by adding timestamps to the point clouds, which shows consistent improvements across all three classes.
The raising availability of 3D cameras and dramatic improvement of computer vision algorithms in the recent decade, accelerated the research of automatic movement assessment solutions. Such solutions can be implemented at home, using affordable equipment and dedicated software. In this paper, we divide the movement assessment task into secondary tasks and explain why they are needed and how they can be addressed. We review the recent solutions for automatic movement assessment from skeleton videos, comparing them by their objectives, features, movement domains and algorithmic approaches. In addition, we discuss the status of the research on this topic in a high level.
We propose a novel biophysical and dichromatic reflectance model that efficiently characterises spectral skin reflectance. We show how to fit the model to multispectral face images enabling high quality estimation of diffuse and specular shading as well as biophysical parameter maps (melanin and haemoglobin). Our method works from a single image without requiring complex controlled lighting setups yet provides quantitatively accurate reconstructions and qualitatively convincing decomposition and editing.
Newton Costa, Valdinei Tadeu Paulino, Vicente Gianluppi
et al.
The effects of potassium levels (0, 40, 80 and 120 kg of K2O ha-1) on green dry matter (GDM), chemical composition and nodulation of Stylosanthes capitata cv. Lavradeiro were evaluated under field conditions in Roraima´s savannas. Potassium fertilization increased significantly (P
This paper addresses the problem of viewpoint estimation of an object in a given image. It presents five key insights that should be taken into consideration when designing a CNN that solves the problem. Based on these insights, the paper proposes a network in which (i) The architecture jointly solves detection, classification, and viewpoint estimation. (ii) New types of data are added and trained on. (iii) A novel loss function, which takes into account both the geometry of the problem and the new types of data, is propose. Our network improves the state-of-the-art results for this problem by 9.8%.
Image Processing in Astronomy is a major field of research and involves a lot of techniques pertaining to improve analyzing the properties of the celestial objects or obtaining preliminary inference from the image data. In this paper, we provide a comprehensive case study of advanced image processing techniques applied to Astronomical Galaxy Images for improved analysis, accurate inferences and faster analysis.
This paper has been withdrawn by the author due to a crucial sign error in equation 2 and some mistake in Table 1 information. please let me for changing this information and updating this paper.
In this paper, we introduce heat kernel coupling (HKC) as a method of constructing multimodal spectral geometry on weighted graphs of different size without vertex-wise bijective correspondence. We show that Laplacian averaging can be derived as a limit case of HKC, and demonstrate its applications on several problems from the manifold learning and pattern recognition domain.
The watershed is one of the most used tools in image segmentation. We present how its concept is born and developed over time. Its implementation as an algorithm or a hardwired device evolved together with the technology which allowed it. We present also how it is used in practice, first together with markers, and later introduced in a multiscale framework, in order to produce not a unique partition but a complete hierarchy.
Many applications require comparing multimodal data with different structure and dimensionality that cannot be compared directly. Recently, there has been increasing interest in methods for learning and efficiently representing such multimodal similarity. In this paper, we present a simple algorithm for multimodal similarity-preserving hashing, trying to map multimodal data into the Hamming space while preserving the intra- and inter-modal similarities. We show that our method significantly outperforms the state-of-the-art method in the field.