Hasil "cs.CV" - JURNALIN

arXiv Open Access 2022

Data Clustering as an Emergent Consensus of Autonomous Agents

Piotr Minakowski, Jan Peszek

We present a data segmentation method based on a first-order density-induced consensus protocol. We provide a mathematically rigorous analysis of the consensus model leading to the stopping criteria of the data segmentation algorithm. To illustrate our method, the algorithm is applied to two-dimensional shape datasets and selected images from Berkeley Segmentation Dataset. The method can be seen as an augmentation of classical clustering techniques for multimodal feature space, such as DBSCAN. It showcases a curious connection between data clustering and collective behavior.

en cs.CV, nlin.AO

Detail Sumber

arXiv Open Access 2022

Facial Expression Recognition based on Multi-head Cross Attention Network

Jae-Yeop Jeong, Yeong-Gi Hong, Daun Kim et al.

Facial expression in-the-wild is essential for various interactive computing domains. In this paper, we proposed an extended version of DAN model to address the VA estimation and facial expression challenges introduced in ABAW 2022. Our method produced preliminary results of 0.44 of mean CCC value for the VA estimation task, and 0.33 of the average F1 score for the expression classification task.

en cs.CV

Detail Sumber

arXiv Open Access 2022

Suspicious and Anomaly Detection

Shubham Deshmukh, Favin Fernandes, Monali Ahire et al.

In this project we propose a CNN architecture to detect anomaly and suspicious activities; the activities chosen for the project are running, jumping and kicking in public places and carrying gun, bat and knife in public places. With the trained model we compare it with the pre-existing models like Yolo, vgg16, vgg19. The trained Model is then implemented for real time detection and also used the. tflite format of the trained .h5 model to build an android classification.

en cs.CV

Detail Sumber

arXiv Open Access 2022

Controllable Garment Transfer

Jooeun Son, Tomas Cabezon Pedroso, Carolene Siga et al.

Image-based garment transfer replaces the garment on the target human with the desired garment; this enables users to virtually view themselves in the desired garment. To this end, many approaches have been proposed using the generative model and have shown promising results. However, most fail to provide the user with on the fly garment modification functionality. We aim to add this customizable option of "garment tweaking" to our model to control garment attributes, such as sleeve length, waist width, and garment texture.

en cs.CV

Detail Sumber

arXiv Open Access 2022

A review of schemes for fingerprint image quality computation

Fernando Alonso-Fernandez, Julian Fierrez-Aguilar, Javier Ortega-Garcia

Fingerprint image quality affects heavily the performance of fingerprint recognition systems. This paper reviews existing approaches for fingerprint image quality computation. We also implement, test and compare a selection of them using the MCYT database including 9000 fingerprint images. Experimental results show that most of the algorithms behave similarly.

en cs.CV, eess.IV

Detail Sumber

arXiv Open Access 2022

One-stage Action Detection Transformer

Lijun Li, Li'an Zhuo, Bang Zhang

In this work, we introduce our solution to the EPIC-KITCHENS-100 2022 Action Detection challenge. One-stage Action Detection Transformer (OADT) is proposed to model the temporal connection of video segments. With the help of OADT, both the category and time boundary can be recognized simultaneously. After ensembling multiple OADT models trained from different features, our model can reach 21.28\% action mAP and ranks the 1st on the test-set of the Action detection challenge.

en cs.CV, cs.AI

Detail Sumber

arXiv Open Access 2021

A Review on Human Pose Estimation

Rohit Josyula, Sarah Ostadabbas

The phenomenon of Human Pose Estimation (HPE) is a problem that has been explored over the years, particularly in computer vision. But what exactly is it? To answer this, the concept of a pose must first be understood. Pose can be defined as the arrangement of human joints in a specific manner. Therefore, we can define the problem of Human Pose Estimation as the localization of human joints or predefined landmarks in images and videos. There are several types of pose estimation, including body, face, and hand, as well as many aspects to it. This paper will cover them, starting with the classical approaches to HPE to the Deep Learning based models.

en cs.CV

Detail Sumber

arXiv Open Access 2021

Memory Guided Road Detection

Praveen Venkatesh, Rwik Rana, Varun Jain

In self driving car applications, there is a requirement to predict the location of the lane given an input RGB front facing image. In this paper, we propose an architecture that allows us to increase the speed and robustness of road detection without a large hit in accuracy by introducing an underlying shared feature space that is propagated over time, which serves as a flowing dynamic memory. By utilizing the gist of previous frames, we train the network to predict the current road with a greater accuracy and lesser deviation from previous frames.

en cs.CV

Detail Sumber

arXiv Open Access 2021

Study of visual processing techniques for dynamic speckles: a comparative analysis

Amit Chatterjee, Jitendra Dhanotiya, Vimal Bhatia et al.

Main visual techniques used to obtain information from speckle patterns are Fujii method, generalized difference, weighted generalized difference, mean windowed difference, structural function (SF), modified SF, etc. In this work, a comparative analysis of major visual techniques for natural gum sample is carried out. Obtained results conclusively establish SF based method as an optimum tool for visual inspection of dynamic speckle data.

en cs.CV

Detail Sumber

arXiv Open Access 2021

Analysis of convolutional neural network image classifiers in a hierarchical max-pooling model with additional local pooling

Benjamin Walter

Image classification is considered, and a hierarchical max-pooling model with additional local pooling is introduced. Here the additional local pooling enables the hierachical model to combine parts of the image which have a variable relative distance towards each other. Various convolutional neural network image classifiers are introduced and compared in view of their rate of convergence. The finite sample size performance of the estimates is analyzed by applying them to simulated and real data.

en cs.CV, cs.LG

Detail Sumber

arXiv Open Access 2021

TorchPRISM: Principal Image Sections Mapping, a novel method for Convolutional Neural Network features visualization

Tomasz Szandala

In this paper we introduce a tool called Principal Image Sections Mapping - PRISM, dedicated for PyTorch, but can be easily ported to other deep learning frameworks. Presented software relies on Principal Component Analysis to visualize the most significant features recognized by a given Convolutional Neural Network. Moreover, it allows to display comparative set features between images processed in the same batch, therefore PRISM can be a method well synerging with technique Explanation by Example.

en cs.CV, cs.AI

Detail Sumber

arXiv Open Access 2021

Simple Distillation Baselines for Improving Small Self-supervised Models

Jindong Gu, Wei Liu, Yonglong Tian

While large self-supervised models have rivalled the performance of their supervised counterparts, small models still struggle. In this report, we explore simple baselines for improving small self-supervised models via distillation, called SimDis. Specifically, we present an offline-distillation baseline, which establishes a new state-of-the-art, and an online-distillation baseline, which achieves similar performance with minimal computational overhead. We hope these baselines will provide useful experience for relevant future research. Code is available at: https://github.com/JindongGu/SimDis/

en cs.CV

Detail Sumber

arXiv Open Access 2021

Successive Subspace Learning: An Overview

Mozhdeh Rouhsedaghat, Masoud Monajatipoor, Zohreh Azizi et al.

Successive Subspace Learning (SSL) offers a light-weight unsupervised feature learning method based on inherent statistical properties of data units (e.g. image pixels and points in point cloud sets). It has shown promising results, especially on small datasets. In this paper, we intuitively explain this method, provide an overview of its development, and point out some open questions and challenges for future research.

en cs.CV

Detail Sumber

arXiv Open Access 2021

Online Behavioral Analysis with Application to Emotion State Identification

Lei Gao, Lin Qi, Ling Guan

In this paper, we propose a novel discriminative model for online behavioral analysis with application to emotion state identification. The proposed model is able to extract more discriminative characteristics from behavioral data effectively and find the direction of optimal projection efficiently to satisfy requirements of online data analysis, leading to better utilization of the behavioral information to produce more accurate recognition results.

en cs.CV, cs.LG

Detail DOI Sumber

arXiv Open Access 2021

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

Lahav Lipson, Zachary Teed, Jia Deng

We introduce RAFT-Stereo, a new deep architecture for rectified stereo based on the optical flow network RAFT. We introduce multi-level convolutional GRUs, which more efficiently propagate information across the image. A modified version of RAFT-Stereo can perform accurate real-time inference. RAFT-stereo ranks first on the Middlebury leaderboard, outperforming the next best method on 1px error by 29% and outperforms all published work on the ETH3D two-view stereo benchmark. Code is available at https://github.com/princeton-vl/RAFT-Stereo.

en cs.CV

Detail Sumber

arXiv Open Access 2021

Contrastive Learning with Large Memory Bank and Negative Embedding Subtraction for Accurate Copy Detection

Shuhei Yokoo

Copy detection, which is a task to determine whether an image is a modified copy of any image in a database, is an unsolved problem. Thus, we addressed copy detection by training convolutional neural networks (CNNs) with contrastive learning. Training with a large memory-bank and hard data augmentation enables the CNNs to obtain more discriminative representation. Our proposed negative embedding subtraction further boosts the copy detection accuracy. Using our methods, we achieved 1st place in the Facebook AI Image Similarity Challenge: Descriptor Track. Our code is publicly available here: \url{https://github.com/lyakaap/ISC21-Descriptor-Track-1st}

en cs.CV

Detail Sumber

arXiv Open Access 2020

A Sample Selection Approach for Universal Domain Adaptation

Omri Lifshitz, Lior Wolf

We study the problem of unsupervised domain adaption in the universal scenario, in which only some of the classes are shared between the source and target domains. We present a scoring scheme that is effective in identifying the samples of the shared classes. The score is used to select which samples in the target domain to pseudo-label during training. Another loss term encourages diversity of labels within each batch. Taken together, our method is shown to outperform, by a sizable margin, the current state of the art on the literature benchmarks.

en cs.CV

Detail Sumber

arXiv Open Access 2020

A lightweight target detection algorithm based on Mobilenet Convolution

Shengquan Wang

Target detection algorithm based on deep learning needs high computer GPU configuration, even need to use high performance deep learning workstation, this not only makes the cost increase, also greatly limits the realizability of the ground, this paper introduces a kind of lightweight algorithm for target detection under the condition of the balance accuracy and computational efficiency, MobileNet as Backbone performs parameter The processing speed is 30fps on the RTX2060 card for images with the CNN separator layer. The processing speed is 30fps on the RTX2060 card for images with a resolution of 320*320.

en cs.CV, eess.IV

Detail Sumber

arXiv Open Access 2019

Geometry of the Hough transforms with applications to synthetic data

Mauro C. Beltrametti, Cristina Campi, Anna Maria Massone et al.

In the framework of the Hough transform technique to detect curves in images, we provide a bound for the number of Hough transforms to be considered for a successful optimization of the accumulator function in the recognition algorithm. Such a bound is consequence of geometrical arguments. We also show the robustness of the results when applied to synthetic datasets strongly perturbed by noise. An algebraic approach, discussed in the appendix, leads to a better bound of theoretical interest in the exact case.

en cs.CV

Detail Sumber

arXiv Open Access 2018

EmotiW 2018: Audio-Video, Student Engagement and Group-Level Affect Prediction

Abhinav Dhall, Amanjot Kaur, Roland Goecke et al.

This paper details the sixth Emotion Recognition in the Wild (EmotiW) challenge. EmotiW 2018 is a grand challenge in the ACM International Conference on Multimodal Interaction 2018, Colorado, USA. The challenge aims at providing a common platform to researchers working in the affective computing community to benchmark their algorithms on `in the wild' data. This year EmotiW contains three sub-challenges: a) Audio-video based emotion recognition; b) Student engagement prediction; and c) Group-level emotion recognition. The databases, protocols and baselines are discussed in detail.

en cs.CV

Detail Sumber

Hasil untuk "cs.CV"