Hasil untuk "cs.CV"

Menampilkan 20 dari ~116480 hasil · dari CrossRef, DOAJ, arXiv

JSON API
arXiv Open Access 2025
RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements

Guangcong Zheng, Teng Li, Xianpan Zhou et al.

Recent advances in camera-controllable video generation have been constrained by the reliance on static-scene datasets with relative-scale camera annotations, such as RealEstate10K. While these datasets enable basic viewpoint control, they fail to capture dynamic scene interactions and lack metric-scale geometric consistency-critical for synthesizing realistic object motions and precise camera trajectories in complex environments. To bridge this gap, we introduce the first fully open-source, high-resolution dynamic-scene dataset with metric-scale camera annotations in https://github.com/ZGCTroy/RealCam-Vid.

en cs.CV
arXiv Open Access 2025
EcoScapes: LLM-Powered Advice for Crafting Sustainable Cities

Martin Röhn, Nora Gourmelon, Vincent Christlein

Climate adaptation is vital for the sustainability and sometimes the mere survival of our urban areas. However, small cities often struggle with limited personnel resources and integrating vast amounts of data from multiple sources for a comprehensive analysis. To overcome these challenges, this paper proposes a multi-layered system combining specialized LLMs, satellite imagery analysis and a knowledge base to aid in developing effective climate adaptation strategies. The corresponding code can be found at https://github.com/Photon-GitHub/EcoScapes.

en cs.CV
arXiv Open Access 2024
qlty: handling large tensors in scientific imaging

Petrus Zwart

In scientific imaging, deep learning has become a pivotal tool for image analytics. However, handling large volumetric datasets, which often exceed the memory capacity of standard GPUs, require special attention when subjected to deep learning efforts. This paper introduces qlty, a toolkit designed to address these challenges through tensor management techniques. qlty offers robust methods for subsampling, cleaning, and stitching of large-scale spatial data, enabling effective training and inference even in resource-limited environments.

en cs.CV, eess.IV
arXiv Open Access 2024
Detecting Korean Food Using Image using Hierarchical Model

Hoang Khanh Lam, Kahandakanaththage Maduni Pramuditha Perera

A solution was made available for Korean Food lovers who have dietary restrictions to identify the Korean food before consuming. Just by uploading a clear photo of the dish, people can get to know what they are eating. Image processing techniques together with machine learning helped to come up with this solution.

en cs.CV, cs.AI
arXiv Open Access 2024
Cascading Refinement Video Denoising with Uncertainty Adaptivity

Xinyuan Yu

Accurate alignment is crucial for video denoising. However, estimating alignment in noisy environments is challenging. This paper introduces a cascading refinement video denoising method that can refine alignment and restore images simultaneously. Better alignment enables restoration of more detailed information in each frame. Furthermore, better image quality leads to better alignment. This method has achieved SOTA performance by a large margin on the CRVD dataset. Simultaneously, aiming to deal with multi-level noise, an uncertainty map was created after each iteration. Because of this, redundant computation on the easily restored videos was avoided. By applying this method, the entire computation was reduced by 25% on average.

en cs.CV
arXiv Open Access 2024
A survey on Graph Deep Representation Learning for Facial Expression Recognition

Théo Gueuret, Akrem Sellami, Chaabane Djeraba

This comprehensive review delves deeply into the various methodologies applied to facial expression recognition (FER) through the lens of graph representation learning (GRL). Initially, we introduce the task of FER and the concepts of graph representation and GRL. Afterward, we discuss some of the most prevalent and valuable databases for this task. We explore promising approaches for graph representation in FER, including graph diffusion, spatio-temporal graphs, and multi-stream architectures. Finally, we identify future research opportunities and provide concluding remarks.

en cs.CV
arXiv Open Access 2024
Enhanced Facial Feature Extraction and Recignation Using Optimal Fully Dispersed Haar-like Filters

Zeinab Sedaghatjoo, Hossein Hosseinzadeh, Ahmad shirzadi

Haar-like filters are renowned for their simplicity, speed, and accuracy in various computer vision tasks. This paper proposes a novel algorithm to identify optimal fully dispersed Haar-like filters for enhanced facial feature extraction and recognation. Unlike traditional Haar-like filters, these novel filters allow pixels to move freely within images, enabling more effictive capture of intricate local features...

en cs.CV, math.NA
arXiv Open Access 2024
Training Noise Token Pruning

Mingxing Rao, Bohan Jiang, Daniel Moyer

In the present work we present Training Noise Token (TNT) Pruning for vision transformers. Our method relaxes the discrete token dropping condition to continuous additive noise, providing smooth optimization in training, while retaining discrete dropping computational gains in deployment settings. We provide theoretical connections to Rate-Distortion literature, and empirical evaluations on the ImageNet dataset using ViT and DeiT architectures demonstrating TNT's advantages over previous pruning methods.

en cs.CV
arXiv Open Access 2023
DSeg: Direct Line Segments Detection

Berger Cyrille, Lacroix Simon

This paper presents a model-driven approach to detect image line segments. The approach incrementally detects segments on the gradient image using a linear Kalman filter that estimates the supporting line parameters and their associated variances. The algorithm is fast and robust with respect to image noise and illumination variations, it allows the detection of longer line segments than data-driven approaches, and does not require any tedious parameters tuning. An extension of the algorithm that exploits a pyramidal approach to enhance the quality of results is proposed. Results with varying scene illumination and comparisons to classic existing approaches are presented.

en cs.CV
arXiv Open Access 2023
Human Reaction Intensity Estimation with Ensemble of Multi-task Networks

JiYeon Oh, Daun Kim, Jae-Yeop Jeong et al.

Facial expression in-the-wild is essential for various interactive computing domains. Especially, "Emotional Reaction Intensity" (ERI) is an important topic in the facial expression recognition task. In this paper, we propose a multi-emotional task learning-based approach and present preliminary results for the ERI challenge introduced in the 5th affective behavior analysis in-the-wild (ABAW) competition. Our method achieved the mean PCC score of 0.3254.

en cs.CV
arXiv Open Access 2022
UnconFuse: Avatar Reconstruction from Unconstrained Images

Han Huang, Liliang Chen, Xihao Wang

The report proposes an effective solution about 3D human body reconstruction from multiple unconstrained frames for ECCV 2022 WCPA Challenge: From Face, Body and Fashion to 3D Virtual avatars I (track1: Multi-View Based 3D Human Body Reconstruction). We reproduce the reconstruction method presented in MVP-Human as our baseline, and make some improvements for the particularity of this challenge. We finally achieve the score 0.93 on the official testing set, getting the 1st place on the leaderboard.

en cs.CV
arXiv Open Access 2018
Iterative Low-Rank Approximation for CNN Compression

Maksym Kholiavchenko

Deep convolutional neural networks contain tens of millions of parameters, making them impossible to work efficiently on embedded devices. We propose iterative approach of applying low-rank approximation to compress deep convolutional neural networks. Since classification and object detection are the most favored tasks for embedded devices, we demonstrate the effectiveness of our approach by compressing AlexNet, VGG-16, YOLOv2 and Tiny YOLO networks. Our results show the superiority of the proposed method compared to non-repetitive ones. We demonstrate higher compression ratio providing less accuracy loss.

en cs.CV
arXiv Open Access 2018
Unsupervised Domain Adaptation: A Multi-task Learning-based Method

Jing Zhang, Wanqing Li, Philip Ogunbona

This paper presents a novel multi-task learning-based method for unsupervised domain adaptation. Specifically, the source and target domain classifiers are jointly learned by considering the geometry of target domain and the divergence between the source and target domains based on the concept of multi-task learning. Two novel algorithms are proposed upon the method using Regularized Least Squares and Support Vector Machines respectively. Experiments on both synthetic and real world cross domain recognition tasks have shown that the proposed methods outperform several state-of-the-art domain adaptation methods.

en cs.CV
arXiv Open Access 2018
Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks

Seyed Ali Jalalifar, Hosein Hasani, Hamid Aghajan

We present a novel approach to generating photo-realistic images of a face with accurate lip sync, given an audio input. By using a recurrent neural network, we achieved mouth landmarks based on audio features. We exploited the power of conditional generative adversarial networks to produce highly-realistic face conditioned on a set of landmarks. These two networks together are capable of producing a sequence of natural faces in sync with an input audio track.

en cs.CV
arXiv Open Access 2016
Persistence Lenses: Segmentation, Simplification, Vectorization, Scale Space and Fractal Analysis of Images

Martin Brooks

A persistence lens is a hierarchy of disjoint upper and lower level sets of a continuous luminance image's Reeb graph. The boundary components of a persistence lens's interior components are Jordan curves that serve as a hierarchical segmentation of the image, and may be rendered as vector graphics. A persistence lens determines a varilet basis for the luminance image, in which image simplification is a realized by subspace projection. Image scale space, and image fractal analysis, result from applying a scale measure to each basis function.

en cs.CV, cs.CG
arXiv Open Access 2016
Shape and Centroid Independent Clustring Algorithm for Crowd Management Applications

Yasser Mohammad Seddiq, A. A. Alharbiy, Moayyad Hamza Ghunaim

Clustering techniques play an important role in data mining and its related applications. Among the challenging applications that require robust and real-time processing are crowd management and group trajectory applications. In this paper, a robust and low-complexity clustering algorithm is proposed. It is capable of processing data in a manner that is shape and centroid independent. The algorithm is of low complexity due to the novel technique to compute the matrix power. The algorithm was tested on real and synthetic data and test results are reported.

en cs.CV
arXiv Open Access 2015
Interactive multiclass segmentation using superpixel classification

Bérengère Mathieu, Alain Crouzil, Jean-Baptiste Puel

This paper adresses the problem of interactive multiclass segmentation. We propose a fast and efficient new interactive segmentation method called Superpixel Classification-based Interactive Segmentation (SCIS). From a few strokes drawn by a human user over an image, this method extracts relevant semantic objects. To get a fast calculation and an accurate segmentation, SCIS uses superpixel over-segmentation and support vector machine classification. In this paper, we demonstrate that SCIS significantly outperfoms competing algorithms by evaluating its performances on the reference benchmarks of McGuinness and Santner.

en cs.CV
arXiv Open Access 2013
Persian Heritage Image Binarization Competition (PHIBC 2012)

Seyed Morteza Ayatollahi, Hossein Ziaei Nafchi

The first competition on the binarization of historical Persian documents and manuscripts (PHIBC 2012) has been organized in conjunction with the first Iranian conference on pattern recognition and image analysis (PRIA 2013). The main objective of PHIBC 2012 is to evaluate performance of the binarization methodologies, when applied on the Persian heritage images. This paper provides a report on the methodology and performance of the three submitted algorithms based on evaluation measures has been used.

arXiv Open Access 2010
Image Segmentation by Using Threshold Techniques

Salem Saleh Al-amri, N. V. Kalyankar, Khamitkar S. D.

This paper attempts to undertake the study of segmentation image techniques by using five threshold methods as Mean method, P-tile method, Histogram Dependent Technique (HDT), Edge Maximization Technique (EMT) and visual Technique and they are compared with one another so as to choose the best technique for threshold segmentation techniques image. These techniques applied on three satellite images to choose base guesses for threshold segmentation image.

en cs.CV

Halaman 18 dari 5824