Hasil untuk "cs.CV"

Menampilkan 20 dari ~116470 hasil · dari DOAJ, arXiv, CrossRef

JSON API
arXiv Open Access 2025
Inteval Analysis for two spherical functions arising from robust Perspective-n-Lines problem

Xiang Zheng, Haodong Jiang, Junfeng Wu

This report presents a comprehensive interval analysis of two spherical functions derived from the robust Perspective-n-Lines (PnL) problem. The study is motivated by the application of a dimension-reduction technique to achieve global solutions for the robust PnL problem. We establish rigorous theoretical results, supported by detailed proofs, and validate our findings through extensive numerical simulations.

en cs.CV, cs.RO
arXiv Open Access 2025
Auto3DSeg for Brain Tumor Segmentation from 3D MRI in BraTS 2023 Challenge

Andriy Myronenko, Dong Yang, Yufan He et al.

In this work, we describe our solution to the BraTS 2023 cluster of challenges using Auto3DSeg from MONAI. We participated in all 5 segmentation challenges, and achieved the 1st place results in three of them: Brain Metastasis, Brain Meningioma, BraTS-Africa challenges, and the 2nd place results in the remaining two: Adult and Pediatic Glioma challenges.

en cs.CV
arXiv Open Access 2025
Evaluating Deep Learning and Traditional Approaches Used in Source Camera Identification

Mansur Ozaman

One of the most important tasks in computer vision is identifying the device using which the image was taken, useful for facilitating further comprehensive analysis of the image. This paper presents comparative analysis of three techniques used in source camera identification (SCI): Photo Response Non-Uniformity (PRNU), JPEG compression artifact analysis, and convolutional neural networks (CNNs). It evaluates each method in terms of device classification accuracy. Furthermore, the research discusses the possible scientific development needed for the implementation of the methods in real-life scenarios.

en cs.CV
arXiv Open Access 2025
Exploring Decision-Making Capabilities of LLM Agents: An Experimental Study on Jump-Jump Game

Juwu Li

The Jump-Jump game, as a simple yet challenging casual game, provides an ideal testing environment for studying LLM decision-making capabilities. The game requires players to precisely control jumping force based on current position and target platform distance, involving multiple cognitive aspects including spatial reasoning, physical modeling, and strategic planning. It illustrates the basic gameplay mechanics of the Jump-Jump game, where the player character (red circle) must jump across platforms with appropriate force to maximize score.

en cs.CV
arXiv Open Access 2024
Visualize and Paint GAN Activations

Rudolf Herdt, Peter Maass

We investigate how generated structures of GANs correlate with their activations in hidden layers, with the purpose of better understanding the inner workings of those models and being able to paint structures with unconditionally trained GANs. This gives us more control over the generated images, allowing to generate them from a semantic segmentation map while not requiring such a segmentation in the training data. To this end we introduce the concept of tileable features, allowing us to identify activations that work well for painting.

en cs.CV, cs.LG
arXiv Open Access 2024
OPCap:Object-aware Prompting Captioning

Feiyang Huang

In the field of image captioning, the phenomenon where missing or nonexistent objects are used to explain an image is referred to as object bias (or hallucination). To mitigate this issue, we propose a target-aware prompting strategy. This method first extracts object labels and their spatial information from the image using an object detector. Then, an attribute predictor further refines the semantic features of the objects. These refined features are subsequently integrated and fed into the decoder, enhancing the model's understanding of the image context. Experimental results on the COCO and nocaps datasets demonstrate that OPCap effectively mitigates hallucination and significantly improves the quality of generated captions.

en cs.CV
arXiv Open Access 2024
Efficient Audio-Visual Fusion for Video Classification

Mahrukh Awan, Asmar Nadeem, Armin Mustafa

We present Attend-Fusion, a novel and efficient approach for audio-visual fusion in video classification tasks. Our method addresses the challenge of exploiting both audio and visual modalities while maintaining a compact model architecture. Through extensive experiments on the YouTube-8M dataset, we demonstrate that our Attend-Fusion achieves competitive performance with significantly reduced model complexity compared to larger baseline models.

en cs.CV
arXiv Open Access 2023
Scaling Up Computer Vision Neural Networks Using Fast Fourier Transform

Siddharth Agrawal

Deep Learning-based Computer Vision field has recently been trying to explore larger kernels for convolution to effectively scale up Convolutional Neural Networks. Simultaneously, new paradigm of models such as Vision Transformers find it difficult to scale up to larger higher resolution images due to their quadratic complexity in terms of input sequence. In this report, Fast Fourier Transform is utilised in various ways to provide some solutions to these issues.

en cs.CV, cs.LG
arXiv Open Access 2022
Patch DCT vs LeNet

David Sinclair

This paper compares the performance of a NN taking the output of a DCT (Discrete Cosine Transform) of an image patch with leNet for classifying MNIST hand written digits. The basis functions underlying the DCT bear a passing resemblance to some of the learned basis function of the Visual Transformer but are an order of magnitude faster to apply.

en cs.CV
arXiv Open Access 2022
Neural Font Rendering

Daniel Anderson, Ariel Shamir, Ohad Fried

Recent advances in deep learning techniques and applications have revolutionized artistic creation and manipulation in many domains (text, images, music); however, fonts have not yet been integrated with deep learning architectures in a manner that supports their multi-scale nature. In this work we aim to bridge this gap, proposing a network architecture capable of rasterizing glyphs in multiple sizes, potentially paving the way for easy and accessible creation and manipulation of fonts.

en cs.CV
arXiv Open Access 2021
Unanswerable Questions about Images and Texts

Ernest Davis

Questions about a text or an image that cannot be answered raise distinctive issues for an AI. This note discusses the problem of unanswerable questions in VQA (visual question answering), in QA (visual question answering), and in AI generally.

en cs.CV, cs.AI
arXiv Open Access 2020
Voronoi Convolutional Neural Networks

Soroosh Yazdani, Andrea Tagliasacchi

In this technical report, we investigate extending convolutional neural networks to the setting where functions are not sampled in a grid pattern. We show that by treating the samples as the average of a function within a cell, we can find a natural equivalent of most layers used in CNN. We also present an algorithm for running inference for these models exactly using standard convex geometry algorithms.

en cs.CV
arXiv Open Access 2019
Semi-Supervised Semantic Matching

Zakaria Laskar, Juho Kannala

Convolutional neural networks (CNNs) have been successfully applied to solve the problem of correspondence estimation between semantically related images. Due to non-availability of large training datasets, existing methods resort to self-supervised or unsupervised training paradigm. In this paper we propose a semi-supervised learning framework that imposes cyclic consistency constraint on unlabeled image pairs. Together with the supervised loss the proposed model achieves state-of-the-art on a benchmark semantic matching dataset.

en cs.CV
arXiv Open Access 2018
SAR Image Despeckling Using Quadratic-Linear Approximated L1-Norm

Fatih Nar

Speckle noise, inherent in synthetic aperture radar (SAR) images, degrades the performance of the various SAR image analysis tasks. Thus, speckle noise reduction is a critical preprocessing step for smoothing homogeneous regions while preserving details. This letter proposes a variational despeckling approach where L1-norm total variation regularization term is approximated in a quadratic and linear manner to increase accuracy while decreasing the computation time. Despeckling performance and computational efficiency of the proposed method are shown using synthetic and real-world SAR images.

en cs.CV
arXiv Open Access 2017
A Multiscale Patch Based Convolutional Network for Brain Tumor Segmentation

Jean Stawiaski

This article presents a multiscale patch based convolutional neural network for the automatic segmentation of brain tumors in multi-modality 3D MR images. We use multiscale deep supervision and inputs to train a convolutional network. We evaluate the effectiveness of the proposed approach on the BRATS 2017 segmentation challenge where we obtained dice scores of 0.755, 0.900, 0.782 and 95% Hausdorff distance of 3.63mm, 4.10mm, and 6.81mm for enhanced tumor core, whole tumor and tumor core respectively.

en cs.CV, q-bio.NC
arXiv Open Access 2017
Group Visual Sentiment Analysis

Zeshan Hussain, Tariq Patanam, Hardie Cate

In this paper, we introduce a framework for classifying images according to high-level sentiment. We subdivide the task into three primary problems: emotion classification on faces, human pose estimation, and 3D estimation and clustering of groups of people. We introduce novel algorithms for matching body parts to a common individual and clustering people in images based on physical location and orientation. Our results outperform several baseline approaches.

en cs.CV
arXiv Open Access 2016
A Topological Lowpass Filter for Quasiperiodic Signals

Michael Robinson

This article presents a two-stage topological algorithm for recovering an estimate of a quasiperiodic function from a set of noisy measurements. The first stage of the algorithm is a topological phase estimator, which detects the quasiperiodic structure of the function without placing additional restrictions on the function. By respecting this phase estimate, the algorithm avoids creating distortion even when it uses a large number of samples for the estimate of the function.

en cs.CV, math.DS
arXiv Open Access 2016
Automatic 3D Point Set Reconstruction from Stereo Laparoscopic Images using Deep Neural Networks

Balint Antal

In this paper, an automatic approach to predict 3D coordinates from stereo laparoscopic images is presented. The approach maps a vector of pixel intensities to 3D coordinates through training a six layer deep neural network. The architectural aspects of the approach is presented and in detail and the method is evaluated on a publicly available dataset with promising results.

en cs.CV
CrossRef Open Access 2016
Transición hacia la paz y zonas marrones urbanas

Mauricio Uribe López

La transición de la guerra a la paz puede conllevar un cambio en el centro de gravedad de la violencia hacia micro-espacios deprimidos de las ciudades que constituyen lo que se puede denominar, adaptando el concepto de Guillermo O’Donnell, zonas marrones urbanas. Las situaciones de postconflicto altamente violento y las de alta violencia societal que corresponden al tipo de casos que se pueden caracterizar como casos de paz violenta, requieren un enfoque de seguridad ciudadana urbana que vaya en sintonía con el giro local que se ha dado en las aproximaciones críticas de la construcción de paz.

arXiv Open Access 2014
Image Classification with A Deep Network Model based on Compressive Sensing

Yufei Gan, Tong Zhuo, Chu He

To simplify the parameter of the deep learning network, a cascaded compressive sensing model "CSNet" is implemented for image classification. Firstly, we use cascaded compressive sensing network to learn feature from the data. Secondly, CSNet generates the feature by binary hashing and block-wise histograms. Finally, a linear SVM classifier is used to classify these features. The experiments on the MNIST dataset indicate that higher classification accuracy can be obtained by this algorithm.

en cs.CV

Halaman 12 dari 5824