Hasil untuk "Photography"

Menampilkan 20 dari ~223400 hasil · dari arXiv, DOAJ, CrossRef, Semantic Scholar

JSON API
DOAJ Open Access 2026
DINO-EYE: self-supervised learning for identification of different optic disc phenotypes in primary open angle glaucoma

Lourdes Grassi, Zhe Fei, Esteban Morales et al.

Abstract To develop a self-supervised learning (SSL) model that classifies optic disc phenotypes in primary open angle glaucoma (POAG) and explores novel phenotypic patterns with optic disc photographs (ODPs). We collected 850 ODPs from patients with POAG and applied data augmentation to address class imbalances, yielding 10,493 images. Using the DINO Vision Transformer as the backbone model, we trained an SSL model to extract 2048-dimensional latent features. These features were used for both supervised classification of six known phenotypes and unsupervised clustering. Classification performance was evaluated with Random Forest and XGBoost models. UMAP (Uniform Manifold Approximation and Projection) was used for dimensionality reduction and feature visualization, and attention maps were generated for model interpretability. The DINO-EYE model features enabled phenotype classification with 91% accuracy with Random Forest and 92.1% after merging clinically similar phenotypes. Unsupervised clustering revealed coherent groupings, particularly for concentric thinning and extensive Peripapillary Atrophy (PPA), though no new phenotypes were unanimously confirmed by clinicians. The proposed model outperformed the RETFound SSL model in phenotype classification and demonstrated interpretable attention regions consistent with expert criteria. Our DINO-EYE effectively extracts clinically meaningful features from fundus images and enables accurate classification of optic disc phenotypes in POAG. It surpasses existing SSL models in performance and interpretability, offering promise for real-world glaucoma decision support and individualized care planning.

Medicine, Science
DOAJ Open Access 2026
Multi-Temporal Shoreline Monitoring and Analysis in Bangkok Bay, Thailand, Using Remote Sensing and GIS Techniques

Yan Wang, Adisorn Sirikham, Jessada Konpang et al.

Drastic alterations have been observed in the coastline of Bangkok Bay, Thailand, over the past three decades. Understanding how coastlines change plays a key role in developing strategies for coastal protection and sustainable resource utilization. This study investigates the temporal and spatial changes in the Bangkok Bay coastline, Thailand, using remote sensing and GIS techniques from 1989 to 2024. The historical rate of coastline change for a typical segment was analyzed using the EPR method, and the underlying causes of these changes were discussed. Finally, the variation trend of the total shoreline length and the characteristics of erosion and sedimentation for a typical shoreline in Bangkok Bay, Thailand, over the past 35 years were obtained. An overall increase in coastline length was observed in Bangkok Bay, Thailand, over the 35-year period from 1989 to 2024, with a net gain from 507.23 km to 571.38 km. The rate of growth has transitioned from rapid to slow, with the most significant changes occurring during the period 1989–1994. Additionally, the average and maximum erosion rates for the typical shoreline segment were notably high during 1989–1994, with values of −21.61 m/a and −55.49 m/a, respectively. The maximum sedimentation rate along the coastline was relatively high from 2014 to 2024, reaching 10.57 m/a. Overall, the entire coastline of the Samut Sakhon–Bangkok–Samut Prakan Provinces underwent net erosion from 1989 to 2024, driven by a confluence of natural and anthropogenic factors.

Photography, Computer applications to medicine. Medical informatics
arXiv Open Access 2025
FLOL: Fast Baselines for Real-World Low-Light Enhancement

Juan C. Benito, Daniel Feijoo, Alvaro Garcia et al.

Low-Light Image Enhancement (LLIE) is a key task in computational photography and imaging. The problem of enhancing images captured during night or in dark environments has been well-studied in the computer vision literature. However, current deep learning-based solutions struggle with efficiency and robustness for real-world scenarios (e.g., scenes with noise, saturated pixels). We propose a lightweight neural network that combines image processing in the frequency and spatial domains. Our baseline method, FLOL, is one of the fastest models for this task, achieving results comparable to the state-of-the-art on popular real-world benchmarks such as LOLv2, LSRW, MIT-5K and UHD-LL. Moreover, we are able to process 1080p images in real-time under 12ms. Code and models at https://github.com/cidautai/FLOL

en cs.CV, cs.RO
arXiv Open Access 2025
Generative AI and the transformation of Work in Latin America -- Brazil

Carmen Bonfacio, Fernando Schapachnik, Fabio Porto

This survey explores the impact perceived by employers and employees of GenAI in their work activities in Brazil. Generative AI (GenAI) is gradually transforming Brazil workforce, particularly in micro and small businesses, though its adoption remains uneven. This survey examines the perceptions of employers and employees across five sectors: Sales, Customer Service, Graphic Design or Photography, Journalism or Content Production, and Software Development or Coding. The results are analyzed in light of six key dimensions of workforce impact. The findings reveal a mix of optimism, apprehension, and untapped potential in the integration of AI tools. This study serves as a foundation for developing inclusive strategies that maximize AI's benefits while safeguarding workers' rights. The IIA-LNCC supports open research and remains committed to shaping a future where technology and human potential progress together.

en cs.CY
arXiv Open Access 2024
Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models

Fan Bao, Chendong Xiang, Gang Yue et al.

We introduce Vidu, a high-performance text-to-video generator that is capable of producing 1080p videos up to 16 seconds in a single generation. Vidu is a diffusion model with U-ViT as its backbone, which unlocks the scalability and the capability for handling long videos. Vidu exhibits strong coherence and dynamism, and is capable of generating both realistic and imaginative videos, as well as understanding some professional photography techniques, on par with Sora -- the most powerful reported text-to-video generator. Finally, we perform initial experiments on other controllable video generation, including canny-to-video generation, video prediction and subject-driven generation, which demonstrate promising results.

en cs.CV, cs.LG
arXiv Open Access 2024
LiDAR Depth Map Guided Image Compression Model

Alessandro Gnutti, Stefano Della Fiore, Mattia Savardi et al.

The incorporation of LiDAR technology into some high-end smartphones has unlocked numerous possibilities across various applications, including photography, image restoration, augmented reality, and more. In this paper, we introduce a novel direction that harnesses LiDAR depth maps to enhance the compression of the corresponding RGB camera images. To the best of our knowledge, this represents the initial exploration in this particular research direction. Specifically, we propose a Transformer-based learned image compression system capable of achieving variable-rate compression using a single model while utilizing the LiDAR depth map as supplementary information for both the encoding and decoding processes. Experimental results demonstrate that integrating LiDAR yields an average PSNR gain of 0.83 dB and an average bitrate reduction of 16% as compared to its absence.

en eess.IV
arXiv Open Access 2024
REACT: Two Datasets for Analyzing Both Human Reactions and Evaluative Feedback to Robots Over Time

Kate Candon, Nicholas C. Georgiou, Helen Zhou et al.

Recent work in Human-Robot Interaction (HRI) has shown that robots can leverage implicit communicative signals from users to understand how they are being perceived during interactions. For example, these signals can be gaze patterns, facial expressions, or body motions that reflect internal human states. To facilitate future research in this direction, we contribute the REACT database, a collection of two datasets of human-robot interactions that display users' natural reactions to robots during a collaborative game and a photography scenario. Further, we analyze the datasets to show that interaction history is an important factor that can influence human reactions to robots. As a result, we believe that future models for interpreting implicit feedback in HRI should explicitly account for this history. REACT opens up doors to this possibility in the future.

en cs.RO, cs.HC
DOAJ Open Access 2024
NHD‐YOLO: Improved YOLOv8 using optimized neck and head for product surface defect detection with data augmentation

Faquan Chen, Miaolei Deng, Hui Gao et al.

Abstract Surface defect detection is an essential task for ensuring the quality of products. Many excellent object detectors have been employed to detect surface defects in resent years, which has achieved outstanding success. To further improve the detection performance, a defect detector based on state‐of‐the‐art YOLOv8, named improved YOLOv8 by neck, head and data (NHD‐YOLO), is proposed. Specifically, YOLOv8 from three crucial aspects including neck, head and data is improved. First, a shortcut feature pyramid network is designed to effectively fuse features from backbone by improving the information transmission. Then, an adaptive decoupled head is proposed to alleviate the feature spatial misalignment between the classification and regression tasks. Finally, to enhance the training on small objects, a data augmentation method named selective small object copy and paste is proposed. Extensive experiments are conducted on three real‐world datasets: detection dataset from Northeastern University (NEU‐DET), printed circuit boards from Peking University (PKU‐Market‐PCB) and common objects in context (COCO). According to the results, NHD‐YOLO achieves the highest detection accuracy and exhibits outstanding inference speed and generalisation performance.

Photography, Computer software
DOAJ Open Access 2024
Automatic Classification of Nodules from 2D Ultrasound Images Using Deep Learning Networks

Tewele W. Tareke, Sarah Leclerc, Catherine Vuillemin et al.

Objective: In clinical practice, thyroid nodules are typically visually evaluated by expert physicians using 2D ultrasound images. Based on their assessment, a fine needle aspiration (FNA) may be recommended. However, visually classifying thyroid nodules from ultrasound images may lead to unnecessary fine needle aspirations for patients. The aim of this study is to develop an automatic thyroid ultrasound image classification system to prevent unnecessary FNAs. Methods: An automatic computer-aided artificial intelligence system is proposed for classifying thyroid nodules using a fine-tuned deep learning model based on the DenseNet architecture, which incorporates an attention module. The dataset comprises 591 thyroid nodule images categorized based on the Bethesda score. Thyroid nodules are classified as either requiring FNA or not. The challenges encountered in this task include managing variability in image quality, addressing the presence of artifacts in ultrasound image datasets, tackling class imbalance, and ensuring model interpretability. We employed techniques such as data augmentation, class weighting, and gradient-weighted class activation maps (Grad-CAM) to enhance model performance and provide insights into decision making. Results: Our approach achieved excellent results with an average accuracy of 0.94, F1-score of 0.93, and sensitivity of 0.96. The use of Grad-CAM gives insights on the decision making and then reinforce the reliability of the binary classification for the end-user perspective. Conclusions: We propose a deep learning architecture that effectively classifies thyroid nodules as requiring FNA or not from ultrasound images. Despite challenges related to image variability, class imbalance, and interpretability, our method demonstrated a high classification accuracy with minimal false negatives, showing its potential to reduce unnecessary FNAs in clinical settings.

Photography, Computer applications to medicine. Medical informatics
DOAJ Open Access 2024
Spróchniałe gałęzie. Upadek rodziny w Buddenbrookach Tomasza Manna i Zmierzchu bogów Luchina Viscontiego

Jędrzej Sławnikowski

The topic of this article is the depiction of the fall of the family in Luchino Visconti’s film The Damned and Thomas Mann’s novel Buddenbrooks, which served as a major inspiration for the former. The comparison of both works centers on two pairs of opposite attributes: the fall of the Buddenbrooks is characterised as gradual and internal, whereas the von Essenbeck family from The Damned declines in a rapid and external manner. These differences are studied in connection with such issues as the role of historical events in both works, the similiarities and dissimilarities between certain characters, as well as the impact of Shakespeare’s Macbeth on the narrative structure of The Damned. The nature of Visconti’s inspiration with Mann’s work and the connection between both artists are also discussed.

Photography, Dramatic representation. The theater
arXiv Open Access 2023
Lightweight High-Speed Photography Built on Coded Exposure and Implicit Neural Representation of Videos

Zhihong Zhang, Runzhao Yang, Jinli Suo et al.

The demand for compact cameras capable of recording high-speed scenes with high resolution is steadily increasing. However, achieving such capabilities often entails high bandwidth requirements, resulting in bulky, heavy systems unsuitable for low-capacity platforms. To address this challenge, leveraging a coded exposure setup to encode a frame sequence into a blurry snapshot and subsequently retrieve the latent sharp video presents a lightweight solution. Nevertheless, restoring motion from blur remains a formidable challenge due to the inherent ill-posedness of motion blur decomposition, the intrinsic ambiguity in motion direction, and the diverse motions present in natural videos. In this study, we propose a novel approach to address these challenges by combining the classical coded exposure imaging technique with the emerging implicit neural representation for videos. We strategically embed motion direction cues into the blurry image during the imaging process. Additionally, we develop a novel implicit neural representation based blur decomposition network to sequentially extract the latent video frames from the blurry image, leveraging the embedded motion direction cues. To validate the effectiveness and efficiency of our proposed framework, we conduct extensive experiments using benchmark datasets and real-captured blurry images. The results demonstrate that our approach significantly outperforms existing methods in terms of both quality and flexibility. The code for our work is available at .https://github.com/zhihongz/BDINR

en cs.CV, eess.IV
arXiv Open Access 2023
Coincidental Generation

Jordan W. Suchow, Necdet Gürkan

Generative A.I. models have emerged as versatile tools across diverse industries, with applications in privacy-preserving data sharing, computational art, personalization of products and services, and immersive entertainment. Here, we introduce a new privacy concern in the adoption and use of generative A.I. models: that of coincidental generation, where a generative model's output is similar enough to an existing entity, beyond those represented in the dataset used to train the model, to be mistaken for it. Consider, for example, synthetic portrait generators, which are today deployed in commercial applications such as virtual modeling agencies and synthetic stock photography. Due to the low intrinsic dimensionality of human face perception, every synthetically generated face will coincidentally resemble an actual person. Such examples of coincidental generation all but guarantee the misappropriation of likeness and expose organizations that use generative A.I. to legal and regulatory risk.

en cs.CV, cs.CR
DOAJ Open Access 2023
Re-encoding Glamour from Ghana to England: Illustrated Magazines, Gender Norms and Black Identities through the Lens of James Barnor (1950s–1980s)

Margaux Lavernhe

How have gender and racial norms conveyed by illustrated magazines—whose circulation exploded in Africa in the 1960s—affected photographers’ local practices? And how, in turn, have they themselves generated this gendered visual order? This article aims to shed light on this two-fold question by proposing a diachronic analysis of the influence of models of femininity transmitted by the illustrated press on the visual imagination of a Ghanaian photographer—as seen in his photographs taken between the 1950s and the 1980s. It explores the links between the publications of the pan-African magazine Drum (the most widely circulated magazine in English-speaking Africa at the time) and its translation into the art of portraiture as practiced by James Barnor (1929-), a photographer with a transnational career, between Ghana and England. Because his professional and personal career path tracked the evolution of these gendered norms, James Barnor became both the repository and the instigator of an idealized vision of “the” African woman.By means of an intersectional focus, the issues of gender norms and of racial biases are examined in parallel to better understand how the photographer appropriated throughout his career the shifting codes of a “female glamour” reinvented for Africa during the post-independence period. While numerous studies have examined the modalities of this codification, in the present paper they are addressed through an in-depth exploration of the photographer's archives, now held in Paris, and combined with an analysis of early issues of Drum. The aim is to juxtapose images intended for publication, i.e. public, with private images in order to consider how the standards of fashion photography infused Barnor’s practices which lie at the crossroad of different social worlds. The corpus composed of portraits of young women is also informed by numerous interviews with the photographer and some of his models, which provide behind-the-scenes insights relative to the published images by exploring their political and social contexts.We first look at Drum’s editorial strategy from its launch in South Africa to its expansion throughout West Africa. While the magazine initially borrowed from white Western references such as Life, it gradually became, to some extent, a showcase for black pride on the continent and in the global diaspora. Then, we study Barnor’s early studio practice as already acutely aware of the codes of femininity enacted by the magazine: this is shown through his “recycling” of the poses and the composition of the images. During the ten years he spent in England, from 1959 to 1969, his collaboration with Drum gave rise to a gallery of portraits of anonymous young women, who became ordinary icons for an ideal African femininity in the context of the diaspora. Finally, in the 1970s, Barnor’s return to Ghana saw the reuse of these codes inherited from the globalized fashion industry combined with the emerging iconography produced by African-American models as a means to create social documentary. In this way, he contributed to an aesthetic of blackness that was constructed within a transnational framework.

Social Sciences
DOAJ Open Access 2022
Semantic and context features integration for robust object tracking

Jinzhen Yao, Jianlin Zhang, Zhixing Wang et al.

Abstract Siamese network‐based object tracking learns features of a target object marked in the first frame and that of the object in subsequent frames simultaneously and then measures similarity between two features to recognize and locate the object. Owing to their efficiency and high accuracy, Siamese networks have attracted much attention recently. However, tracking accuracy decreases significantly when there are scale changes, occlusion, and pose variations due to the way that Siamese networks estimate feature similarity. To address this issue, the authors propose a tracking algorithm, named Semantic and context features integration for robust object tracking that integrates local and global features of the object. Local features provide context information for tracking parts of the object, while global features contain semantic information for tracking the object. The authors meticulously design local and global classification and regression heads and integrate them into one uniform framework to achieve integration tracking. This method effectively alleviates low accuracy in complex scenes such as scale changes, deformation, and occlusion. Numerous experiments demonstrate that this method achieves state‐of‐art (SOTA) performance with 45 FPS on a single RTX2060 Super GPU on public tracking datasets, including VOT2016, VOT2019, OTB100, GOT‐10k, and LaSOT, and its effectiveness and efficiency is confirmed.

Photography, Computer software
DOAJ Open Access 2022
The surrogate labor of the eye: Farocki, Papa, and the eeefff collective

Tereza Stejskalová

ABSTRACTThe following essay explores the work of art as a site of encounter with human perceptual labor that plays a role in technical operations. It tackles the way such labor is deemed obsolete, soon to be replaced, and therefore surrogate even if it actually animates and reproduces automated vision systems. It explores how art goes about representing the ways in which such labor is undervalued and unrecognized. The text argues for reading in between the lines and images of Harun Farocki’s films, installations, and writings where the obsolescence of human labor emerges more as an ideological screen than a fact. It focuses on moments in his oeuvre which indicate that human labor, including cognition as automation’s last frontier, is not automated away but persists, changes site, undergoes restructuring, and becomes more hidden. More recent works by the eeefff collective and Elisa Giardina Papa explore the intertwined roles of human affection and vision labor in the necessarily failed attempts to teach machines to see and feel, to “clean” the algorithmic vision and affection from opacity and the queerness of real life. Both artists leave behind Farocki’s self-reflexive, detached spectator to involve the audience in more situated and embodied experiences of perception labor and the particular ways in which such labor has become outsourced and dispersed in semi-peripheries such as Sicily or Belarus. They try to express the price that people pay with their emotions and bodies for such work. Yet, in principle, they follow Farocki’s take on labor’s in/visibility in that they challenge the ruling ideologies that blind human vision to the realities of labor. The essay also pays attention to the ways in which both artistic and technical vision today are pre-determined by the logic of the gig economy.

Arts in general, Aesthetics

Halaman 14 dari 11170