Hasil untuk "Photography"

Menampilkan 20 dari ~170720 hasil · dari arXiv, DOAJ, Semantic Scholar

JSON API
S2 Open Access 2023
UAV-YOLOv8: A Small-Object-Detection Model Based on Improved YOLOv8 for UAV Aerial Photography Scenarios

G. Wang, Yanfei Chen, Pei An et al.

Unmanned aerial vehicle (UAV) object detection plays a crucial role in civil, commercial, and military domains. However, the high proportion of small objects in UAV images and the limited platform resources lead to the low accuracy of most of the existing detection models embedded in UAVs, and it is difficult to strike a good balance between detection performance and resource consumption. To alleviate the above problems, we optimize YOLOv8 and propose an object detection model based on UAV aerial photography scenarios, called UAV-YOLOv8. Firstly, Wise-IoU (WIoU) v3 is used as a bounding box regression loss, and a wise gradient allocation strategy makes the model focus more on common-quality samples, thus improving the localization ability of the model. Secondly, an attention mechanism called BiFormer is introduced to optimize the backbone network, which improves the model’s attention to critical information. Finally, we design a feature processing module named Focal FasterNet block (FFNB) and propose two new detection scales based on this module, which makes the shallow features and deep features fully integrated. The proposed multiscale feature fusion network substantially increased the detection performance of the model and reduces the missed detection rate of small objects. The experimental results show that our model has fewer parameters compared to the baseline model and has a mean detection accuracy higher than the baseline model by 7.7%. Compared with other mainstream models, the overall performance of our model is much better. The proposed method effectively improves the ability to detect small objects. There is room to optimize the detection effectiveness of our model for small and feature-less objects (such as bicycle-type vehicles), as we will address in subsequent research.

760 sitasi en Computer Science, Medicine
S2 Open Access 2021
The Civil Contract of Photography

A. Azoulay, R. Mazali, Ruvik Danieli

In this compelling work, Ariella Azoulay reconsiders the political and ethical status of photography. Describing the power relations that sustain and make possible photographic meanings, Azoulay argues that anyone -- even a stateless person -- who addresses others through photographs or is addressed by photographs can become a member of the citizenry of photography. The civil contract of photography enables anyone to pursue political agency and resistance through photography. Photography, Azoulay insists, cannot be understood separately from the many catastrophes of recent history. The crucial arguments of her book concern two groups with flawed or nonexistent citizenship: the Palestinian noncitizens of Israel and women in Western societies. Azoulay analyzes Israeli press photographs of violent episodes in the Occupied Territories, and interprets various photographs of women -- from famous images by stop-motion photographer Eadweard Muybridge to photographs from Abu Ghraib prison. Azoulay asks this question: under what legal, political, or cultural conditions does it become possible to see and to show disaster that befalls those who can claim only incomplete or nonexistent citizenship? Drawing on such key texts in the history of modern citizenship as the Declaration of the Rights of Man together with relevant work by Giorgio Agamben, Jean-Francois Lyotard, Susan Sontag, and Roland Barthes, Azoulay explores the visual field of catastrophe, injustice, and suffering in our time. Her book is essential reading for anyone seeking to understand the disasters of recent history -- and the consequences of how these events and their victims have been represented.

575 sitasi en Sociology
S2 Open Access 2020
Perceptual Quality Assessment of Smartphone Photography

Yuming Fang, Hanwei Zhu, Yan Zeng et al.

As smartphones become people’s primary cameras to take photos, the quality of their cameras and the associated computational photography modules has become a de facto standard in evaluating and ranking smartphones in the consumer market. We conduct so far the most comprehensive study of perceptual quality assessment of smartphone photography. We introduce the Smartphone Photography Attribute and Quality (SPAQ) database, consisting of 11,125 pictures taken by 66 smartphones, where each image is attached with so far the richest annotations. Specifically, we collect a series of human opinions for each image, including image quality, image attributes (brightness, colorfulness, contrast, noisiness, and sharpness), and scene category labels (animal, cityscape, human, indoor scene, landscape, night scene, plant, still life, and others) in a well-controlled laboratory environment. The exchangeable image file format (EXIF) data for all images are also recorded to aid deeper analysis. We also make the first attempts using the database to train blind image quality assessment (BIQA) models constructed by baseline and multi-task deep neural networks. The results provide useful insights on how EXIF data, image attributes and high-level semantics interact with image quality, how next-generation BIQA models can be designed, and how better computational photography systems can be optimized on mobile devices. The database along with the proposed BIQA models are available at https://github.com/h4nwei/SPAQ.

441 sitasi en Computer Science
S2 Open Access 2018
Automated diabetic retinopathy detection in smartphone-based fundus photography using artificial intelligence

R. Rajalakshmi, R. Subashini, R. Anjana et al.

ObjectivesTo assess the role of artificial intelligence (AI)-based automated software for detection of diabetic retinopathy (DR) and sight-threatening DR (STDR) by fundus photography taken using a smartphone-based device and validate it against ophthalmologist’s grading.MethodsThree hundred and one patients with type 2 diabetes underwent retinal photography with Remidio ‘Fundus on phone’ (FOP), a smartphone-based device, at a tertiary care diabetes centre in India. Grading of DR was performed by the ophthalmologists using International Clinical DR (ICDR) classification scale. STDR was defined by the presence of severe non-proliferative DR, proliferative DR or diabetic macular oedema (DME). The retinal photographs were graded using a validated AI DR screening software (EyeArtTM) designed to identify DR, referable DR (moderate non-proliferative DR or worse and/or DME) or STDR. The sensitivity and specificity of automated grading were assessed and validated against the ophthalmologists’ grading.ResultsRetinal images of 296 patients were graded. DR was detected by the ophthalmologists in 191 (64.5%) and by the AI software in 203 (68.6%) patients while STDR was detected in 112 (37.8%) and 146 (49.3%) patients, respectively. The AI software showed 95.8% (95% CI 92.9–98.7) sensitivity and 80.2% (95% CI 72.6–87.8) specificity for detecting any DR and 99.1% (95% CI 95.1–99.9) sensitivity and 80.4% (95% CI 73.9–85.9) specificity in detecting STDR with a kappa agreement of k = 0.78 (p < 0.001) and k = 0.75 (p < 0.001), respectively.ConclusionsAutomated AI analysis of FOP smartphone retinal imaging has very high sensitivity for detecting DR and STDR and thus can be an initial tool for mass retinal screening in people with diabetes.

412 sitasi en Medicine
S2 Open Access 2020
3D Photography Using Context-Aware Layered Depth Inpainting

Meng-Li Shih, Shih-Yang Su, J. Kopf et al.

We propose a method for converting a single RGB-D input image into a 3D photo, i.e., a multi-layer representation for novel view synthesis that contains hallucinated color and depth structures in regions occluded in the original view. We use a Layered Depth Image with explicit pixel connectivity as underlying representation, and present a learning-based inpainting model that iteratively synthesizes new local color-and-depth content into the occluded region in a spatial context-aware manner. The resulting 3D photos can be efficiently rendered with motion parallax using standard graphics engines. We validate the effectiveness of our method on a wide range of challenging everyday scenes and show less artifacts when compared with the state-of-the-arts.

329 sitasi en Computer Science, Engineering
S2 Open Access 2022
An integrated imaging sensor for aberration-corrected 3D photography

Jiamin Wu, Yuduo Guo, Chao Deng et al.

Planar digital image sensors facilitate broad applications in a wide range of areas1–5, and the number of pixels has scaled up rapidly in recent years2,6. However, the practical performance of imaging systems is fundamentally limited by spatially nonuniform optical aberrations originating from imperfect lenses or environmental disturbances7,8. Here we propose an integrated scanning light-field imaging sensor, termed a meta-imaging sensor, to achieve high-speed aberration-corrected three-dimensional photography for universal applications without additional hardware modifications. Instead of directly detecting a two-dimensional intensity projection, the meta-imaging sensor captures extra-fine four-dimensional light-field distributions through a vibrating coded microlens array, enabling flexible and precise synthesis of complex-field-modulated images in post-processing. Using the sensor, we achieve high-performance photography up to a gigapixel with a single spherical lens without a data prior, leading to orders-of-magnitude reductions in system capacity and costs for optical imaging. Even in the presence of dynamic atmosphere turbulence, the meta-imaging sensor enables multisite aberration correction across 1,000 arcseconds on an 80-centimetre ground-based telescope without reducing the acquisition speed, paving the way for high-resolution synoptic sky surveys. Moreover, high-density accurate depth maps can be retrieved simultaneously, facilitating diverse applications from autonomous driving to industrial inspections. A meta-imaging sensor detects an extra-fine 4D light field distribution using a vibrating microlens array, enabling high-resolution 3D photography up to a gigapixel with fast aberration correction, demonstrated on a telescope aimed at the Moon.

131 sitasi en Medicine
S2 Open Access 2024
Photorealism versus photography. AI-generated depiction in the age of visual disinformation

Liv Hausken

ABSTRACT In the spring of 2023, we witnessed a breakthrough in the development of AI-generated images accessible to the general public. Pictures of Pope Francis wearing a stylishly long, white puffer jacket or driving a motorcycle down a busy street went viral. The same thing happened with AI-generated pictures fabulating over what the press coverage of an imminent arrest of former US President Donald Trump would look like. Amnesty International used AI to generate images to mark the second anniversary of police violence against protesters in Colombia. Boris Eldagsen turned down an award for Best Creative Photograph from the Sony World Photography Awards, announcing that it had been generated with AI. Critical reactions in the public were not long in coming. The use of AI to generate images was discussed with sometimes shrill words and phrases such as fake, fraud, a threat to photography’s credibility, and fake news. This article seeks to intervene in this crisis-oriented debate by proposing three analytical moves: First, we need a concept of photorealism that is kept separate from the idea of photography. Secondly, we need a conceptual distinction between two basic functions of photography: depiction and detection. In addition to this primary distinction between image functions, the article proposes a third move to introduce a function-oriented genre concept. Through an interdisciplinary approach to photorealism, photography, and genre, these three analytical measures are presented and examined step by step and discussed through analyzes of concrete and recent examples of AI-generated and AI-enhanced publicly available images in today’s society. Today’s crisis-oriented public debate about AI images serves neither democracy, art, journalism, nor photography. The purpose of the article is to contribute a simple and, at the same time, useful analytical tool in these discussions about the relationship between photography and current and future image technologies.

35 sitasi en
arXiv Open Access 2025
Understanding colors of Dufaycolor: Can we recover them using historical colorimetric and spectral data?

Jan Hubička, Linda Kimrová, Melichar Konečný

Dufaycolor, an additive color photography process produced from 1935 to the late 1950s, represents one of the most advanced iterations of this technique. This paper presents ongoing research and development of an open-source Color-Screen tool designed to reconstruct the original colors of additive color photographs. We discuss the incorporation of historical measurements of dyes used in the production of the color-screen filter (réseau) to achieve accurate color recovery.

en cs.CV, cs.GR
arXiv Open Access 2025
The photography transforms and their analytic inversion formulas

Duo Liu, Gangrong Qu, Shan Gao

The light field reconstruction from the focal stack can be mathematically formulated as an ill-posed integral equation inversion problem. Although the previous research about this problem has made progress both in practice and theory, its forward problem and inversion in a general form still need to be studied. In this paper, to model the forward problem rigorously, we propose three types of photography transforms with different integral geometry characteristics that extend the forward operator to the arbitrary $n$-dimensional case. We prove that these photography transforms are equivalent to the Radon transform with the coupling relation between variables. We also obtain some properties of the photography transforms, including the Fourier slice theorem, the convolution theorem, and the convolution property of the dual operator, which are very similar to those of the classic Radon transform. Furthermore, the representation of the normal operator and the analytic inversion formula for the photography transforms are derived and they are quite different from those of the classic Radon transform.

en math.FA, math-ph
arXiv Open Access 2025
CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography

I-Sheng Fang, Jun-Cheng Chen

Large language models (LLMs) and multimodal large language models (MLLMs) have significantly advanced artificial intelligence. However, visual reasoning, reasoning involving both visual and textual inputs, remains underexplored. Recent advancements, including the reasoning models like OpenAI o1 and Gemini 2.0 Flash Thinking, which incorporate image inputs, have opened this capability. In this ongoing work, we focus specifically on photography-related tasks because a photo is a visual snapshot of the physical world where the underlying physics (i.e., illumination, blur extent, etc.) interplay with the camera parameters. Successfully reasoning from the visual information of a photo to identify these numerical camera settings requires the MLLMs to have a deeper understanding of the underlying physics for precise visual comprehension, representing a challenging and intelligent capability essential for practical applications like photography assistant agents. We aim to evaluate MLLMs on their ability to distinguish visual differences related to numerical camera settings, extending a methodology previously proposed for vision-language models (VLMs). Our preliminary results demonstrate the importance of visual reasoning in photography-related tasks. Moreover, these results show that no single MLLM consistently dominates across all evaluation tasks, demonstrating ongoing challenges and opportunities in developing MLLMs with better visual reasoning.

en cs.CV, cs.CL
arXiv Open Access 2025
MMP-2K: A Benchmark Multi-Labeled Macro Photography Image Quality Assessment Database

Jiashuo Chang, Zhengyi Li, Jianxun Lou et al.

Macro photography (MP) is a specialized field of photography that captures objects at an extremely close range, revealing tiny details. Although an accurate macro photography image quality assessment (MPIQA) metric can benefit macro photograph capturing, which is vital in some domains such as scientific research and medical applications, the lack of MPIQA data limits the development of MPIQA metrics. To address this limitation, we conducted a large-scale MPIQA study. Specifically, to ensure diversity both in content and quality, we sampled 2,000 MP images from 15,700 MP images, collected from three public image websites. For each MP image, 17 (out of 21 after outlier removal) quality ratings and a detailed quality report of distortion magnitudes, types, and positions are gathered by a lab study. The images, quality ratings, and quality reports form our novel multi-labeled MPIQA database, MMP-2k. Experimental results showed that the state-of-the-art generic IQA metrics underperform on MP images. The database and supplementary materials are available at https://github.com/Future-IQA/MMP-2k.

arXiv Open Access 2025
Discovering an Image-Adaptive Coordinate System for Photography Processing

Ziteng Cui, Lin Gu, Tatsuya Harada

Curve & Lookup Table (LUT) based methods directly map a pixel to the target output, making them highly efficient tools for real-time photography processing. However, due to extreme memory complexity to learn full RGB space mapping, existing methods either sample a discretized 3D lattice to build a 3D LUT or decompose into three separate curves (1D LUTs) on the RGB channels. Here, we propose a novel algorithm, IAC, to learn an image-adaptive Cartesian coordinate system in the RGB color space before performing curve operations. This end-to-end trainable approach enables us to efficiently adjust images with a jointly learned image-adaptive coordinate system and curves. Experimental results demonstrate that this simple strategy achieves state-of-the-art (SOTA) performance in various photography processing tasks, including photo retouching, exposure correction, and white-balance editing, while also maintaining a lightweight design and fast inference speed.

en cs.CV
DOAJ Open Access 2025
Experimental and numerical study on cavitation flow characteristics of refrigerants with different thermophysical properties in confined micro-clearance

Shaohang Yan, Tianwei Lai, Zhen Wang et al.

In high-speed hydraulic machinery, its efficiency and reliability are affected by the cavitation in the bearing. Due to the confined effect of the bearing clearance, cavitation bubbles grow in a two-dimensional way. To uncover the cavitation process with confined and high speed shearing effect, the high-speed cavitation flowing of different refrigerants is researched experimentally based on the high-speed shearing test rig with micro-clearance. The influence of thermophysical properties on growth of cavitation bubble is evaluated and analyzed. The confined effect of micro-clearance and high-speed shearing effect has a significant influence on the cavitation bubbles evolution. The high-speed camera is used to record the morphology of cavitation bubbles for various refrigerants with different thermalphysical properties. Furthermore, the thermal-sensitive cavitation model is used to analyze the bubble-foam alternation from cavitation flow inside micro-clearance. For different refrigerants, the growth process of cavitation bubble area is exponential. Inside the micro-clearance, the cavitation inducing pressure drops of different refrigerants are analogous due to the similar thermodynamic properties. According to pressure drop during cavitation, different refrigerants are classified by introducing dimensionless numbers, σ·Re (Jie et al., 2009) [2] and σ·We. The pressure and temperature drop increase with the dimensionless numbers. The refrigerants with similar thermodynamic properties have a similar relationship between dimensionless number and supercooling degree.

Engineering (General). Civil engineering (General)
DOAJ Open Access 2025
Dysonans ludonarracyjny w światocentrycznych grach wideo

Michał Mróz

The author explores the issue of “ludonarrative dissonance”, a term developed by the game designer C. Hocking in his critique of the game BioShock. The author explains Hocking’s arguments and then expands on the term, disagreeing with Hocking. In the case of BioShock, the author interprets the dissonance not as a design flaw but as a deliberate narrative strategy that momentarily distances the player from the game’s fiction to emphasize its metanarrative dimension. The author argues that ludonarrative dissonance is itself part of videogame poetics, thus echoing the works of F. Seraphine and P. Grabarczyk & B.W. Kampmann. The author then examines how ludonarrative dissonance may appear in vast, nonlinear open-world cRPGs. An analysis of examples from The Elder Scrolls: Skyrim, Fallout 3, and Fallout 4 reveals various instances of unintended dissonance. Finally, the author compares these games to Fallout: New Vegas, presenting it as an example of harmonizing the narrative – the main motifs and story – with the narrativity of gameplay, including rules, mechanics, and vast player agency.

Photography, Dramatic representation. The theater

Halaman 1 dari 8536