H. Rheingold
Hasil untuk "Photography"
Menampilkan 20 dari ~170902 hasil · dari arXiv, DOAJ, Semantic Scholar
Gaochang Wu, B. Masiá, A. Jarabo et al.
Light field imaging has emerged as a technology allowing to capture richer visual information from our world. As opposed to traditional photography, which captures a 2D projection of the light in the scene integrating the angular domain, light fields collect radiance from rays in all directions, demultiplexing the angular information lost in conventional photography. On the one hand, this higher dimensional representation of visual data offers powerful capabilities for scene understanding, and substantially improves the performance of traditional computer vision problems such as depth sensing, post-capture refocusing, segmentation, video stabilization, material classification, etc. On the other hand, the high-dimensionality of light fields also brings up new challenges in terms of data capture, data compression, content editing, and display. Taking these two elements together, research in light field image processing has become increasingly popular in the computer vision, computer graphics, and signal processing communities. In this paper, we present a comprehensive overview and discussion of research in this field over the past 20 years. We focus on all aspects of light field image processing, including basic light field representation and theory, acquisition, super-resolution, depth estimation, compression, editing, processing algorithms for light field display, and computer vision applications of light field data.
Sajjad Abdoli, Freeman Lewin, Gediminas Vasiliauskas et al.
The development of modern Artificial Intelligence (AI) models, particularly diffusion-based models employed in computer vision and image generation tasks, is undergoing a paradigmatic shift in development methodologies. Traditionally dominated by a "Model Centric" approach, in which performance gains were primarily pursued through increasingly complex model architectures and hyperparameter optimization, the field is now recognizing a more nuanced "Data-Centric" approach. This emergent framework foregrounds the quality, structure, and relevance of training data as the principal driver of model performance. To operationalize this paradigm shift, we introduce the DataSeeds.AI sample dataset (the "DSD"), initially comprised of approximately 10,610 high-quality human peer-ranked photography images accompanied by extensive multi-tier annotations. The DSD is a foundational computer vision dataset designed to usher in a new standard for commercial image datasets. Representing a small fraction of DataSeeds.AI's 100 million-plus image catalog, the DSD provides a scalable foundation necessary for robust commercial and multimodal AI development. Through this in-depth exploratory analysis, we document the quantitative improvements generated by the DSD on specific models against known benchmarks and make the code and the trained models used in our evaluation publicly available.
Wontae Choi, Jaelin Lee, Hyung Sup Yun et al.
Accurate estimation of motion information is crucial in diverse computational imaging and computer vision applications. Researchers have investigated various methods to extract motion information from a single blurred image, including blur kernels and optical flow. However, existing motion representations are often of low quality, i.e., coarse-grained and inaccurate. In this paper, we propose the first high-resolution (HR) Motion Trajectory estimation framework using Diffusion models (MoTDiff). Different from existing motion representations, we aim to estimate an HR motion trajectory with high-quality from a single motion-blurred image. The proposed MoTDiff consists of two key components: 1) a new conditional diffusion framework that uses multi-scale feature maps extracted from a single blurred image as a condition, and 2) a new training method that can promote precise identification of a fine-grained motion trajectory, consistent estimation of overall shape and position of a motion path, and pixel connectivity along a motion trajectory. Our experiments demonstrate that the proposed MoTDiff can outperform state-of-the-art methods in both blind image deblurring and coded exposure photography applications.
Kang Liao, Size Wu, Zhonghua Wu et al.
Camera-centric understanding and generation are two cornerstones of spatial intelligence, yet they are typically studied in isolation. We present Puffin, a unified camera-centric multimodal model that extends spatial awareness along the camera dimension. Puffin integrates language regression and diffusion-based generation to interpret and create scenes from arbitrary viewpoints. To bridge the modality gap between cameras and vision-language, we introduce a novel paradigm that treats camera as language, enabling thinking with camera. This guides the model to align spatially grounded visual cues with photographic terminology while reasoning across geometric context. Puffin is trained on Puffin-4M, a large-scale dataset of 4 million vision-language-camera triplets. We incorporate both global camera parameters and pixel-wise camera maps, yielding flexible and reliable spatial generation. Experiments demonstrate Puffin superior performance over specialized models for camera-centric generation and understanding. With instruction tuning, Puffin generalizes to diverse cross-view tasks such as spatial imagination, world exploration, and photography guidance. We will release the code, models, dataset pipeline, and benchmark to advance multimodal spatial intelligence research.
Binghong Chen, Tingting Chai, Wei Jiang et al.
Image denoising is essential in low-level vision applications such as photography and automated driving. Existing methods struggle with distinguishing complex noise patterns in real-world scenes and consume significant computational resources due to reliance on Transformer-based models. In this work, the Context-guided Receptance Weighted Key-Value (\M) model is proposed, combining enhanced multi-view feature integration with efficient sequence modeling. Our approach introduces the Context-guided Token Shift (CTS) paradigm, which effectively captures local spatial dependencies and enhance the model's ability to model real-world noise distributions. Additionally, the Frequency Mix (FMix) module extracting frequency-domain features is designed to isolate noise in high-frequency spectra, and is integrated with spatial representations through a multi-view learning process. To improve computational efficiency, the Bidirectional WKV (BiWKV) mechanism is adopted, enabling full pixel-sequence interaction with linear complexity while overcoming the causal selection constraints. The model is validated on multiple real-world image denoising datasets, outperforming the existing state-of-the-art methods quantitatively and reducing inference time up to 40\%. Qualitative results further demonstrate the ability of our model to restore fine details in various scenes.
Wei Jiang, Jiahao Cui, Yizheng Wu et al.
Reconstructing high dynamic range (HDR) images from low dynamic range (LDR) bursts plays an essential role in the computational photography. Impressive progress has been achieved by learning-based algorithms which require LDR-HDR image pairs. However, these pairs are hard to obtain, which motivates researchers to delve into the problem of annotation-efficient HDR image reconstructing: how to achieve comparable performance with limited HDR ground truths (GTs). This work attempts to address this problem from the view of semi-supervised learning where a teacher model generates pseudo HDR GTs for the LDR samples without GTs and a student model learns from pseudo GTs. Nevertheless, the confirmation bias, i.e., the student may learn from the artifacts in pseudo HDR GTs, presents an impediment. To remove this impediment, an uncertainty-based masking process is proposed to discard unreliable parts of pseudo GTs at both pixel and patch levels, then the trusted areas can be learned from by the student. With this novel masking process, our semi-supervised HDR reconstructing method not only outperforms previous annotation-efficient algorithms, but also achieves comparable performance with up-to-date fully-supervised methods by using only 6.7% HDR GTs.
Anqi Yang, Eunhee Kang, Wei Chen et al.
Pixels in image sensors have progressively become smaller, driven by the goal of producing higher-resolution imagery. However, ceteris paribus, a smaller pixel accumulates less light, making image quality worse. This interplay of resolution, noise, and the dynamic range of the sensor and their impact on the eventual quality of acquired imagery is a fundamental concept in photography. In this paper, we propose spatially-varying gain and binning to enhance the noise performance and dynamic range of image sensors. First, we show that by varying gain spatially to local scene brightness, the read noise can be made negligible, and the dynamic range of a sensor is expanded by an order of magnitude. Second, we propose a simple analysis to find a binning size that best balances resolution and noise for a given light level; this analysis predicts a spatially-varying binning strategy, again based on local scene brightness, to effectively increase the overall signal-to-noise ratio. % without sacrificing resolution. We discuss analog and digital binning modes and, perhaps surprisingly, show that digital binning outperforms its analog counterparts when a larger gain is allowed. Finally, we demonstrate that combining spatially-varying gain and binning in various applications, including high dynamic range imaging, vignetting, and lens distortion.
Zhiqiang Shen, Peng Cao, Jinzhu Yang et al.
Due to domain shifts across diverse medical imaging modalities, learned segmentation models often suffer significant performance degradation during deployment. We posit that these domain shifts can generally be categorized into two main components: 1) "style" shifts, referring to global disparities in image properties such as illumination, contrast, and color; and 2) "content" shifts, which involve local discrepancies in anatomical structures. To address the domain shifts in medical image segmentation, we first factorize an image into style codes and content maps, explicitly modeling the "style" and "content" components. Building on this, we introduce a Style-Content decomposition-based data augmentation algorithm (StyCona), which performs augmentation on both the global style and local content of source-domain images, enabling the training of a well-generalized model for domain generalizable medical image segmentation. StyCona is a simple yet effective plug-and-play module that substantially improves model generalization without requiring additional training parameters or modifications to segmentation model architectures. Experiments on cardiac magnetic resonance imaging and fundus photography segmentation tasks, with single and multiple target domains respectively, demonstrate the effectiveness of StyCona and its superiority over state-of-the-art domain generalization methods. The code is available at https://github.com/Senyh/StyCona.
Barty Wardell, Benjamin Russell, Shuvrangsu Das et al.
Abstract Entanglement in fibrous materials strongly affects their mechanical performance, yet quantitative experimental characterisation of entanglement has proved elusive. The widely used hook-drop test is known to be inadequate to evaluate entanglement. Here, we directly demonstrate its shortcomings using a combination of micro-tomography and high-speed photography. These observations motivated us to propose a simpler and more reliable method—a pin insertion test, which directly relates entanglement to its mechanical effects. Application of the basic principles of fracture mechanics to this test allows a direct quantification of the length scale of entanglement. Our ideas not only improve the assessment of entanglements in fibre composites, but also open pathways for investigating and quantifying entanglement in a large class of fibrous materials.
Cristian George Fieraru, Maria Biserica, Ioana Cristina Plajer et al.
Reliable image quality assessment is essential not only in digital photography but also as a key metric for evaluating the performance of algorithms and models designed for image quality enhancement or generation. In recent years, a wide range of image quality assessment metrics, both traditional and learning-based, have been proposed, making it a challenge to select the appropriate method for a given task. This study presents a comparative analysis between five widely used traditional no-reference image quality assessment techniques and five machine learning-based approaches, evaluating their effectiveness in computing image quality scores. The evaluation is carried out comprehensively using a set of standard and advanced performance metrics. Furthermore, we analyze how characteristics of the training datasets, such as score distribution, influence model performance. The machine learning models considered vary significantly in architectural complexity, in terms of both the number of layers and parameters, and we investigate whether this variability has a considerable impact on prediction accuracy. The analysis also extends to non-photographic imagery, with a comparative evaluation of the methods on hyperspectral satellite image visualizations. For full transparency and reproducibility of the current study, all training parameters and hardware specifications are reported.
Tomohiro Hayase, Sacha Braun, Hikari Yanagawa et al.
Social VR platforms enable social, economic, and creative activities by allowing users to create and share their own virtual spaces. In social VR, photography within a VR scene is an important indicator of visitors' activities. Although automatic identification of photo spots within a VR scene can facilitate the process of creating a VR scene and enhance the visitor experience, there are challenges in quantitatively evaluating photos taken in the VR scene and efficiently exploring the large VR scene. We propose PanoTree, an automated photo-spot explorer in VR scenes. To assess the aesthetics of images captured in VR scenes, a deep scoring network is trained on a large dataset of photos collected by a social VR platform to determine whether humans are likely to take similar photos. Furthermore, we propose a Hierarchical Optimistic Optimization (HOO)-based search algorithm to efficiently explore 3D VR spaces with the reward from the scoring network. Our user study shows that the scoring network achieves human-level performance in distinguishing randomly taken images from those taken by humans. In addition, we show applications using the explored photo spots, such as automatic thumbnail generation, support for VR world creation, and visitor flow planning within a VR scene.
Marcos V. Conde, Zhijun Lei, Wen Li et al.
This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF codec, instead of JPEG. All the proposed methods improve PSNR fidelity over Lanczos interpolation, and process images under 10ms. Out of the 160 participants, 25 teams submitted their code and models. The solutions present novel designs tailored for memory-efficiency and runtime on edge devices. This survey describes the best solutions for real-time SR of compressed high-resolution images.
Maibritt Meldgaard Arildsen, Christian Østergaard Mariager, Christoffer Vase Overgaard et al.
The aim was to establish combined H<sub>2</sub><sup>15</sup>O PET/MRI during ex vivo normothermic machine perfusion (NMP) of isolated porcine kidneys. We examined whether changes in renal arterial blood flow (RABF) are accompanied by changes of a similar magnitude in renal blood perfusion (RBP) as well as the relation between RBP and renal parenchymal oxygenation (RPO). Methods: Pig kidneys (n = 7) were connected to a NMP circuit. PET/MRI was performed at two different pump flow levels: a blood-oxygenation-level-dependent (BOLD) MRI sequence performed simultaneously with a H<sub>2</sub><sup>15</sup>O PET sequence for determination of RBP. Results: RBP was measured using H<sub>2</sub><sup>15</sup>O PET in all kidneys (flow 1: 0.42–0.76 mL/min/g, flow 2: 0.7–1.6 mL/min/g). We found a linear correlation between changes in delivered blood flow from the perfusion pump and changes in the measured RBP using PET imaging (r<sup>2</sup> = 0.87). Conclusion: Our study demonstrated the feasibility of combined H<sub>2</sub><sup>15</sup>O PET/MRI during NMP of isolated porcine kidneys with tissue oxygenation being stable over time. The introduction of H215O PET/MRI in nephrological research could be highly relevant for future pre-transplant kidney evaluation and as a tool for studying renal physiology in healthy and diseased kidneys.
Nida Mustafa, Shreeyaa Ramana, Margaret MacNeill et al.
Background Over the past two decades, the prevalence of chronic pain has significantly increased globally, with approximately 20% of the world’s population living with pain. Although quantitative measures are useful in identifying pain prevalence and severity, qualitative methods, and especially arts-based ones, are now receiving attention as a valuable means to understand lived experiences of pain. Photovoice is one such method that utilizes individuals’ own photography to document their lived experiences.Aims The current study utilized an arts-based method to explore immigrant Indian women’s chronic pain experiences in Canada and aimed to enhance the understanding of those experiences by creating a visual opportunity for them to share their stories.Methods Twelve immigrant Indian women captured photographs and participated in one-on-one interviews exploring daily experiences of chronic pain.Results Women’s photographs, and description of these photographs, provided a visual entry into their lives and pain experiences. Three themes emerged from our analysis: (1) bodies in pain, (2) traversing spaces including immigration, and (3) pain management methods. Findings revealed that women’s representations of pain were shaped by a clash between culturally shaped gender role expectations and changing gender norms due to immigration processes. The use of photovoice visually contextualized and represented pain experiences, proving to be a valuable tool for self-reflection.Conclusions This research uncovers the multifaceted nature of chronic pain and identifies the influence of immigration, gender, and social relations on the exacerbation of pain in immigrant Indian women.
Yixiao Jing, Tao Huang, Linfeng Gao et al.
Abstract Unmanned aerial vehicle insulator detection that aims to recognize defective insulators from transmission lines has made significant progress in recent years. However, it still faces challenges, such as the complex background of aerial images and the small memory of unmanned aerial vehicles. This paper proposes a refined insulator detection algorithm that integrates the attention mechanism in YOLOv8 to improve the feature extraction ability. Specifically, this paper introduces a fast vision transformers structure in the you only look once (YOLO) v8 backbone section to enhance feature extraction by capturing local and global features. Additionally, the global attention mechanism is incorporated in the neck for additional feature extraction by merging comprehensive spatial and channel information into the output. Furthermore, we amalgamate depth‐wise convolution, graph convolution, and residual operation in the global attention mechanism module. This design can mitigate the issues of gradient vanishing or exploding and meanwhile enhance the distinction between spatial attention and channel attention. The proposed model is then applied to a public dataset and a set of real images from a specific power station, and the detection results show that it outperforms many competitors in terms of accuracy, efficiency, and memory size.
Amandine Turri Hoelken
“Dialogy” is based on the independence of characters and readers, the incompleteness of dialogue and a polyphonic approach that reflects different points of view. Applied to documentary photography, this approach implies new relationships between photographer, subjects and viewers. It involves the collaboration and participation of these different players. A study of the work of four photographers will shed light on the links between dialogic documentary photography and ethnology.
Wenhui Zhu, Peijie Qiu, Oana M. Dumitrascu et al.
Non-mydriatic retinal color fundus photography (CFP) is widely available due to the advantage of not requiring pupillary dilation, however, is prone to poor quality due to operators, systemic imperfections, or patient-related causes. Optimal retinal image quality is mandated for accurate medical diagnoses and automated analyses. Herein, we leveraged the Optimal Transport (OT) theory to propose an unpaired image-to-image translation scheme for mapping low-quality retinal CFPs to high-quality counterparts. Furthermore, to improve the flexibility, robustness, and applicability of our image enhancement pipeline in the clinical practice, we generalized a state-of-the-art model-based image reconstruction method, regularization by denoising, by plugging in priors learned by our OT-guided image-to-image translation network. We named it as regularization by enhancing (RE). We validated the integrated framework, OTRE, on three publicly available retinal image datasets by assessing the quality after enhancement and their performance on various downstream tasks, including diabetic retinopathy grading, vessel segmentation, and diabetic lesion segmentation. The experimental results demonstrated the superiority of our proposed framework over some state-of-the-art unsupervised competitors and a state-of-the-art supervised method.
Yuxi Li, Hongzhi Jiang, Huijie Zhao et al.
We present projective parallel single-pixel imaging (pPSI), a 3D photography method that provides a robust and efficient way to analyze the light transport behavior and enables separation of light effect due to global illumination, thereby achieving 3D structured light scanning under global illumination. The light transport behavior is described by the light transport coefficients (LTC), which contain complete information for a projector camera pair, and is a 4D data set. However, the capture of LTC is generally time consuming. The 4D LTC in pPSI are reduced to projection functions, thereby enabling a highly efficient data capture process. We introduce the local maximum constraint, which provides constraint for the location of candidate correspondence matching points when projections are captured. Local slice extension (LSE) method is introduced to accelerate the capture of projection functions. Optimization is conducted for pPSI under several situations. The number of projection functions required for pPSI is optimized and the influence of capture ratio in LSE on the accuracy of the correspondence matching points is investigated. Discussions and experiments include two typical kinds of global illuminations: inter-reflections and subsurface scattering. The proposed method is validated with several challenging scenarios, and outperforms the state-of-the-art methods.
Chris Careaga, Yağız Aksoy
Intrinsic decomposition is a fundamental mid-level vision problem that plays a crucial role in various inverse rendering and computational photography pipelines. Generating highly accurate intrinsic decompositions is an inherently under-constrained task that requires precisely estimating continuous-valued shading and albedo. In this work, we achieve high-resolution intrinsic decomposition by breaking the problem into two parts. First, we present a dense ordinal shading formulation using a shift- and scale-invariant loss in order to estimate ordinal shading cues without restricting the predictions to obey the intrinsic model. We then combine low- and high-resolution ordinal estimations using a second network to generate a shading estimate with both global coherency and local details. We encourage the model to learn an accurate decomposition by computing losses on the estimated shading as well as the albedo implied by the intrinsic model. We develop a straightforward method for generating dense pseudo ground truth using our model's predictions and multi-illumination data, enabling generalization to in-the-wild imagery. We present an exhaustive qualitative and quantitative analysis of our predicted intrinsic components against state-of-the-art methods. Finally, we demonstrate the real-world applicability of our estimations by performing otherwise difficult editing tasks such as recoloring and relighting.
Halaman 30 dari 8546