S. Sontag
Hasil untuk "Photography"
Menampilkan 20 dari ~223230 hasil · dari CrossRef, arXiv, DOAJ, Semantic Scholar
Mary J. Benner, M. Tushman
M. Hirsch
G. Petschnigg, R. Szeliski, Maneesh Agrawala et al.
A. Lucieer, S. M. Jong, D. Turner
Ilya Chugunov
Over the past two decades, mobile imaging has experienced a profound transformation, with cell phones rapidly eclipsing all other forms of digital photography in popularity. Today's cell phones are equipped with a diverse range of imaging technologies - laser depth ranging, multi-focal camera arrays, and split-pixel sensors - alongside non-visual sensors such as gyroscopes, accelerometers, and magnetometers. This, combined with on-board integrated chips for image and signal processing, makes the cell phone a versatile pocket-sized computational imaging platform. Parallel to this, we have seen in recent years how neural fields - small neural networks trained to map continuous spatial input coordinates to output signals - enable the reconstruction of complex scenes without explicit data representations such as pixel arrays or point clouds. In this thesis, I demonstrate how carefully designed neural field models can compactly represent complex geometry and lighting effects. Enabling applications such as depth estimation, layer separation, and image stitching directly from collected in-the-wild mobile photography data. These methods outperform state-of-the-art approaches without relying on complex pre-processing steps, labeled ground truth data, or machine learning priors. Instead, they leverage well-constructed, self-regularized models that tackle challenging inverse problems through stochastic gradient descent, fitting directly to raw measurements from a smartphone.
Lujian Yao, Siming Zheng, Xinbin Yuan et al.
Traditional photography composition approaches are dominated by 2D cropping-based methods. However, these methods fall short when scenes contain poorly arranged subjects. Professional photographers often employ perspective adjustment as a form of 3D recomposition, modifying the projected 2D relationships between subjects while maintaining their actual spatial positions to achieve better compositional balance. Inspired by this artistic practice, we propose photography perspective composition (PPC), extending beyond traditional cropping-based methods. However, implementing the PPC faces significant challenges: the scarcity of perspective transformation datasets and undefined assessment criteria for perspective quality. To address these challenges, we present three key contributions: (1) An automated framework for building PPC datasets through expert photographs. (2) A video generation approach that demonstrates the transformation process from less favorable to aesthetically enhanced perspectives. (3) A perspective quality assessment (PQA) model constructed based on human performance. Our approach is concise and requires no additional prompt instructions or camera trajectories, helping and guiding ordinary users to enhance their composition skills.
Mengchen Zhang, Tong Wu, Jing Tan et al.
Camera trajectory design plays a crucial role in video production, serving as a fundamental tool for conveying directorial intent and enhancing visual storytelling. In cinematography, Directors of Photography meticulously craft camera movements to achieve expressive and intentional framing. However, existing methods for camera trajectory generation remain limited: Traditional approaches rely on geometric optimization or handcrafted procedural systems, while recent learning-based methods often inherit structural biases or lack textual alignment, constraining creative synthesis. In this work, we introduce an auto-regressive model inspired by the expertise of Directors of Photography to generate artistic and expressive camera trajectories. We first introduce DataDoP, a large-scale multi-modal dataset containing 29K real-world shots with free-moving camera trajectories, depth maps, and detailed captions in specific movements, interaction with the scene, and directorial intent. Thanks to the comprehensive and diverse database, we further train an auto-regressive, decoder-only Transformer for high-quality, context-aware camera movement generation based on text guidance and RGBD inputs, named GenDoP. Extensive experiments demonstrate that compared to existing methods, GenDoP offers better controllability, finer-grained trajectory adjustments, and higher motion stability. We believe our approach establishes a new standard for learning-based cinematography, paving the way for future advancements in camera control and filmmaking. Our project website: https://kszpxxzmc.github.io/GenDoP/.
Konstantinos Keremis, Eleni Vrochidou, George A. Papakostas
The ability of deep learning models to maintain consistent performance under image transformations-termed invariances, is critical for reliable deployment across diverse computer vision applications. This study presents a comprehensive empirical evaluation of modern convolutional neural networks (CNNs) and vision transformers (ViTs) concerning four fundamental types of image invariances: blur, noise, rotation, and scale. We analyze a curated selection of thirty models across three common vision tasks, object localization, recognition, and semantic segmentation, using benchmark datasets including COCO, ImageNet, and a custom segmentation dataset. Our experimental protocol introduces controlled perturbations to test model robustness and employs task-specific metrics such as mean Intersection over Union (mIoU), and classification accuracy (Acc) to quantify models’ performance degradation. Results indicate that while ViTs generally outperform CNNs under blur and noise corruption in recognition tasks, both model families exhibit significant vulnerabilities to rotation and extreme scale transformations. Notably, segmentation models demonstrate higher resilience to geometric variations, with SegFormer and Mask2Former emerging as the most robust architectures. These findings challenge prevailing assumptions regarding model robustness and provide actionable insights for designing vision systems capable of withstanding real-world input variability.
Nikoleta V. Nikolaidou, Anastasios Asvestas, Agathi Anthoula Kaminari et al.
Religious panel paintings (icons) play a pivotal role in the rituals of the Eastern Orthodox Christian Church. However, their continuous use often results in physical degradation, prompting remedial interventions. Quite commonly, alterations were treated by simply applying new paint layers directly over the decayed original, while in some cases, old icons were overpainted merely as a means to renovate and modernize them. Therefore, numerous overpainted icons are currently housed in churches, museums, and private collections across Greece. This study focuses on the investigation of a post-Byzantine icon of Christ Pantokrator, which displays extensive overpainting while retaining a few visible fragments of the original composition. The objective was to assess the extent and condition of preservation of the original artwork, to identify materials and techniques used both in the initial painting and in subsequent restoration phases, and to distinguish between those phases. To achieve these aims, a fully non-invasive diagnostic methodology was implemented, including visible light photography, ultraviolet radiation imaging (UVR/UVL), hyperspectral imaging (MuSIS HS), infrared reflectography (IRRef), X-ray radiography, and macroscopic X-ray fluorescence scanning (MA-XRF). The findings confirm that the original painting remains substantially preserved and is of high artistic quality. Moreover, analysis revealed at least two distinct phases of overpainting, likely dating from the 20th century, while the results suggest that the original artwork probably dates to the first half of the 18th century. The study highlights the need to use complementary techniques in order to non-invasively assess complex artifacts like overpainted icons and offers valuable insights into historical restoration practices providing foundation for future conservation planning.
Prashant D. Tailor, MD, Piotr K. Kopinski, MD, PhD, Haley S. D’Souza, MD et al.
Purpose: To develop and validate machine learning (ML) models to predict choroidal nevus transformation to melanoma based on multimodal imaging at initial presentation. Design: Retrospective multicenter study. Participants: Patients diagnosed with choroidal nevus on the Ocular Oncology Service at Wills Eye Hospital (2007–2017) or Mayo Clinic Rochester (2015–2023). Methods: Multimodal imaging was obtained, including fundus photography, fundus autofluorescence, spectral domain OCT, and B-scan ultrasonography. Machine learning models were created (XGBoost, LGBM, Random Forest, Extra Tree) and optimized for area under receiver operating characteristic curve (AUROC). The Wills Eye Hospital cohort was used for training and testing (80% training–20% testing) with fivefold cross validation. The Mayo Clinic cohort provided external validation. Model performance was characterized by AUROC and area under precision–recall curve (AUPRC). Models were interrogated using SHapley Additive exPlanations (SHAP) to identify the features most predictive of conversion from nevus to melanoma. Differences in AUROC and AUPRC between models were tested using 10 000 bootstrap samples with replacement and results. Main Outcome Measures: Area under receiver operating curve and AUPRC for each ML model. Results: There were 2870 nevi included in the study, with conversion to melanoma confirmed in 128 cases. Simple AI Nevus Transformation System (SAINTS; XGBoost) was the top-performing model in the test cohort [pooled AUROC 0.864 (95% confidence interval (CI): 0.864–0.865), pooled AUPRC 0.244 (95% CI: 0.243–0.246)] and in the external validation cohort [pooled AUROC 0.931 (95% CI: 0.930–0.931), pooled AUPRC 0.533 (95% CI: 0.531–0.535)]. Other models also had good discriminative performance: LGBM (test set pooled AUROC 0.831, validation set pooled AUROC 0.815), Random Forest (test set pooled AUROC 0.812, validation set pooled AUROC 0.866), and Extra Tree (test set pooled AUROC 0.826, validation set pooled AUROC 0.915). A model including only nevi with at least 5 years of follow-up demonstrated the best performance in AUPRC (test: pooled 0.592 (95% CI: 0.590–0.594); validation: pooled 0.656 [95% CI: 0.655–0.657]). The top 5 features in SAINTS by SHAP values were: tumor thickness, largest tumor basal diameter, tumor shape, distance to optic nerve, and subretinal fluid extent. Conclusions: We demonstrate accuracy and generalizability of a ML model for predicting choroidal nevus transformation to melanoma based on multimodal imaging. Financial Disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Yankun Wu, Yuta Nakashima, Noa Garcia
Social biases in generative models have gained increasing attention. This paper proposes an automatic evaluation protocol for text-to-image generation, examining how gender bias originates and perpetuates in the generation process of Stable Diffusion. Using triplet prompts that vary by gender indicators, we trace presentations at several stages of the generation process and explore dependencies between prompts and images. Our findings reveal the bias persists throughout all internal stages of the generating process and manifests in the entire images. For instance, differences in object presence, such as different instruments and outfit preferences, are observed across genders and extend to overall image layouts. Moreover, our experiments demonstrate that neutral prompts tend to produce images more closely aligned with those from masculine prompts than with their female counterparts. We also investigate prompt-image dependencies to further understand how bias is embedded in the generated content. Finally, we offer recommendations for developers and users to mitigate this effect in text-to-image generation.
Michele Zappavigna
Gabriele Berton, Alex Stoken, Barbara Caputo et al.
Astronaut photography, spanning six decades of human spaceflight, presents a unique Earth observations dataset with immense value for both scientific research and disaster response. Despite its significance, accurately localizing the geographical extent of these images, crucial for effective utilization, poses substantial challenges. Current manual localization efforts are time-consuming, motivating the need for automated solutions. We propose a novel approach - leveraging image retrieval - to address this challenge efficiently. We introduce innovative training techniques, including Year-Wise Data Augmentation and a Neutral-Aware Multi-Similarity Loss, which contribute to the development of a high-performance model, EarthLoc. We develop six evaluation datasets and perform a comprehensive benchmark comparing EarthLoc to existing methods, showcasing its superior efficiency and accuracy. Our approach marks a significant advancement in automating the localization of astronaut photography, which will help bridge a critical gap in Earth observations data. Code and datasets are available at https://github.com/gmberton/EarthLoc
Wei-Lun Huang, Minghao Xue, Zhiyou Liu et al.
Melanoma is the most deadly form of skin cancer. Tracking the evolution of nevi and detecting new lesions across the body is essential for the early detection of melanoma. Despite prior work on longitudinal tracking of skin lesions in 3D total body photography, there are still several challenges, including 1) low accuracy for finding correct lesion pairs across scans, 2) sensitivity to noisy lesion detection, and 3) lack of large-scale datasets with numerous annotated lesion pairs. We propose a framework that takes in a pair of 3D textured meshes, matches lesions in the context of total body photography, and identifies unmatchable lesions. We start by computing correspondence maps bringing the source and target meshes to a template mesh. Using these maps to define source/target signals over the template domain, we construct a flow field aligning the mapped signals. The initial correspondence maps are then refined by advecting forward/backward along the vector field. Finally, lesion assignment is performed using the refined correspondence maps. We propose the first large-scale dataset for skin lesion tracking with 25K lesion pairs across 198 subjects. The proposed method achieves a success rate of 89.9% (at 10 mm criterion) for all pairs of annotated lesions and a matching accuracy of 98.2% for subjects with more than 200 lesions.
Dennis Scheidt, Pedro A. Quinto-Su
In single pixel photography an image is sampled with a programmable optical element like a digital micromirror array or a spatial light modulator that can project an orthogonal base. The light reflected or diffracted is collected by a lens and measured with a photodiode (bucket detector). In this work we demonstrate that single pixel photography that uses sampling bases with non-zero off-diagonal elements (i.e. Hadamard), can be susceptible to errors that emerge from the relative size of the bucket detector area compared with the spatial spread of the Fourier spectrum of the base element that has the the highest spatial frequency. Experiments with a spatial light modulator and simulations using a Hadamard basis show that if the bucket detector area is smaller than between $50-75\%$ of the maximum area spanned by the projected spectrum of the measurement basis, the reconstructed photograph will exhibit cross-talk with the effective phase of the optical system. The phase can be encoded or errors can be introduced in the optical system to demonstrate this effect.
Yu Yuan, Xijun Wang, Yichen Sheng et al.
Image generation today can produce somewhat realistic images from text prompts. However, if one asks the generator to synthesize a specific camera setting such as creating different fields of view using a 24mm lens versus a 70mm lens, the generator will not be able to interpret and generate scene-consistent images. This limitation not only hinders the adoption of generative tools in professional photography but also highlights the broader challenge of aligning data-driven models with real-world physical settings. In this paper, we introduce Generative Photography, a framework that allows controlling camera intrinsic settings during content generation. The core innovation of this work are the concepts of Dimensionality Lifting and Differential Camera Intrinsics Learning, enabling smooth and consistent transitions across different camera settings. Experimental results show that our method produces significantly more scene-consistent photorealistic images than state-of-the-art models such as Stable Diffusion 3 and FLUX. Our code and additional results are available at https://generative-photography.github.io/project.
Kailai Zhou, Lijing Cai, Yibo Wang et al.
The integration of miniaturized spectrometers into mobile devices offers new avenues for image quality enhancement and facilitates novel downstream tasks. However, the broader application of spectral sensors in mobile photography is hindered by the inherent complexity of spectral images and the constraints of spectral imaging capabilities. To overcome these challenges, we propose a joint RGB-Spectral decomposition model guided enhancement framework, which consists of two steps: joint decomposition and prior-guided enhancement. Firstly, we leverage the complementarity between RGB and Low-resolution Multi-Spectral Images (Lr-MSI) to predict shading, reflectance, and material semantic priors. Subsequently, these priors are seamlessly integrated into the established HDRNet to promote dynamic range enhancement, color mapping, and grid expert learning, respectively. Additionally, we construct a high-quality Mobile-Spec dataset to support our research, and our experiments validate the effectiveness of Lr-MSI in the tone enhancement task. This work aims to establish a solid foundation for advancing spectral vision in mobile photography. The code is available at \url{https://github.com/CalayZhou/JDM-HDRNet}.
Guang Li PhD, Victoria Yu PhD, Kaitlyn Ryan BS et al.
Purpose To improve the setup reproducibility of neck curvature using real-time optical surface imaging (OSI) guidance on 2 regions of interest (ROIs) to infer cervical spine (c-spine) curvature for surface-guided radiotherapy (SGRT) of head-and-neck (HN) and c-spine cancer. Methods A novel SGRT setup approach was designed to reproduce neck curvature with 2 ROIs: upper-chest ROI and open-face ROI. It was hypothesized that the neck curvature could be reproduced if both ROIs were aligned within ±3 mm/2˚ tolerance. This was tested prospectively in 7 volunteers using real-time 3D-OSI guidance and lateral 2D-photography verification after the 3D and 2D references were captured from the initial conventional setup. Real-time SGRT was performed to align chest-ROI and face-ROI, and the longitudinal distance between them was adjustable using a head-support slider. Verification of neck curvature anteriorly and posteriorly was achieved by overlaying edge-extracted lateral pictures. Retrospectively, the relationship between anterior surface and spinal canal alignment was checked in 11 patients using their simulation CT (simCT) and setup cone-beam CT (CBCT). After the anterior surface was rigidly aligned, the spinal canal alignment was checked and quantified using the mean-distance-to-agreement (MDA) and DICE similarity index, and surface-to-spine correlation was calculated. Results The reproducibility of neck curvatures using the 2xROI SGRT setup is verified and the mean neck-outline-matching difference is within ±2 mm in lateral photographic overlays. The chest-ROI alignment takes 110 ± 58 s and the face-ROI takes 60 ± 35 s. When the anterior body surface is aligned (MDA = 1.1 ± 0.6 mm, DICE = 0.96 ± 0.02,) the internal spinal canal is also aligned (MDA = 1.0 ± 0.3 mm, DICE = 0.84 ± 0.04) in 11 patients. The surface-to-spine correlation is c = 0.90 (MDA) and c = 0.85 (DICE). Conclusion This study demonstrates the feasibility of the novel 2-ROI SGRT setup technique to achieve reproducible neck and c-spine curvature regardless of neck visibility and availability as ROI. Staff training is needed to adopt this unconventional SGRT technique to improve patient setup.
Yandong Gao, Maolin Zhou, Weilin Xu et al.
The vibration mode of the radiation surface of transducer (or structure of supersaturated cavitation cloud in thin liquid) is investigated experimentally by high-speed photography. The classification of saturated, supersaturated and undersaturated cavitation clouds was proposed, and a comparison was made between saturated and supersaturated cavitation cloud structures in liquid thin layers. The characteristics and formation mechanism of supersaturated cavitation cloud structure were investigated. Based on the close correspondence and rapid response between the distribution of supersaturated cavitation clouds and vibration modes of radiation surface, a new approach is proposed to measure the vibration mode of transducer operating at high power and large amplitude in real time.
Halaman 2 dari 11162