W. Lauterborn, H. Bolle
Hasil untuk "Photography"
Menampilkan 20 dari ~223552 hasil · dari arXiv, DOAJ, Semantic Scholar, CrossRef
E. Leith, J. Upatnieks
Hyein Kim, Sung Park
For four decades, AIED research has rested on what we term the Sedentary Assumption: the unexamined design commitment to a stationary learner seated before a screen. Mobile learning and museum guides have moved learners into physical space, and context-aware systems have delivered location-triggered content -- yet these efforts predominantly cast AI in the role of information-de-livery tool rather than epistemic partner. We map this gap through a 2 x 2 matrix (AI Role x Learning Environment) and identify an undertheorized intersection: the configuration in which AI serves as an epistemic teammate during unstruc-tured, place-bound field inquiry and learning is assessed through trajectory rather than product. To fill it, we propose Field Atlas, a framework grounded in embod-ied, embedded, enactive, and extended (4E) cognition, active inference, and dual coding theory that shifts AIED's guiding metaphor from instruction to sensemak-ing. The architecture pairs volitional photography with immediate voice reflec-tion, constrains AI to Socratic provocation rather than answer delivery, and ap-plies Epistemic Trajectory Modeling (ETM) to represent field learning as a con-tinuous trajectory through conjoined physical-epistemic space. We demonstrate the framework through a museum scenario and argue that the resulting trajecto-ries -- bound to a specific body, place, and time -- constitute process-based evi-dence structurally resistant to AI fabrication, offering a new assessment paradigm and reorienting AIED toward embodied, dialogic human-AI sensemaking in the wild.
Xiaolong Qian, Qi Jiang, Yao Gao et al.
Prevalent Computational Aberration Correction (CAC) methods are typically tailored to specific optical systems, leading to poor generalization and labor-intensive re-training for new lenses. Developing CAC paradigms capable of generalizing across diverse photographic lenses offers a promising solution to these challenges. However, efforts to achieve such cross-lens universality within consumer photography are still in their early stages due to the lack of a comprehensive benchmark that encompasses a sufficiently wide range of optical aberrations. Furthermore, it remains unclear which specific factors influence existing CAC methods and how these factors affect their performance. In this paper, we present comprehensive experiments and evaluations involving 24 image restoration and CAC algorithms, utilizing our newly proposed UniCAC, a large-scale benchmark for photographic cameras constructed via automatic optical design. The Optical Degradation Evaluator (ODE) is introduced as a novel framework to objectively assess the difficulty of CAC tasks, offering credible quantification of optical aberrations and enabling reliable evaluation. Drawing on our comparative analysis, we identify three key factors -- prior utilization, network architecture, and training strategy -- that most significantly influence CAC performance, and further investigate their respective effects. We believe that our benchmark, dataset, and observations contribute foundational insights to related areas and lay the groundwork for future investigations. Benchmarks, codes, and Zemax files will be available at https://github.com/XiaolongQian/UniCAC.
Xiaohui Yang, Ping Ping, Feng Xu
The widespread adoption of smartphone photography has led users to increasingly rely on cloud storage for personal photo archiving and sharing, raising critical privacy concerns. Existing deep learning-based image encryption schemes, typically built upon CNNs or GANs, often depend on traditional cryptographic algorithms and lack inherent architectural reversibility, resulting in limited recovery quality and poor robustness. Invertible neural networks (INNs) have emerged to address this issue by enabling reversible transformations, yet the first INN-based encryption scheme still relies on an auxiliary reference image and discards by-product information before decryption, leading to degraded recovery and limited practicality. To address these limitations, this paper proposes FlowCrypt, a novel flow-based image encryption framework that simultaneously achieves near-lossless recovery, high security, and lightweight model design. FlowCrypt begins by applying a key-conditioned random split to the input image, enhancing forward-process randomness and encryption strength. The resulting components are processed through a Flow-based Encryption/Decryption (FED) module composed of invertible blocks, which share parameters across encryption and decryption. Thanks to its reversible architecture and reference-free design, FlowCrypt ensures high-fidelity image recovery. Extensive experiments show that FlowCrypt achieves recovery quality with 100dB on three datasets, produces uniformly distributed cipher images, and maintains a compact architecture with only 1M parameters, making it suitable for mobile and edge-device applications.
Hansen Feng, Lizhi Wang, Yiqi Huang et al.
The rapid advancement of photography has created a growing demand for a practical blind raw image denoising method. Recently, learning-based methods have become mainstream due to their excellent performance. However, most existing learning-based methods suffer from camera-specific data dependency, resulting in performance drops when applied to data from unknown cameras. To address this challenge, we introduce a novel blind raw image denoising method named YOND, which represents You Only Need a Denoiser. Trained solely on synthetic data, YOND can generalize robustly to noisy raw images captured by diverse unknown cameras. Specifically, we propose three key modules to guarantee the practicality of YOND: coarse-to-fine noise estimation (CNE), expectation-matched variance-stabilizing transform (EM-VST), and SNR-guided denoiser (SNR-Net). Firstly, we propose CNE to identify the camera noise characteristic, refining the estimated noise parameters based on the coarse denoised image. Secondly, we propose EM-VST to eliminate camera-specific data dependency, correcting the bias expectation of VST according to the noisy image. Finally, we propose SNR-Net to offer controllable raw image denoising, supporting adaptive adjustments and manual fine-tuning. Extensive experiments on unknown cameras, along with flexible solutions for challenging cases, demonstrate the superior practicality of our method. The source code will be publicly available at the \href{https://fenghansen.github.io/publication/YOND}{project homepage}.
Haoran Wang, Zhuohang Chen, Guang Li et al.
The future of UAV interaction systems is evolving from engineer-driven to user-driven, aiming to replace traditional predefined Human-UAV Interaction designs. This shift focuses on enabling more personalized task planning and design, thereby achieving a higher quality of interaction experience and greater flexibility, which can be used in many fileds, such as agriculture, aerial photography, logistics, and environmental monitoring. However, due to the lack of a common language between users and the UAVs, such interactions are often difficult to be achieved. The developments of Large Language Models possess the ability to understand nature languages and Robots' (UAVs') behaviors, marking the possibility of personalized Human-UAV Interaction. Recently, some HUI frameworks based on LLMs have been proposed, but they commonly suffer from difficulties in mixed task planning and execution, leading to low adaptability in complex scenarios. In this paper, we propose a novel dual-agent HUI framework. This framework constructs two independent LLM agents (a task planning agent, and an execution agent) and applies different Prompt Engineering to separately handle the understanding, planning, and execution of tasks. To verify the effectiveness and performance of the framework, we have built a task database covering four typical application scenarios of UAVs and quantified the performance of the HUI framework using three independent metrics. Meanwhile different LLM models are selected to control the UAVs with compared performance. Our user study experimental results demonstrate that the framework improves the smoothness of HUI and the flexibility of task execution in the tasks scenario we set up, effectively meeting users' personalized needs.
Sovi Guillaume Sodjinou, Amadou Tidjani Sanda Mahama, Pierre Gouton
Semantic segmentation in deep learning is a crucial area of research within computer vision, aimed at assigning specific labels to each pixel in an image. The segmentation of crops, plants, and weeds has significantly advanced the application of deep learning in precision agriculture, leading to the development of sophisticated architectures based on convolutional neural networks (CNNs). This study proposes a segmentation algorithm for identifying plants and weeds using broadband multispectral images. In the first part of this algorithm, we utilize the PIF-Net model for feature extraction and fusion. The resulting feature map is then employed to enhance an optimized U-Net model for semantic segmentation within a broadband system. Our investigation focuses specifically on scenes from the CAVIAR dataset of multispectral images. The proposed algorithm has enabled us to effectively capture complex details while regulating the learning process, achieving an impressive overall accuracy of 98.2%. The results demonstrate that our approach to semantic segmentation and the differentiation between plants and weeds yields accurate and compelling outcomes.
Jiajia Wu, Haorui Zuo, Yuxing Wei et al.
Multi-modal object tracking (MMOT) has received widespread attention for the ability to overcome single-sensor perception limitations. However, existing methods encounter several critical challenges. Representation learning and generalization capabilities of models are constrained by the inherent heterogeneity of cross-task multi-modal data and inter-modal synergy imbalance. Particularly, in dynamically changing complex scenarios, the reliability and stability of data significantly degrade, further exacerbating the difficulty in multi-modal consistent perception and aggregation. To tackle the above issues, we propose SMUTrack, a unified framework with global shared parameters integrating three downstream MMOT tasks. SMUTrack implements a batch merging-and-splitting alternating strategy, coupled with multi-task joint training, to establish latent correlations across inter- and intra-task modalities, effectively avoiding over-reliance on certain modalities. Concurrently, we design a hierarchical modality synergy and reinforcement (HMSR) module, and a gated fusion and context awareness (GFCA) module to enable progressive multi-modal information exchange and integration, yielding the more discriminative and robust multi-modal representation. More importantly, we introduce a spatial–temporal information propagation (SIP) mechanism, which synchronously learns object trajectory cues and appearance variations to effectively build contextual relationships in long-term video tracking. Experimental results definitively validate the outstanding performance of SMUTrack on mainstream MMOT datasets, exhibiting its powerful adaptability to various MMOT tasks.
Christos Ch. Andrianos, Spiros A. Kostopoulos, Ioannis K. Kalatzis et al.
Astrocytoma is the most common type of brain glioma and is classified by the World Health Organization into four grades, providing prognostic insights and guiding treatment decisions. The accurate determination of astrocytoma grade is critical for patient management, especially in high-malignancy-grade cases. This study proposes a fully optimized Convolutional Neural Network (CNN) for the classification of astrocytoma MRI slices across the three malignant grades (G2–4). The training dataset consisted of 1284 pre-operative axial 2D MRI slices from T1-weighted contrast-enhanced and FLAIR sequences derived from 69 patients. To provide the best possible model performance, an extensive hyperparameter tuning was carried out through the Hyperband method, a variant of Successive Halving. Training was conducted using Repeated Hold-Out Validation across four randomized data splits, achieving a mean classification accuracy of 98.05%, low loss values, and an AUC of 0.997. Comparative evaluation against state-of-the-art pre-trained models using transfer learning demonstrated superior performance. For validation purposes, the proposed CNN trained on an altered version of the training set yielded 93.34% accuracy on unmodified slices, which confirms the model’s robustness and potential use for clinical deployment. Model interpretability was ensured through the application of two Explainable AI (XAI) techniques, SHAP and LIME, which highlighted the regions of the slices contributing to the decision-making process.
Haolin Wang, Ming Liu, Zifei Yan et al.
When embedding objects (foreground) into images (background), considering the influence of photography conditions like illumination, it is usually necessary to perform image harmonization to make the foreground object coordinate with the background image in terms of brightness, color, and etc. Although existing image harmonization methods have made continuous efforts toward visually pleasing results, they are still plagued by two main issues. Firstly, the image harmonization becomes highly ill-posed when there are no contents similar to the foreground object in the background, making the harmonization results unreliable. Secondly, even when similar contents are available, the harmonization process is often interfered with by irrelevant areas, mainly attributed to an insufficient understanding of image contents and inaccurate attention. As a remedy, we present a retrieval-augmented image harmonization (Raiha) framework, which seeks proper reference images to reduce the ill-posedness and restricts the attention to better utilize the useful information. Specifically, an efficient retrieval method is designed to find reference images that contain similar objects as the foreground while the illumination is consistent with the background. For training the Raiha framework to effectively utilize the reference information, a data augmentation strategy is delicately designed by leveraging existing non-reference image harmonization datasets. Besides, the image content priors are introduced to ensure reasonable attention. With the presented Raiha framework, the image harmonization performance is greatly boosted under both non-reference and retrieval-augmented settings. The source code and pre-trained models will be publicly available.
Dewei Hu, Hao Li, Han Liu et al.
Deep learning has shown remarkable performance in medical image segmentation. However, despite its promise, deep learning has many challenges in practice due to its inability to effectively transition to unseen domains, caused by the inherent data distribution shift and the lack of manual annotations to guide domain adaptation. To tackle this problem, we present an unsupervised domain adaptation (UDA) method named AdaptDiff that enables a retinal vessel segmentation network trained on fundus photography (FP) to produce satisfactory results on unseen modalities (e.g., OCT-A) without any manual labels. For all our target domains, we first adopt a segmentation model trained on the source domain to create pseudo-labels. With these pseudo-labels, we train a conditional semantic diffusion probabilistic model to represent the target domain distribution. Experimentally, we show that even with low quality pseudo-labels, the diffusion model can still capture the conditional semantic information. Subsequently, we sample on the target domain with binary vessel masks from the source domain to get paired data, i.e., target domain synthetic images conditioned on the binary vessel map. Finally, we fine-tune the pre-trained segmentation network using the synthetic paired data to mitigate the domain gap. We assess the effectiveness of AdaptDiff on seven publicly available datasets across three distinct modalities. Our results demonstrate a significant improvement in segmentation performance across all unseen datasets. Our code is publicly available at https://github.com/DeweiHu/AdaptDiff.
Zihan Nie, Muhao Xu, Zhiyong Wang et al.
Deep learning, particularly convolutional neural networks (CNNs), has revolutionized endoscopic image processing, significantly enhancing the efficiency and accuracy of disease diagnosis through its exceptional ability to extract features and classify complex patterns. This technology automates medical image analysis, alleviating the workload of physicians and enabling a more focused and personalized approach to patient care. However, despite these remarkable achievements, there are still opportunities to further optimize deep learning models for endoscopic image analysis, including addressing limitations such as the requirement for large annotated datasets and the challenge of achieving higher diagnostic precision, particularly for rare or subtle pathologies. This review comprehensively examines the profound impact of deep learning on endoscopic image processing, highlighting its current strengths and limitations. It also explores potential future directions for research and development, outlining strategies to overcome existing challenges and facilitate the integration of deep learning into clinical practice. Ultimately, the goal is to contribute to the ongoing advancement of medical imaging technologies, leading to more accurate, personalized, and optimized medical care for patients.
Jing Xu, Zhenjin Liu, Ming Fang
Abstract In recent years, research on infrared and visible image fusion has mainly focused on deep learning‐based approaches, particularly deep neural networks with auto‐encoder architectures. However, these approaches suffer from problems such as insufficient feature extraction capability and inefficient fusion strategies. Therefore, this paper introduces a novel image fusion network to address the limitations of infrared and visible image fusion networks with auto‐encoder architectures. In the designed network, the encoder employs a multi‐branch cascade structure, and these convolution branches with different kernel sizes provide the encoder with an adaptive receptive field to extract multi‐scale features. In addition, the fusion layer incorporates a non‐local attention module that is inspired by the self‐attention mechanism. With its global receptive field, this module is used to build a non‐local attention fusion network, which works together with the l1‐norm spatial fusion strategy to extract, split, filter, and fuse global and local features. Comparative experiments on the TNO and MSRS datasets demonstrate that the proposed method outperforms other state‐of‐the‐art fusion approaches.
Paul Virilio
Qi Li, Jiaxin Cai, Yuanlong Yu et al.
Amidst the swift advancements in photography and sensor technologies, high-definition cameras have become commonplace in the deployment of Unmanned Aerial Vehicles (UAVs) for diverse operational purposes. Within the domain of UAV imagery analysis, the segmentation of ultra-high resolution images emerges as a substantial and intricate challenge, especially when grappling with the constraints imposed by GPU memory-restricted computational devices. This paper delves into the intricate problem of achieving efficient and effective segmentation of ultra-high resolution UAV imagery, while operating under stringent GPU memory limitation. The strategy of existing approaches is to downscale the images to achieve computationally efficient segmentation. However, this strategy tends to overlook smaller, thinner, and curvilinear regions. To address this problem, we propose a GPU memory-efficient and effective framework for local inference without accessing the context beyond local patches. In particular, we introduce a novel spatial-guided high-resolution query module, which predicts pixel-wise segmentation results with high quality only by querying nearest latent embeddings with the guidance of high-resolution information. Additionally, we present an efficient memory-based interaction scheme to correct potential semantic bias of the underlying high-resolution information by associating cross-image contextual semantics. For evaluation of our approach, we perform comprehensive experiments over public benchmarks and achieve superior performance under both conditions of small and large GPU memory usage limitations. We will release the model and codes in the future.
Wenbin Zou, Hongxia Gao, Tian Ye et al.
Night photography often struggles with challenges like low light and blurring, stemming from dark environments and prolonged exposures. Current methods either disregard priors and directly fitting end-to-end networks, leading to inconsistent illumination, or rely on unreliable handcrafted priors to constrain the network, thereby bringing the greater error to the final result. We believe in the strength of data-driven high-quality priors and strive to offer a reliable and consistent prior, circumventing the restrictions of manual priors. In this paper, we propose Clearer Night Image Restoration with Vector-Quantized Codebook (VQCNIR) to achieve remarkable and consistent restoration outcomes on real-world and synthetic benchmarks. To ensure the faithful restoration of details and illumination, we propose the incorporation of two essential modules: the Adaptive Illumination Enhancement Module (AIEM) and the Deformable Bi-directional Cross-Attention (DBCA) module. The AIEM leverages the inter-channel correlation of features to dynamically maintain illumination consistency between degraded features and high-quality codebook features. Meanwhile, the DBCA module effectively integrates texture and structural information through bi-directional cross-attention and deformable convolution, resulting in enhanced fine-grained detail and structural fidelity across parallel decoders. Extensive experiments validate the remarkable benefits of VQCNIR in enhancing image quality under low-light conditions, showcasing its state-of-the-art performance on both synthetic and real-world datasets. The code is available at https://github.com/AlexZou14/VQCNIR.
Xin Lin, Jingtong Yue, Sixian Ding et al.
Rain in the dark poses a significant challenge to deploying real-world applications such as autonomous driving, surveillance systems, and night photography. Existing low-light enhancement or deraining methods struggle to brighten low-light conditions and remove rain simultaneously. Additionally, cascade approaches like ``deraining followed by low-light enhancement'' or the reverse often result in problematic rain patterns or overly blurred and overexposed images. To address these challenges, we introduce an end-to-end model called L$^{2}$RIRNet, designed to manage both low-light enhancement and deraining in real-world settings. Our model features two main components: a Dual Degradation Representation Network (DDR-Net) and a Restoration Network. The DDR-Net independently learns degradation representations for luminance effects in dark areas and rain patterns in light areas, employing dual degradation loss to guide the training process. The Restoration Network restores the degraded image using a Fourier Detail Guidance (FDG) module, which leverages near-rainless detailed images, focusing on texture details in frequency and spatial domains to inform the restoration process. Furthermore, we contribute a dataset containing both synthetic and real-world low-light-rainy images. Extensive experiments demonstrate that our L$^{2}$RIRNet performs favorably against existing methods in both synthetic and complex real-world scenarios. All the code and dataset can be found in \url{https://github.com/linxin0/Low_light_rainy}.
Nicolas Chahine, Ana-Stefania Calarasanu, Davide Garcia-Civiero et al.
Year after year, the demand for ever-better smartphone photos continues to grow, in particular in the domain of portrait photography. Manufacturers thus use perceptual quality criteria throughout the development of smartphone cameras. This costly procedure can be partially replaced by automated learning-based methods for image quality assessment (IQA). Due to its subjective nature, it is necessary to estimate and guarantee the consistency of the IQA process, a characteristic lacking in the mean opinion scores (MOS) widely used for crowdsourcing IQA. In addition, existing blind IQA (BIQA) datasets pay little attention to the difficulty of cross-content assessment, which may degrade the quality of annotations. This paper introduces PIQ23, a portrait-specific IQA dataset of 5116 images of 50 predefined scenarios acquired by 100 smartphones, covering a high variety of brands, models, and use cases. The dataset includes individuals of various genders and ethnicities who have given explicit and informed consent for their photographs to be used in public research. It is annotated by pairwise comparisons (PWC) collected from over 30 image quality experts for three image attributes: face detail preservation, face target exposure, and overall image quality. An in-depth statistical analysis of these annotations allows us to evaluate their consistency over PIQ23. Finally, we show through an extensive comparison with existing baselines that semantic information (image context) can be used to improve IQA predictions. The dataset along with the proposed statistical analysis and BIQA algorithms are available: https://github.com/DXOMARK-Research/PIQ2023
Qing‐Yan Chen, Da‐Zheng Feng
Abstract The goal of feature matching is to establish accurate correspondences between feature points in different images depicting the same scene. To address the polymorphism of local structures, the authors propose a mismatch removal method using bilateral local–global structural consistency. This method incorporates the problem of mismatch removal into the framework of graph matching, constructs a global affinity matrix using local structural similarity and global affine transformation consistency, and optimizes it using a constrained integer quadratic programming method. To comprehensively describe the local structure, the signature quadratic form distance (SQFD) is used to measure the consistency of the neighbourhood structure. Specifically, the weights of edges are constructed based on the SQFD of the local structure, while the matching correctness of nodes and edges between the two graphs is described using local vector similarity. Furthermore, the consistency of the global affine transformation is evaluated by assessing the consistency of the local neighbourhood affine transformation between different corresponding point pairs. In estimating the local affine transformation, a bilateral correction is performed using a total least‐squares (TLS) algorithm to measure the similarity of nodes between the two different graphs. Experimental results demonstrate that the proposed algorithm outperforms state‐of‐the‐art methods in terms of accuracy and effectiveness.
Halaman 37 dari 11178