Deep learning has emerged as the predominant solution for classifying medical images. We intend to apply these developments to the ultra-widefield (UWF) retinal imaging dataset. Since UWF images can accurately diagnose various retina diseases, it is very important to clas sify them accurately and prevent them with early treatment. However, processing images manually is time-consuming and labor-intensive, and there are two challenges to automating this process. First, high perfor mance usually requires high computational resources. Artificial intelli gence medical technology is better suited for places with limited medical resources, but using high-performance processing units in such environ ments is challenging. Second, the problem of the accuracy of colour fun dus photography (CFP) methods. In general, the UWF method provides more information for retinal diagnosis than the CFP method, but most of the research has been conducted based on the CFP method. Thus, we demonstrate that these problems can be efficiently addressed in low performance units using methods such as strategic data augmentation and model ensembles, which balance performance and computational re sources while utilizing UWF images.
Image cropping is crucial for enhancing the visual appeal and narrative impact of photographs, yet existing rule-based and data-driven approaches often lack diversity or require annotated training data. We introduce ProCrop, a retrieval-based method that leverages professional photography to guide cropping decisions. By fusing features from professional photographs with those of the query image, ProCrop learns from professional compositions, significantly boosting performance. Additionally, we present a large-scale dataset of 242K weakly-annotated images, generated by out-painting professional images and iteratively refining diverse crop proposals. This composition-aware dataset generation offers diverse high-quality crop proposals guided by aesthetic principles and becomes the largest publicly available dataset for image cropping. Extensive experiments show that ProCrop significantly outperforms existing methods in both supervised and weakly-supervised settings. Notably, when trained on the new dataset, our ProCrop surpasses previous weakly-supervised methods and even matches fully supervised approaches. Both the code and dataset will be made publicly available to advance research in image aesthetics and composition analysis.
Low-light photography produces images with low signal-to-noise ratios due to limited photons. In such conditions, common approximations like the Gaussian noise model fall short, and many denoising techniques fail to remove noise effectively. Although deep-learning methods perform well, they require large datasets of paired images that are impractical to acquire. As a remedy, synthesizing realistic low-light noise has gained significant attention. In this paper, we investigate the ability of diffusion models to capture the complex distribution of low-light noise. We show that a naive application of conventional diffusion models is inadequate for this task and propose three key adaptations that enable high-precision noise generation: a two-branch architecture to better model signal-dependent and signal-independent noise, the incorporation of positional information to capture fixed-pattern noise, and a tailored diffusion noise schedule. Consequently, our model enables the generation of large datasets for training low-light denoising networks, leading to state-of-the-art performance. Through comprehensive analysis, including statistical evaluation and noise decomposition, we provide deeper insights into the characteristics of the generated data.
Light Detection and Ranging (LiDAR) technology in consumer-grade mobile devices can be used as a replacement for traditional background removal and compositing techniques. Unlike approaches such as chroma keying and trained AI models, LiDAR's depth information is independent of subject lighting, and performs equally well in low-light and well-lit environments. We integrate the LiDAR and color cameras on the iPhone 15 Pro Max with GPU-based image processing. We use Apple's SwiftUI and Swift frameworks for user interface and backend development, and Metal Shader Language (MSL) for realtime image enhancement at the standard iPhone streaming frame rate of 60 frames per second. The only meaningful limitations of the technology are the streaming bandwidth of the depth data, which currently reduces the depth map resolution to 320x240, and any pre-existing limitations of the LiDAR IR laser to reflect accurate depth from some materials. If the LiDAR resolution on a mobile device like the iPhone can be improved to match the color image resolution, LiDAR could feasibly become the preeminent method of background removal for video applications and photography.
Dongjin Kim, Jaekyun Ko, Muhammad Kashif Ali
et al.
Image denoising is a fundamental challenge in computer vision, with applications in photography and medical imaging. While deep learning-based methods have shown remarkable success, their reliance on specific noise distributions limits generalization to unseen noise types and levels. Existing approaches attempt to address this with extensive training data and high computational resources but they still suffer from overfitting. To address these issues, we conduct image denoising by utilizing dynamically generated kernels via efficient operations. This approach helps prevent overfitting and improves resilience to unseen noise. Specifically, our method leverages a Feature Extraction Module for robust noise-invariant features, Global Statistics and Local Correlation Modules to capture comprehensive noise characteristics and structural correlations. The Kernel Prediction Module then employs these cues to produce pixel-wise varying kernels adapted to local structures, which are then applied iteratively for denoising. This ensures both efficiency and superior restoration quality. Despite being trained on single-level Gaussian noise, our compact model (~ 0.04 M) excels across diverse noise types and levels, demonstrating the promise of iterative dynamic filtering for practical image denoising.
Accurate information on the number of building floors, or above-ground storeys, is essential for household estimation, utility provision, risk assessment, evacuation planning, and energy modeling. Yet large-scale floor-count data are rarely available in cadastral and 3D city databases. This study proposes an end-to-end deep learning framework that infers floor numbers directly from unrestricted, crowdsourced street-level imagery, avoiding hand-crafted features and generalizing across diverse facade styles. To enable benchmarking, we release the Munich Building Floor Dataset, a public set of over 6800 geo-tagged images collected from Mapillary and targeted field photography, each paired with a verified storey label. On this dataset, the proposed classification-regression network attains 81.2% exact accuracy and predicts 97.9% of buildings within +/-1 floor. The method and dataset together offer a scalable route to enrich 3D city models with vertical information and lay a foundation for future work in urban informatics, remote sensing, and geographic information science. Source code and data will be released under an open license at https://github.com/ya0-sun/Munich-SVI-Floor-Benchmark.
The need for robotic agricultural automation has been driven by global population growth and climate change.To efficiently evaluate and develop agricultural robots not limited to the growing season, we developed a dynamics simulator that works fast on 3D point-cloud models of agricultural fields.The point-cloud models have been widely used in recent agricultural research thanks to advances in aerial photography technology. Therefore, the simulator can be easily applied to many agricultural fields.To speed up the dynamics calculation on the dense point-cloud model, we developed a method to quickly detect collision points using a grid table, and a method to calculate collision forces between the points and robot meshes.The performance of the simulator was evaluated on an agri-field model (<inline-formula> <tex-math notation="LaTeX">$31 \times 14$ </tex-math></inline-formula> m2) represented by <inline-formula> <tex-math notation="LaTeX">$1.7 \times 10^{6}$ </tex-math></inline-formula> points. The computation time of the simulation was 8.8 times faster than real time, and the simulation accuracy compared to actual robot movements was ~1 cm in Root Mean Square Error (RMSE). The simulator in this study enables fast computation and accurate prediction of robot movements on centimeter-resolution agri-field point-cloud models, supporting research on agricultural robots not limited to the growing season.
Eric Ivan Petersen, Regine Hock, Michael G. Loso
et al.
ABSTRACT Despite increasing availability of satellite‐derived products, in situ glacier observations are pivotal to accurately monitor glacier change and to calibrate and validate glacier models. However, comprehensive multi‐variable field observations are especially rare on large glaciers and on debris‐covered glaciers. Here we present extensive field observations from Kennicott Glacier, a heavily debris‐covered glacier in central Alaska covering more than 400 km2. The multi‐year data set includes point glacier mass balances, meteorological data from several weather stations on and off the glacier, debris thickness and temperature, ice cliff back wasting derived from time‐lapse photography of horizontal stakes drilled into several cliffs, and bathymetry, water temperature, and water level of proglacial and supraglacial lakes. Cumulated summer melt of more than 8 m was observed at the lowest clean‐ice sites. Melt rates over clean ice correlate well with elevation, while the rates over debris‐covered ice lack any strong elevation dependence. Melt rates drop exponentially with increasing debris thickness and tend to be much lower than for clean ice at similar elevations. Melt rates determined for ice cliffs in areas of otherwise continuous debris cover were up to 10× those for debris‐covered ice, and even exceeded standard clean ice melt rates. Debris‐cover thickness measurements at 150 sites vary from < 1 to 69 cm with an average of 17 ± 11 cm (±standard deviation). Debris thickens down‐glacier, but with high spatial variability–thickness was observed to vary by tens of cm within a ~15 m radius. Depth‐averaged thermal heat conductivity derived from supraglacial debris temperature profiles at 12 sites ranges from 0.53 to 1.86 W m−1 K−1. Interconnected proglacial lakes covered 1.61 km2 in 2018 with observed water depths of more than 60 m in the two largest lakes. The dataset can be downloaded at https://doi.org/10.5281/zenodo.14625691 (Petersen, Hock, Loso, Guo, et al., 2024) and will be useful for glaciological and glacier meteorological studies.
ObjectiveThis study focuses on unit-type nursing homes for older adults as the research subject, investigating and proposing design methodologies for communication activity spaces that enhance social interaction among older adults.BackgroundChina’s aging population issue is intensifying, with the number of older adult on the rise. The total count of older adults with disabilities has surpassed 420,000, comprising 16.6% of the entire aging population. The absence of social interaction in unit-type nursing homes can adversely affect the physical health of older adults.MethodsOn-site surveying and mapping of building plans for unit-type nursing homes; Real-time photography of architectural spaces and analysis of spatial design characteristics; Interviews and questionnaire surveys regarding the social needs of older adults; Environmental behavior observation: fixed-point observation of social behaviors and systematic recording of behavioral dynamics among older adults.ResultsA comparative analysis was conducted on the architectural spatial design characteristics of nursing homes with clustered and non clustered architectural spatial layouts. Meanwhile, a comparative analysis was conducted on the social behavior characteristics of older adults living in nursing homes with these two types of architectural spatial composition. Summarize the advantages and disadvantages of architectural design for these two types of space composition in nursing homes. From the perspective of promoting public communication among older adults, propose suitable architectural spatial layout types and communication space optimization design strategies for nursing homes.ConclusionThe design of living units and small-scale clusters in specialized nursing home should be tailored to the physical health needs of older adults. The architectural space of the nursing home adopts a unit clustered architectural spatial layout type, while the interaction spaces are distributed. This type of building is more conducive to older adults carrying out social activities. It is essential to create a communal communication space within the living unit that embodies an open, family-like atmosphere, serving as the primary venue for both residence and interaction among older adults. Furthermore, urban life functions should be integrated between living units to enhance social communication opportunities for this demographic.
The dynamic behavior of a cavitation bubble near an asymmetric hydrofoil is investigated experimentally through high-speed photography and theoretically using a Kelvin impulse model. The typical deformations arising during bubble collapse near the hydrofoil are analyzed qualitatively. The effects of the bubble position and the hydrofoil’s eccentricity angle are analyzed quantitatively. The spatial distribution of the Kelvin impulse near the hydrofoil are explored. Different morphological evolutions of a bubble near an asymmetric hydrofoil are revealed, with B-shaped, heart-shaped, and arc-shaped collapses. The velocity of the bubble interface close to the hydrofoil is significantly affected by the bubble–hydrofoil distance and the hydrofoil’s eccentricity angle, increasing as both the distance and the eccentricity angle grow. It is found that the Kelvin impulse sensitivity varies at different positions with respect to the asymmetric hydrofoil, being higher at the head and tail, and lower in the middle.
Contemporary color difference (CD) measures for photographic images typically operate by comparing co-located pixels, patches in a ``perceptually uniform'' color space, or features in a learned latent space. Consequently, these measures inadequately capture the human color perception of misaligned image pairs, which are prevalent in digital photography (e.g., the same scene captured by different smartphones). In this paper, we describe a perceptual CD measure based on the multiscale sliced Wasserstein distance, which facilitates efficient comparisons between non-local patches of similar color and structure. This aligns with the modern understanding of color perception, where color and structure are inextricably interdependent as a unitary process of perceptual organization. Meanwhile, our method is easy to implement and training-free. Experimental results indicate that our CD measure performs favorably in assessing CDs in photographic images, and consistently surpasses competing models in the presence of image misalignment. Additionally, we empirically verify that our measure functions as a metric in the mathematical sense, and show its promise as a loss function for image and video color transfer tasks. The code is available at https://github.com/real-hjq/MS-SWD.
Image-to-Image translation in Generative Artificial Intelligence (Generative AI) has been a central focus of research, with applications spanning healthcare, remote sensing, physics, chemistry, photography, and more. Among the numerous methodologies, Generative Adversarial Networks (GANs) with contrastive learning have been particularly successful. This study aims to demonstrate that the Kolmogorov-Arnold Network (KAN) can effectively replace the Multi-layer Perceptron (MLP) method in generative AI, particularly in the subdomain of image-to-image translation, to achieve better generative quality. Our novel approach replaces the two-layer MLP with a two-layer KAN in the existing Contrastive Unpaired Image-to-Image Translation (CUT) model, developing the KAN-CUT model. This substitution favors the generation of more informative features in low-dimensional vector representations, which contrastive learning can utilize more effectively to produce high-quality images in the target domain. Extensive experiments, detailed in the results section, demonstrate the applicability of KAN in conjunction with contrastive learning and GANs in Generative AI, particularly for image-to-image translation. This work suggests that KAN could be a valuable component in the broader generative AI domain.
Previous research on retinal vessel segmentation is targeted at a specific image domain, mostly color fundus photography (CFP). In this paper we make a brave attempt to attack a more challenging task of broad-domain retinal vessel segmentation (BD-RVS), which is to develop a unified model applicable to varied domains including CFP, SLO, UWF, OCTA and FFA. To that end, we propose Dual Convoltuional Prompting (DCP) that learns to extract domain-specific features by localized prompting along both position and channel dimensions. DCP is designed as a plug-in module that can effectively turn a R2AU-Net based vessel segmentation network to a unified model, yet without the need of modifying its network structure. For evaluation we build a broad-domain set using five public domain-specific datasets including ROSSA, FIVES, IOSTAR, PRIME-FP20 and VAMPIRE. In order to benchmark BD-RVS on the broad-domain dataset, we re-purpose a number of existing methods originally developed in other contexts, producing eight baseline methods in total. Extensive experiments show the the proposed method compares favorably against the baselines for BD-RVS.
The urban canopy refers to the spatial area at the average height range of urban structures. The light environment of the urban canopy not only influences the ecological conditions of the canopy layer region but also serves as an indicator of the upward light influx of artificial nighttime light in the urban environment. Previous research on urban nighttime light environment mainly focused on the urban surface layer and urban night sky layer, lacking attention to the urban canopy layer. This study observes the urban canopy layer with the flight and photography functions of an unmanned aerial vehicle (UAV) and combines color band remote sensing data with ground measurement data to explore the relationship between the three levels of the urban nighttime light environment. Furthermore, a three–dimensional observation method is established for urban nighttime light environments based on a combination of three observation methods. The research results indicate that there is a good correlation between drone aerial photography data and remote sensing data (R<sup>2</sup> = 0.717), as well as between ground–measured data and remote sensing data (R<sup>2</sup> = 0.876). It also shows that UAV images can serve as a new path for the observation of urban canopy nighttime light environments because of the accuracy and reliability of UAV aerial data. Meanwhile, the combination of UAV photography, ground measurement, and remote sensing data provides a new method for the monitoring and control of urban nighttime light pollution.
Objective:
Fundus examination is a vital part of routine ophthalmological examination. Over the years, we have had various techniques and advancements to make the procedure handy, cheap, and more efficient. The latest in the armory is the smartphone fundoscopy. Although most of the work using this technique has been done in Western countries, of late it has gained popularity in India, especially among the residents. Here, we describe a relatively simple technique for fundus photography in patients with dilated pupils using a smartphone and an indirect ophthalmoscopy lens.
Materials and Methods:
Various applications and devices have been developed to acquire images of the retina using the smartphone, but that only adds to the overall cost of the procedure. By applying the basic principle of indirect ophthalmoscopy and using a smartphone and a condensing lens, we can record high-definition videos of the fundus and subsequently extract high-quality images.
Results:
Using the described technique of manual smartphone fundus photography, excellent, high-quality fundus images have been captured.
Conclusion:
Fundus photography using a smartphone is an inexpensive, portable, safe, fast, and convenient method to obtain retinal images during busy outpatient departments or even inward for bedridden patients. These images can be used as medical records for follow-up of different pathologies and academic purposes.
Neural rendering has garnered substantial attention owing to its capacity for creating realistic 3D scenes. However, its applicability to extensive scenes remains challenging, with limitations in effectiveness. In this work, we propose the Drone-NeRF framework to enhance the efficient reconstruction of unbounded large-scale scenes suited for drone oblique photography using Neural Radiance Fields (NeRF). Our approach involves dividing the scene into uniform sub-blocks based on camera position and depth visibility. Sub-scenes are trained in parallel using NeRF, then merged for a complete scene. We refine the model by optimizing camera poses and guiding NeRF with a uniform sampler. Integrating chosen samples enhances accuracy. A hash-coded fusion MLP accelerates density representation, yielding RGB and Depth outputs. Our framework accounts for sub-scene constraints, reduces parallel-training noise, handles shadow occlusion, and merges sub-regions for a polished rendering result. This Drone-NeRF framework demonstrates promising capabilities in addressing challenges related to scene complexity, rendering efficiency, and accuracy in drone-obtained imagery.
Italian architects began to adopt photomontage and collage techniques only at the end of the 1920s. In the decade before the Second World War, these techniques provided architects with a visual key to distinguish themselves from the academies’ canonical representation; to seek an affiliation with the European avant-gardes; and to be recognisable in architecture competitions. They also constituted a critical tool for investigating and designing. Their peculiar evolution is exemplified in the work of the Milanese architect Piero Bottoni. Passionate about photography and cinema, Bottoni used these techniques for different purposes, not least their latent political and subversive potential, which was already implicit in the work of the artistic avant-gardes. In this sense, the analysis of some of his photomontages, which are today preserved in the Archivio Piero Bottoni (APB) at the Milan Polytechnic, reveals both his intent to introduce an anti-academic, ironic and realistic language, as well as the importance of cinema as an original source for architectural communication. Parallel to colleagues such as Giuseppe Terragni, Figini and Pollini or Ludovico Quaroni, Bottoni explored these techniques in the political context of the fascist regime, which initially promoted any kind of original artistic research; and then gradually converged towards a reactionary, conformist and populist classicism which would isolate creative voices.
Carlos Francisco Moreno-Garcia, Francesc Serratosa
Image registration is a research field in which images must be compared and aligned independently of the point of view or camera characteristics. In some applications (such as forensic biometrics, satellite photography or outdoor scene identification) classical image registration systems fail due to one of the images compared represents a tiny piece of the other image. For instance, in forensics palmprint recognition, it is usual to find only a small piece of the palmprint, but in the database, the whole palmprint has been enrolled. The main reason of the poor behaviour of classical image registration methods is the gap between the amounts of salient points of both images, which is related to the number of points to be considered as outliers. Usually, the difficulty of finding a good match increases when the image that represents the tiny part of the scene has been drastically rotated. Again, in the case of palmprint forensics, it is difficult to decide a priori the orientation of the found tiny palmprint image. We present a rotation invariant registration method that explicitly considers that the image to be matched is a small piece of a larger image. We have experimentally validated our method in two different scenarios; palmprint identification and outdoor image registration.