AceTone: Bridging Words and Colors for Conditional Image Grading
Tianren Ma, Mingxiang Liao, Xijin Zhang
et al.
Color affects how we interpret image style and emotion. Previous color grading methods rely on patch-wise recoloring or fixed filter banks, struggling to generalize across creative intents or align with human aesthetic preferences. In this study, we propose AceTone, the first approach that supports multimodal conditioned color grading within a unified framework. AceTone formulates grading as a generative color transformation task, where a model directly produces 3D-LUTs conditioned on text prompts or reference images. We develop a VQ-VAE based tokenizer which compresses a $3\times32^3$ LUT vector to 64 discrete tokens with $ΔE<2$ fidelity. We further build a large-scale dataset, AceTone-800K, and train a vision-language model to predict LUT tokens, followed by reinforcement learning to align outputs with perceptual fidelity and aesthetics. Experiments show that AceTone achieves state-of-the-art performance on both text-guided and reference-guided grading tasks, improving LPIPS by up to 50% over existing methods. Human evaluations confirm that AceTone's results are visually pleasing and stylistically coherent, demonstrating a new pathway toward language-driven, aesthetic-aligned color grading.
SemiNFT: Learning to Transfer Presets from Imitation to Appreciation via Hybrid-Sample Reinforcement Learning
Melany Yang, Yuhang Yu, Diwang Weng
et al.
Photorealistic color retouching plays a vital role in visual content creation, yet manual retouching remains inaccessible to non-experts due to its reliance on specialized expertise. Reference-based methods offer a promising alternative by transferring the preset color of a reference image to a source image. However, these approaches often operate as novice learners, performing global color mappings derived from pixel-level statistics, without a true understanding of semantic context or human aesthetics. To address this issue, we propose SemiNFT, a Diffusion Transformer (DiT)-based retouching framework that mirrors the trajectory of human artistic training: beginning with rigid imitation and evolving into intuitive creation. Specifically, SemiNFT is first taught with paired triplets to acquire basic structural preservation and color mapping skills, and then advanced to reinforcement learning (RL) on unpaired data to cultivate nuanced aesthetic perception. Crucially, during the RL stage, to prevent catastrophic forgetting of old skills, we design a hybrid online-offline reward mechanism that anchors aesthetic exploration with structural review. % experiments Extensive experiments show that SemiNFT not only outperforms state-of-the-art methods on standard preset transfer benchmarks but also demonstrates remarkable intelligence in zero-shot tasks, such as black-and-white photo colorization and cross-domain (anime-to-photo) preset transfer. These results confirm that SemiNFT transcends simple statistical matching and achieves a sophisticated level of aesthetic comprehension. Our project can be found at https://melanyyang.github.io/SemiNFT/.
Examining the human-environment interactive design approach in Lingnan garden architecture: a case study of Mo Bozhi’s Garden Restaurant artworks
Zhaoming Du, Lujing Zhong, Weicong Li
Recent landscape studies have shifted focus from the formal aesthetics of garden design to embodied behavioral experiences. While much of the literature has centered on the artistic principles of traditional garden design, limited attention has been paid to the underlying design philosophy of commercial gardens. Drawing on cognitive map theory and employing a non-participatory observation method, this study examines the experiential patterns and Human Environment Interaction (HEI) design logic within Panxi (PGR) and South Garden Restaurants (SGR), and proposes a landscape narrative model grounded in the “sensory – behavioral – memory” triadic structure. PGR demonstrates an open, socially interactive spatial configuration characterized by the synergy of multiple nodes, whereas SGR establishes immersive experiential zones anchored around focal landscapes. Lingnan garden architecture (GA) constructs spatial imagery through a four-element system of “edges, landscape, architecture, paths”. In both gardens, short-duration, high-frequency dwellings occur at nodes such as zigzag bridges, covered corridors, and spiral staircases. Elderly visitors tend to favor tranquil districts, while younger cohorts are drawn to zones with high informational density. Individuals’ perception of garden spaces is shaped not only by immediate sensory stimuli but also by structurally embedded memory and lived experience.
Architecture, Building construction
The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation
Weijia Mao, Hao Chen, Zhenheng Yang
et al.
A reliable reward function is essential for reinforcement learning (RL) in image generation. Most current RL approaches depend on pre-trained preference models that output scalar rewards to approximate human preferences. However, these rewards often fail to capture human perception and are vulnerable to reward hacking, where higher scores do not correspond to better images. To address this, we introduce Adv-GRPO, an RL framework with an adversarial reward that iteratively updates both the reward model and the generator. The reward model is supervised using reference images as positive samples and can largely avoid being hacked. Unlike KL regularization that constrains parameter updates, our learned reward directly guides the generator through its visual outputs, leading to higher-quality images. Moreover, while optimizing existing reward functions can alleviate reward hacking, their inherent biases remain. For instance, PickScore may degrade image quality, whereas OCR-based rewards often reduce aesthetic fidelity. To address this, we take the image itself as a reward, using reference images and vision foundation models (e.g., DINO) to provide rich visual rewards. These dense visual signals, instead of a single scalar, lead to consistent gains across image quality, aesthetics, and task-specific metrics. Finally, we show that combining reference samples with foundation-model rewards enables distribution transfer and flexible style customization. In human evaluation, our method outperforms Flow-GRPO and SD3, achieving 70.0% and 72.4% win rates in image quality and aesthetics, respectively. Code and models have been released.
How to Train your Text-to-Image Model: Evaluating Design Choices for Synthetic Training Captions
Manuel Brack, Sudeep Katakol, Felix Friedrich
et al.
Training data is at the core of any successful text-to-image models. The quality and descriptiveness of image text are crucial to a model's performance. Given the noisiness and inconsistency in web-scraped datasets, recent works shifted towards synthetic training captions. While this setup is generally believed to produce more capable models, current literature does not provide any insights into its design choices. This study closes this gap by systematically investigating how different synthetic captioning strategies impact the downstream performance of text-to-image models. Our experiments demonstrate that dense, high-quality captions enhance text alignment but may introduce trade-offs in output aesthetics and diversity. Conversely, captions of randomized lengths yield balanced improvements across aesthetics and alignment without compromising sample diversity. We also demonstrate that varying caption distributions introduce significant shifts in the output bias of a trained model. Our findings underscore the importance of caption design in achieving optimal model performance and provide practical insights for more effective training data strategies in text-to-image generation.
Voxify3D: Pixel Art Meets Volumetric Rendering
Yi-Chuan Huang, Jiewen Chan, Hao-Jen Chien
et al.
Voxel art is a distinctive stylization widely used in games and digital media, yet automated generation from 3D meshes remains challenging due to conflicting requirements of geometric abstraction, semantic preservation, and discrete color coherence. Existing methods either over-simplify geometry or fail to achieve the pixel-precise, palette-constrained aesthetics of voxel art. We introduce Voxify3D, a differentiable two-stage framework bridging 3D mesh optimization with 2D pixel art supervision. Our core innovation lies in the synergistic integration of three components: (1) orthographic pixel art supervision that eliminates perspective distortion for precise voxel-pixel alignment; (2) patch-based CLIP alignment that preserves semantics across discretization levels; (3) palette-constrained Gumbel-Softmax quantization enabling differentiable optimization over discrete color spaces with controllable palette strategies. This integration addresses fundamental challenges: semantic preservation under extreme discretization, pixel-art aesthetics through volumetric rendering, and end-to-end discrete optimization. Experiments show superior performance (37.12 CLIP-IQA, 77.90\% user preference) across diverse characters and controllable abstraction (2-8 colors, 20x-50x resolutions). Project page: https://yichuanh.github.io/Voxify-3D/
Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models
Jinjin Zhang, Qiuyu Huang, Junjie Liu
et al.
In this paper, we present Diffusion-4K, a novel framework for direct ultra-high-resolution image synthesis using text-to-image diffusion models. The core advancements include: (1) Aesthetic-4K Benchmark: addressing the absence of a publicly available 4K image synthesis dataset, we construct Aesthetic-4K, a comprehensive benchmark for ultra-high-resolution image generation. We curated a high-quality 4K dataset with carefully selected images and captions generated by GPT-4o. Additionally, we introduce GLCM Score and Compression Ratio metrics to evaluate fine details, combined with holistic measures such as FID, Aesthetics and CLIPScore for a comprehensive assessment of ultra-high-resolution images. (2) Wavelet-based Fine-tuning: we propose a wavelet-based fine-tuning approach for direct training with photorealistic 4K images, applicable to various latent diffusion models, demonstrating its effectiveness in synthesizing highly detailed 4K images. Consequently, Diffusion-4K achieves impressive performance in high-quality image synthesis and text prompt adherence, especially when powered by modern large-scale diffusion models (e.g., SD3-2B and Flux-12B). Extensive experimental results from our benchmark demonstrate the superiority of Diffusion-4K in ultra-high-resolution image synthesis.
Exploring Stress among International College Students in China
Omogolo Omaatla Morake, Mengru Xue
Psychological stress encompasses emotional tension and pressure experienced by people, which usually arises from situations people find challenging. However, more is needed to know about the pressures faced by international college students studying in China. The goal of this study is to investigate the various stressors that international college students in China face and how they cope with stress (coping mechanisms). Twenty international students were interviewed to gather data, which was then transcribed. Thematic analysis and coding were applied to the qualitative data, revealing themes related to the causes of stress. The following themes emerge from this data: anticipatory anxiety or future stress, social and cultural challenges, financial strain, and academic pressure. These themes will help understand the various stressors international college students in China face and how they try to cope. Studying how international college students in China cope with challenges can guide the development of targeted interventions to support their mental health. Research suggests that integrating aesthetics and connectivity into design interventions can notably improve the well-being of these students. This paper presents possible future design solutions, leveraging the aesthetics of connectivity to empower students and enhance their resilience. Additionally, it aims to provide valuable insights for designers interested in creating solutions that alleviate stress and promote emotional awareness among international students.
A Progressive Evaluation Framework for Multicultural Analysis of Story Visualization
Janak Kapuriya, Ali Hatami, Paul Buitelaar
Recent advancements in text-to-image generative models have improved narrative consistency in story visualization. However, current story visualization models often overlook cultural dimensions, resulting in visuals that lack authenticity and cultural fidelity. In this study, we conduct a comprehensive multicultural analysis of story visualization using current text-to-image models across multilingual settings on two datasets: FlintstonesSV and VIST. To assess cultural dimensions rigorously, we propose a Progressive Multicultural Evaluation Framework and introduce five story visualization metrics, Cultural Appropriateness, Visual Aesthetics, Cohesion, Semantic Consistency, and Object Presence, that are not addressed by existing metrics. We further automate assessment through an MLLM-as-Jury framework that approximates human judgment. Human evaluations show that models generate more coherent, visually appealing, and culturally appropriate stories for real-world datasets than for animated ones. The generated stories exhibit a stronger alignment with English-speaking cultures across all metrics except Cohesion, where Chinese performs better. In contrast, Hindi ranks lowest on all metrics except Visual Aesthetics, reflecting real-world cultural biases embedded in current models. This multicultural analysis provides a foundation for future research aimed at generating culturally appropriate and inclusive visual stories across diverse linguistic and cultural settings.
Resistance to cosmetic botulinum toxin A: A 15-patient case series across 12 sites
Carlos G. Wambier, MD, PhD, Fatima N. Mirza, MD, MPH, Sarah P.F. Wambier, MD, PhD
et al.
AID-AppEAL: Automatic Image Dataset and Algorithm for Content Appeal Enhancement and Assessment Labeling
Sherry X. Chen, Yaron Vaxman, Elad Ben Baruch
et al.
We propose Image Content Appeal Assessment (ICAA), a novel metric that quantifies the level of positive interest an image's content generates for viewers, such as the appeal of food in a photograph. This is fundamentally different from traditional Image-Aesthetics Assessment (IAA), which judges an image's artistic quality. While previous studies often confuse the concepts of ``aesthetics'' and ``appeal,'' our work addresses this by being the first to study ICAA explicitly. To do this, we propose a novel system that automates dataset creation and implements algorithms to estimate and boost content appeal. We use our pipeline to generate two large-scale datasets (70K+ images each) in diverse domains (food and room interior design) to train our models, which revealed little correlation between content appeal and aesthetics. Our user study, with more than 76% of participants preferring the appeal-enhanced images, confirms that our appeal ratings accurately reflect user preferences, establishing ICAA as a unique evaluative criterion. Our code and datasets are available at https://github.com/SherryXTChen/AID-Appeal.
What Secrets Do Your Manifolds Hold? Understanding the Local Geometry of Generative Models
Ahmed Imtiaz Humayun, Ibtihel Amara, Cristina Vasconcelos
et al.
Deep Generative Models are frequently used to learn continuous representations of complex data distributions using a finite number of samples. For any generative model, including pre-trained foundation models with Diffusion or Transformer architectures, generation performance can significantly vary across the learned data manifold. In this paper we study the local geometry of the learned manifold and its relationship to generation outcomes for a wide range of generative models, including DDPM, Diffusion Transformer (DiT), and Stable Diffusion 1.4. Building on the theory of continuous piecewise-linear (CPWL) generators, we characterize the local geometry in terms of three geometric descriptors - scaling ($ψ$), rank ($ν$), and complexity/un-smoothness ($δ$). We provide quantitative and qualitative evidence showing that for a given latent-image pair, the local descriptors are indicative of generation aesthetics, diversity, and memorization by the generative model. Finally, we demonstrate that by training a reward model on the local scaling for Stable Diffusion, we can self-improve both generation aesthetics and diversity using `geometry reward' based guidance during denoising.
Epistemological Foundations of Fashion Pedagogy: Investigating Knowledge and Emerging Issues
Emanuele Isidori, Irina Leonova , Roberta Alonzi
et al.
Fashion pedagogy, an emerging field within fashion studies, plays a critical role in contemporary culture by integrating elements of body education, expression, and socio-economic values. This paper explores the epistemological foundations of fashion pedagogy through a comprehensive review of the current scientific literature. We address the multifaceted relationship between fashion pedagogy and body education, highlighting how this discipline influences and reshapes people's perceptions of body image and self-expression within informal education. The interplay between fashion pedagogy and capitalist values is also examined, revealing how fashion education reflects and critiques prevailing economic ideologies, promoting a more conscious engagement with fashion as a form of cultural production. Furthermore, this study delves into the interdisciplinary nature of fashion pedagogy, which draws upon diverse fields such as art, aesthetics, psychology, anthropology, sociology, and business to enrich learning experiences and outcomes. The paper also reviews how fashion pedagogy can be implemented within the curricula of higher education institutions, emphasizing innovative teaching methods that foster critical thinking and creativity. Through an analysis of various educational models and practices, we identify key trends and challenges that influence the effectiveness and relevance of fashion pedagogy in today’s educational landscape. This paper argues that fashion pedagogy is significant in educating about fashion and fostering critical awareness among students about its broader social, cultural, aesthetic, and economic implications. It is a pivotal tool for empowering students to navigate and influence the evolving dynamics of contemporary culture through informed and thoughtful fashion practice.
Automatic Generation of Topology Diagrams for Strongly-Meshed Power Transmission Systems
Jingyu Wang, Jinfu Chen, Dongyuan Shi
et al.
Topology diagrams are widely seen in power system applications, but their automatic generation is often easier said than done. When facing power transmission systems with strongly-meshed structures, existing approaches can hardly produce topology diagrams catering to the aesthetics of readers. This paper proposes an integrated framework for generating aesthetically-pleasing topology diagrams for power transmission systems. Input with a rough layout, the framework first conducts visibility region analysis to reduce line crossings and then solves a mixed-integer linear programming problem to optimize the arrangement of nodes. Given that the complexity of both modules is pretty high, simplification heuristics are also proposed to enhance the efficiency of the framework. Case studies on several power transmission systems containing up to 2,046 nodes demonstrate the capability of the proposed framework in generating topology diagrams conforming to aesthetic criteria in the power system community. Compared with the widespread force-directed algorithm, the proposed framework can preserve the relative positions of nodes in the original layout to a great extent, which significantly contributes to the identification of electrical elements on the diagrams. Meanwhile, the time consumption is acceptable for practical applications.
Design Characterization for Black-and-White Textures in Visualization
Tingying He, Yuanyang Zhong, Petra Isenberg
et al.
We investigate the use of 2D black-and-white textures for the visualization of categorical data and contribute a summary of texture attributes, and the results of three experiments that elicited design strategies as well as aesthetic and effectiveness measures. Black-and-white textures are useful, for instance, as a visual channel for categorical data on low-color displays, in 2D/3D print, to achieve the aesthetic of historic visualizations, or to retain the color hue channel for other visual mappings. We specifically study how to use what we call geometric and iconic textures. Geometric textures use patterns of repeated abstract geometric shapes, while iconic textures use repeated icons that may stand for data categories. We parameterized both types of textures and developed a tool for designers to create textures on simple charts by adjusting texture parameters. 30 visualization experts used our tool and designed 66 textured bar charts, pie charts, and maps. We then had 150 participants rate these designs for aesthetics. Finally, with the top-rated geometric and iconic textures, our perceptual assessment experiment with 150 participants revealed that textured charts perform about equally well as non-textured charts, and that there are some differences depending on the type of chart.
Walking Through Everyday Life: Tensions and Disruptions within the Ordinary
Conceição Nélio
Bringing together a genealogy of authors, concepts, and aesthetic case studies, this article aims to contribute to the discussion on ordinary aesthetics by focusing on the tensions that are intrinsic to walking as a fundamental embodied action in everyday urban life. These tensions concern the movement of walking itself and its relation to one’s surroundings, but it also concerns a certain complementarity between home (familiarity) and wandering. Experiencing space and thresholds that disrupt one’s relationship with home and the everyday can be understood as part of a modern “anti-home” tendency that lies at the core of several artistic and aesthetic practices. On the other hand, the study of walking and its relationship with the ordinary has also been enhanced and complexified by the mediation of images and technologies of reproduction. Approaching the paradoxes and ambiguities of everydayness from the perspective of walking allows us to better understand the ordinary as an in-between concept composed of evidence and mystery, familiarity and strangeness. Walking itself, as an ordinary element of life, is an unstable stabilisation, an unconsciousness that may become awareness, an immersive action that knows interruptions, a way of repeating paths that can also lead to detours and discoveries.
Five Looks at Emmaus: Revelation, Resonance, and the Sacramental Imagination
Anthony J. Godzieba
The intersection between religious experience and aesthetic experience has become so obvious that the current “aesthetic turn” in Christian theology no longer needs to be defended. In this essay, I discuss that intersection point from the point of view of Roman Catholicism, in order to demonstrate the bold claim that the arts and the performance they evoke from us are as important as the creed for Catholicism. The essay aims to do three things: first, to examine that intersection point and emphasize the elements of intentionality and desire; second, to analyze one expression of that intersection, namely the connection among Catholic faith claims, the visual arts, and Catholicism’s incarnational-sacramental imagination (using depictions of the post-Resurrection Emmaus story); third, to use hints from Hartmut Rosa’s recent work on “resonance” to tease out how revelation and transformation occur at this intersection.
Religions. Mythology. Rationalism
Semi-supervised Fashion Compatibility Prediction by Color Distortion Prediction
Ling Xiao, Toshihiko Yamasaki
Supervised learning methods have been suffering from the fact that a large-scale labeled dataset is mandatory, which is difficult to obtain. This has been a more significant issue for fashion compatibility prediction because compatibility aims to capture people's perception of aesthetics, which are sparse and changing. Thus, the labeled dataset may become outdated quickly due to fast fashion. Moreover, labeling the dataset always needs some expert knowledge; at least they should have a good sense of aesthetics. However, there are limited self/semi-supervised learning techniques in this field. In this paper, we propose a general color distortion prediction task forcing the baseline to recognize low-level image information to learn more discriminative representation for fashion compatibility prediction. Specifically, we first propose to distort the image by adjusting the image color balance, contrast, sharpness, and brightness. Then, we propose adding Gaussian noise to the distorted image before passing them to the convolutional neural network (CNN) backbone to learn a probability distribution over all possible distortions. The proposed pretext task is adopted in the state-of-the-art methods in fashion compatibility and shows its effectiveness in improving these methods' ability in extracting better feature representations. Applying the proposed pretext task to the baseline can consistently outperform the original baseline.
Karakter Visual Candi Bentar Pura Puru Sada di Badung, Bali
I Putu Sathya Dharma, Gusti Ayu Made Suartika
Candi bentar is a gate or the main door to enter a specific area, such as temple and palace in Bali. However, in the current situation, it can be found in many entries points to various premises, including a border between areas, a house, and public facilities. Puru Sada Temple, one of Kahyangan Jagat Temples located in Badung Regency of Bali Province, has a candi bentar, which at first glance similar to that of the Wringin Lawang Temple - a legacy of the Majapahit Kingdom of East Java. In terms of scale, however, the size of the Puru Sada Temple’s candi bentar is smaller. The purpose of this study is to discuss the visual characters of candi bentar in places that functioned for worship by taking Puru Sada Temple as its case study. The study used a descriptive qualitative approach. Its analysis is supported by relevant views offered by both Yudoseputro (2008) and Ching (1991). This study finds that intimacy has been a dominant visual character supported by the existence of sacred ornaments that are considered as guarding figures.
Keywords: visual character; candi bentar; gate; Puru Sada Temple
Abstrak
Candi bentar adalah gerbang atau pintu utama dalam memasuki area khusus seperti pura maupun puri di Bali. Namun saat ini candi bentar dapat ditemukan di berbagai tempat seperti perbatasan daerah, rumah tinggal, dan fasilitas umum. Pura Puru Sada termasuk dalam Pura Kahyangan Jagat berlokasi di Badung memiliki candi bentar yang sekilas mirip dengan Gapura Wringin Lawang peninggalan Kerajaan Majapahit di Jawa Timur. Namun ukuran candi bentar Pura Puru Sada lebih kecil. Tujuan penelitian ini adalah membahas karakter visual candi bentar di tempat suci dengan mengambil Pura Puru Sada sebagai studi kasus. Penelitian ini menggunakan pedekatan kualitatif deskriptif. Dianalisa dengan teori relevan yang ditawarkan oleh Yudoseputro (2008) dan Ching (1991). Studi ini menemukan jika intimasi merupakan karakter visual dominan yang didukung dengan adanya ornamen sakral sebagai sosok penjaga.
Kata kunci: karakter visual; candi bentar; gapura; Pura Puru Sada
Aesthetics of cities. City planning and beautifying
Tumera: Tutor of Photography Beginners
Xiaoran Wu, Jia Jia
With the popularity of photographic equipment, more and more people are starting to learn photography by themselves. Although they have easy access to photographic materials, it is uneasy to obtain professional feedback or guidance that can help them improve their photography skills. Therefore, we develop an intelligently interactive system, Tumera, that provides aesthetics guidance for photography beginners. When shooting, Tumera gives timely feedback on the pictures in the view port. After shooting, scores evaluating the aesthetic quality of different aspects of the photos and corresponding improvement suggestions are given. Tumera allows users to share, rank, discuss, and learn from their works and interaction with the system based on the scores and suggestions. In the experiment, Tumera showed good accuracy, real-time computing ability, and effective guiding performance.