Hasil untuk "Aesthetics"

Menampilkan 20 dari ~185931 hasil · dari arXiv, DOAJ, Semantic Scholar

JSON API
arXiv Open Access 2025
Generic visuality of war? How image-generative AI models (mis)represent Russia's war against Ukraine

Mykola Makhortykh, Miglė Bareikytė

The rise of generative AI (genAI) can transform the representation of different aspects of social reality, including modern wars. While scholarship has largely focused on the military applications of AI, the growing adoption of genAI technologies may have major implications for how wars are portrayed, remembered, and interpreted. A few initial scholarly inquiries highlight the risks of genAI in this context, specifically regarding its potential to distort the representation of mass violence, particularly by sanitising and homogenising it. However, little is known about how genAI representation practices vary between different episodes of violence portrayed by Western and non-Western genAI models. Using the Russian aggression against Ukraine as a case study, we audit how two image-generative models, the US-based Midjourney and the Russia-based Kandinsky, represent both fictional and factual episodes of the war. We then analyse the models' responsiveness to the war-related prompts, together with the aesthetic and content-based aspects of the resulting images. Our findings highlight that contextual factors lead to variation in the representation of war, both between models and within the outputs of the same model. However, there are some consistent patterns of representation that may contribute to the homogenization of war aesthetics.

en cs.CY
arXiv Open Access 2025
Beyond Universality: Cultural Diversity in Music and Its Implications for Sound Design and Sonification

Rubén García-Benito

The Audio Mostly (AM) conference has long been a platform for exploring the intersection of sound, technology, and culture. Despite growing interest in sonic cultures, discussions on the role of cultural diversity in sound design and sonification remain limited. This paper investigates the implicit biases and gaps within the discourse on music and sound aesthetics, challenging the notion of music as a 'universal language'. Through a historical and cross-cultural analysis of musicology and ethnomusicology, the profound influence of cultural context on auditory perception and aesthetic appraisal is highlighted. By drawing parallels between historical music practices and contemporary sound design, the paper advocates for a more inclusive approach that recognizes the diversity of sonic traditions. Using music as a case study, we underscore broader implications for sound design and sonification, emphasizing the need to integrate cultural perspectives into auditory design practices. A reevaluation of existing frameworks in sound design and sonification is proposed, emphasizing the necessity of culturally informed practices that resonate with global audiences. Ultimately, embracing cultural diversity in sound design is suggested to lead to richer, more meaningful auditory experiences and to foster greater inclusivity within the field.

en physics.soc-ph, cs.SD
arXiv Open Access 2025
Preserve Anything: Controllable Image Synthesis with Object Preservation

Prasen Kumar Sharma, Neeraj Matiyali, Siddharth Srivastava et al.

We introduce \textit{Preserve Anything}, a novel method for controlled image synthesis that addresses key limitations in object preservation and semantic consistency in text-to-image (T2I) generation. Existing approaches often fail (i) to preserve multiple objects with fidelity, (ii) maintain semantic alignment with prompts, or (iii) provide explicit control over scene composition. To overcome these challenges, the proposed method employs an N-channel ControlNet that integrates (i) object preservation with size and placement agnosticism, color and detail retention, and artifact elimination, (ii) high-resolution, semantically consistent backgrounds with accurate shadows, lighting, and prompt adherence, and (iii) explicit user control over background layouts and lighting conditions. Key components of our framework include object preservation and background guidance modules, enforcing lighting consistency and a high-frequency overlay module to retain fine details while mitigating unwanted artifacts. We introduce a benchmark dataset consisting of 240K natural images filtered for aesthetic quality and 18K 3D-rendered synthetic images with metadata such as lighting, camera angles, and object relationships. This dataset addresses the deficiencies of existing benchmarks and allows a complete evaluation. Empirical results demonstrate that our method achieves state-of-the-art performance, significantly improving feature-space fidelity (FID 15.26) and semantic alignment (CLIP-S 32.85) while maintaining competitive aesthetic quality. We also conducted a user study to demonstrate the efficacy of the proposed work on unseen benchmark and observed a remarkable improvement of $\sim25\%$, $\sim19\%$, $\sim13\%$, and $\sim14\%$ in terms of prompt alignment, photorealism, the presence of AI artifacts, and natural aesthetics over existing works.

en cs.CV
arXiv Open Access 2025
Quantum Brush: A quantum computing-based tool for digital painting

João S. Ferreira, Arianna Crippa, Astryd Park et al.

We present Quantum Brush, an open-source digital painting tool that harnesses quantum computing to generate novel artistic expressions. The tool includes four different brushes that translate strokes into unique quantum algorithms, each highlighting a different way in which quantum effects can produce novel aesthetics. Each brush is designed to be compatible with the current noisy intermediate-scale quantum (NISQ) devices, as demonstrated by executing them on IQM's Sirius device.

en cs.GR, cs.ET
arXiv Open Access 2025
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis

Shitian Zhao, Qilong Wu, Xinyue Li et al.

We introduce LeX-Art, a comprehensive suite for high-quality text-image synthesis that systematically bridges the gap between prompt expressiveness and text rendering fidelity. Our approach follows a data-centric paradigm, constructing a high-quality data synthesis pipeline based on Deepseek-R1 to curate LeX-10K, a dataset of 10K high-resolution, aesthetically refined 1024$\times$1024 images. Beyond dataset construction, we develop LeX-Enhancer, a robust prompt enrichment model, and train two text-to-image models, LeX-FLUX and LeX-Lumina, achieving state-of-the-art text rendering performance. To systematically evaluate visual text generation, we introduce LeX-Bench, a benchmark that assesses fidelity, aesthetics, and alignment, complemented by Pairwise Normalized Edit Distance (PNED), a novel metric for robust text accuracy evaluation. Experiments demonstrate significant improvements, with LeX-Lumina achieving a 79.81% PNED gain on CreateBench, and LeX-FLUX outperforming baselines in color (+3.18%), positional (+4.45%), and font accuracy (+3.81%). Our codes, models, datasets, and demo are publicly available.

en cs.CV
arXiv Open Access 2025
IPGO: Indirect Prompt Gradient Optimization for Parameter-Efficient Prompt-level Fine-Tuning on Text-to-Image Models

Jianping Ye, Michel Wedel, Kunpeng Zhang

Text-to-Image Diffusion models excel at generating images from text prompts but often exhibit suboptimal alignment with content semantics, aesthetics, and human preferences. To address these limitations, this study proposes a novel parameter-efficient framework, Indirect Prompt Gradient Optimization (IPGO), for prompt-level diffusion model fine-tuning. IPGO enhances prompt embeddings by injecting continuously differentiable embeddings at the beginning and end of the prompt embeddings, leveraging low-rank structures with the flexibility and nonlinearity from rotations. This approach enables gradient-based optimization of injected embeddings under range, orthonormality, and conformity constraints, effectively narrowing the search space, promoting a stable solution, and ensuring alignment between the embeddings of the injected embeddings and the original prompt. Its extension IPGO+ adds a parameter-free cross-attention mechanism on the prompt embedding to enforce dependencies between the original prompt and the inserted embeddings. We conduct extensive evaluations through prompt-wise (IPGO) and prompt-batch (IPGO+) training using three reward models of image aesthetics, image-text alignment, and human preferences across three datasets of varying complexity. The results show that IPGO consistently outperforms SOTA benchmarks, including stable diffusion v1.5 with raw prompts, text-embedding-based methods (TextCraftor), training-based methods (DRaFT and DDPO), and training-free methods (DPO-Diffusion, Promptist, and ChatGPT-4o). Specifically, IPGO achieves a win-rate exceeding 99% in prompt-wise learning, and IPGO+ achieves a comparable, but often better performance against current SOTAs (a 75% win rate) in prompt-batch learning. Moreover, we illustrate IPGO's generalizability and its capability to significantly enhance image quality while requiring minimal data and resources.

en cs.LG
arXiv Open Access 2025
LumiCtrl : Learning Illuminant Prompts for Lighting Control in Personalized Text-to-Image Models

Muhammad Atif Butt, Kai Wang, Javier Vazquez-Corral et al.

Text-to-image (T2I) models have demonstrated remarkable progress in creative image generation, yet they still lack precise control over scene illuminants which is a crucial factor for content designers to manipulate visual aesthetics of generated images. In this paper, we present an illuminant personalization method named LumiCtrl that learns illuminant prompt given single image of the object. LumiCtrl consists of three components: given an image of the object, our method apply (a) physics-based illuminant augmentation along with Planckian locus to create fine-tuning variants under standard illuminants; (b) Edge-Guided Prompt Disentanglement using frozen ControlNet to ensure prompts focus on illumination, not the structure; and (c) a Masked Reconstruction Loss that focuses learning on foreground object while allowing background to adapt contextually which enables what we call Contextual Light Adaptation. We qualitatively and quantitatively compare LumiCtrl against other T2I customization methods. The results show that LumiCtrl achieves significantly better illuminant fidelity, aesthetic quality, and scene coherence compared to existing baselines. A human preference study further confirms the strong user preference for LumiCtrl generations.

en cs.CV
DOAJ Open Access 2025
Probiotics to reduce microbiota-related dental stains: A potential approach

Jing-Jie Yu, Yvonne Hernandez-Kapila, Chin-Wei Wang

Adult extrinsic black stains on teeth, caused by bacterial colonization, impact aesthetics and confidence. Conventional treatments can be abrasive and have a high recurrence rate. This pilot case study explores probiotics as an adjunctive approach. Direct application of probiotic powder over the black stains of the teeth was carried out prior to routine home care. Results showed black stain removal was possible with tooth brush and dental floss. Saliva and biofilm samples were analyzed via 16S rRNA sequencing. Microbiome revealed a noticeable reduction in Corynebacterium, a key black stain-associated bacterium, with slight shifts in major phyla like Actinobacteriota and Firmicutes. This case study aimed to evaluate the potential of probiotics in reducing black stains on teeth and assess the associated microbiome changes.

DOAJ Open Access 2025
Deep Neural Framework With Visual Attention and Global Context for Predicting Image Aesthetics

Yifei Xu, Nuo Zhang, Pingping Wei et al.

Computational inference of aesthetics has recently become a hot topic due to its usefulness in widely applications such as evaluating image quality, retouching image and retrieving image. Owing to the subjectivity of this problem, there is no general framework to predict image aesthetics. In this paper, we propose a deep neural framework with visual attention module, self-generated global features and hybrid loss to address this problem. Specifically, the framework can be any state-of-the-art convolution classification network compatible with visual attention. Further, self-generated global feature compensates for the loss of global context information during training stage, and the hybrid loss guides the network to learn the similarity between the predicted aesthetic scores and the ground-truths through fusing soft-max-entropy and Earth Mover’s Distance(EMD). With the above-mentioned improvements, the proposed deep neural framework is capable of effectively predicting image aesthetics in an efficient way. In our experiments, we release a real-world aesthetic dataset that contains 1,800 2K photos labeled by several experienced photographers, and then provide a thorough ablation study of the design choices to better understand the superiority brought by each part of our framework, and design several comparisons with the state-of-the-art methods on a fraction of metrics. The experimental results on two datasets demonstrate that both accuracy and efficiency achieve favorably performance.

Electrical engineering. Electronics. Nuclear engineering
arXiv Open Access 2024
Empowering Backbone Models for Visual Text Generation with Input Granularity Control and Glyph-Aware Training

Wenbo Li, Guohao Li, Zhibin Lan et al.

Diffusion-based text-to-image models have demonstrated impressive achievements in diversity and aesthetics but struggle to generate images with legible visual texts. Existing backbone models have limitations such as misspelling, failing to generate texts, and lack of support for Chinese text, but their development shows promising potential. In this paper, we propose a series of methods, aiming to empower backbone models to generate visual texts in English and Chinese. We first conduct a preliminary study revealing that Byte Pair Encoding (BPE) tokenization and the insufficient learning of cross-attention modules restrict the performance of the backbone models. Based on these observations, we make the following improvements: (1) We design a mixed granularity input strategy to provide more suitable text representations; (2) We propose to augment the conventional training objective with three glyph-aware training losses, which enhance the learning of cross-attention modules and encourage the model to focus on visual texts. Through experiments, we demonstrate that our methods can effectively empower backbone models to generate semantic relevant, aesthetically appealing, and accurate visual text images, while maintaining their fundamental image generation quality.

en cs.CV, cs.AI
arXiv Open Access 2024
Structure-Aware Human Body Reshaping with Adaptive Affinity-Graph Network

Qiwen Deng, Yangcen Liu, Wen Li et al.

Given a source portrait, the automatic human body reshaping task aims at editing it to an aesthetic body shape. As the technology has been widely used in media, several methods have been proposed mainly focusing on generating optical flow to warp the body shape. However, those previous works only consider the local transformation of different body parts (arms, torso, and legs), ignoring the global affinity, and limiting the capacity to ensure consistency and quality across the entire body. In this paper, we propose a novel Adaptive Affinity-Graph Network (AAGN), which extracts the global affinity between different body parts to enhance the quality of the generated optical flow. Specifically, our AAGN primarily introduces the following designs: (1) we propose an Adaptive Affinity-Graph (AAG) Block that leverages the characteristic of a fully connected graph. AAG represents different body parts as nodes in an adaptive fully connected graph and captures all the affinities between nodes to obtain a global affinity map. The design could better improve the consistency between body parts. (2) Besides, for high-frequency details are crucial for photo aesthetics, a Body Shape Discriminator (BSD) is designed to extract information from both high-frequency and spatial domain. Particularly, an SRM filter is utilized to extract high-frequency details, which are combined with spatial features as input to the BSD. With this design, BSD guides the Flow Generator (FG) to pay attention to various fine details rather than rigid pixel-level fitting. Extensive experiments conducted on the BR-5K dataset demonstrate that our framework significantly enhances the aesthetic appeal of reshaped photos, surpassing all previous work to achieve state-of-the-art in all evaluation metrics.

en cs.CV
arXiv Open Access 2024
Evaluating Text-to-Image Generative Models: An Empirical Study on Human Image Synthesis

Muxi Chen, Yi Liu, Jian Yi et al.

In this paper, we present an empirical study introducing a nuanced evaluation framework for text-to-image (T2I) generative models, applied to human image synthesis. Our framework categorizes evaluations into two distinct groups: first, focusing on image qualities such as aesthetics and realism, and second, examining text conditions through concept coverage and fairness. We introduce an innovative aesthetic score prediction model that assesses the visual appeal of generated images and unveils the first dataset marked with low-quality regions in generated human images to facilitate automatic defect detection. Our exploration into concept coverage probes the model's effectiveness in interpreting and rendering text-based concepts accurately, while our analysis of fairness reveals biases in model outputs, with an emphasis on gender, race, and age. While our study is grounded in human imagery, this dual-faceted approach is designed with the flexibility to be applicable to other forms of image generation, enhancing our understanding of generative models and paving the way to the next generation of more sophisticated, contextually aware, and ethically attuned generative models. Code and data, including the dataset annotated with defective areas, are available at \href{https://github.com/cure-lab/EvaluateAIGC}{https://github.com/cure-lab/EvaluateAIGC}.

en cs.CV, cs.AI
arXiv Open Access 2024
Beyond Imperfections: A Conditional Inpainting Approach for End-to-End Artifact Removal in VTON and Pose Transfer

Aref Tabatabaei, Zahra Dehghanian, Maryam Amirmazlaghani

Artifacts often degrade the visual quality of virtual try-on (VTON) and pose transfer applications, impacting user experience. This study introduces a novel conditional inpainting technique designed to detect and remove such distortions, improving image aesthetics. Our work is the first to present an end-to-end framework addressing this specific issue, and we developed a specialized dataset of artifacts in VTON and pose transfer tasks, complete with masks highlighting the affected areas. Experimental results show that our method not only effectively removes artifacts but also significantly enhances the visual quality of the final images, setting a new benchmark in computer vision and image processing.

en cs.CV
DOAJ Open Access 2024
Mimicking power

Florence Zivaishe Madenga

This article explores performances of satire as a form of journalism in Zimbabwe by analysing performances by satirists who mimic to mock journalistic conventions and political authorities. Through analyses of YouTube videos, the article explores the ways in which satire as journalism is visualised. Onscreen, satirists mimic the gestures, mannerisms and aesthetic objects connected to both political figures and state journalists on the state-run television station for ridicule. This paper argues that the parodying and mimicking of aesthetics of authority legitimises and professionalises satire as journalism, even as it seeks to critique notions of journalistic authority in an authoritarian state. Through analyses of three Zimbabwean satire shows on YouTube, the paper finds that, while satirists mimic journalistic practices and journalistic authority with the goal of mocking state media and the censorial state, a closer reading of their practices shows that they themselves become legitimate news tellers. This kind of satire plays with familiar broadcasting television aesthetics to signal authority, providing current news and otherwise censored critique.

Language and Literature
arXiv Open Access 2023
End-to-End Diffusion Latent Optimization Improves Classifier Guidance

Bram Wallace, Akash Gokul, Stefano Ermon et al.

Classifier guidance -- using the gradients of an image classifier to steer the generations of a diffusion model -- has the potential to dramatically expand the creative control over image generation and editing. However, currently classifier guidance requires either training new noise-aware models to obtain accurate gradients or using a one-step denoising approximation of the final generation, which leads to misaligned gradients and sub-optimal control. We highlight this approximation's shortcomings and propose a novel guidance method: Direct Optimization of Diffusion Latents (DOODL), which enables plug-and-play guidance by optimizing diffusion latents w.r.t. the gradients of a pre-trained classifier on the true generated pixels, using an invertible diffusion process to achieve memory-efficient backpropagation. Showcasing the potential of more precise guidance, DOODL outperforms one-step classifier guidance on computational and human evaluation metrics across different forms of guidance: using CLIP guidance to improve generations of complex prompts from DrawBench, using fine-grained visual classifiers to expand the vocabulary of Stable Diffusion, enabling image-conditioned generation with a CLIP visual encoder, and improving image aesthetics using an aesthetic scoring network. Code at https://github.com/salesforce/DOODL.

en cs.CV, cs.AI
DOAJ Open Access 2023
Productive Imagination in Painting from Ricoeur's Point of View

Mohammad Sadegh Gheysari, Amir Maziar

The study of imagination has always had a significant role in the study of art and aesthetics. And imagination plays a central role in Ricoeur's philosophical system; a subject that has received less attention. It was Ricoeur's goal to bring the productive aspect of imagination back to the main text of philosophy, by presenting a critical reading of the history of philosophy. By using Emanuel Kant's imagination analysis, Ricoeur revived the various dimensions of imagination in the field of perception, to achieve a comprehensive theory of productive imagination. This paper attempts to address the importance of painting as a work of art and its relevance to reality, by studying the path taken by Ricoeur to present a theory of productive imagination. In this regard, explaining the role of Iconic Augmentation, which is one of the latent characteristics of the productive aspect of imagination, better defines the action present in the image and the dialectical position of language and image. Ultimately, painting, as a fictional work that its idea's reference is not merely a representation, becomes more real as its imaginativeness increases. The result is a new approach to describe the nature of painting as a work of art.

Philosophy (General)
arXiv Open Access 2022
Towards Evaluation of Autonomously Generated Musical Compositions: A Comprehensive Survey

Daniel Kvak

There are many applications that aim to create a complete model for an autonomously generated composition; systems are able to generate muzak songs, assist singers in transcribing songs or can imitate long-dead authors. Subjective understanding of creativity or aesthetics differs not only within preferences (popular authors or genres), but also differs on the basis of experienced experience or socio-cultural environment. So, what do we want to achieve with such an adaptation? What is the benefit of the resulting work for the author, who can no longer evaluate this composition? And in what ways should we evaluate such a composition at all?

en cs.SD, cs.CY
arXiv Open Access 2022
Utility-Oriented Underwater Image Quality Assessment Based on Transfer Learning

Weiling Chen, Rongfu Lin, Honggang Liao et al.

The widespread image applications have greatly promoted the vision-based tasks, in which the Image Quality Assessment (IQA) technique has become an increasingly significant issue. For user enjoyment in multimedia systems, the IQA exploits image fidelity and aesthetics to characterize user experience; while for other tasks such as popular object recognition, there exists a low correlation between utilities and perceptions. In such cases, the fidelity-based and aesthetics-based IQA methods cannot be directly applied. To address this issue, this paper proposes a utility-oriented IQA in object recognition. In particular, we initialize our research in the scenario of underwater fish detection, which is a critical task that has not yet been perfectly addressed. Based on this task, we build an Underwater Image Utility Database (UIUD) and a learning-based Underwater Image Utility Measure (UIUM). Inspired by the top-down design of fidelity-based IQA, we exploit the deep models of object recognition and transfer their features to our UIUM. Experiments validate that the proposed transfer-learning-based UIUM achieves promising performance in the recognition task. We envision our research provides insights to bridge the researches of IQA and computer vision.

en cs.CV, eess.IV

Halaman 23 dari 9297