A. Elliot, Markus A. Maier
Hasil untuk "Aesthetics"
Menampilkan 20 dari ~275324 hasil · dari arXiv, DOAJ, Semantic Scholar, CrossRef
H. O'Brien, Elaine Toms
A. Schein, Alexandrin Popescul, L. Ungar et al.
Wen Yin, Cencen Liu, Dingrui Liu et al.
Unifying Image Quality Assessment (IQA) and Image Aesthetic Assessment (IAA) in a single multimodal large language model is appealing, yet existing methods adopt a task-agnostic recipe that applies the same reasoning strategy and reward to both tasks. We show this is fundamentally misaligned: IQA relies on low-level, objective perceptual cues and benefits from concise distortion-focused reasoning, whereas IAA requires deliberative semantic judgment and is poorly served by point-wise score regression. We identify these as a reasoning mismatch and an optimization mismatch, and provide empirical evidence for both through controlled probes. Motivated by these findings, we propose TATAR (Task-Aware Thinking with Asymmetric Rewards), a unified framework that shares the visual-language backbone while conditioning post-training on each task's nature. TATAR combines three components: fast--slow task-specific reasoning construction that pairs IQA with concise perceptual rationales and IAA with deliberative aesthetic narratives; two-stage SFT+GRPO learning that establishes task-aware behavioral priors before reward-driven refinement; and asymmetric rewards that apply Gaussian score shaping for IQA and Thurstone-style completion ranking for IAA. Extensive experiments across eight benchmarks demonstrate that TATAR consistently outperforms prior unified baselines on both tasks under in-domain and cross-domain settings, remains competitive with task-specific specialized models, and yields more stable training dynamics for aesthetic assessment. Our results establish task-conditioned post-training as a principled paradigm for unified perceptual scoring. Our code is publicly available at https://github.com/yinwen2019/TATAR.
Kaiyuan Ji, Yixuan Gao, Lu Sun et al.
Advertising images significantly impact commercial conversion rates and brand equity, yet current evaluation methods rely on subjective judgments, lacking scalability, standardized criteria, and interpretability. To address these challenges, we present A^3 (Advertising Aesthetic Assessment), a comprehensive framework encompassing four components: a paradigm (A^3-Law), a dataset (A^3-Dataset), a multimodal large language model (A^3-Align), and a benchmark (A^3-Bench). Central to A^3 is a theory-driven paradigm, A^3-Law, comprising three hierarchical stages: (1) Perceptual Attention, evaluating perceptual image signals for their ability to attract attention; (2) Formal Interest, assessing formal composition of image color and spatial layout in evoking interest; and (3) Desire Impact, measuring desire evocation from images and their persuasive impact. Building on A^3-Law, we construct A^3-Dataset with 120K instruction-response pairs from 30K advertising images, each richly annotated with multi-dimensional labels and Chain-of-Thought (CoT) rationales. We further develop A^3-Align, trained under A^3-Law with CoT-guided learning on A^3-Dataset. Extensive experiments on A^3-Bench demonstrate that A^3-Align achieves superior alignment with A^3-Law compared to existing models, and this alignment generalizes well to quality advertisement selection and prescriptive advertisement critique, indicating its potential for broader deployment. Dataset, code, and models can be found at: https://github.com/euleryuan/A3-Align.
Wenhan Wang, Zhixiang Zhou, Zhongtian Ma et al.
The connoisseurship of antique Chinese porcelain demands extensive historical expertise, material understanding, and aesthetic sensitivity, making it difficult for non-specialists to engage. To democratize cultural-heritage understanding and assist expert connoisseurship, we introduce CiQi-Agent -- a domain-specific Porcelain Connoisseurship Agent for intelligent analysis of antique Chinese porcelain. CiQi-Agent supports multi-image porcelain inputs and enables vision tool invocation and multimodal retrieval-augmented generation, performing fine-grained connoisseurship analysis across six attributes: dynasty, reign period, kiln site, glaze color, decorative motif, and vessel shape. Beyond attribute classification, it captures subtle visual details, retrieves relevant domain knowledge, and integrates visual and textual evidence to produce coherent, explainable connoisseurship descriptions. To achieve this capability, we construct a large-scale, expert-annotated dataset CiQi-VQA, comprising 29,596 porcelain specimens, 51,553 images, and 557,940 visual question--answering pairs, and further establish a comprehensive benchmark CiQi-Bench aligned with the previously mentioned six attributes. CiQi-Agent is trained through supervised fine-tuning, reinforcement learning, and a tool-augmented reasoning framework that integrates two categories of tools: a vision tool and multimodal retrieval tools. Experimental results show that CiQi-Agent (7B) outperforms all competitive open- and closed-source models across all six attributes on CiQi-Bench, achieving on average 12.2\% higher accuracy than GPT-5. The model and dataset have been released and are publicly available at https://huggingface.co/datasets/SII-Monument-Valley/CiQi-VQA.
Chen Zheng, Yuxuan Lai, Haoyang Lu et al.
The handwriting of Chinese characters is a fundamental aspect of learning the Chinese language. Previous automated assessment methods often framed scoring as a regression problem. However, this score-only feedback lacks actionable guidance, which limits its effectiveness in helping learners improve their handwriting skills. In this paper, we leverage vision-language models (VLMs) to analyze the quality of handwritten Chinese characters and generate multi-level feedback. Specifically, we investigate two feedback generation tasks: simple grade feedback (Task 1) and enriched, descriptive feedback (Task 2). We explore both low-rank adaptation (LoRA)-based fine-tuning strategies and in-context learning methods to integrate aesthetic assessment knowledge into VLMs. Experimental results show that our approach achieves state-of-the-art performances across multiple evaluation tracks in the CCL 2025 workshop on evaluation of handwritten Chinese character quality.
Hameed Olutoba Lawal
Dance serves as a potent metaphor for Taiye Adeola’s Ni Ile Wa, embodying cultural identity, resistance, and transformation. This study explores the symbolic and narrative functions of dance within the text, and examines how movement and rhythm reflect societal tensions, personal struggles, and communal aspirations. Drawing on theories of performance, embodiment, and postcolonial aesthetics, this paper argues that dance in Ni Ile Wa is not merely an artistic expression, but a language through which characters negotiate power, memory, and belonging. Ultimately, work positions dance as a dynamic force that bridges the past and present, reinforcing cultural heritage while enabling new forms of self-expression.
Luwei Xiao, Rui Mao, Shuai Zhao et al.
Multimodal aspect-based sentiment classification (MASC) is an emerging task due to an increase in user-generated multimodal content on social platforms, aimed at predicting sentiment polarity toward specific aspect targets (i.e., entities or attributes explicitly mentioned in text-image pairs). Despite extensive efforts and significant achievements in existing MASC, substantial gaps remain in understanding fine-grained visual content and the cognitive rationales derived from semantic content and impressions (cognitive interpretations of emotions evoked by image content). In this study, we present Chimera: a cognitive and aesthetic sentiment causality understanding framework to derive fine-grained holistic features of aspects and infer the fundamental drivers of sentiment expression from both semantic perspectives and affective-cognitive resonance (the synergistic effect between emotional responses and cognitive interpretations). Specifically, this framework first incorporates visual patch features for patch-word alignment. Meanwhile, it extracts coarse-grained visual features (e.g., overall image representation) and fine-grained visual regions (e.g., aspect-related regions) and translates them into corresponding textual descriptions (e.g., facial, aesthetic). Finally, we leverage the sentimental causes and impressions generated by a large language model (LLM) to enhance the model's awareness of sentimental cues evoked by semantic content and affective-cognitive resonance. Experimental results on standard MASC datasets demonstrate the effectiveness of the proposed model, which also exhibits greater flexibility to MASC compared to LLMs such as GPT-4o. We have publicly released the complete implementation and dataset at https://github.com/Xillv/Chimera
Lujian Yao, Siming Zheng, Xinbin Yuan et al.
Traditional photography composition approaches are dominated by 2D cropping-based methods. However, these methods fall short when scenes contain poorly arranged subjects. Professional photographers often employ perspective adjustment as a form of 3D recomposition, modifying the projected 2D relationships between subjects while maintaining their actual spatial positions to achieve better compositional balance. Inspired by this artistic practice, we propose photography perspective composition (PPC), extending beyond traditional cropping-based methods. However, implementing the PPC faces significant challenges: the scarcity of perspective transformation datasets and undefined assessment criteria for perspective quality. To address these challenges, we present three key contributions: (1) An automated framework for building PPC datasets through expert photographs. (2) A video generation approach that demonstrates the transformation process from less favorable to aesthetically enhanced perspectives. (3) A perspective quality assessment (PQA) model constructed based on human performance. Our approach is concise and requires no additional prompt instructions or camera trajectories, helping and guiding ordinary users to enhance their composition skills.
Weimin Wang, Jiawei Liu, Zhijie Lin et al.
The growing demand for high-fidelity video generation from textual descriptions has catalyzed significant research in this field. In this work, we introduce MagicVideo-V2 that integrates the text-to-image model, video motion generator, reference image embedding module and frame interpolation module into an end-to-end video generation pipeline. Benefiting from these architecture designs, MagicVideo-V2 can generate an aesthetically pleasing, high-resolution video with remarkable fidelity and smoothness. It demonstrates superior performance over leading Text-to-Video systems such as Runway, Pika 1.0, Morph, Moon Valley and Stable Video Diffusion model via user evaluation at large scale.
Shiyu Duan, Runsheng Zhang, Mengmeng Chen et al.
This paper presents a novel method for user interface (UI) generation based on the Transformer architecture, addressing the increasing demand for efficient and aesthetically pleasing UI designs in software development. Traditional UI design relies heavily on designers' expertise, which can be time-consuming and costly. Leveraging the capabilities of Transformers, particularly their ability to capture complex design patterns and long-range dependencies, we propose a Transformer-based interface generation tree algorithm. This method constructs a hierarchical representation of UI components as nodes in a tree structure, utilizing pre-trained Transformer models for encoding and decoding. We define a markup language to describe UI components and their properties and use a rich dataset of real-world web and mobile application interfaces for training. The experimental results demonstrate that our approach not only significantly enhances design quality and efficiency but also outperforms traditional models in user satisfaction and aesthetic appeal. We also provide a comparative analysis with existing models, illustrating the advantages of our method in terms of accuracy, user ratings, and design similarity. Overall, our study underscores the potential of the Transformer based approach to revolutionize the UI design process, making it accessible for non-professionals while maintaining high standards of quality.
Jian Chen, Ruiyi Zhang, Yufan Zhou et al.
Controllable layout generation refers to the process of creating a plausible visual arrangement of elements within a graphic design (e.g., document and web designs) with constraints representing design intentions. Although recent diffusion-based models have achieved state-of-the-art FID scores, they tend to exhibit more pronounced misalignment compared to earlier transformer-based models. In this work, we propose the $\textbf{LA}$yout $\textbf{C}$onstraint diffusion mod$\textbf{E}$l (LACE), a unified model to handle a broad range of layout generation tasks, such as arranging elements with specified attributes and refining or completing a coarse layout design. The model is based on continuous diffusion models. Compared with existing methods that use discrete diffusion models, continuous state-space design can enable the incorporation of differentiable aesthetic constraint functions in training. For conditional generation, we introduce conditions via masked input. Extensive experiment results show that LACE produces high-quality layouts and outperforms existing state-of-the-art baselines.
Esther-Maria Guggenmos
This article is a revised version of the inaugural lecture delivered on 5 October2023, on the occasion of the author's appointment as Professor of History of Religions at Lund University. It opens by depicting fundamental changes in the study of the history of religions in the twentieth century, followed by biographical notes, including her research on lay Buddhism in urban Taiwan, the emphasis on sensual dimensions of religious practice and the aesthetics of religion, and international academic networking in the analysis of practices of prognostication between Asia and Europe. Three areas are outlined that are central to the author's current research. It is pointed out that a focus on religion in contemporary society certainly includes a healthy awareness of current developments in the politics of religion, particularly in East Asia. In addition, the article addresses two fields of research that the author is currently engaged in: (1) The emergence of "Life Education" as a school subject in Greater China and the pedagogical shift that goes along with it. Particularly in Taiwan, this new subject is tailored to create a space for juveniles to develop self-reflection and life orientation in a success-oriented society while a new trust in religious organizations leads to the organizations' active engagement in these developments. The author is especially interested in how the transforming relationship between religion and public education gains special relevance in a comparative perspective between Asia and Europe. (2) Religious change in East Asia is evident in Buddhist ritual practices that are impacted by a consumer society that moulds emotionally profound experiences into marketable and distinct units that Eva Illouz has termed "emodities". Religious practices are subject to change in our contemporary world as they are reshaped by a growing global digitalized consumer culture. Tracing these changes leads to a deeper understanding of the underlying forces that distinctly reshape contemporary religious life.
Oleg Shevchenko / Олег Константинович Шевченко, Anna Dorofeeva / Анна Андреевна Дорофеева
The article, based on a mythopoetic approach, examines several lines of wines from the Alma Valley producer. The authors do a lot of work to identify the mythopoetic elements of the Alma terroir from Antiquity to the nineteenth century, from Scythian settlements to the Battle of Alma. Thus, they consider it possible to give a high mythopoetic assessment of the “Scythian Gold” wine series. But the authors also note a number of inconsistencies between this wine branding and its type. It is proposed to strengthen the historical and mythological component of the brand, more clearly drawing the lines of relationship between the character of the wine and the label design, grape variety and the general idea of the series. Analyzing the series of wines “Seasons” (PIN AP), the authors emphasize the exceptional success of positioning the product in the gender and age segment of the consumer market but conclude that this move cannot be the main one for the philosophy of Alma Valley wines. Particular attention was paid to the “Alma.X” series and the search for the most appropriate mythopoetic line for the original wines, which are the result of bold experiments by Alma Valley specialists. В статье на основе мифопоэтического подхода рассматривается несколько линеек вин производителя Alma Valley. Авторы проводят большую работу по выявлению мифопоэтических элементов терруара Альмы от Античности до девятнадцатого века, от скифских городищ до Альминского сражения. Таким образом, они считают возможным дать высокою мифопоэтическую оценку серии вина «Золото скифов». Но авторы отмечают и ряд несоответствий между данным брендированием вина и его видом. Предлагается усилить историко-мифологическую составляющую бренда, более четко проводя линии взаимосвязи между характером вина и дизайном этикетки, сортом винограда и общей идеей серии. Анализируя серию вин «Времена года» (ПИН АП), авторы подчеркивают исключительную удачность позиционирования продукта в гендерно-возрастном сегменте рынка потребителей, но делают вывод, что этот ход не может являться магистральным для философии вин Alma Valley. Особое внимание было уделено серии «Альма.Икс» и поиску наиболее соответствующей мифопоэтической линии для оригинальных вин, являющихся результатом смелых экспериментов специалистов Alma Valley.
Kibeom Hong, Seogkyu Jeon, Junsoo Lee et al.
To deliver the artistic expression of the target style, recent studies exploit the attention mechanism owing to its ability to map the local patches of the style image to the corresponding patches of the content image. However, because of the low semantic correspondence between arbitrary content and artworks, the attention module repeatedly abuses specific local patches from the style image, resulting in disharmonious and evident repetitive artifacts. To overcome this limitation and accomplish impeccable artistic style transfer, we focus on enhancing the attention mechanism and capturing the rhythm of patterns that organize the style. In this paper, we introduce a novel metric, namely pattern repeatability, that quantifies the repetition of patterns in the style image. Based on the pattern repeatability, we propose Aesthetic Pattern-Aware style transfer Networks (AesPA-Net) that discover the sweet spot of local and global style expressions. In addition, we propose a novel self-supervisory task to encourage the attention mechanism to learn precise and meaningful semantic correspondence. Lastly, we introduce the patch-wise style loss to transfer the elaborate rhythm of local patterns. Through qualitative and quantitative evaluations, we verify the reliability of the proposed pattern repeatability that aligns with human perception, and demonstrate the superiority of the proposed framework.
Svetlana D. HRISTOVA-VLADI
Objectives. This study focuses on the visibility of three local festivals in Bulgaria: Rose Festival in Kazanlak, July Morning at Kamen bryag and the Festival of Peppers, Tomatoes, Traditional Foods, and Crafts in Kurtovo Konare. The research on festive visibility has been deconstructed to three components of analysis: story, local imagery and photogenicity (colors, photographic visuals). Material and methods. These include participant observations, in-depth interviews, analysis of visuals (both website and media ones as well as photographs, taken by the researcher), and desktop research of scientific literature and online media outlets. Results. The researcher conducted fieldwork as participant observer, interviewer, photographer, and visual analyst of festive events. It was discovered that the Rose Festival promotes pink symbols as prevalent elements of the cultural-historical branding, encompassing Thracian heritage and rose farming. July Morning has been commodified towards fragmented celebrations happening in the peripheral moment of 30th June and 1st July. This has obscured the sense of community and the sense of place affiliated with the initial phenomenon. Local farmers’ aesthetics and diligence play a central role in the publicity of Kurtovo Konare Fest: their agrarian knowledge and willpower to actively participate in social life, upskill and exchange know-how with fellow famers. Conclusions. The three local celebrations represent collections of sensations, colors, imagined experiences, memories, visitor’s expectations, sense of community and awoken sense of place. The optics of the Rose Festival in Kazanlak comprises of contrasting messages: the pink aesthetics is representing the beauty and the traditional means of local livelihood; however, the flashy pink ambience somewhat mutes the demands of the rose farmers, seen in the pieces of critical journalism. July Morning Festival has been largely deterritorialized from its original place to dispersed celebrations which do not recur the initial code of conduct. In the locality of Kamen bryag, however, the scent of wild nature and sea salt still reunites a few generations of like-minded people, mostly admirers of rock music and camping. The heart of the optics of Kurtovo Konare Fest are the village producers, eager to raise voices in defense of their production and generate a distinctive local ethos.
Miriam de Rosa, Andrea Mariani
This article moves from small-gauge film technology manufacturing and experimental film practices to develop a twofold exploration of film and camera conceptually interlinked in a mutual cycle of experimentation. Italian filmmaker Ubaldo Magnaghi’s city symphonies provide an illustrative example: active in the early 1930s as an independent filmmaker, between 1930 and 1933 he produced five films sponsored by Agfa, which was expanding its market in Italy. Magnaghi’s experimental films were thought to stress the material resistance of cameras (an Agfa Movex 30) and the film stock (Agfa Isopan reverse). The article will shed light on the affordances offered to the filmmaker regarding the film stock’s specificities and the camera involved in the manufacturing process. Such a virtuous cycle connecting manufacturing and creativity is metaphorically reinforced on the level of aesthetics in Magnaghi’s film Symphony of Life and Work (1933), characterised by a reiterated circling camera movement. Inspired by this, we aim to craft a study that assembles the various components of the filmmaker’s work in a rounded film experience, underlining the nature of small-gauge cinema as a non-neutral yet empowering practice able to create a complex room for critical analysis eliciting new ways of looking, problematising, and therefore thinking reality.
Mohammad Reza Naderi, Mohammad Hossein Givkashi, Nader Karimi et al.
Image retargeting aims at altering an image size while preserving important content and minimizing noticeable distortions. However, previous image retargeting methods create outputs that suffer from artifacts and distortions. Besides, most previous works attempt to retarget the background and foreground of the input image simultaneously. Simultaneous resizing of the foreground and background causes changes in the aspect ratios of the objects. The change in the aspect ratio is specifically not desirable for human objects. We propose a retargeting method that overcomes these problems. The proposed approach consists of the following steps. Firstly, an inpainting method uses the input image and the binary mask of foreground objects to produce a background image without any foreground objects. Secondly, the seam carving method resizes the background image to the target size. Then, a super-resolution method increases the input image quality, and we then extract the foreground objects. Finally, the retargeted background and the extracted super-resolued objects are fed into a particle swarm optimization algorithm (PSO). The PSO algorithm uses aesthetic quality assessment as its objective function to identify the best location and size for the objects to be placed in the background. We used image quality assessment and aesthetic quality assessment measures to show our superior results compared to popular image retargeting techniques.
Yizhi Wang, Guo Pu, Wenhan Luo et al.
Text logo design heavily relies on the creativity and expertise of professional designers, in which arranging element layouts is one of the most important procedures. However, few attention has been paid to this task which needs to take many factors (e.g., fonts, linguistics, topics, etc.) into consideration. In this paper, we propose a content-aware layout generation network which takes glyph images and their corresponding text as input and synthesizes aesthetic layouts for them automatically. Specifically, we develop a dual-discriminator module, including a sequence discriminator and an image discriminator, to evaluate both the character placing trajectories and rendered shapes of synthesized text logos, respectively. Furthermore, we fuse the information of linguistics from texts and visual semantics from glyphs to guide layout prediction, which both play important roles in professional layout design. To train and evaluate our approach, we construct a dataset named as TextLogo3K, consisting of about 3,500 text logo images and their pixel-level annotations. Experimental studies on this dataset demonstrate the effectiveness of our approach for synthesizing visually-pleasing text logos and verify its superiority against the state of the art.
Halaman 16 dari 13767