What’s in a text-to-image prompt? The potential of stable diffusion in visual arts education
N. Dehouche, Kullathida Dehouche
Text-to-Image artificial intelligence (AI) recently saw a major breakthrough with the release of Dall-E and its open-source counterpart, Stable Diffusion. These programs allow anyone to create original visual art pieces by simply providing descriptions in natural language (prompts). Using a sample of 72,980 Stable Diffusion prompts, we propose a formalization of this new medium of art creation and assess its potential for teaching the history of art, aesthetics, and technique. Our findings indicate that text-to-Image AI has the potential to revolutionize the way art is taught, offering new, cost-effective possibilities for experimentation and expression. However, it also raises important questions about the ownership of artistic works. As more and more art is created using these programs, it will be crucial to establish new legal and economic models to protect the rights of artists.
131 sitasi
en
Computer Science, Medicine
Visual-ERM: Reward Modeling for Visual Equivalence
Ziyu Liu, Shengyuan Ding, Xinyu Fang
et al.
Vision-to-code tasks require models to reconstruct structured visual inputs, such as charts, tables, and SVGs, into executable or structured representations with high visual fidelity. While recent Large Vision Language Models (LVLMs) achieve strong results via supervised fine-tuning, reinforcement learning remains challenging due to misaligned reward signals. Existing rewards either rely on textual rules or coarse visual embedding similarity, both of which fail to capture fine-grained visual discrepancies and are vulnerable to reward hacking. We propose Visual Equivalence Reward Model (Visual-ERM), a multimodal generative reward model that provides fine-grained, interpretable, and task-agnostic feedback to evaluate vision-to-code quality directly in the rendered visual space. Integrated into RL, Visual-ERM improves Qwen3-VL-8B-Instruct by +8.4 on chart-to-code and yields consistent gains on table and SVG parsing (+2.7, +4.1 on average), and further strengthens test-time scaling via reflection and revision. We also introduce VisualCritic-RewardBench (VC-RewardBench), a benchmark for judging fine-grained image-to-image discrepancies on structured visual data, where Visual-ERM at 8B decisively outperforms Qwen3-VL-235B-Instruct and approaches leading closed-source models. Our results suggest that fine-grained visual reward supervision is both necessary and sufficient for vision-to-code RL, regardless of task specificity.
The Revealed Structure. Drawing between Construction and Form
Stefano Chiarenza, Marta Salvatore
The coherence of built architecture often depends on a latent structural design whose essential role can govern architectural outcomes even when not overtly visible. Structural logic is either integrated into the building as a whole or concealed behind finished surfaces. However, this concealed structure often defines space and regulates formal relationships. Instead of viewing structure and form as opposites or as directly corresponding, it is more accurate to understand them as mutually influential, their relationship shaped by specific design choices. Every construction act creates an order, and every form manifests through its underlying structure, highlighting an ongoing interplay between these elements. [read more]
Drawing. Design. Illustration, Visual arts
Interdisciplinary Approach to Monkeypox Prevention: Integrating Nanobiosensors, Nanovaccines, Artificial Intelligence, Visual Arts, and Social Sciences
Vishal Chaudhary, Lucky Lucky, Harsh Sable
et al.
To effectively address crisis emergence of new virus such as monkeypox, a collective and collaborative effort between scientists, engineers, innovators, and artists from all ages, regions, and diverse fields is required. This review explores a holistic approach to addressing the monkeypox crisis by integrating nanobiosensors, artificial intelligence, visual arts, humanities, and social sciences. Traditional diagnostic methods are often limited by time, accessibility, and accuracy, but the advancement of point‐of‐care smart nanobiosensors offers a promising shift toward rapid, precise, and accessible diagnostics. They enhance the ability to screen, diagnose, and monitor monkeypox infections efficiently, contributing to better disease management. Beyond technological innovation, the essential role of arts, humanities, and social sciences in fostering public engagement, understanding, and acceptance of new diagnostic tools is emphasized. Visual arts can illustrate scientific concepts, making them more relatable, while storytelling through various media can reduce stigma and promote preventive measures. Social sciences provide insights into cultural attitudes, behaviors, and public health challenges, ensuring that technological solutions are effectively integrated into diverse communities. By combining these disciplines, this review presents a comprehensive framework for a more resilient global health system that aligns with One Health principles, emphasizing the interconnectedness of human, animal, and environmental health.
Sketch & Paint: Stroke-by-Stroke Evolution of Visual Artworks
Jeripothula Prudviraj, Vikram Jamwal
Understanding the stroke-based evolution of visual artworks is useful for advancing artwork learning, appreciation, and interactive display. While the stroke sequence of renowned artworks remains largely unknown, formulating this sequence for near-natural image drawing processes can significantly enhance our understanding of artistic techniques. This paper introduces a novel method for approximating artwork stroke evolution through a proximity-based clustering mechanism. We first convert pixel images into vector images via parametric curves and then explore the clustering approach to determine the sequence order of extracted strokes. Our proposed algorithm demonstrates the potential to infer stroke sequences in unknown artworks. We evaluate the performance of our method using WikiArt data and qualitatively demonstrate the plausible stroke sequences. Additionally, we demonstrate the robustness of our approach to handle a wide variety of input image types such as line art, face sketches, paintings, and photographic images. By exploring stroke extraction and sequence construction, we aim to improve our understanding of the intricacies of the art development techniques and the step-by-step reconstruction process behind visual artworks, thereby enriching our understanding of the creative journey from the initial sketch to the final artwork.
Proceedings of The third international workshop on eXplainable AI for the Arts (XAIxArts)
Corey Ford, Elizabeth Wilson, Shuoyang Zheng
et al.
This third international workshop on explainable AI for the Arts (XAIxArts) brought together a community of researchers in HCI, Interaction Design, AI, explainable AI (XAI), and digital arts to explore the role of XAI for the Arts. Workshop held at the 17th ACM Conference on Creativity and Cognition (C&C 2025), online.
Visual-RFT: Visual Reinforcement Fine-Tuning
Ziyu Liu, Zeyi Sun, Yuhang Zang
et al.
Reinforcement Fine-Tuning (RFT) in Large Reasoning Models like OpenAI o1 learns from feedback on its answers, which is especially useful in applications when fine-tuning data is scarce. Recent open-source work like DeepSeek-R1 demonstrates that reinforcement learning with verifiable reward is one key direction in reproducing o1. While the R1-style model has demonstrated success in language models, its application in multi-modal domains remains under-explored. This work introduces Visual Reinforcement Fine-Tuning (Visual-RFT), which further extends the application areas of RFT on visual tasks. Specifically, Visual-RFT first uses Large Vision-Language Models (LVLMs) to generate multiple responses containing reasoning tokens and final answers for each input, and then uses our proposed visual perception verifiable reward functions to update the model via the policy optimization algorithm such as Group Relative Policy Optimization (GRPO). We design different verifiable reward functions for different perception tasks, such as the Intersection over Union (IoU) reward for object detection. Experimental results on fine-grained image classification, few-shot object detection, reasoning grounding, as well as open-vocabulary object detection benchmarks show the competitive performance and advanced generalization ability of Visual-RFT compared with Supervised Fine-tuning (SFT). For example, Visual-RFT improves accuracy by $24.3\%$ over the baseline in one-shot fine-grained image classification with around 100 samples. In few-shot object detection, Visual-RFT also exceeds the baseline by $21.9$ on COCO's two-shot setting and $15.4$ on LVIS. Our Visual-RFT represents a paradigm shift in fine-tuning LVLMs, offering a data-efficient, reward-driven approach that enhances reasoning and adaptability for domain-specific tasks.
Ane Lekuona-Mariscal. Hacia una historia feminista del arte del País Vasco. Trazando nuevas genealogías 1950-1972. Granada: Comares, 2024. 268 pp.
Clara Solbes-Borja
Reseña del libro
The Recessed Arched Mihrab Design: An Identity Marker in the Formation of Art in the Zand Era
samaneh kakavand
Each King in their respective reigns attempted to shape visual culture according to their intellectual and ideological foundations. In some periods, these changes were explicit and evident, while in others, they were implicit and gradual. The Zand era also possesses distinctive artistic characteristics, including a particular type of Mihrab design. In other words, one of the identifying components of Zand-era visual culture is a type of "Mihrab" design. The Zand Mihrab design exhibits unique visual elements that serve as an iconic representation of the visual culture of that period, along with other visual components, forming the artistic characteristics of the era. The art of the Zand era is full of innovative designs, among which the repetition of a type of frame similar to the types of altars is thought-provoking. Introducing the mentioned frame as an identity identifier and searching for its frequency and how it is manifested in the Zand heritage is the concern of this article. The design of a mihrab with a corrugation, which is often seen in the art of the Zand, is a combination of two crescent and pointed arches, which has brought the upper part, especially the crown of the arch, to the fore in a new form. The mentioned design is an innovative frame in the Zend era and has a hybrid form that is inspired by two crescent and pointed arches in the crown part. The discovery of such distinctive markers necessitates research in this field, as the lack of attention to the study framework may result in misattributing artworks to the Zand era or the early years of the Qajar period without proper identification. The adoption of Mihrab designs imprinted on dateless artifacts and handicrafts of the Zand era will serve as an identity marker of that period. While motifs inspired by arches and muqarnas are prominent in Islamic art, the present study argues that the "Mihrab" design discussed here, is a distinguished visual characteristic of Zand-era art.Research Objective: The primary objective is to introduce, classify, and study the form of the "Mihrab" frame in Zand-era artworks to define one of its visual elements and cultural markers. The secondary objective is to establish the "Zand Mihrab" as an identity marker.Research Questions: How can the unique work of the "Zand Frame" be explained as an identity marker in artworks from the second half of the 12th century and the first half of the 13th century AH? How is the identity marker of the "Zand motif" interpreted in terms of innovation in the design of the Mihrab?In terms of the hypothesis it is assumed that the "Mihrab" design is an identity marker of the art in the Zand era.Research Methodology: This research is primarily theoretical, as it aims to explore the visual phenomenon known as the "Zand Mihrab" design and to elucidate its characteristics and qualities. The research employs a descriptive-analytical approach, utilizing library research, document analysis, virtual observations, and field visits to collect information and concepts. Data was gathered through written texts and images, focusing on 19 samples of visual arts from the Zand period, including tiles of Vakil mosque, the pavilion, the Haft-Tanan mansion, as well as objects and artifacts such as carpets, tombstones, a wooden door located in the Qazvin Museum, and a fabric piece containing a "Mihrab" frame.Results: The quantitative findings of this research indicate a prevalence of the "Mihrab" design in the Vakil mosque compared to other architectural structures and artistic works of the mentioned period. The study of the chronological evolution of the "Zand Mihrab" design demonstrates its invention and flourishing in the second half of the 12th century AH and persisted during the first half of the 13th century AH.Another significant finding of this research is that, although the studied design carries spiritual meanings, it has been transformed in usage during the Zand period and manifested in non-religious architectural structures and artifacts. The "Mihrab" design is found in works such as the Vakil mosque (religious), the pavilion, and the Abdul Razzaq Khan mansion (palace), as well as in Vakil School (educational) and the decoration of objects and tombstones. The Zand-era artists, through innovation in the design and modern visual composition of the Mihrab arch by iterating and harmonizing the application of the motif, and ultimately the utilization of the principle of diversity, transformed the "Mihrab" design into an identity element of the Zand period. Thus, in addition to its original invention, the repetition and diverse application of this form in various arts and architectures have turned the spiritual figure of Mihrab design into a symbol of Zand's identity and an integral part of the visual culture.
Drawing Time: Showcasing Archives Beyond the Museum. The case study of 10·Corso·Como “Yohji Yamamoto. Letter to the Future”
Irene Calvi
Understanding Visual Arts Experiences of Blind People
Franklin Mingzhe Li, Lotus Zhang, Maryam Bandukda
et al.
Visual arts play an important role in cultural life and provide access to social heritage and self-enrichment, but most visual arts are inaccessible to blind people. Researchers have explored different ways to enhance blind people’s access to visual arts (e.g., audio descriptions, tactile graphics). However, how blind people adopt these methods remains unknown. We conducted semi-structured interviews with 15 blind visual arts patrons to understand how they engage with visual artwork and the factors that influence their adoption of visual arts access methods. We further examined interview insights in a follow-up survey (N=220). We present: 1) current practices and challenges of accessing visual artwork in-person and online (e.g., Zoom tour), 2) motivation and cognition of perceiving visual arts (e.g., imagination), and 3) implications for designing visual arts access methods. Overall, our findings provide a roadmap for technology-based support for blind people’s visual arts experiences.
61 sitasi
en
Computer Science
Enhancing primary school students’ performance, flow state, and cognitive load in visual arts education through the integration of augmented reality technology in a card game
Jing Chen, N. A. M. Mokmin
27 sitasi
en
Computer Science
GAN Computers Generate Arts? A Survey on Visual Arts, Music, and Literary Text Generation using Generative Adversarial Network
Sakib Shahriar
"Art is the lie that enables us to realize the truth."- Pablo Picasso. For centuries, humans have dedicated themselves to producing arts to convey their imagination. The advancement in technology and deep learning in particular, has caught the attention of many researchers trying to investigate whether art generation is possible by computers and algorithms. Using generative adversarial networks (GANs), applications such as synthesizing photorealistic human faces and creating captions automatically from images were realized. This survey takes a comprehensive look at the recent works using GANs for generating visual arts, music, and literary text. A performance comparison and description of the various GAN architecture are also presented. Finally, some of the key challenges in art generation using GANs are highlighted along with recommendations for future work.
122 sitasi
en
Computer Science, Mathematics
EXPLORING THE IMPACT OF ARTIFICIAL INTELLIGENCE IN THE VISUAL ARTS: A COMPREHENSIVE STUDY
P. Ezhilmurugan., Y. E
Human thinking first appeared through visual art. From the early cave man paintings to this modern-day AI-generated image and deep learning algorithms, the world has developed. Artificial intelligence (AI) has impacted the visual arts in various ways, and it has influenced more and more as a transformative force in many fields. Through this study, the complex link between artificial intelligence and the visual arts is explained by analyzing the effects, the outcomes, and the future paths. The study explores how artificial intelligence has transformed the production, interpretation, and consumption of art. It also shows how AI algorithms are employed by artists to generate imagery. This study, along with the analysis of surveys, experimental initiatives, and different artworks, explains the impact of artificial intelligence in the world of visual arts. It also tells how artificial intelligence has swayed historical mythology and its sociocultural ramifications. With an interdisciplinary approach, this study integrates the understanding of computer science, art history, and cultural studies and offers a subtle analysis of the profound impact of AI on the visual arts. Finally, this comprehensive study provides an insight on how artificial intelligence has influenced and impacted the visual arts and about the evolutionary technological potential of it in the future by providing a deeper and better understanding of the growing complex link between creativity and technology in modern art practices.
Artificial Neural Networks and Deep Learning in the Visual Arts: a review
I. Santos, Luz Castro, Nereida Rodriguez-Fernandez
et al.
101 sitasi
en
Computer Science
Art2Mus: Bridging Visual Arts and Music through Cross-Modal Generation
Ivan Rinaldi, Nicola Fanelli, Giovanna Castellano
et al.
Artificial Intelligence and generative models have revolutionized music creation, with many models leveraging textual or visual prompts for guidance. However, existing image-to-music models are limited to simple images, lacking the capability to generate music from complex digitized artworks. To address this gap, we introduce $\mathcal{A}\textit{rt2}\mathcal{M}\textit{us}$, a novel model designed to create music from digitized artworks or text inputs. $\mathcal{A}\textit{rt2}\mathcal{M}\textit{us}$ extends the AudioLDM~2 architecture, a text-to-audio model, and employs our newly curated datasets, created via ImageBind, which pair digitized artworks with music. Experimental results demonstrate that $\mathcal{A}\textit{rt2}\mathcal{M}\textit{us}$ can generate music that resonates with the input stimuli. These findings suggest promising applications in multimedia art, interactive installations, and AI-driven creative tools.
An Art-centric perspective on AI-based content moderation of nudity
Piera Riccio, Georgina Curto, Thomas Hofmann
et al.
At a time when the influence of generative Artificial Intelligence on visual arts is a highly debated topic, we raise the attention towards a more subtle phenomenon: the algorithmic censorship of artistic nudity online. We analyze the performance of three "Not-Safe-For-Work'' image classifiers on artistic nudity, and empirically uncover the existence of a gender and a stylistic bias, as well as evident technical limitations, especially when only considering visual information. Hence, we propose a multi-modal zero-shot classification approach that improves artistic nudity classification. From our research, we draw several implications that we hope will inform future research on this topic.
Proceedings of The second international workshop on eXplainable AI for the Arts (XAIxArts)
Nick Bryan-Kinns, Corey Ford, Shuoyang Zheng
et al.
This second international workshop on explainable AI for the Arts (XAIxArts) brought together a community of researchers in HCI, Interaction Design, AI, explainable AI (XAI), and digital arts to explore the role of XAI for the Arts. Workshop held at the 16th ACM Conference on Creativity and Cognition (C&C 2024), Chicago, USA.
Understanding Visual Concepts Across Models
Brandon Trabucco, Max Gurinas, Kyle Doherty
et al.
Large multimodal models such as Stable Diffusion can generate, detect, and classify new visual concepts after fine-tuning just a single word embedding. Do models learn similar words for the same concepts (i.e. <orange-cat> = orange + cat)? We conduct a large-scale analysis on three state-of-the-art models in text-to-image generation, open-set object detection, and zero-shot classification, and find that new word embeddings are model-specific and non-transferable. Across 4,800 new embeddings trained for 40 diverse visual concepts on four standard datasets, we find perturbations within an $ε$-ball to any prior embedding that generate, detect, and classify an arbitrary concept. When these new embeddings are spliced into new models, fine-tuning that targets the original model is lost. We show popular soft prompt-tuning approaches find these perturbative solutions when applied to visual concept learning tasks, and embeddings for visual concepts are not transferable. Code for reproducing our work is available at: https://visual-words.github.io.
Beyond Embeddings: The Promise of Visual Table in Visual Reasoning
Yiwu Zhong, Zi-Yuan Hu, Michael R. Lyu
et al.
Visual representation learning has been a cornerstone in computer vision, involving typical forms such as visual embeddings, structural symbols, and text-based representations. Despite the success of CLIP-type visual embeddings, they often lack access to world knowledge critical for visual reasoning. In this work, we propose Visual Table, a novel form of visual representation tailored for visual reasoning. Visual tables are constructed as hierarchical descriptions of visual scenes, featuring a scene description and multiple object-centric descriptions covering categories, attributes, and knowledge. Thanks to the structural and textual formats, visual tables offer unique advantages over mere visual embeddings, such as interpretability and controllable editing. Furthermore, they deliver instance-level world knowledge and detailed attributes that are essential for visual reasoning. To create visual tables, we develop a generator trained on the dataset with collected, small-scale annotations. Extensive results on 11 visual reasoning benchmarks demonstrate that the generated visual tables significantly outperform previous structural and text-based representations. Moreover, they consistently enhance state-of-the-art multimodal large language models across diverse benchmarks, showcasing their potential for advancing visual reasoning tasks. Our code is available at https://github.com/LaVi-Lab/Visual-Table.