Dodgersort: Uncertainty-Aware VLM-Guided Human-in-the-Loop Pairwise Ranking
Yujin Park, Haejun Chung, Ikbeom Jang
Pairwise comparison labeling is emerging as it yields higher inter-rater reliability than conventional classification labeling, but exhaustive comparisons require quadratic cost. We propose Dodgersort, which leverages CLIP-based hierarchical pre-ordering, a neural ranking head and probabilistic ensemble (Elo, BTL, GP), epistemic--aleatoric uncertainty decomposition, and information-theoretic pair selection. It reduces human comparisons while improving the reliability of the rankings. In visual ranking tasks in medical imaging, historical dating, and aesthetics, Dodgersort achieves a 11--16\% annotation reduction while improving inter-rater reliability. Cross-domain ablations across four datasets show that neural adaptation and ensemble uncertainty are key to this gain. In FG-NET with ground-truth ages, the framework extracts 5--20$\times$ more ranking information per comparison than baselines, yielding Pareto-optimal accuracy--efficiency trade-offs.
Embodying Facts, Figures, and Faiths in Narrative Artistic Performances in Rural Bangladesh
Sharifa Sultana, Zinnat Sultana, Jeffrey M. Rzeszotarski
et al.
There is an increasing interest in telling serious stories with data. Designers organize information, construct narratives, and present findings to inform audiences. However, many of these practices emerge from modern information visualization rhetoric and ethical frameworks which may marginalize communities with low digital and media literacy. In a ten-month-long ethnographic study in three Bangladeshi villages, we investigated how these communities use entertainment and cultural practices, namely Puthi, Bhandari Gaan, and Pot music, to instruct, communicate traditional moral lessons and recall history. We found that these communities embrace polyvocality and multiple ethical frameworks in their performances, construct narratives combining factuality, emotionality, and aesthetics, and adapt their performances to changing technology and audience needs. Our findings provide HCI, visualization, and ethical data practitioners with implications for the design of accessible and culturally appropriate ways of presenting data narratives in data-driven systems.
SGL: A Structured Graphics Language
Jon Chapman
This paper introduces SGL, a graphics language that is aesthetically similar to SQL. As a graphical counterpart to SQL, SGL enables specification of statistical graphics within SQL query interfaces. SGL is based on a grammar of graphics that has been customized to support a SQL aesthetic. This paper presents the fundamental components of the SGL language alongside examples, and describes SGL's underlying grammar of graphics via comparison to its closest predecessor, the layered grammar of graphics.
Unveiling the Potential of iMarkers: Invisible Fiducial Markers for Advanced Robotics
Ali Tourani, Deniz Isinsu Avsar, Hriday Bavle
et al.
Fiducial markers are widely used in robotics for navigation, object recognition, and scene understanding. While offering significant advantages for robots and Augmented Reality (AR) applications, they often disrupt the visual aesthetics of environments, as they are visible to humans, making them unsuitable for many everyday use cases. To address this gap, this paper presents iMarkers, innovative, unobtrusive fiducial markers detectable exclusively by robots and AR devices equipped with adequate sensors and detection algorithms. These markers offer high flexibility in production, allowing customization of their visibility range and encoding algorithms to suit various demands. The paper also introduces the hardware designs and open-sourced software algorithms developed for detecting iMarkers, highlighting their adaptability and robustness in the detection and recognition stages. Numerous evaluations have demonstrated the effectiveness of iMarkers relative to conventional (printed) and blended fiducial markers and have confirmed their applicability across diverse robotics scenarios.
Central Path Art
Thor Catteau, Benjamin Glancy, Allen Holder
et al.
The central path revolutionized the study of optimization in the 1980s and 1990s due to its favorable convergence properties, and as such, it has been investigated analytically, algorithmically, and computationally. Past pursuits have primarily focused on linking iterative approximation algorithms to the central path in the design of efficient algorithms to solve large, and sometimes novel, optimization problems. This algorithmic intent has meant that the central path has rarely been celebrated as an aesthetic entity in low dimensions, with the only meager exceptions being illustrative examples in textbooks. We undertake this low dimensional investigation and illustrate the artistic use of the central path to create aesthetic tilings and flower-like constructs in two and three dimensions, an endeavor that combines mathematical rigor and artistic sensibilities. The result is a fanciful and enticing collection of patterns that, beyond computer generated images, supports math-aesthetic designs for novelties and museum-quality pieces of art.
Situational Agency: The Framework for Designing Behavior in Agent-based art
Ary-Yue Huang, Varvara Guljajeva
In the context of artificial life art and agent-based art, this paper draws on Simon Penny's {\itshape Aesthetic of Behavior} theory and Sofian Audry's discussions on behavior computation to examine how artists design agent behaviors and the ensuing aesthetic experiences. We advocate for integrating the environment in which agents operate as the context for behavioral design, positing that the environment emerges through continuous interactions among agents, audiences, and other entities, forming an evolving network of meanings generated by these interactions. Artists create contexts by deploying and guiding these computational systems, audience participation, and agent behaviors through artist strategies. This framework is developed by analysing two categories of agent-based artworks, exploring the intersection of computational systems, audience participation, and artistic strategies in creating aesthetic experiences. This paper seeks to provide a contextual foundation and framework for designing agents' behaviors by conducting a comparative study focused on behavioural design strategies by the artists.
Seedream 3.0 Technical Report
Yu Gao, Lixue Gong, Qiushan Guo
et al.
We present Seedream 3.0, a high-performance Chinese-English bilingual image generation foundation model. We develop several technical improvements to address existing challenges in Seedream 2.0, including alignment with complicated prompts, fine-grained typography generation, suboptimal visual aesthetics and fidelity, and limited image resolutions. Specifically, the advancements of Seedream 3.0 stem from improvements across the entire pipeline, from data construction to model deployment. At the data stratum, we double the dataset using a defect-aware training paradigm and a dual-axis collaborative data-sampling framework. Furthermore, we adopt several effective techniques such as mixed-resolution training, cross-modality RoPE, representation alignment loss, and resolution-aware timestep sampling in the pre-training phase. During the post-training stage, we utilize diversified aesthetic captions in SFT, and a VLM-based reward model with scaling, thereby achieving outputs that well align with human preferences. Furthermore, Seedream 3.0 pioneers a novel acceleration paradigm. By employing consistent noise expectation and importance-aware timestep sampling, we achieve a 4 to 8 times speedup while maintaining image quality. Seedream 3.0 demonstrates significant improvements over Seedream 2.0: it enhances overall capabilities, in particular for text-rendering in complicated Chinese characters which is important to professional typography generation. In addition, it provides native high-resolution output (up to 2K), allowing it to generate images with high visual quality.
Shaping Realities: Enhancing 3D Generative AI with Fabrication Constraints
Faraz Faruqi, Yingtao Tian, Vrushank Phadnis
et al.
Generative AI tools are becoming more prevalent in 3D modeling, enabling users to manipulate or create new models with text or images as inputs. This makes it easier for users to rapidly customize and iterate on their 3D designs and explore new creative ideas. These methods focus on the aesthetic quality of the 3D models, refining them to look similar to the prompts provided by the user. However, when creating 3D models intended for fabrication, designers need to trade-off the aesthetic qualities of a 3D model with their intended physical properties. To be functional post-fabrication, 3D models have to satisfy structural constraints informed by physical principles. Currently, such requirements are not enforced by generative AI tools. This leads to the development of aesthetically appealing, but potentially non-functional 3D geometry, that would be hard to fabricate and use in the real world. This workshop paper highlights the limitations of generative AI tools in translating digital creations into the physical world and proposes new augmentations to generative AI tools for creating physically viable 3D models. We advocate for the development of tools that manipulate or generate 3D models by considering not only the aesthetic appearance but also using physical properties as constraints. This exploration seeks to bridge the gap between digital creativity and real-world applicability, extending the creative potential of generative AI into the tangible domain.
No Longer Trending on Artstation: Prompt Analysis of Generative AI Art
Jon McCormack, Maria Teresa Llano, Stephen James Krol
et al.
Image generation using generative AI is rapidly becoming a major new source of visual media, with billions of AI generated images created using diffusion models such as Stable Diffusion and Midjourney over the last few years. In this paper we collect and analyse over 3 million prompts and the images they generate. Using natural language processing, topic analysis and visualisation methods we aim to understand collectively how people are using text prompts, the impact of these systems on artists, and more broadly on the visual cultures they promote. Our study shows that prompting focuses largely on surface aesthetics, reinforcing cultural norms, popular conventional representations and imagery. We also find that many users focus on popular topics (such as making colouring books, fantasy art, or Christmas cards), suggesting that the dominant use for the systems analysed is recreational rather than artistic.
Music-triggered fashion design: from songs to the metaverse
Martina Delgado, Marta Llopart, Eva Sarabia
et al.
The advent of increasingly-growing virtual realities poses unprecedented opportunities and challenges to different societies. Artistic collectives are not an exception, and we here aim to put special attention into musicians. Compositions, lyrics and even show-advertisements are constituents of a message that artists transmit about their reality. As such, artistic creations are ultimately linked to feelings and emotions, with aesthetics playing a crucial role when it comes to transmit artist's intentions. In this context, we here analyze how virtual realities can help to broaden the opportunities for musicians to bridge with their audiences, by devising a dynamical fashion-design recommendation system inspired by sound stimulus. We present our first steps towards re-defining musical experiences in the metaverse, opening up alternative opportunities for artists to connect both with real and virtual (\textit{e.g.} machine-learning agents operating in the metaverse) in potentially broader ways.
Creating Aesthetic Sonifications on the Web with SIREN
Tristan Peng, Hongchan Choi, Jonathan Berger
SIREN is a flexible, extensible, and customizable web-based general-purpose interface for auditory data display (sonification). Designed as a digital audio workstation for sonification, synthesizers written in JavaScript using the Web Audio API facilitate intuitive mapping of data to auditory parameters for a wide range of purposes. This paper explores the breadth of sound synthesis techniques supported by SIREN, and details the structure and definition of a SIREN synthesizer module. The paper proposes further development that will increase SIREN's utility.
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models
Aparna Elangovan, Ling Liu, Lei Xu
et al.
In this position paper, we argue that human evaluation of generative large language models (LLMs) should be a multidisciplinary undertaking that draws upon insights from disciplines such as user experience research and human behavioral psychology to ensure that the experimental design and results are reliable. The conclusions from these evaluations, thus, must consider factors such as usability, aesthetics, and cognitive biases. We highlight how cognitive biases can conflate fluent information and truthfulness, and how cognitive uncertainty affects the reliability of rating scores such as Likert. Furthermore, the evaluation should differentiate the capabilities and weaknesses of increasingly powerful large language models -- which requires effective test sets. The scalability of human evaluation is also crucial to wider adoption. Hence, to design an effective human evaluation system in the age of generative NLP, we propose the ConSiDERS-The-Human evaluation framework consisting of 6 pillars -- Consistency, Scoring Criteria, Differentiating, User Experience, Responsible, and Scalability.
A New Creative Generation Pipeline for Click-Through Rate with Stable Diffusion Model
Hao Yang, Jianxin Yuan, Shuai Yang
et al.
In online advertising scenario, sellers often create multiple creatives to provide comprehensive demonstrations, making it essential to present the most appealing design to maximize the Click-Through Rate (CTR). However, sellers generally struggle to consider users preferences for creative design, leading to the relatively lower aesthetics and quantities compared to Artificial Intelligence (AI)-based approaches. Traditional AI-based approaches still face the same problem of not considering user information while having limited aesthetic knowledge from designers. In fact that fusing the user information, the generated creatives can be more attractive because different users may have different preferences. To optimize the results, the generated creatives in traditional methods are then ranked by another module named creative ranking model. The ranking model can predict the CTR score for each creative considering user features. However, the two above stages are regarded as two different tasks and are optimized separately. In this paper, we proposed a new automated Creative Generation pipeline for Click-Through Rate (CG4CTR) with the goal of improving CTR during the creative generation stage. Our contributions have 4 parts: 1) The inpainting mode in stable diffusion is firstly applied to creative generation task in online advertising scene. A self-cyclic generation pipeline is proposed to ensure the convergence of training. 2) Prompt model is designed to generate individualized creatives for different user groups, which can further improve the diversity and quality. 3) Reward model comprehensively considers the multimodal features of image and text to improve the effectiveness of creative ranking task, and it is also critical in self-cyclic pipeline. 4) The significant benefits obtained in online and offline experiments verify the significance of our proposed method.
Claiming Indigenous Sovereignty Online
Chun Chia Tai
Taiwanese Indigenous youths utilize social media to assert Indigeneity. However, while egalitarian technologies provide a platform for self-representation, fetishism and multiculturalism might misrepresent their Indigeneity. This study focuses on Ponay’s covers of Mando-pop songs on YouTube to reclaim Indigenous popular music history and challenge Han-centric aesthetics and heteronormativity.
Memory augment is All You Need for image restoration
Xiao Feng Zhang, Chao Chen Gu, Shan Ying Zhu
Image restoration is a low-level vision task, most CNN methods are designed as a black box, lacking transparency and internal aesthetics. Although some methods combining traditional optimization algorithms with DNNs have been proposed, they all have some limitations. In this paper, we propose a three-granularity memory layer and contrast learning named MemoryNet, specifically, dividing the samples into positive, negative, and actual three samples for contrastive learning, where the memory layer is able to preserve the deep features of the image and the contrastive learning converges the learned features to balance. Experiments on Derain/Deshadow/Deblur task demonstrate that these methods are effective in improving restoration performance. In addition, this paper's model obtains significant PSNR, SSIM gain on three datasets with different degradation types, which is a strong proof that the recovered images are perceptually realistic. The source code of MemoryNet can be obtained from https://github.com/zhangbaijin/MemoryNet
A Survey for Graphic Design Intelligence
Danqing Huang, Jiaqi Guo, Shizhao Sun
et al.
Graphic design is an effective language for visual communication. Using complex composition of visual elements (e.g., shape, color, font) guided by design principles and aesthetics, design helps produce more visually-appealing content. The creation of a harmonious design requires carefully selecting and combining different visual elements, which can be challenging and time-consuming. To expedite the design process, emerging AI techniques have been proposed to automatize tedious tasks and facilitate human creativity. However, most current works only focus on specific tasks targeting at different scenarios without a high-level abstraction. This paper aims to provide a systematic overview of graphic design intelligence and summarize literature in the taxonomy of representation, understanding and generation. Specifically we consider related works for individual visual elements as well as the overall design composition. Furthermore, we highlight some of the potential directions for future explorations.
Learning Triangular Distribution in Visual World
Ping Chen, Xingpeng Zhang, Chengtao Zhou
et al.
Convolution neural network is successful in pervasive vision tasks, including label distribution learning, which usually takes the form of learning an injection from the non-linear visual features to the well-defined labels. However, how the discrepancy between features is mapped to the label discrepancy is ambient, and its correctness is not guaranteed.To address these problems, we study the mathematical connection between feature and its label, presenting a general and simple framework for label distribution learning. We propose a so-called Triangular Distribution Transform (TDT) to build an injective function between feature and label, guaranteeing that any symmetric feature discrepancy linearly reflects the difference between labels. The proposed TDT can be used as a plug-in in mainstream backbone networks to address different label distribution learning tasks. Experiments on Facial Age Recognition, Illumination Chromaticity Estimation, and Aesthetics assessment show that TDT achieves on-par or better results than the prior arts.
La ciudad participativa. Formas de trabajo colaborativo aplicadas a la planificación urbana. Los casos de las ciudades menguantes americanas: Baltimore, Detroit y Filadelfia = The participatory city. Collaborative working methods applied to urban planning. The case of American shrinking cities: Baltimore, Detroit, and Philadelphia
Gonzalo José López Garrido
Resumen La segunda mitad del Siglo XX en Estados Unidos supuso la puesta en marcha de una serie de iniciativas federales que distribuyeron fondos para el desarrollo suburbano y de infraestructura, dejando los tejidos urbanos de las ciudades deteriorados y a sus barrios diezmados en población y sin recursos, provocando el surgimiento de iniciativas que han asumido el autodesarrollo de proyectos empleando una amplia gama de enfoques participativos como metodología. Esta intersección entre una urgente crisis urbana y una metodología con gran potencial para el urbanismo conforma el marco histórico y teórico en el que se desarrolla este trabajo. Partiendo de la pregunta: ¿Es posible generar impacto urbano con la implementación de metodologías participativas como parte fundamental del proyecto de diseño y planificación urbana?, se examinan estas iniciativas, estableciendo una genealogía de modelos de participación ciudadana que suponen un marco común para analizar la participación en urbanismo. La investigación busca aportar una serie de estrategias de las cuales el urbanista pueda disponer a la hora de realizar su trabajo con una comunidad, y redefinir su papel como el agente capaz de gestionar procesos de diseño participativos y mediar entre la comunidad y las instituciones en la realización de un proyecto urbano.AbstractThe second half of the 20th century in the United States saw the implementation of a series of federal initiatives that distributed funds to suburban and infrastructure development, leaving the urban fabric of cities deteriorated and their neighborhoods decimated in population and without resources, causing the emergence of initiatives that have assumed the self-development of projects using a wide range of participatory approaches as methodology. This intersection between an urgent urban crisis and a methodology with great potential for urbanism conforms the historical and theoretical framework for this work. From the question: Is it possible to generate urban impact with the implementation of participatory methodologies as a fundamental part of the urban planning and design project? these initiatives are examined establishing a genealogy of citizen participation models that represent a common framework to analyze participation in urban planning and design. The research seeks to provide a series of strategies for the urban planner to use when carrying out their work with a community, and to redefine their role as the agent capable of managing participatory design processes and mediating between the community and the institutions in the realization of an urban project.
Aesthetics of cities. City planning and beautifying
IDeS Method Applied to an Innovative Motorbike—Applying Topology Optimization and Augmented Reality
Leonardo Frizziero, Christian Leon-Cardenas, Giulio Galiè
et al.
This study is on the conception of the DS700 HYBRID project by the application of the Industrial Design Structure method (IDeS), which applies different tools sourced from engineering and style departments, including QFD and SDE, used to create the concept of a hybrid motorbike that could reach the market in the near future. SDE is an engineering approach for the design and development of industrial design projects, and it finds important applications in the automotive sector. In addition, analysis tools such as QFD, comprising benchmarking and top-flop analysis are carried out to maximize the creative process. The key characteristics of the bike and the degree of innovation are identified and outlined, the market segment is identified, and the stylistic trends that are most suitable for a naked motorbike of the future are analyzed. In the second part the styling of each superstructure and of all the components of the vehicle is carried out. Afterwards the aesthetics and engineering perspectives are accounted for to complete the project. This is achieved with modelling and computing tools such as 3D CAD, visual renderings, and FEM simulations, and virtual prototyping thanks to augmented reality (AR), and finally physical prototyping with the use of additive manufacturing (AM). The result is a product conception able to compete in the present challenging market, with a design that is technically feasible and also reaches new lightness targets for efficiency.
Engineering machinery, tools, and implements, Technological innovations. Automation
Shared memories driven by the intrinsic memorability of items
Wilma A. Bainbridge
When we experience an event, it feels like our previous experiences, our interpretations of that event (e.g., aesthetics, emotions), and our current state will determine how we will remember it. However, recent work has revealed a strong sway of the visual world itself in influencing what we remember and forget. Certain items -- including certain faces, words, images, and movements -- are intrinsically memorable or forgettable across observers, regardless of individual differences. Further, neuroimaging research has revealed that the brain is sensitive to memorability both rapidly and automatically during late perception. These strong consistencies in memory across people may reflect the broad organizational principles of our sensory environment, and may reveal how the brain prioritizes information before encoding items into memory. In this chapter, I will discuss our current state-of-the-art understanding of memorability for visual information, and what these findings imply about how we perceive and remember visual events.