MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping
Junyao Gao, Sibo Liu, Jiaxing Li
et al.
In this paper, we introduce MegaStyle, a novel and scalable data curation pipeline that constructs an intra-style consistent, inter-style diverse and high-quality style dataset. We achieve this by leveraging the consistent text-to-image style mapping capability of current large generative models, which can generate images in the same style from a given style description. Building on this foundation, we curate a diverse and balanced prompt gallery with 170K style prompts and 400K content prompts, and generate a large-scale style dataset MegaStyle-1.4M via content-style prompt combinations. With MegaStyle-1.4M, we propose style-supervised contrastive learning to fine-tune a style encoder MegaStyle-Encoder for extracting expressive, style-specific representations, and we also train a FLUX-based style transfer model MegaStyle-FLUX. Extensive experiments demonstrate the importance of maintaining intra-style consistency, inter-style diversity and high-quality for style dataset, as well as the effectiveness of the proposed MegaStyle-1.4M. Moreover, when trained on MegaStyle-1.4M, MegaStyle-Encoder and MegaStyle-FLUX provide reliable style similarity measurement and generalizable style transfer, making a significant contribution to the style transfer community. More results are available at our project website https://jeoyal.github.io/MegaStyle/.
Fragmented Form and Spatiotemporal Experiences in Transnational Korean Women’s Poetry
Melanie Hyo-In Han
This paper explores the intersection of poetic form and transnational identity in contemporary women’s poetry, focusing on the strategic use of fragmentation and prose poetry. By examining the works of poets Don Mee Choi, Emily Jungmin Yoon, and Cathy Park Hong, it highlights how these forms enhance the exploration of spatiotemporal experiences and cultural belonging. I show how the interplay between fragmented poetry and prose poetry creates a dynamic aesthetic, reflecting the layered complexity of lived experiences, trauma, and resilience. Through a detailed analysis, this paper demonstrates how the use of prose poetry provides a versatile platform for delving into narratives of confinement and oppression, while fragmented forms capture the fluidity and dislocation inherent in transnational identities. I highlight how the integration of personal and socio-political narratives underscores the interconnectedness of global experiences, offering new perspectives on identity and belonging in a constantly shifting world.
Language. Linguistic theory. Comparative grammar, Style. Composition. Rhetoric
Can AI Recognize the Style of Art? Analyzing Aesthetics through the Lens of Style Transfer
Yunha Yeo, Daeho Um
This study investigates how artificial intelligence (AI) recognizes style through style transfer-an AI technique that generates a new image by applying the style of one image to another. Despite the considerable interest that style transfer has garnered among researchers, most efforts have focused on enhancing the quality of output images through advanced AI algorithms. In this paper, we approach style transfer from an aesthetic perspective, thereby bridging AI techniques and aesthetics. We analyze two style transfer algorithms: one based on convolutional neural networks (CNNs) and the other utilizing recent Transformer models. By comparing the images produced by each, we explore the elements that constitute the style of artworks through an aesthetic analysis of the style transfer results. We then elucidate the limitations of current style transfer techniques. Based on these limitations, we propose potential directions for future research on style transfer techniques.
Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech
Nam-Gyu Kim
Recent advances in expressive text-to-speech (TTS) have introduced diverse methods based on style embedding extracted from reference speech. However, synthesizing high-quality expressive speech remains challenging. We propose SpotlightTTS, which exclusively emphasizes style via voiced-aware style extraction and style direction adjustment. Voiced-aware style extraction focuses on voiced regions highly related to style while maintaining continuity across different speech regions to improve expressiveness. We adjust the direction of the extracted style for optimal integration into the TTS model, which improves speech quality. Experimental results demonstrate that Spotlight-TTS achieves superior performance compared to baseline models in terms of expressiveness, overall speech quality, and style transfer capability.
Visual Product Graph: Bridging Visual Products And Composite Images For End-to-End Style Recommendations
Yue Li Du, Ben Alexander, Mikhail Antonenka
et al.
Retrieving semantically similar but visually distinct contents has been a critical capability in visual search systems. In this work, we aim to tackle this problem with Visual Product Graph (VPG), leveraging high-performance infrastructure for storage and state-of-the-art computer vision models for image understanding. VPG is built to be an online real-time retrieval system that enables navigation from individual products to composite scenes containing those products, along with complementary recommendations. Our system not only offers contextual insights by showcasing how products can be styled in a context, but also provides recommendations for complementary products drawn from these inspirations. We discuss the essential components for building the Visual Product Graph, along with the core computer vision model improvements across object detection, foundational visual embeddings, and other visual signals. Our system achieves a 78.8% extremely similar@1 in end-to-end human relevance evaluations, and a 6% module engagement rate. The "Ways to Style It" module, powered by the Visual Product Graph technology, is deployed in production at Pinterest.
« Superstition ain’t the way ». L’optimisme du théoricien du complot
Marco Mazzeo, Adriano Bertollini
Cet article aborde le thème des théories du complot d’un point de vue rhétorique et philosophique à partir d’une étude de cas récente : la série documentaire The Ancient Apocalypse de Graham Hancock, produite par Netflix1. Dans ce long documentaire, l’auteur émet l’hypothèse d’une conspiration voulue par les représentants académiques de l’archéologie : ils refuseraient à dessein de reconnaître l’existence d’une ancienne civilisation, hautement développée techniquement et disparue avant la dernière période glaciaire. La raison de cette résistance serait le désir des archéologues de maintenir une position de pouvoir et de prestige qu’ils devraient abandonner s’ils acceptaient un tel changement de paradigme. Le discours de Hancock servira de corpus textuel à analyser d’un point de vue rhétorique en vue d’interroger l’hypothèse philosophique selon laquelle les théories du complot peuvent être comprises comme une forme de superstition (que l’on distingue de la magie, § 1). Dans cette optique, nous nous intéressons à chacune des preuves techniques utilisée pour la persuasion. D’abord, l’éthos (§ 2) qui permet à l’orateur d’apparaitre comme une figure marginale et crédible car dérangeante. Nous analyserons ensuite le logos (§ 3), caractérisé par des sophismes, une logique ambivalente et un récit mythologique faisant office de preuve historique. Enfin (§ 4), nous nous concentrons sur le pathos, qui présente une forme de désintérêt et encourage une disposition à l’inaction.
Style. Composition. Rhetoric
InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation
Haofan Wang, Peng Xing, Renyuan Huang
et al.
Style transfer is an inventive process designed to create an image that maintains the essence of the original while embracing the visual style of another. Although diffusion models have demonstrated impressive generative power in personalized subject-driven or style-driven applications, existing state-of-the-art methods still encounter difficulties in achieving a seamless balance between content preservation and style enhancement. For example, amplifying the style's influence can often undermine the structural integrity of the content. To address these challenges, we deconstruct the style transfer task into three core elements: 1) Style, focusing on the image's aesthetic characteristics; 2) Spatial Structure, concerning the geometric arrangement and composition of visual elements; and 3) Semantic Content, which captures the conceptual meaning of the image. Guided by these principles, we introduce InstantStyle-Plus, an approach that prioritizes the integrity of the original content while seamlessly integrating the target style. Specifically, our method accomplishes style injection through an efficient, lightweight process, utilizing the cutting-edge InstantStyle framework. To reinforce the content preservation, we initiate the process with an inverted content latent noise and a versatile plug-and-play tile ControlNet for preserving the original image's intrinsic layout. We also incorporate a global semantic adapter to enhance the semantic content's fidelity. To safeguard against the dilution of style information, a style extractor is employed as discriminator for providing supplementary style guidance. Codes will be available at https://github.com/instantX-research/InstantStyle-Plus.
Z-STAR+: A Zero-shot Style Transfer Method via Adjusting Style Distribution
Yingying Deng, Xiangyu He, Fan Tang
et al.
Style transfer presents a significant challenge, primarily centered on identifying an appropriate style representation. Conventional methods employ style loss, derived from second-order statistics or contrastive learning, to constrain style representation in the stylized result. However, these pre-defined style representations often limit stylistic expression, leading to artifacts. In contrast to existing approaches, we have discovered that latent features in vanilla diffusion models inherently contain natural style and content distributions. This allows for direct extraction of style information and seamless integration of generative priors into the content image without necessitating retraining. Our method adopts dual denoising paths to represent content and style references in latent space, subsequently guiding the content image denoising process with style latent codes. We introduce a Cross-attention Reweighting module that utilizes local content features to query style image information best suited to the input patch, thereby aligning the style distribution of the stylized results with that of the style image. Furthermore, we design a scaled adaptive instance normalization to mitigate inconsistencies in color distribution between style and stylized images on a global scale. Through theoretical analysis and extensive experimentation, we demonstrate the effectiveness and superiority of our diffusion-based \uline{z}ero-shot \uline{s}tyle \uline{t}ransfer via \uline{a}djusting style dist\uline{r}ibution, termed Z-STAR+.
Entre auteur et locuteurs, l’énonciateur textuel : concept inutile ou figure-clé ?
Michèle Monte
This article distinguishes the concept of text enunciator from that of author: whereas the author’s position results from his/her work but also from the discourses he/she holds and those that circulate about him/her in the media space, the text enunciator is the first speaker/enunciator who produces the text and establishes a different relationship with the diegesis according to the enunciative situation chosen by the author. The article compares these different situations and studies the relations that the text enunciator maintains with the second - speakers, focusing on its marks in the case of theatre texts or first-person novels from which it seems to be absent. The article argues that this concept, rooted in Ducrot’s and Rabatel’s polyphony, can account for the functioning of literary works independently of their genres.
Style. Composition. Rhetoric
Fuori dai canoni: traduzione e opere di autrici della prima età moderna
Helena Aguilà Ruzola
Introduzione al supplemento “Volti del tradurre” a cura di Helena Aguilà Ruzola e Donatella Siviero, che contiene un’antologia di articoli raccolti sotto il titolo Fuori dai canoni: traduzione e opere di autrici della prima età moderna.
Language. Linguistic theory. Comparative grammar, Style. Composition. Rhetoric
Guérin, Charles, Jean-Marc Leblanc, Jordi Pià-Comella et Guillaume Soulez (dir.). 2022. L’Èthos de Rupture. De Diogène à Donald Trump (Paris : Presses Sorbonne Nouvelle).
Roselyne Koren
Style. Composition. Rhetoric
Multi-Speaker Multi-Style Speech Synthesis with Timbre and Style Disentanglement
Wei Song, Yanghao Yue, Ya-jie Zhang
et al.
Disentanglement of a speaker's timbre and style is very important for style transfer in multi-speaker multi-style text-to-speech (TTS) scenarios. With the disentanglement of timbres and styles, TTS systems could synthesize expressive speech for a given speaker with any style which has been seen in the training corpus. However, there are still some shortcomings with the current research on timbre and style disentanglement. The current method either requires single-speaker multi-style recordings, which are difficult and expensive to collect, or uses a complex network and complicated training method, which is difficult to reproduce and control the style transfer behavior. To improve the disentanglement effectiveness of timbres and styles, and to remove the reliance on single-speaker multi-style corpus, a simple but effective timbre and style disentanglement method is proposed in this paper. The FastSpeech2 network is employed as the backbone network, with explicit duration, pitch, and energy trajectory to represent the style. Each speaker's data is considered as a separate and isolated style, then a speaker embedding and a style embedding are added to the FastSpeech2 network to learn disentangled representations. Utterance level pitch and energy normalization are utilized to improve the decoupling effect. Experimental results demonstrate that the proposed model could synthesize speech with any style seen during training with high style similarity while maintaining very high speaker similarity.
Style Matters! Investigating Linguistic Style in Online Communities
Osama Khalid, Padmini Srinivasan
Content has historically been the primary lens used to study language in online communities. This paper instead focuses on the linguistic style of communities. While we know that individuals have distinguishable styles, here we ask whether communities have distinguishable styles. Additionally, while prior work has relied on a narrow definition of style, we employ a broad definition involving 262 features to analyze the linguistic style of 9 online communities from 3 social media platforms discussing politics, television and travel. We find that communities indeed have distinct styles. Also, style is an excellent predictor of group membership (F-score 0.952 and Accuracy 96.09%). While on average it is statistically equivalent to predictions using content alone, it is more resilient to reductions in training data.
Name Your Style: An Arbitrary Artist-aware Image Style Transfer
Zhi-Song Liu, Li-Wen Wang, Wan-Chi Siu
et al.
Image style transfer has attracted widespread attention in the past few years. Despite its remarkable results, it requires additional style images available as references, making it less flexible and inconvenient. Using text is the most natural way to describe the style. More importantly, text can describe implicit abstract styles, like styles of specific artists or art movements. In this paper, we propose a text-driven image style transfer (TxST) that leverages advanced image-text encoders to control arbitrary style transfer. We introduce a contrastive training strategy to effectively extract style descriptions from the image-text model (i.e., CLIP), which aligns stylization with the text description. To this end, we also propose a novel and efficient attention module that explores cross-attentions to fuse style and content features. Finally, we achieve an arbitrary artist-aware image style transfer to learn and transfer specific artistic characters such as Picasso, oil painting, or a rough sketch. Extensive experiments demonstrate that our approach outperforms the state-of-the-art methods on both image and textual styles. Moreover, it can mimic the styles of one or many artists to achieve attractive results, thus highlighting a promising direction in image style transfer.
Time-of-Day Neural Style Transfer for Architectural Photographs
Yingshu Chen, Tuan-Anh Vu, Ka-Chun Shum
et al.
Architectural photography is a genre of photography that focuses on capturing a building or structure in the foreground with dramatic lighting in the background. Inspired by recent successes in image-to-image translation methods, we aim to perform style transfer for architectural photographs. However, the special composition in architectural photography poses great challenges for style transfer in this type of photographs. Existing neural style transfer methods treat the architectural images as a single entity, which would generate mismatched chrominance and destroy geometric features of the original architecture, yielding unrealistic lighting, wrong color rendition, and visual artifacts such as ghosting, appearance distortion, or color mismatching. In this paper, we specialize a neural style transfer method for architectural photography. Our method addresses the composition of the foreground and background in an architectural photograph in a two-branch neural network that separately considers the style transfer of the foreground and the background, respectively. Our method comprises a segmentation module, a learning-based image-to-image translation module, and an image blending optimization module. We trained our image-to-image translation neural network with a new dataset of unconstrained outdoor architectural photographs captured at different magic times of a day, utilizing additional semantic information for better chrominance matching and geometry preservation. Our experiments show that our method can produce photorealistic lighting and color rendition on both the foreground and background, and outperforms general image-to-image translation and arbitrary style transfer baselines quantitatively and qualitatively. Our code and data are available at https://github.com/hkust-vgd/architectural_style_transfer.
Tourisme radiophonique et imaginaires touristiques : quand les sons rendent sensibles les lieux, les pratiques et les acteurs
Séverine Equoy Hutin
This article is an argumentative semio-discursive analysis of the radio tourism magazine Et si on partait? broadcasted on radio station Europe1 during the summer of 2020. It reveals the discursive processes and mechanisms that “make tourism” and territories, practices and actors sensitive to the listener. The question here is not to consider radio as a media without image but to take into account its mediativity and the specificities of the sound space in order to analyze the radio productions dedicated to tourism and the tourist imaginaries that they convey and build.
Style. Composition. Rhetoric
Heterogeneous Information Network-based Interest Composition with Graph Neural Network for Recommendation
Dengcheng Yan, Wenxin Xie, Yiwen Zhang
Heterogeneous information networks (HINs) are widely applied to recommendation systems due to their capability of modeling various auxiliary information with meta-paths. However, existing HIN-based recommendation models usually fuse the information from various meta-paths by simple weighted sum or concatenation, which limits performance improvement because it lacks the capability of interest compositions among meta-paths. In this article, we propose an HIN-based Interest Composition model for Recommendation (HicRec). Specifically, user and item representations are learned with a graph neural network on both the graph structure and features in each meta-path, and a parameter sharing mechanism is utilized here to ensure that the user and item representations are in the same latent space. Then, users' interests in each item from each pair of related meta-paths are calculated by a combination of the user and item representations. The composed user interests are obtained by their single interest from both intra- and inter-meta-paths for recommendation. Extensive experiments are conducted on three real-world datasets and the results demonstrate that our proposed HicRec model outperforms the baselines.
Style-Aware Normalized Loss for Improving Arbitrary Style Transfer
Jiaxin Cheng, Ayush Jaiswal, Yue Wu
et al.
Neural Style Transfer (NST) has quickly evolved from single-style to infinite-style models, also known as Arbitrary Style Transfer (AST). Although appealing results have been widely reported in literature, our empirical studies on four well-known AST approaches (GoogleMagenta, AdaIN, LinearTransfer, and SANet) show that more than 50% of the time, AST stylized images are not acceptable to human users, typically due to under- or over-stylization. We systematically study the cause of this imbalanced style transferability (IST) and propose a simple yet effective solution to mitigate this issue. Our studies show that the IST issue is related to the conventional AST style loss, and reveal that the root cause is the equal weightage of training samples irrespective of the properties of their corresponding style images, which biases the model towards certain styles. Through investigation of the theoretical bounds of the AST style loss, we propose a new loss that largely overcomes IST. Theoretical analysis and experimental results validate the effectiveness of our loss, with over 80% relative improvement in style deception rate and 98% relatively higher preference in human evaluation.
Style Pooling: Automatic Text Style Obfuscation for Improved Classification Fairness
Fatemehsadat Mireshghallah, Taylor Berg-Kirkpatrick
Text style can reveal sensitive attributes of the author (e.g. race or age) to the reader, which can, in turn, lead to privacy violations and bias in both human and algorithmic decisions based on text. For example, the style of writing in job applications might reveal protected attributes of the candidate which could lead to bias in hiring decisions, regardless of whether hiring decisions are made algorithmically or by humans. We propose a VAE-based framework that obfuscates stylistic features of human-generated text through style transfer by automatically re-writing the text itself. Our framework operationalizes the notion of obfuscated style in a flexible way that enables two distinct notions of obfuscated style: (1) a minimal notion that effectively intersects the various styles seen in training, and (2) a maximal notion that seeks to obfuscate by adding stylistic features of all sensitive attributes to text, in effect, computing a union of styles. Our style-obfuscation framework can be used for multiple purposes, however, we demonstrate its effectiveness in improving the fairness of downstream classifiers. We also conduct a comprehensive study on style pooling's effect on fluency, semantic consistency, and attribute removal from text, in two and three domain style obfuscation.
Io e altri nella poesia di Raboni
Andrea Maletto
Il presente articolo si propone di indagare i rapporti tra l’io e gli altri nelle prime due raccolte di Raboni, Le case della Vetra e Cadenza d’inganno, sullo sfondo del generale processo di indebolimento del soggetto poetico verificatosi attorno agli anni Sessanta del Novecento. L’obiettivo è di mostrare come il poeta milanese apra progressivamente lo spazio dei suoi testi alla parola altrui sino a giungere a un nuovo modo di intendere la lirica, in cui l’io condivide con gli altri quella centralità enunciativa che a lungo era stata sua prerogativa esclusiva.
Language. Linguistic theory. Comparative grammar, Style. Composition. Rhetoric