Stand-up, a very popular form of comic performance today, is characterized by a combination of the pragmatic aim of amusing the audience and the overt desire to present the humorous speaker’s standpoint on current affairs, which corresponds to an explicit argumentative aim. However, this is a challenging exercise, both because of the fictional (non-serious) aspect of the comedy show and because of the divisions that the current affairs, whose axiological evaluation is not yet stabilized, could introduce at the thematic and argumentative level within a heterogeneous audience, thereby canceling out the expected euphoric effect. An analysis of some of the argumentative strategies used in the analyzed sketches highlights the importance of the “shown” and “said” ethos and of the interactive dimension of stand-up. It also highlights the crucial role of the enunciative pact that governs this form of performance, granting the stand-up comedian, among other things, the systematic posture of “over-enunciation” (i.e. high-up perspective) in relation to other points of view represented in his discourse.
Recent advances in style and appearance transfer are impressive, but most methods isolate global style and local appearance transfer, neglecting semantic correspondence. Additionally, image and video tasks are typically handled in isolation, with little focus on integrating them for video transfer. To address these limitations, we introduce a novel task, Semantic Style Transfer, which involves transferring style and appearance features from a reference image to a target visual content based on semantic correspondence. We subsequently propose a training-free method, Semantix an energy-guided sampler designed for Semantic Style Transfer that simultaneously guides both style and appearance transfer based on semantic understanding capacity of pre-trained diffusion models. Additionally, as a sampler, Semantix be seamlessly applied to both image and video models, enabling semantic style transfer to be generic across various visual media. Specifically, once inverting both reference and context images or videos to noise space by SDEs, Semantix utilizes a meticulously crafted energy function to guide the sampling process, including three key components: Style Feature Guidance, Spatial Feature Guidance and Semantic Distance as a regularisation term. Experimental results demonstrate that Semantix not only effectively accomplishes the task of semantic style transfer across images and videos, but also surpasses existing state-of-the-art solutions in both fields. The project website is available at https://huiang-he.github.io/semantix/
Does the search for perlocutionary effectiveness (getting people to agree with a claim, to buy a product, etc.) without truth or morality playing a decisive role constitute what some call manipulation? Precisely because he is interested in the mechanisms of persuasion, Steve Oswald has woven links between cognitive science and rhetoric around this notion. Drawing on theories of cognitive pragmatics and work in psychology, this researcher goes off the beaten track of rhetoric by opening what is for many a black box: the question of the effects of discursive strategies. He has approached this question first by attempting to explain the mechanism theoretically, but more recently also experimentally. Examining the question of rhetorical effects in this way makes it possible not only to revitalise the discipline by documenting the intuitions or empirical findings of the old rhetoricians, but also to build bridges between approaches that sometimes ignore each other: informal logic, cognitive psychology, cognitive pragmatics, and discourse analysis.
3D Human motion style transfer is a fundamental problem in computer graphic and animation processing. Existing AdaIN- based methods necessitate datasets with balanced style distribution and content/style labels to train the clustered latent space. However, we may encounter a single unseen style example in practical scenarios, but not in sufficient quantity to constitute a style cluster for AdaIN-based methods. Therefore, in this paper, we propose a novel two-stage framework for few-shot style transfer learning based on the diffusion model. Specifically, in the first stage, we pre-train a diffusion-based text-to-motion model as a generative prior so that it can cope with various content motion inputs. In the second stage, based on the single style example, we fine-tune the pre-trained diffusion model in a few-shot manner to make it capable of style transfer. The key idea is regarding the reverse process of diffusion as a motion-style translation process since the motion styles can be viewed as special motion variations. During the fine-tuning for style transfer, a simple yet effective semantic-guided style transfer loss coordinated with style example reconstruction loss is introduced to supervise the style transfer in CLIP semantic space. The qualitative and quantitative evaluations demonstrate that our method can achieve state-of-the-art performance and has practical applications.
We propose PARASOL, a multi-modal synthesis model that enables disentangled, parametric control of the visual style of the image by jointly conditioning synthesis on both content and a fine-grained visual style embedding. We train a latent diffusion model (LDM) using specific losses for each modality and adapt the classifier-free guidance for encouraging disentangled control over independent content and style modalities at inference time. We leverage auxiliary semantic and style-based search to create training triplets for supervision of the LDM, ensuring complementarity of content and style cues. PARASOL shows promise for enabling nuanced control over visual style in diffusion models for image creation and stylization, as well as generative search where text-based search results may be adapted to more closely match user intent by interpolating both content and style descriptors.
Audio-driven talking head animation is a challenging research topic with many real-world applications. Recent works have focused on creating photo-realistic 2D animation, while learning different talking or singing styles remains an open problem. In this paper, we present a new method to generate talking head animation with learnable style references. Given a set of style reference frames, our framework can reconstruct 2D talking head animation based on a single input image and an audio stream. Our method first produces facial landmarks motion from the audio stream and constructs the intermediate style patterns from the style reference images. We then feed both outputs into a style-aware image generator to generate the photo-realistic and fidelity 2D animation. In practice, our framework can extract the style information of a specific character and transfer it to any new static image for talking head animation. The intensive experimental results show that our method achieves better results than recent state-of-the-art approaches qualitatively and quantitatively.
Spontaneous speaking style exhibits notable differences from other speaking styles due to various spontaneous phenomena (e.g., filled pauses, prolongation) and substantial prosody variation (e.g., diverse pitch and duration variation, occasional non-verbal speech like a smile), posing challenges to modeling and prediction of spontaneous style. Moreover, the limitation of high-quality spontaneous data constrains spontaneous speech generation for speakers without spontaneous data. To address these problems, we propose SponTTS, a two-stage approach based on neural bottleneck (BN) features to model and transfer spontaneous style for TTS. In the first stage, we adopt a Conditional Variational Autoencoder (CVAE) to capture spontaneous prosody from a BN feature and involve the spontaneous phenomena by the constraint of spontaneous phenomena embedding prediction loss. Besides, we introduce a flow-based predictor to predict a latent spontaneous style representation from the text, which enriches the prosody and context-specific spontaneous phenomena during inference. In the second stage, we adopt a VITS-like module to transfer the spontaneous style learned in the first stage to the target speakers. Experiments demonstrate that SponTTS is effective in modeling spontaneous style and transferring the style to the target speakers, generating spontaneous speech with high naturalness, expressiveness, and speaker similarity. The zero-shot spontaneous style TTS test further verifies the generalization and robustness of SponTTS in generating spontaneous speech for unseen speakers.
Style is an important concept in today's challenges in natural language generating. After the success in the field of image style transfer, the task of text style transfer became actual and attractive. Researchers are also interested in the tasks of style reproducing in generation of the poetic text. Evaluation of style reproducing in natural poetry generation remains a problem. I used 3 character-based LSTM-models to work with style reproducing assessment. All three models were trained on the corpus of texts by famous Russian-speaking poets. Samples were shown to the assessors and 4 answer options were offered, the style of which poet this sample reproduces. In addition, the assessors were asked how well they were familiar with the work of the poet they had named. Students studying history of literature were the assessors, 94 answers were received. It has appeared that accuracy of definition of style increases if the assessor can quote the poet by heart. Each model showed at least 0.7 macro-average accuracy. The experiment showed that it is better to involve a professional rather than a naive reader in the evaluation of style in the tasks of poetry generation, while lstm models are good at reproducing the style of Russian poets even on a limited training corpus.
Abstract The conversational human voice (CHV) is an extensively studied and adopted communication style in online brand communication. However, in previous research the way in which CHV is operationalized differs considerably: the type and the number of linguistic elements used to establish a sense of CHV in online brand messages varies. Moreover, it is still unknown how CHV operationalizations contribute to consumers’ perceptions of CHV, which consequently could affect their evaluation regarding the message and the brand. In this paper, we addressed these issues by conducting an integrative literature review and a perception experiment, and consequently present a taxonomy of linguistic elements related to message personalization, informal speech, and invitational rhetoric that can be used to operationalize CHV systematically in future studies in online brand communication. Directions for future research and managerial implications are discussed.
Il saggio esamina alcuni contributi pubblicati dal poeta Durs Grünbein in occasione del settimo centenario della morte di Dante. Particolare attenzione viene dedicata alla terzina di Paradiso XXXIII, vv. 94-96 che impegna il poeta tedesco almeno dal 2009. In base alla terminologia impiegata da Grünbein per descrivere le sue trasposizioni dantesche in termini di rendering, il saggio indaga l’influenza del Sommo Fiorentino nella poetica dell’autore di Dresda alla luce dell’era globale-digitale per risalire alle fonti dell’ispirazione poetica, dove Grünbein rilegge Dante con Cartesio e Mandel’štàm.
Language. Linguistic theory. Comparative grammar, Style. Composition. Rhetoric
Nicholas Kolkin, Michal Kucera, Sylvain Paris
et al.
We propose Neural Neighbor Style Transfer (NNST), a pipeline that offers state-of-the-art quality, generalization, and competitive efficiency for artistic style transfer. Our approach is based on explicitly replacing neural features extracted from the content input (to be stylized) with those from a style exemplar, then synthesizing the final output based on these rearranged features. While the spirit of our approach is similar to prior work, we show that our design decisions dramatically improve the final visual quality.
Image or video appearance features (e.g., color, texture, tone, illumination, and so on) reflect one's visual perception and direct impression of an image or video. Given a source image (video) and a target image (video), the image (video) color transfer technique aims to process the color of the source image or video (note that the source image or video is also referred to the reference image or video in some literature) to make it look like that of the target image or video, i.e., transferring the appearance of the target image or video to that of the source image or video, which can thereby change one's perception of the source image or video. As an extension of color transfer, style transfer refers to rendering the content of a target image or video in the style of an artist with either a style sample or a set of images through a style transfer model. As an emerging field, the study of style transfer has attracted the attention of a large number of researchers. After decades of development, it has become a highly interdisciplinary research with a variety of artistic expression styles can be achieved. This paper provides an overview of color transfer and style transfer methods over the past years.
Domain generalization (DG) approaches intend to extract domain invariant features that can lead to a more robust deep learning model. In this regard, style augmentation is a strong DG method taking advantage of instance-specific feature statistics containing informative style characteristics to synthetic novel domains. While it is one of the state-of-the-art methods, prior works on style augmentation have either disregarded the interdependence amongst distinct feature channels or have solely constrained style augmentation to linear interpolation. To address these research gaps, in this work, we introduce a novel augmentation approach, named Correlated Style Uncertainty (CSU), surpassing the limitations of linear interpolation in style statistic space and simultaneously preserving vital correlation information. Our method's efficacy is established through extensive experimentation on diverse cross-domain computer vision and medical imaging classification tasks: PACS, Office-Home, and Camelyon17 datasets, and the Duke-Market1501 instance retrieval task. The results showcase a remarkable improvement margin over existing state-of-the-art techniques. The source code is available https://github.com/freshman97/CSU.
On July 26, 2019 the WHO affirms in its annual report that the electronic cigarette is "unquestionably harmful" to health. The reactions in platforms 2.0 are immediate: institutional actors, scientists and consumers speak for or against the position of WHO and the debates soon turns into a real 2.0 polemical debate, involving different authorities who express an opinion on the issue. In this study, I will focus on the construction of the discourse of authority of the vape influencers and on the analysis of the debate that was triggered following the WHO's decision in different Web 2.0 platforms (Twitter, Facebook and the comment space of online newspapers) in order to show some specificities of the circulation of authority in these digital spaces and, in particular, that this polemic is based on conflicts of authority and the variety of technodiscursive tools used by Internet users to construct or deconstruct authority in discourse.
El objetivo de esta investigación es revelar con exhaustividad las características de una poética del alegato en el poema “España, aparta de mí este cáliz”. Para ello, a partir de las propuestas teóricas desarrolladas por la Teoría de la Enunciación de Oswald Ducrot y la Teoría de la Argumentación de Ruth Amossy y Dominique Maingueneau, nos interesamos en señalar el desenvolvimiento de los interlocutores, donde los locutores se dirigen a los alocutarios con el fin supremo de persuadirlos para que se unan a la lucha libertaria del pueblo republicano. Asimismo, se detalla cómo en el propósito persuasivo que se evidencia en el discurso lírico, este alcanza su consumación enunciativa gracias a la gran performance del ethos y del pathos.
The paper proposes a Dynamic ResBlock Generative Adversarial Network (DRB-GAN) for artistic style transfer. The style code is modeled as the shared parameters for Dynamic ResBlocks connecting both the style encoding network and the style transfer network. In the style encoding network, a style class-aware attention mechanism is used to attend the style feature representation for generating the style codes. In the style transfer network, multiple Dynamic ResBlocks are designed to integrate the style code and the extracted CNN semantic feature and then feed into the spatial window Layer-Instance Normalization (SW-LIN) decoder, which enables high-quality synthetic images with artistic style transfer. Moreover, the style collection conditional discriminator is designed to equip our DRB-GAN model with abilities for both arbitrary style transfer and collection style transfer during the training stage. No matter for arbitrary style transfer or collection style transfer, extensive experiments strongly demonstrate that our proposed DRB-GAN outperforms state-of-the-art methods and exhibits its superior performance in terms of visual quality and efficiency. Our source code is available at \color{magenta}{\url{https://github.com/xuwenju123/DRB-GAN}}.
Jen-Hao Rick Chang, Ashish Shrivastava, Hema Swetha Koppula
et al.
Controllable generative sequence models with the capability to extract and replicate the style of specific examples enable many applications, including narrating audiobooks in different voices, auto-completing and auto-correcting written handwriting, and generating missing training samples for downstream recognition tasks. However, under an unsupervised-style setting, typical training algorithms for controllable sequence generative models suffer from the training-inference mismatch, where the same sample is used as content and style input during training but unpaired samples are given during inference. In this paper, we tackle the training-inference mismatch encountered during unsupervised learning of controllable generative sequence models. The proposed method is simple yet effective, where we use a style transformation module to transfer target style information into an unrelated style input. This method enables training using unpaired content and style samples and thereby mitigate the training-inference mismatch. We apply style equalization to text-to-speech and text-to-handwriting synthesis on three datasets. We conduct thorough evaluation, including both quantitative and qualitative user studies. Our results show that by mitigating the training-inference mismatch with the proposed style equalization, we achieve style replication scores comparable to real data in our user studies.
We take the first step towards multilingual style transfer by creating and releasing XFORMAL, a benchmark of multiple formal reformulations of informal text in Brazilian Portuguese, French, and Italian. Results on XFORMAL suggest that state-of-the-art style transfer approaches perform close to simple baselines, indicating that style transfer is even more challenging when moving multilingual.