ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer
Jiayi Gao, Zijin Yin, Changcheng Hua
et al.
The development of Text-to-Video (T2V) generation has made motion transfer possible, enabling the control of video motion based on existing footage. However, current methods have two limitations: 1) struggle to handle multi-subjects videos, failing to transfer specific subject motion; 2) struggle to preserve the diversity and accuracy of motion as transferring to subjects with varying shapes. To overcome these, we introduce \textbf{ConMo}, a zero-shot framework that disentangle and recompose the motions of subjects and camera movements. ConMo isolates individual subject and background motion cues from complex trajectories in source videos using only subject masks, and reassembles them for target video generation. This approach enables more accurate motion control across diverse subjects and improves performance in multi-subject scenarios. Additionally, we propose soft guidance in the recomposition stage which controls the retention of original motion to adjust shape constraints, aiding subject shape adaptation and semantic transformation. Unlike previous methods, ConMo unlocks a wide range of applications, including subject size and position editing, subject removal, semantic modifications, and camera motion simulation. Extensive experiments demonstrate that ConMo significantly outperforms state-of-the-art methods in motion fidelity and semantic consistency. The code is available at https://github.com/Andyplus1/ConMo.
Behave Your Motion: Habit-preserved Cross-category Animal Motion Transfer
Zhimin Zhang, Bi'an Du, Caoyuan Ma
et al.
Animal motion embodies species-specific behavioral habits, making the transfer of motion across categories a critical yet complex task for applications in animation and virtual reality. Existing motion transfer methods, primarily focused on human motion, emphasize skeletal alignment (motion retargeting) or stylistic consistency (motion style transfer), often neglecting the preservation of distinct habitual behaviors in animals. To bridge this gap, we propose a novel habit-preserved motion transfer framework for cross-category animal motion. Built upon a generative framework, our model introduces a habit-preservation module with category-specific habit encoder, allowing it to learn motion priors that capture distinctive habitual characteristics. Furthermore, we integrate a large language model (LLM) to facilitate the motion transfer to previously unobserved species. To evaluate the effectiveness of our approach, we introduce the DeformingThings4D-skl dataset, a quadruped dataset with skeletal bindings, and conduct extensive experiments and quantitative analyses, which validate the superiority of our proposed model.
A Respiratory Motion Analysis for Guiding Stereotactic Arrhythmia Radiotherapy Motion Management
Yuhao Wang, Yao Hao, Hongyu An
et al.
Stereotactic Arrhythmia Radiotherapy (STAR) treats ventricular tachycardia (VT) but requires internal target volume (ITV) expansions to compensate for cardiorespiratory motion. Current clinical r4DCT imaging methods are limited, and the reconstructed r4DCTs suffer from unmanaged cardiac motion artifacts that affect the quantitative assessment of respiratory motion. A groupwise surface-to-surface deformable image registration (DIR) algorithm, named gCGF, was developed. A novel principal component filtering (PCF) mechanism and a spatial smoothing mechanism were developed and incorporated into gCGF to iteratively register heart contours from an average respiratory-phase CT to ten r4DCT phases while removing random cardiac motion from the cyclic respiratory motion. The performance of the groupwise DIR was quantitatively validated using 8 digital phantoms with simulated cardiac artifacts. An ablation study was conducted to compare gCGF to another comparable state-of-the-art groupwise DIR method. gCGF was applied to r4DCTs of 20 STAR patients to analyze the respiratory motion of the heart. Validation on digital phantoms showed that gCGF achieved a mean target registration error of 0.63+-0.51 mm while successfully achieving phase smoothness and reducing cardiac motion artifacts. Among all STAR patients, the heart's maximum and mean respiratory motion magnitudes ranged from 3.6 to 7.9 mm and 1.0 mm to 2.6 mm. The peak-to-peak motion range was from 6.2 to 14.7 mm. For VT targets, the max and mean motion magnitude ranges were 3.0 to 6.7 mm and 0.8 to 2.9 mm, respectively. The peak-to-peak range was from 4.7 to 11.8 mm. Significant dominance of the first principal component of the motion direction was observed (p = 0).
Response to Critical Views of Phenomenology of Film
Shawn Loht
This article responds to critical views of John Rhym, Martin Rossouw, Ludo de Roo, and Annie Sandrussi on my 2017 book Phenomenology of Film: A Heideggerian Account of the Film Experience. The article also takes up positive footholds from the analyses of Chiara Quaranta and Jason Wirth. The main topics addressed include Martin Heidegger’s ontic-ontological distinction; the notion of film-as-philosophy; being-in-the-world read as being-in-the-film-world; and questions surrounding the facticity and identity of the film viewer.
Motion pictures, Philosophy (General)
Generative Motion Stylization of Cross-structure Characters within Canonical Motion Space
Jiaxu Zhang, Xin Chen, Gang Yu
et al.
Stylized motion breathes life into characters. However, the fixed skeleton structure and style representation hinder existing data-driven motion synthesis methods from generating stylized motion for various characters. In this work, we propose a generative motion stylization pipeline, named MotionS, for synthesizing diverse and stylized motion on cross-structure characters using cross-modality style prompts. Our key insight is to embed motion style into a cross-modality latent space and perceive the cross-structure skeleton topologies, allowing for motion stylization within a canonical motion space. Specifically, the large-scale Contrastive-Language-Image-Pre-training (CLIP) model is leveraged to construct the cross-modality latent space, enabling flexible style representation within it. Additionally, two topology-encoded tokens are learned to capture the canonical and specific skeleton topologies, facilitating cross-structure topology shifting. Subsequently, the topology-shifted stylization diffusion is designed to generate motion content for the particular skeleton and stylize it in the shifted canonical motion space using multi-modality style descriptions. Through an extensive set of examples, we demonstrate the flexibility and generalizability of our pipeline across various characters and style descriptions. Qualitative and quantitative comparisons show the superiority of our pipeline over state-of-the-arts, consistently delivering high-quality stylized motion across a broad spectrum of skeletal structures.
Textual Decomposition Then Sub-motion-space Scattering for Open-Vocabulary Motion Generation
Ke Fan, Jiangning Zhang, Ran Yi
et al.
Text-to-motion generation is a crucial task in computer vision, which generates the target 3D motion by the given text. The existing annotated datasets are limited in scale, resulting in most existing methods overfitting to the small datasets and unable to generalize to the motions of the open domain. Some methods attempt to solve the open-vocabulary motion generation problem by aligning to the CLIP space or using the Pretrain-then-Finetuning paradigm. However, the current annotated dataset's limited scale only allows them to achieve mapping from sub-text-space to sub-motion-space, instead of mapping between full-text-space and full-motion-space (full mapping), which is the key to attaining open-vocabulary motion generation. To this end, this paper proposes to leverage the atomic motion (simple body part motions over a short time period) as an intermediate representation, and leverage two orderly coupled steps, i.e., Textual Decomposition and Sub-motion-space Scattering, to address the full mapping problem. For Textual Decomposition, we design a fine-grained description conversion algorithm, and combine it with the generalization ability of a large language model to convert any given motion text into atomic texts. Sub-motion-space Scattering learns the compositional process from atomic motions to the target motions, to make the learned sub-motion-space scattered to form the full-motion-space. For a given motion of the open domain, it transforms the extrapolation into interpolation and thereby significantly improves generalization. Our network, $DSO$-Net, combines textual $d$ecomposition and sub-motion-space $s$cattering to solve the $o$pen-vocabulary motion generation. Extensive experiments demonstrate that our DSO-Net achieves significant improvements over the state-of-the-art methods on open-vocabulary motion generation. Code is available at https://vankouf.github.io/DSONet/.
Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation
Peng Jin, Hao Li, Zesen Cheng
et al.
Text-to-motion generation requires not only grounding local actions in language but also seamlessly blending these individual actions to synthesize diverse and realistic global motions. However, existing motion generation methods primarily focus on the direct synthesis of global motions while neglecting the importance of generating and controlling local actions. In this paper, we propose the local action-guided motion diffusion model, which facilitates global motion generation by utilizing local actions as fine-grained control signals. Specifically, we provide an automated method for reference local action sampling and leverage graph attention networks to assess the guiding weight of each local action in the overall motion synthesis. During the diffusion process for synthesizing global motion, we calculate the local-action gradient to provide conditional guidance. This local-to-global paradigm reduces the complexity associated with direct global motion generation and promotes motion diversity via sampling diverse actions as conditions. Extensive experiments on two human motion datasets, i.e., HumanML3D and KIT, demonstrate the effectiveness of our method. Furthermore, our method provides flexibility in seamlessly combining various local actions and continuous guiding weight adjustment, accommodating diverse user preferences, which may hold potential significance for the community. The project page is available at https://jpthu17.github.io/GuidedMotion-project/.
Sacha Polak's Dirty God and the Politics of Authenticity
Suzannah Biernoff
Dutch director Sacha Polak’s Dirty God (2019) is the first narrative film with a female lead whose scars are real, and arguably the first to tackle the assumption that scars (especially on a woman’s body) are shameful or tragic. Vicky Knight, who plays Jade, a young woman rebuilding her life after an acid attack, has talked about the revelation of seeing her body on screen after enduring years of abuse because of her appearance. Polak ‘saved my life’ she says, by enabling her to see her scarred body as beautiful, ‘a piece of art.’ Like any art form, film has the potential to be transformative, and in interviews both Knight and Polak have repeatedly spoken of their work in those terms. This article uses Dirty God to think about what is at stake in the dismantling of stereotypes and the reclamation of beauty — a goal shared by many disability rights campaigners. Made at a time when escalating cases of acid violence in London were making headlines around the world, Polak’s film prompts comparisons with Katie Piper’s Beautiful (2011) and other survivor memoirs. Privileging imperfection over repair and fragility over strength, it challenges existing portrayals of disfigurement and, in the process, offers a more radical understanding of beauty and authenticity.
HumanMAC: Masked Motion Completion for Human Motion Prediction
Ling-Hao Chen, Jiawei Zhang, Yewen Li
et al.
Human motion prediction is a classical problem in computer vision and computer graphics, which has a wide range of practical applications. Previous effects achieve great empirical performance based on an encoding-decoding style. The methods of this style work by first encoding previous motions to latent representations and then decoding the latent representations into predicted motions. However, in practice, they are still unsatisfactory due to several issues, including complicated loss constraints, cumbersome training processes, and scarce switch of different categories of motions in prediction. In this paper, to address the above issues, we jump out of the foregoing style and propose a novel framework from a new perspective. Specifically, our framework works in a masked completion fashion. In the training stage, we learn a motion diffusion model that generates motions from random noise. In the inference stage, with a denoising procedure, we make motion prediction conditioning on observed motions to output more continuous and controllable predictions. The proposed framework enjoys promising algorithmic properties, which only needs one loss in optimization and is trained in an end-to-end manner. Additionally, it accomplishes the switch of different categories of motions effectively, which is significant in realistic tasks, e.g., the animation task. Comprehensive experiments on benchmarks confirm the superiority of the proposed framework. The project page is available at https://lhchen.top/Human-MAC.
Motion Matters: Neural Motion Transfer for Better Camera Physiological Measurement
Akshay Paruchuri, Xin Liu, Yulu Pan
et al.
Machine learning models for camera-based physiological measurement can have weak generalization due to a lack of representative training data. Body motion is one of the most significant sources of noise when attempting to recover the subtle cardiac pulse from a video. We explore motion transfer as a form of data augmentation to introduce motion variation while preserving physiological changes of interest. We adapt a neural video synthesis approach to augment videos for the task of remote photoplethysmography (rPPG) and study the effects of motion augmentation with respect to 1) the magnitude and 2) the type of motion. After training on motion-augmented versions of publicly available datasets, we demonstrate a 47% improvement over existing inter-dataset results using various state-of-the-art methods on the PURE dataset. We also present inter-dataset results on five benchmark datasets to show improvements of up to 79% using TS-CAN, a neural rPPG estimation method. Our findings illustrate the usefulness of motion transfer as a data augmentation technique for improving the generalization of models for camera-based physiological sensing. We release our code for using motion transfer as a data augmentation technique on three publicly available datasets, UBFC-rPPG, PURE, and SCAMPS, and models pre-trained on motion-augmented data here: https://motion-matters.github.io/
Conditional Motion In-betweening
Jihoon Kim, Taehyun Byun, Seungyoun Shin
et al.
Motion in-betweening (MIB) is a process of generating intermediate skeletal movement between the given start and target poses while preserving the naturalness of the motion, such as periodic footstep motion while walking. Although state-of-the-art MIB methods are capable of producing plausible motions given sparse key-poses, they often lack the controllability to generate motions satisfying the semantic contexts required in practical applications. We focus on the method that can handle pose or semantic conditioned MIB tasks using a unified model. We also present a motion augmentation method to improve the quality of pose-conditioned motion generation via defining a distribution over smooth trajectories. Our proposed method outperforms the existing state-of-the-art MIB method in pose prediction errors while providing additional controllability.
Differential Motion Evolution for Fine-Grained Motion Deformation in Unsupervised Image Animation
Peirong Liu, Rui Wang, Xuefei Cao
et al.
Image animation is the task of transferring the motion of a driving video to a given object in a source image. While great progress has recently been made in unsupervised motion transfer, requiring no labeled data or domain priors, many current unsupervised approaches still struggle to capture the motion deformations when large motion/view discrepancies occur between the source and driving domains. Under such conditions, there is simply not enough information to capture the motion field properly. We introduce DiME (Differential Motion Evolution), an end-to-end unsupervised motion transfer framework integrating differential refinement for motion estimation. Key findings are twofold: (1) by capturing the motion transfer with an ordinary differential equation (ODE), it helps to regularize the motion field, and (2) by utilizing the source image itself, we are able to inpaint occluded/missing regions arising from large motion changes. Additionally, we also propose a natural extension to the ODE idea, which is that DiME can easily leverage multiple different views of the source object whenever they are available by modeling an ODE per view. Extensive experiments across 9 benchmarks show DiME outperforms the state-of-the-arts by a significant margin and generalizes much better to unseen objects.
The formation of drops by nozzles and the breakup of liquid jets
W. Ohnesorge
45 sitasi
en
Materials Science
International Conference Histories of Tacit Cinematic Knowledge | Frankfurt am Main and worldwide (September 24–26, 2020) Organisers: Rebecca Boguska, Guilherme da Silva Machado, Rebecca Puchta, Marin Reljic and Philipp Röding
Laura Teixeira
A Motion Taxonomy for Manipulation Embedding
David Paulius, Nicholas Eales, Yu Sun
To represent motions from a mechanical point of view, this paper explores motion embedding using the motion taxonomy. With this taxonomy, manipulations can be described and represented as binary strings called motion codes. Motion codes capture mechanical properties, such as contact type and trajectory, that should be used to define suitable distance metrics between motions or loss functions for deep learning and reinforcement learning. Motion codes can also be used to consolidate aliases or cluster motion types that share similar properties. Using existing data sets as a reference, we discuss how motion codes can be created and assigned to actions that are commonly seen in activities of daily living based on intuition as well as real data. Motion codes are compared to vectors from pre-trained Word2Vec models, and we show that motion codes maintain distances that closely match the reality of manipulation.
History Repeats Itself: Human Motion Prediction via Motion Attention
Wei Mao, Miaomiao Liu, Mathieu Salzmann
Human motion prediction aims to forecast future human poses given a past motion. Whether based on recurrent or feed-forward neural networks, existing methods fail to model the observation that human motion tends to repeat itself, even for complex sports actions and cooking activities. Here, we introduce an attention-based feed-forward network that explicitly leverages this observation. In particular, instead of modeling frame-wise attention via pose similarity, we propose to extract motion attention to capture the similarity between the current motion context and the historical motion sub-sequences. Aggregating the relevant past motions and processing the result with a graph convolutional network allows us to effectively exploit motion patterns from the long-term history to predict the future poses. Our experiments on Human3.6M, AMASS and 3DPW evidence the benefits of our approach for both periodical and non-periodical actions. Thanks to our attention model, it yields state-of-the-art results on all three datasets. Our code is available at https://github.com/wei-mao-2019/HisRepItself.
Flexible coherent control of plasmonic spin-Hall effect
Shiyi Xiao, F. Zhong, Hui Liu
et al.
The surface plasmon polariton is an emerging candidate for miniaturizing optoelectronic circuits. Recent demonstrations of polarization-dependent splitting using metasurfaces, including focal-spot shifting and unidirectional propagation, allow us to exploit the spin degree of freedom in plasmonics. However, further progress has been hampered by the inability to generate more complicated and independent surface plasmon profiles for two incident spins, which work coherently together for more flexible and tunable functionalities. Here by matching the geometric phases of the nano-slots on silver to specific superimpositions of the inward and outward surface plasmon profiles for the two spins, arbitrary spin-dependent orbitals can be generated in a slot-free region. Furthermore, motion pictures with a series of picture frames can be assembled and played by varying the linear polarization angle of incident light. This spin-enabled control of orbitals is potentially useful for tip-free near-field scanning microscopy, holographic data storage, tunable plasmonic tweezers, and integrated optical components. Conventional methods to control surface plasmon polaritons with light offer limited tunability or complex design parameters. Here, Xiao et al. demonstrate coherent and independent control of surface plasmon polariton orbitals for two opposite spins using multiple rings of nano-slots on a metasurface
134 sitasi
en
Medicine, Physics
Box office forecasting using machine learning algorithms based on SNS data
Taegu Kim, Jungsik Hong, Pilsung Kang
104 sitasi
en
Computer Science
10 anos de Alma Suburbana: Uma análise da história e das identidades suburbanas a partir do documentário
Luiz Claudio MOTTA LIMA, Rafael MATTOSO
O presente trabalho busca produzir uma reflexão sobre as identidades suburbanas a partir do documentário Alma Suburbana, idealizado, produzido e exibido,
inicialmente, no ano de 2007. Desde o início da produção do filme, a intenção era dar voz aos moradores dos subúrbios cariocas, que na maioria das vezes contestam a forma depreciativa que os bairros de subúrbio são retratados na mídia em geral. A equipe era formada por um grupo de alunos da rede municipal de ensino e a intenção foi mostrar como os subúrbios são realmente vistos por seus moradores e também por aqueles que
procuram problematizar tal tema, mesmo não morando em um subúrbio. Os entrevistados foram escolhidos a partir da relevância de suas atividades culturais no campo da produção musical, cinematográfica, poética, da dança, entre outras atividades suburbanas. A estética escolhida foi o do cinema documentário de entrevistas inspirado no cineasta Eduardo Coutinho, que trabalhava buscando deixar o documentário bem enxuto. Aproveitando a influência do referido cineasta, nós autores do filme também nos colocamos como personagens do documentário, interrompendo os entrevistados, aparecendo nas cenas do filme e principalmente reivindicando o nosso espaço na cidade.
Visual arts, Motion pictures
Viewing lip forms: cortical dynamics.
N. Nishitani, R. Hari
380 sitasi
en
Psychology, Medicine