Not All Frames Are Equal: Complexity-Aware Masked Motion Generation via Motion Spectral Descriptors
Pengfei Zhou, Xiangyue Zhang, Xukun Shen
et al.
Masked generative models have become a strong paradigm for text-to-motion synthesis, but they still treat motion frames too uniformly during masking, attention, and decoding. This is a poor match for motion, where local dynamic complexity varies sharply over time. We show that current masked motion generators degrade disproportionately on dynamically complex motions, and that frame-wise generation error is strongly correlated with motion dynamics. Motivated by this mismatch, we introduce the Motion Spectral Descriptor (MSD), a simple and parameter-free measure of local dynamic complexity computed from the short-time spectrum of motion velocity. Unlike learned difficulty predictors, MSD is deterministic, interpretable, and derived directly from the motion signal itself. We use MSD to make masked motion generation complexity-aware. In particular, MSD guides content-focused masking during training, provides a spectral similarity prior for self-attention, and can additionally modulate token-level sampling during iterative decoding. Built on top of masked motion generators, our method, DynMask, improves motion generation most clearly on dynamically complex motions while also yielding stronger overall FID on HumanML3D and KIT-ML. These results suggest that respecting local motion complexity is a useful design principle for masked motion generation. Project page: https://xiangyue-zhang.github.io/DynMask
PP-Motion: Physical-Perceptual Fidelity Evaluation for Human Motion Generation
Sihan Zhao, Zixuan Wang, Tianyu Luan
et al.
Human motion generation has found widespread applications in AR/VR, film, sports, and medical rehabilitation, offering a cost-effective alternative to traditional motion capture systems. However, evaluating the fidelity of such generated motions is a crucial, multifaceted task. Although previous approaches have attempted at motion fidelity evaluation using human perception or physical constraints, there remains an inherent gap between human-perceived fidelity and physical feasibility. Moreover, the subjective and coarse binary labeling of human perception further undermines the development of a robust data-driven metric. We address these issues by introducing a physical labeling method. This method evaluates motion fidelity by calculating the minimum modifications needed for a motion to align with physical laws. With this approach, we are able to produce fine-grained, continuous physical alignment annotations that serve as objective ground truth. With these annotations, we propose PP-Motion, a novel data-driven metric to evaluate both physical and perceptual fidelity of human motion. To effectively capture underlying physical priors, we employ Pearson's correlation loss for the training of our metric. Additionally, by incorporating a human-based perceptual fidelity loss, our metric can capture fidelity that simultaneously considers both human perception and physical alignment. Experimental results demonstrate that our metric, PP-Motion, not only aligns with physical laws but also aligns better with human perception of motion fidelity than previous work.
VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models
Haidong Xu, Guangwei Xu, Zhedong Zheng
et al.
This paper introduces VimoRAG, a novel video-based retrieval-augmented motion generation framework for motion large language models (LLMs). As motion LLMs face severe out-of-domain/out-of-vocabulary issues due to limited annotated data, VimoRAG leverages large-scale in-the-wild video databases to enhance 3D motion generation by retrieving relevant 2D human motion signals. While video-based motion RAG is nontrivial, we address two key bottlenecks: (1) developing an effective motion-centered video retrieval model that distinguishes human poses and actions, and (2) mitigating the issue of error propagation caused by suboptimal retrieval results. We design the Gemini Motion Video Retriever mechanism and the Motion-centric Dual-alignment DPO Trainer, enabling effective retrieval and generation processes. Experimental results show that VimoRAG significantly boosts the performance of motion LLMs constrained to text-only input. All the resources are available at https://walkermitty.github.io/VimoRAG/
Follow-Your-Motion: Video Motion Transfer via Efficient Spatial-Temporal Decoupled Finetuning
Yue Ma, Yulong Liu, Qiyuan Zhu
et al.
Recently, breakthroughs in the video diffusion transformer have shown remarkable capabilities in diverse motion generations. As for the motion-transfer task, current methods mainly use two-stage Low-Rank Adaptations (LoRAs) finetuning to obtain better performance. However, existing adaptation-based motion transfer still suffers from motion inconsistency and tuning inefficiency when applied to large video diffusion transformers. Naive two-stage LoRA tuning struggles to maintain motion consistency between generated and input videos due to the inherent spatial-temporal coupling in the 3D attention operator. Additionally, they require time-consuming fine-tuning processes in both stages. To tackle these issues, we propose Follow-Your-Motion, an efficient two-stage video motion transfer framework that finetunes a powerful video diffusion transformer to synthesize complex motion. Specifically, we propose a spatial-temporal decoupled LoRA to decouple the attention architecture for spatial appearance and temporal motion processing. During the second training stage, we design the sparse motion sampling and adaptive RoPE to accelerate the tuning speed. To address the lack of a benchmark for this field, we introduce MotionBench, a comprehensive benchmark comprising diverse motion, including creative camera motion, single object motion, multiple object motion, and complex human motion. We show extensive evaluations on MotionBench to verify the superiority of Follow-Your-Motion.
Improving Sparse IMU-based Motion Capture with Motion Label Smoothing
Zhaorui Meng, Lu Yin, Yangqing Hou
et al.
Sparse Inertial Measurement Units (IMUs) based human motion capture has gained significant momentum, driven by the adaptation of fundamental AI tools such as recurrent neural networks (RNNs) and transformers that are tailored for temporal and spatial modeling. Despite these achievements, current research predominantly focuses on pipeline and architectural designs, with comparatively little attention given to regularization methods, highlighting a critical gap in developing a comprehensive AI toolkit for this task. To bridge this gap, we propose motion label smoothing, a novel method that adapts the classic label smoothing strategy from classification to the sparse IMU-based motion capture task. Specifically, we first demonstrate that a naive adaptation of label smoothing, including simply blending a uniform vector or a ``uniform'' motion representation (e.g., dataset-average motion or a canonical T-pose), is suboptimal; and argue that a proper adaptation requires increasing the entropy of the smoothed labels. Second, we conduct a thorough analysis of human motion labels, identifying three critical properties: 1) Temporal Smoothness, 2) Joint Correlation, and 3) Low-Frequency Dominance, and show that conventional approaches to entropy enhancement (e.g., blending Gaussian noise) are ineffective as they disrupt these properties. Finally, we propose the blend of a novel skeleton-based Perlin noise for motion label smoothing, designed to raise label entropy while satisfying motion properties. Extensive experiments applying our motion label smoothing to three state-of-the-art methods across four real-world IMU datasets demonstrate its effectiveness and robust generalization (plug-and-play) capability.
OMRA: Online Motion Resolution Adaptation to Remedy Domain Shift in Learned Hierarchical B-frame Coding
Zong-Lin Gao, Sang NguyenQuang, Wen-Hsiao Peng
et al.
Learned hierarchical B-frame coding aims to leverage bi-directional reference frames for better coding efficiency. However, the domain shift between training and test scenarios due to dataset limitations poses a challenge. This issue arises from training the codec with small groups of pictures (GOP) but testing it on large GOPs. Specifically, the motion estimation network, when trained on small GOPs, is unable to handle large motion at test time, incurring a negative impact on compression performance. To mitigate the domain shift, we present an online motion resolution adaptation (OMRA) method. It adapts the spatial resolution of video frames on a per-frame basis to suit the capability of the motion estimation network in a pre-trained B-frame codec. Our OMRA is an online, inference technique. It need not re-train the codec and is readily applicable to existing B-frame codecs that adopt hierarchical bi-directional prediction. Experimental results show that OMRA significantly enhances the compression performance of two state-of-the-art learned B-frame codecs on commonly used datasets.
Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion
Zeyu Zhang, Yiran Wang, Biao Wu
et al.
In recent years, there has been significant interest in creating 3D avatars and motions, driven by their diverse applications in areas like film-making, video games, AR/VR, and human-robot interaction. However, current efforts primarily concentrate on either generating the 3D avatar mesh alone or producing motion sequences, with integrating these two aspects proving to be a persistent challenge. Additionally, while avatar and motion generation predominantly target humans, extending these techniques to animals remains a significant challenge due to inadequate training data and methods. To bridge these gaps, our paper presents three key contributions. Firstly, we proposed a novel agent-based approach named Motion Avatar, which allows for the automatic generation of high-quality customizable human and animal avatars with motions through text queries. The method significantly advanced the progress in dynamic 3D character generation. Secondly, we introduced a LLM planner that coordinates both motion and avatar generation, which transforms a discriminative planning into a customizable Q&A fashion. Lastly, we presented an animal motion dataset named Zoo-300K, comprising approximately 300,000 text-motion pairs across 65 animal categories and its building pipeline ZooGen, which serves as a valuable resource for the community. See project website https://steve-zeyu-zhang.github.io/MotionAvatar/
Nietzschean Themes in Béla Tarr's The Turin Horse
Paolo Stellino
Béla Tarr's last feature film The Turin Horse (2011) begins with a prologue that narrates Friedrich Nietzsche's mental breakdown in Turin in 1889, which was allegedly prompted by his witnessing a cab driver brutally whipping his horse. Nietzsche's name is not mentioned again in the film, and the viewer is left wondering what connection, if any, exists between the Nietzsche story and the film's narrative. Scholars often refer to one or another aspect of Nietzsche's philosophy when analysing Tarr's film. Yet, to date, no comprehensive study has been devoted to exploring the connection between The Turin Horse and Nietzsche's philosophy. This article seeks to fill this gap in the literature. The first section examines the connection between the prologue and the film. In the second section, attention is given to the most evident connection between Tarr's film and Nietzsche's philosophy, namely the use of circularity and repetition in The Turin Horse and Nietzsche's idea of the eternal recurrence of the same. The third section interprets the neighbour's monologue in light of Nietzsche's death of God. Finally, the fourth section is devoted to nihilism.
Motion pictures, Philosophy (General)
MotionGPT: Human Motion as a Foreign Language
Biao Jiang, Xin Chen, Wen Liu
et al.
Though the advancement of pre-trained large language models unfolds, the exploration of building a unified model for language and other multi-modal data, such as motion, remains challenging and untouched so far. Fortunately, human motion displays a semantic coupling akin to human language, often perceived as a form of body language. By fusing language data with large-scale motion models, motion-language pre-training that can enhance the performance of motion-related tasks becomes feasible. Driven by this insight, we propose MotionGPT, a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks. Specifically, we employ the discrete vector quantization for human motion and transfer 3D motion into motion tokens, similar to the generation process of word tokens. Building upon this "motion vocabulary", we perform language modeling on both motion and text in a unified manner, treating human motion as a specific language. Moreover, inspired by prompt learning, we pre-train MotionGPT with a mixture of motion-language data and fine-tune it on prompt-based question-and-answer tasks. Extensive experiments demonstrate that MotionGPT achieves state-of-the-art performances on multiple motion tasks including text-driven motion generation, motion captioning, motion prediction, and motion in-between.
Mitigating Motion Sickness with Optimization-based Motion Planning
Yanggu Zheng, Barys Shyrokau, Tamas Keviczky
The acceptance of automated driving is under the potential threat of motion sickness. It hinders the passengers' willingness to perform secondary activities. In order to mitigate motion sickness in automated vehicles, we propose an optimization-based motion planning algorithm that minimizes the distribution of acceleration energy within the frequency range that is found to be the most nauseogenic. The algorithm is formulated into integral and receding-horizon variants and compared with a commonly used alternative approach aiming to minimize accelerations in general. The proposed approach can reduce frequency-weighted acceleration by up to 11.3% compared with not considering the frequency sensitivity for the price of reduced overall acceleration comfort. Our simulation studies also reveal a loss of performance by the receding-horizon approach over the integral approach when varying the preview time and nominal sampling time. The computation time of the receding-horizon planner is around or below the real-time threshold when using a longer sampling time but without causing significant performance loss. We also present the results of experiments conducted to measure the performance of human drivers on a public road section that the simulated scenario is actually based on. The proposed method can achieve a 19\% improvement in general acceleration comfort or a 32% reduction in squared motion sickness dose value over the best-performing participant. The results demonstrate considerable potential for improving motion comfort and mitigating motion sickness using our approach in automated vehicles.
Langevin picture of subdiffusion in nonuniformly expanding medium
Xudong Wang, Yao Chen, Wanli Wang
Anomalous diffusion phenomena have been observed in many complex physical and biological systems. One significant advance recently is the physical extension of particle's motion in static medium to uniformly (and even nonuniformly) expanding medium. The dynamic mechanism of particle's motion in the nonuniformly expanding medium has only been investigated in the framework of continuous-time random walk. To study more physical observables and supplement the theory of the expanding medium problems, we characterize the nonuniformly expanding medium with a spatial-temporal dependent scale factor $a(x,t)$, and build the Langevin picture describing the particle's motion in the nonuniformly expanding medium. By introducing a new coordinate, besides of the existing comoving and physical coordinates, we build the relation between the nonuniformly expanding medium and the uniformly expanding one, and further obtain the moments of the comoving and physical coordinates. Both exponential and power-law formed scale factor are considered to uncover the combined effects of the particle's intrinsic diffusion and the nonuniform expansion of medium. Our detailed theoretical analyses and simulations provide the foundation for studying more expanding medium problems.
Constants of motion network
Muhammad Firmansyah Kasim, Yi Heng Lim
The beauty of physics is that there is usually a conserved quantity in an always-changing system, known as the constant of motion. Finding the constant of motion is important in understanding the dynamics of the system, but typically requires mathematical proficiency and manual analytical work. In this paper, we present a neural network that can simultaneously learn the dynamics of the system and the constants of motion from data. By exploiting the discovered constants of motion, it can produce better predictions on dynamics and can work on a wider range of systems than Hamiltonian-based neural networks. In addition, the training progresses of our method can be used as an indication of the number of constants of motion in a system which could be useful in studying a novel physical system.
Constrained motion design with distinct actuators and motion stabilization
Renate Sachse, Florian Geiger, Manfred Bischoff
The design of adaptive structures is one method to improve sustainability of buildings. Adaptive structures are able to adapt to different loading and environmental conditions or to changing requirements by either small or large shape changes. In the latter case, also the mechanics and properties of the deformation process play a role for the structure's energy efficiency. The method of variational motion design, previously developed in the group of the authors, allows to identify deformation paths between two given geometrical configurations that are optimal with respect to a defined quality function. In a preliminary, academic setting this method assumes that every single degree of freedom is accessible to arbitrary external actuation forces that realize the optimized motion. These (nodal) forces can be recovered a posteriori. The present contribution deals with an extension of the method of motion design by the constraint that the motion is to be realized by a predefined set of actuation forces. These can be either external forces or prescribed length chances of discrete, internal actuator elements. As an additional constraint, static stability of each intermediate configuration during the motion is taken into account. It can be accomplished by enforcing a positive determinant of the stiffness matrix.
Learning a Generative Motion Model from Image Sequences based on a Latent Motion Matrix
Julian Krebs, Hervé Delingette, Nicholas Ayache
et al.
We propose to learn a probabilistic motion model from a sequence of images for spatio-temporal registration. Our model encodes motion in a low-dimensional probabilistic space - the motion matrix - which enables various motion analysis tasks such as simulation and interpolation of realistic motion patterns allowing for faster data acquisition and data augmentation. More precisely, the motion matrix allows to transport the recovered motion from one subject to another simulating for example a pathological motion in a healthy subject without the need for inter-subject registration. The method is based on a conditional latent variable model that is trained using amortized variational inference. This unsupervised generative model follows a novel multivariate Gaussian process prior and is applied within a temporal convolutional network which leads to a diffeomorphic motion model. Temporal consistency and generalizability is further improved by applying a temporal dropout training scheme. Applied to cardiac cine-MRI sequences, we show improved registration accuracy and spatio-temporally smoother deformations compared to three state-of-the-art registration algorithms. Besides, we demonstrate the model's applicability for motion analysis, simulation and super-resolution by an improved motion reconstruction from sequences with missing frames compared to linear and cubic interpolation.
Real-time colour hologram generation based on ray-sampling plane with multi-GPU acceleration
H. Sato, T. Kakue, Y. Ichihashi
et al.
Although electro-holography can reconstruct three-dimensional (3D) motion pictures, its computational cost is too heavy to allow for real-time reconstruction of 3D motion pictures. This study explores accelerating colour hologram generation using light-ray information on a ray-sampling (RS) plane with a graphics processing unit (GPU) to realise a real-time holographic display system. We refer to an image corresponding to light-ray information as an RS image. Colour holograms were generated from three RS images with resolutions of 2,048 × 2,048; 3,072 × 3,072 and 4,096 × 4,096 pixels. The computational results indicate that the generation of the colour holograms using multiple GPUs (NVIDIA Geforce GTX 1080) was approximately 300–500 times faster than those generated using a central processing unit. In addition, the results demonstrate that 3D motion pictures were successfully reconstructed from RS images of 3,072 × 3,072 pixels at approximately 15 frames per second using an electro-holographic reconstruction system in which colour holograms were generated from RS images in real time.
38 sitasi
en
Computer Science, Medicine
Moving up in taste: Enhanced projected taste and freshness of moving food products
Yaniv Gvili, Aner Tal, Moty Amar
et al.
Poesia (território sagrado) e Cinema (acólito devoto)
Maria João SEIXAS
Entre a Poesia, como “território sagrado” e o Cinema, seu “acólito devoto” pretendeu-se trazer à discussão uma relação difícil e improvável. Foi a partir de A cor da
Romã (1968), de Serguei Paradjanov, que Maria João Seixas se propôs contribuir para uma reflexão sobre a relação entre o Cinema e o território literário, nomeadamente, a Poesia.
Visual arts, Motion pictures
Multidimensional Electronic Spectroscopy of Photochemical Reactions.
P. Nuernberger, Stefan Ruetzel, T. Brixner
87 sitasi
en
Chemistry, Medicine
Closely overlapping responses to tools and hands in left lateral occipitotemporal cortex.
S. Bracci, C. Cavina-Pratesi, M. Ietswaart
et al.
169 sitasi
en
Psychology, Medicine
Non ou a vã glória de mandar: um retrato identitário e geopolítico de Portugal
Mariana Veiga Copertino F. da SILVA
Um dos filmes mais conhecidos de Manoel de Oliveira, Non ou a vã glória de mandar, apresenta um grupo de soldados que enfrentam a guerra na África no ano de
1974. Oliveira reconta, em uma espécie de Os Lusíadas às avessas, a história de Portugal, mas através das amargas derrotas sofridas dentro e fora do território lusitano, sempre decorrentes da “vã glória de mandar”. Em cada episódio retratado, verifica-se o fracasso
em disputas sociais e territoriais - que vão desde o evento histórico de Viriato, líder das tribos lusitanas na resistência aos romanos; até a famigerada batalha de Alcácer-Quibir e os resultantes desígnios do sebastianismo. Elencando uma série de episódios que
remontam à história de Portugal, Manoel de Oliveira busca fazer uma reflexão sobre a dimensão humana e cultural do povo português tendo o espaço lusitano como mote para compreender essa relação. Este artigo procura analisar no filme Non ou a vã glória de mandar, as relações estabelecidas entre o cinema e o conceito de território que servem de fio condutor para a reflexão sobre a formação de uma nação, além de observar o
cinema de Oliveira como espaço para a expressão do retrato da identidade social portuguesa.
Visual arts, Motion pictures