Recent advances in humanoid locomotion have enabled dynamic behaviors such as dancing, martial arts, and parkour, yet these capabilities are predominantly demonstrated in open, flat, and obstacle-free settings. In contrast, real-world environments such as homes, offices, and public spaces, are densely cluttered, three-dimensional, and geometrically constrained, requiring scene-aware whole-body coordination, precise balance control, and reasoning over spatial constraints imposed by furniture and household objects. However, humanoid locomotion in cluttered 3D environments remains underexplored, and no public dataset systematically couples full-body human locomotion with the scene geometry that shapes it. To address this gap, we present Moving Through Clutter (MTC), an opensource Virtual Reality (VR) based data collection and evaluation framework for scene-aware humanoid locomotion in cluttered environments. Our system procedurally generates scenes with controllable clutter levels and captures embodiment-consistent, whole-body human motion through immersive VR navigation, which is then automatically retargeted to a humanoid robot model. We further introduce benchmarks that quantify environment clutter level and locomotion performance, including stability and collision safety. Using this framework, we compile a dataset of 348 trajectories across 145 diverse 3D cluttered scenes. The dataset provides a foundation for studying geometry-induced adaptation in humanoid locomotion and developing scene-aware planning and control methods.
Despite significant advances in quadrupedal robotics, a critical gap persists in foundational motion resources that holistically integrate diverse locomotion, emotionally expressive behaviors, and rich language semantics-essential for agile, intuitive human-robot interaction. Current quadruped motion datasets are limited to a few mocap primitives (e.g., walk, trot, sit) and lack diverse behaviors with rich language grounding. To bridge this gap, we introduce Quadruped Foundational Motion (QuadFM) , the first large-scale, ultra-high-fidelity dataset designed for text-to-motion generation and general motion control. QuadFM contains 11,784 curated motion clips spanning locomotion, interactive, and emotion-expressive behaviors (e.g., dancing, stretching, peeing), each with three-layer annotation-fine-grained action labels, interaction scenarios, and natural language commands-totaling 35,352 descriptions to support language-conditioned understanding and command execution. We further propose Gen2Control RL, a unified framework that jointly trains a general motion controller and a text-to-motion generator, enabling efficient end-to-end inference on edge hardware. On a real quadruped robot with an NVIDIA Orin, our system achieves real-time motion synthesis (<500 ms latency). Simulation and real-world results show realistic, diverse motions while maintaining robust physical interaction. The dataset will be released at https://github.com/GaoLii/QuadFM.
Motion in-betweening is the problem to synthesize movement between keyposes. Traditional research focused primarily on single characters. Extending them to densely interacting characters is highly challenging, as it demands precise spatial-temporal correspondence between the characters to maintain the interaction, while creating natural transitions towards predefined keyposes. In this research, we present a method for long-horizon interaction in-betweening that enables two characters to engage and respond to one another naturally. To effectively represent and synthesize interactions, we propose a novel solution called Cross-Space In-Betweening, which models the interactions of each character across different conditioning representation spaces. We further observe that the significantly increased constraints in interacting characters heavily limit the solution space, leading to degraded motion quality and diminished interaction over time. To enable long-horizon synthesis, we present two solutions to maintain long-term interaction and motion quality, thereby keeping synthesis in the stable region of the solution space.We first sustain interaction quality by identifying periodic interaction patterns through adversarial learning. We further maintain the motion quality by learning to refine the drifted latent space and prevent pose error accumulation. We demonstrate that our approach produces realistic, controllable, and long-horizon in-between motions of two characters with dynamic boxing and dancing actions across multiple keyposes, supported by extensive quantitative evaluations and user studies.
With the rapid advancement of game and film production, generating interactive motion from texts has garnered significant attention due to its potential to revolutionize content creation processes. In many practical applications, there is a need to impose strict constraints on the motion range or trajectory of virtual characters. However, existing methods that rely solely on textual input face substantial challenges in accurately capturing the user's intent, particularly in specifying the desired trajectory. As a result, the generated motions often lack plausibility and accuracy. Moreover, existing trajectory - based methods for customized motion generation rely on retraining for single - actor scenarios, which limits flexibility and adaptability to different datasets, as well as interactivity in two-actor motions. To generate interactive motion following specified trajectories, this paper decouples complex motion into a Leader - Follower dynamic, inspired by role allocation in partner dancing. Based on this framework, this paper explores the motion range refinement process in interactive motion generation and proposes a training-free approach, integrating a Pace Controller and a Kinematic Synchronization Adapter. The framework enhances the ability of existing models to generate motion that adheres to trajectory by controlling the leader's movement and correcting the follower's motion to align with the leader. Experimental results show that the proposed approach, by better leveraging trajectory information, outperforms existing methods in both realism and accuracy.
Hitoshi Suda, Junya Koguchi, Shunsuke Yoshida
et al.
Japanese idol groups, comprising performers known as "idols," are an indispensable part of Japanese pop culture. They frequently appear in live concerts and television programs, entertaining audiences with their singing and dancing. Similar to other J-pop songs, idol group music covers a wide range of styles, with various types of chord progressions and instrumental arrangements. These tracks often feature numerous instruments and employ complex mastering techniques, resulting in high signal loudness. Additionally, most songs include a song division (utawari) structure, in which members alternate between singing solos and performing together. Hence, these songs are well-suited for benchmarking various music information processing techniques such as singer diarization, music source separation, and automatic chord estimation under challenging conditions. Focusing on these characteristics, we constructed a song corpus titled IdolSongsJp by commissioning professional composers to create 15 tracks in the style of Japanese idol groups. This corpus includes not only mastered audio tracks but also stems for music source separation, dry vocal tracks, and chord annotations. This paper provides a detailed description of the corpus, demonstrates its diversity through comparisons with real-world idol group songs, and presents its application in evaluating several music information processing techniques.
Research on integrating emerging technologies, such as robots, into K-12 education has been growing because of their benefits in creating engaging learning environments and preparing children for appropriate human-robot interactions in the future. However, most studies have focused on the impact of robots in formal educational settings, leaving their effectiveness in informal settings, such as afterschool programs, unclear. The present study developed a 9-week afterschool program in an elementary school to promote STEAM (STEM + Art) education for elementary school students. The program incorporated four modules (Acting, Dancing, Music & Sounds, and Drawing), each with specific learning objectives and concluding with a theater play at the end. This program facilitated hands-on activities with social robots to create engaging learning experiences for children. A total of 38 students, aged 6–10 years, participated in the afterschool program. Among these students, 21 took part in research activities, which included answering questions about their perceptions of robots compared to other entities (i.e., babies and beetles), learning interest and curiosity, and their opinions about robots. In addition, four teachers and staff participated in interviews, sharing their reflections on children’s learning experiences with robots and their perceptions of the program. Our results showed that 1) children perceived robots as having limited affective and social capabilities but gained a more realistic understanding of their physiological senses and agentic capabilities; 2) children were enthusiastic about interacting with robots and learning about robot-related technologies, and 3) teachers recognized the importance of embodied learning and the benefits of using robots in the afterschool program; however, they also expressed concerns that robots could be potential distractions and negatively impact students’ interpersonal relationships with peers in educational settings. These findings suggest how robots can shape children’s perceptions of robots and their learning experiences in informal education, providing design guidelines for future educational programs that incorporate social robots for young learners.
Mechanical engineering and machinery, Electronic computers. Computer science
The remarkable athletic intelligence displayed by humans in complex dynamic movements such as dancing and gymnastics suggests that the balance mechanism in biological beings is decoupled from specific movement patterns. This decoupling allows for the execution of both learned and unlearned movements under certain constraints while maintaining balance through minor whole-body coordination. To replicate this balance ability and body agility, this paper proposes a versatile controller for bipedal robots. This controller achieves ankle and body trajectory tracking across a wide range of gaits using a single small-scale neural network, which is based on a model-based IK solver and reinforcement learning. We consider a single step as the smallest control unit and design a universally applicable control input form suitable for any single-step variation. Highly flexible gait control can be achieved by combining these minimal control units with high-level policy through our extensible control interface. To enhance the trajectory-tracking capability of our controller, we utilize a three-stage training curriculum. After training, the robot can move freely between target footholds at varying distances and heights. The robot can also maintain static balance without repeated stepping to adjust posture. Finally, we evaluate the tracking accuracy of our controller on various bipedal tasks, and the effectiveness of our control framework is verified in the simulation environment.
Humanoid activities involving sequential contacts are crucial for complex robotic interactions and operations in the real world and are traditionally solved by model-based motion planning, which is time-consuming and often relies on simplified dynamics models. Although model-free reinforcement learning (RL) has become a powerful tool for versatile and robust whole-body humanoid control, it still requires tedious task-specific tuning and state machine design and suffers from long-horizon exploration issues in tasks involving contact sequences. In this work, we propose WoCoCo (Whole-Body Control with Sequential Contacts), a unified framework to learn whole-body humanoid control with sequential contacts by naturally decomposing the tasks into separate contact stages. Such decomposition facilitates simple and general policy learning pipelines through task-agnostic reward and sim-to-real designs, requiring only one or two task-related terms to be specified for each task. We demonstrated that end-to-end RL-based controllers trained with WoCoCo enable four challenging whole-body humanoid tasks involving diverse contact sequences in the real world without any motion priors: 1) versatile parkour jumping, 2) box loco-manipulation, 3) dynamic clap-and-tap dancing, and 4) cliffside climbing. We further show that WoCoCo is a general framework beyond humanoid by applying it in 22-DoF dinosaur robot loco-manipulation tasks.
The reasoning segmentation task, which demands a nuanced comprehension of intricate queries to accurately pinpoint object regions, is attracting increasing attention. However, Multi-modal Large Language Models (MLLM) often find it difficult to accurately localize the objects described in complex reasoning contexts. We believe that the act of reasoning segmentation should mirror the cognitive stages of human visual search, where each step is a progressive refinement of thought toward the final object. Thus we introduce the Chains of Reasoning and Segmenting (CoReS) and find this top-down visual hierarchy indeed enhances the visual search process. Specifically, we propose a dual-chain structure that generates multi-modal, chain-like outputs to aid the segmentation process. Furthermore, to steer the MLLM's outputs into this intended hierarchy, we incorporate in-context inputs as guidance. Extensive experiments demonstrate the superior performance of our CoReS, which surpasses the state-of-the-art method by 6.5\% on the ReasonSeg dataset. Project: https://chain-of-reasoning-and-segmentation.github.io/.
Rawal Khirodkar, Jyun-Ting Song, Jinkun Cao
et al.
Understanding how humans interact with each other is key to building realistic multi-human virtual reality systems. This area remains relatively unexplored due to the lack of large-scale datasets. Recent datasets focusing on this issue mainly consist of activities captured entirely in controlled indoor environments with choreographed actions, significantly affecting their diversity. To address this, we introduce Harmony4D, a multi-view video dataset for human-human interaction featuring in-the-wild activities such as wrestling, dancing, MMA, and more. We use a flexible multi-view capture system to record these dynamic activities and provide annotations for human detection, tracking, 2D/3D pose estimation, and mesh recovery for closely interacting subjects. We propose a novel markerless algorithm to track 3D human poses in severe occlusion and close interaction to obtain our annotations with minimal manual intervention. Harmony4D consists of 1.66 million images and 3.32 million human instances from more than 20 synchronized cameras with 208 video sequences spanning diverse environments and 24 unique subjects. We rigorously evaluate existing state-of-the-art methods for mesh recovery and highlight their significant limitations in modeling close interaction scenarios. Additionally, we fine-tune a pre-trained HMR2.0 model on Harmony4D and demonstrate an improved performance of 54.8% PVE in scenes with severe occlusion and contact. Code and data are available at https://jyuntins.github.io/harmony4d/.
This paper deals with the concept of liveness and digital performance during Covid-19 through Dangdut performance. This paper originated because after the spread of the Covid-19 virus in March 2020, Dangdut could not be performed in its regular stages and several performances cancelled. Covid-19 impacted many kinds of performances, but Dangdut is highlighted because the performance stimulates the audience to respond by dancing on the stage and giving tips to the singer (sawer). The live performance creates ambiance, intimacy, and interactivity among the singers, musicians, and audiences. In short, liveness is essential in Dangdut performance. How do musicians sustain live, staged performances of Dangdut during Covid-19? What efforts did the musicians make? How does digital technology assist in the production of liveness for Dangdut? This paper will discuss what has been done by Dangdut actors in media streaming. To articulate the phenomenon, I referred to a debate about the liveness of live performance and live streaming from a performance studies perspective. This paper discusses how performers such as Ndarboy Genk, Guyon Waton, Denny Caknan, and OM Wawes tried to solve the liveness problem. These performances have the intention to enlighten the experience of the liveness in digital performance. These performances proved the resilience of Dangdut agents during the Covid-19. This paper will enrich the point of view of performance studies and popular music studies.
Tulisan ini menyoal tentang liveness dan pertunjukan digital di masa Covid-19 melalui pertunjukan Dangdut. Tulisan ini terstimulasi pasca tersebarnya virus Covid-19 pada Maret 2020, Dangdut tidak dapat melakukan pementasan di panggung berkala dan banyak pertunjukan dibatalkan. Covid-19 terjadi dan berimplikasi kepada banyak jenis pertunjukan, tetapi saya menggarisbawahi Dangdut karena panggung Dangdut menstimulasi penonton untuk merespons, semisal berjoged di panggung dan memberi tips kepada penyanyi (sawer). Pertunjukan live mencipta ambiance, keintiman, dan interaktivitas antara penyanyi, musisi, dan penonton. Singkat kata, saya menyadari jika pada pertunjukan Dangdut perihal liveness menjadi esensial. Atas dasar itu, bagaimana musisi bertahan, mementaskan pertunjukan sepanjang Covid-19? Apa upaya yang dibuat oleh musisi? Bagaimana teknologi digital membantu dalam memproduksi liveness? Artikel ini akan menyelisik dan meneliti apa-apa saja yang telah dilakukan oleh agen Dangdut pada media tertayang. Untuk meneliti fenomena ini, Saya merujuk pada perdebatan mengenai liveness pada pertunjukan langsung atau pertunjukan tertaryang dari sudut pandang Performance Studies. Lebih lanjut, artikel ini mendiskusikan bagaimana musisi seperti Ndarboy Genk, Guyon Waton, Denny Caknan, dan OM Wawes mencoba meretas persoalan liveness pada dangdut. Pertunjukan-pertunjukan it memiliki intensi untuk mencerahkan pengalaman akan liveness pada pertunjukan digital. Pertunjukan-pertunjukan itu membuktikan kebertahanan dari agen Dangdut selama Covid-19. Dengan membongkar hal ini, artikel ini akan memperkaya sudut pandang Performance Studies dan kajian musik populer.
Tianyu Li, Hyunyoung Jung, Matthew Gombolay
et al.
Human motion driven control (HMDC) is an effective approach for generating natural and compelling robot motions while preserving high-level semantics. However, establishing the correspondence between humans and robots with different body structures is not straightforward due to the mismatches in kinematics and dynamics properties, which causes intrinsic ambiguity to the problem. Many previous algorithms approach this motion retargeting problem with unsupervised learning, which requires the prerequisite skill sets. However, it will be extremely costly to learn all the skills without understanding the given human motions, particularly for high-dimensional robots. In this work, we introduce CrossLoco, a guided unsupervised reinforcement learning framework that simultaneously learns robot skills and their correspondence to human motions. Our key innovation is to introduce a cycle-consistency-based reward term designed to maximize the mutual information between human motions and robot states. We demonstrate that the proposed framework can generate compelling robot motions by translating diverse human motions, such as running, hopping, and dancing. We quantitatively compare our CrossLoco against the manually engineered and unsupervised baseline algorithms along with the ablated versions of our framework and demonstrate that our method translates human motions with better accuracy, diversity, and user preference. We also showcase its utility in other applications, such as synthesizing robot movements from language input and enabling interactive robot control.
The history of choreomania recounts how a dancing crowd in the streets has consistently been viewed with suspicion. Estatic explosions of relentless dances, sudden spasmodic movements, bodily convulsions, and uncontainable gestures have recursively involved groups of people in public spaces, provoking religious condemnation, moral disapproval, political control maneuvers, and medical discourse-driven pathologization. Choreographer and researcher Mette Ingvartsen devotes a substantial period of investigation to this topic, leading to the performance The Dancing Public, a performance that invites the spectators to experience dancing together, to dwell within the sympathetic vibration collectively produced. The essay analyzes the writing of body and voice, conceived by Ingvartsen in the aftermath of forced confinement, biomedical controls of the anti-pandemic agenda of Covid-19, revealing a biopolitical unease rooted in the present that retroactively engages with history through a choreography of affections.
Charlotte Svendler Nielsen, Tone Pernille Østern, Kristine Høeg Karlsen
et al.
This article seeks to create an overview of existing structures for dance education in the public educational systems and of cross-sectoral collaborations in the Nordic countries Denmark, Norway and Finland. A case study methodology of the field of dance education of each of the countries is used for an analysis that seeks to better understand the different kinds of structures we find in these countries. We trace ways of organising, dividing, and defining the field based on different types of documents such as policy documents, white papers, webpages, reports, research articles, and curricula. The analyses of case descriptions result in insights into which opportunities or lack of opportunities structures give for children and young people’s long-term engagement with dance as an arts educational practice, how well the systems for educating teachers seem to support dance in education, and, looking to dance education in New Zealand, there is a discussion about what might be ways forward to strengthen the field in the Nordic countries.
Tari Barong Wadon merupakan tari kreasi yang diciptakan oleh Tantin Hermawati terinspirasi dari kesenian Barongan yang hidup dan berkembang di Kabupaten Pati. Sesuai dengan namanya, Tari Barong Wadon dimainkan oleh penari wanita tetapi gerak yang digunakan yaitu gerak gagahan dalam tari putra. Tujuan penelitian ini yaitu untuk mendeskripsikan bentuk tari dan menguraikan proses penciptaan tari Barong Wadon. Penelitian dilakukan dengan mengunakan metode kualitatif bersifat deskriptif dengan pendekatan etnokoreologi. Pengumpulan data menggunakan teknik observasi, wawancara, dan dokumentasi. Analisis data menggunakan teknik reduksi, penyajian, dan menyimpulkan data. Hasil penelitian menunjukkan bahwa Tari Barong Wadon merupakan tari kreasi yang proses penciptaannya mempertimbangkan aspek bentuk tari dan melalui tahapan koreografi meliputi tahap eksplorasi, improvisasi, dan komposisi. Bentuk tari Barong Wadon meliputi gerak, iringan, tema, tata busana, tata rias, properti, penari, dan tempat pertunjukan. Keunikan tari Barong Wadon selain ditarikan oleh penari perempuan yaitu memiliki karakter gerak perpaduan antara gerak gagahan dan gerak wanita yang memiliki karakter centil, serta memakai topeng Barongan sebagai properti sehingga memiliki daya tarik tersendiri.
Eating disorders among adolescent girls are a public health concern. Adolescent girls that participate in aesthetic sport, such as dance, are of particular concern as they experience the highest rates of clinical eating disorders. The purpose of this study is to explore the experiences of young girls in the world of competitive dance and examine how these experiences shape their relationship with the body; feminist poststructural discourse analysis was employed to critically explore this relationship. Interviews were conducted across Canada with twelve young girls in competitive dance (14–18 years of age) to better understand how the dominant discourses in the world of competitive dance constitute the beliefs, values and practices about body and body image. Environment, parents, coaches, and peers emerged as the largest influencers in shaping the young dancers’ relationship with their body. These influencers were found to generate and perpetuate body image discourses that reinforce the ideal dancer’s body and negative body image.
Isabella Graßl, Katharina Geldreich, Gordon Fraser
Block-based programming environments such as Scratch are an essential entry point to computer science. In order to create an effective learning environment that has the potential to address the gender imbalance in computer science, it is essential to better understand gender-specific differences in how children use such programming environments. In this paper, we explore gender differences and similarities in Scratch programs along two dimensions: In order to understand what motivates girls and boys to use Scratch, we apply a topic analysis using unsupervised machine learning for the first time on Scratch programs, using a dataset of 317 programs created by girls and boys in the range of 8-10 years. In order to understand how they program for these topics, we apply automated program analysis on the code implemented in these projects. We find that, in-line with common stereotypes, girls prefer topics that revolve around unicorns, celebrating, dancing and music, while boys tend to prefer gloomy topics with bats and ghouls, or competitive ones such as soccer or basketball. Girls prefer animations and stories, resulting in simpler control structures, while boys create games with more loops and conditional statements, resulting in more complex programs. Considering these differences can help to improve the learning outcomes and the resulting computing-related self-concepts, which are prerequisites for developing a longer-term interest in computer science.