MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis
Abstract
Conventional methods for human motion synthesis have either been deterministic or have had to struggle with the trade-off between motion diversity vs~motion quality. In response to these limitations, we introduce MoFusion, i.e., a new denoising-diffusion-based framework for high-quality conditional human motion synthesis that can synthesise long, temporally plausible, and semantically accurate motions based on a range of conditioning contexts (such as music and text). We also present ways to introduce well-known kinematic losses for motion plausibility within the motion diffusion framework through our scheduled weighting strategy. We also present ways to introduce well-known kinematic losses for motion plausibility within the motion diffusion framework through our scheduled weighting strategy. The learned latent space can be used for several interactive motion-editing applications like in-betweening, seed-conditioning, and text-based editing, thus, providing crucial abilities for virtual-character animation and robotics. Through comprehensive quantitative evaluations and a perceptual user study, we demonstrate the effectiveness of MoFusion compared to the state-of-the-art on established benchmarks in the literature. We urge the reader to watch our supplementary video.
Video
Download Video: HD (MP4, 68 MB)
Reverse Diffusion Process for Human Motion Synthesis
Download Video: HD
Music-to-Dance Generation
Results
Please unmute the audio to hear corresponding music.
Download Video: HD
Seed Conditioned Motion Forecasting
Please unmute the audio to hear corresponding music.
Download Video: HD
Quality Comparison with State-of-the-art
We observe better perceptual quality in dance generation on unseen music as compared to ground truth data and state-of-the-art in Music-to-Dance Generation, despite having higher FID. Note that lower FID doesn't correspond to better synthesis quality as seen in the examples below.
Please unmute the audio to hear corresponding music.
Download Video: HD
Downloads
Citation
@InProceedings{dabral2022mofusion, title={MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis}, author={Rishabh Dabral and Muhammad Hamza Mughal and Vladislav Golyanik and Christian Theobalt}, booktitle={Computer Vision and Pattern Recognition (CVPR)}, year={2023} }
Contact
For questions, clarifications, please get in touch with:Rishabh Dabral
rdabral@mpi-inf.mpg.de
Vladislav Golyanik
golyanik@mpi-inf.mpg.de