A Motion Matching-based Framework for Controllable Gesture Synthesis from Speech

Download Video: HD (MP4, 53 MB)


Recent deep learning-based approaches have shown promising results for synthesizing plausible 3D human gestures from speech input. However, these approaches typically offer limited freedom to incorporate user control. Furthermore, training such models in a supervised manner often does not capture the multi-modal nature of the data, particularly because the same audio input can produce different gesture outputs. To address these problems, we present an approach for generating controllable 3D gestures that combines the advantage of database matching and deep generative modeling. Our method predicts 3D body motion by sequentially searching for the most plausible audio-gesture clips from a database using a k-Nearest Neighbors (k-NN) algorithm that considers the similarity to both the input audio and the previous body pose information. To further improve the synthesis quality, we propose a conditional Generative Adversarial Network (cGAN) model to provide a data-driven refinement to the k-NN result by comparing its plausibility against the ground truth audio-gesture pairs. Our novel approach enables direct and more varied control manipulation that is not possible with prior learning-based counterparts. Our experiments show that our proposed approach outperforms recent models on control-based synthesis tasks using high-level signals such as motion statistics while enabling flexible and effective user control for lower-level signals.



BibTeX, 1 KB

Author = {Habibie, Ikhsanul and Elgharib, Mohamed and Sarkar, Kripashindu and Abdullah, Ahsan and Nyatsanga, Simbarashe and Neff, Michael and Theobalt, Christian},
Title = {A Motion Matching-based Framework for Controllable Gesture Synthesis from Speech},
Booktitle = {SIGGRAPH ’22 Conference Proceedings},
Year = {2022}


This work was supported by the ERC Consolidator Grant 4DRepLy (770784).


For questions, clarifications, please get in touch with:
Ikhsanul Habibie

This page is Zotero translator friendly. Page last updated Imprint. Data Protection.