ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions

Ghosh, Anindita; Dabral, Rishabh; Golyanik, Vladislav; Theobalt, Christian; Slusallek, Philipp

ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions

Anindita Ghosh^1,2,3 Rishabh Dabral^2,3 Vladislav Golyanik^2,3 Christian Theobalt^2,3 Philipp Slusallek^1,3

¹German Research Center for Artificial Intelligence (DFKI),
²Max Planck Institute for Informatics,
³Saarland Informatics Campus

European Conference on Computer Vision, 2024

ReMoCap Dataset (Released) | Code

Paper | Appendix | Supplementary Video

Visualizations of reactive 3D motion sequences synthesized with the proposed ReMoS approach. Given the 3D motion of the acting person (red), we synthesize the 3D full-body motion of the reacting person (blue) with meaningful interactions between the two (Ninjutsu practice in the top left and Lindy Hop dancing in top right). The synthesized hand interactions are enlarged and highlighted with circles. Application of our results in virtual character animation (bottom row): the red arrow indicates the input actor and the blue arrow indicates the synthesized reactor.

Abstract

Current approaches for 3D human motion synthesis generate high-quality animations of digital humans performing a wide variety of actions and gestures. However, a notable technological gap exists in ad- dressing the complex dynamics of multi-human interactions within this paradigm. In this work, we present ReMoS, a denoising diffusion-based model that synthesizes full-body reactive motion of a person in a two- person interaction scenario. Given the motion of one person, we employ a combined spatio-temporal cross-attention mechanism to synthesize the reactive body and hand motion of the second person, thereby completing the interactions between the two. We demonstrate ReMoS across chal- lenging two-person scenarios such as pair-dancing, Ninjutsu, kickboxing, and acrobatics, where one person’s movements have complex and di- verse influences on the other. We also contribute the ReMoCap dataset for two-person interactions containing full-body and finger motions. We evaluate ReMoS through multiple quantitative metrics, qualitative vi- sualizations, and a user study, and also indicate usability in interactive motion editing applications.

Approach

ReMoS Framework. Given the full-body sequence of the actor (in red, left), we input noisy body and hand samples (from below) in a cascaded fashion. We synthesize the body samples first, and use them for hand-interaction-aware attention masking (top-center) to synthesize the denoised hand samples (top-right). The full-body reactive motion is a concatenation of the denoised body and hand samples (in blue, right).

ReMoCap Dataset

We propose the ReMoCap dataset for two-person interactions consisting of fullbody and hand motions. The dataset captures interactive, challenging two-person motions in two scenarios: the fast-paced swing style Lindy Hop dancing and the martial art technique of Ninjutsu.

Dataset Download Link : Remocap.zip

Samples from the ReMoCap dataset.

Results

Generalizability of ReMoS in various two-person scenarios.

Applications of ReMoS in character animation and motion editing.

Quantitative Evaluation

Citation

@InProceedings{ghosh2024remos,
title={ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions},
author={Ghosh, Anindita and Dabral, Rishabh and Golyanik, Vladislav and Theobalt, Christian and Slusallek, Philipp},
booktitle={European Conference on Computer Vision (ECCV)},
year={2024}
}

Contact

For questions, clarifications, please get in touch with:
Anindita Ghosh
anghosh@mpi-inf.mpg.de

This page is Zotero translator friendly. Imprint. Data Protection.