EVA generates high-fidelity, real-time human renderings from arbitrary camera views, skeletal motion, and expression parameters. It leverages an expressive template geometry as a geometric proxy for its 3D Gaussian appearance model, enabling disentangled control over body, hands, and face


Abstract

With recent advancements in neural rendering and motion capture algorithms, remarkable progress has been made in photorealistic human avatar modeling, unlocking immense potential for applications in virtual reality, augmented reality, remote communication, and industries such as gaming, film, and medicine. However, existing methods fail to provide complete, faithful, and expressive control over human avatars due to their entangled representation of facial expressions and body movements. In this work, we introduce Expressive Virtual Avatars (EVA), an actor-specific, fully controllable, and expressive human avatar framework that achieves high-fidelity, lifelike renderings in real time while enabling independent control of facial expressions, body movements, and hand gestures. Specifically, our approach designs the human avatar as a two-layer model: an expressive template geometry layer and a 3D Gaussian appearance layer. First, we present an expressive template tracking algorithm that leverages coarse-to-fine optimization to accurately recover body motions, facial expressions, and non-rigid deformation parameters from multi-view videos. Next, we propose a novel decoupled 3D Gaussian appearance model designed to effectively disentangle body and facial appearance. Unlike unified Gaussian estimation approaches, our method employs two specialized and independent modules to model the body and face separately. Experimental results demonstrate that EVA surpasses state-of-the-art methods in terms of rendering quality and expressiveness, validating its effectiveness in creating full-body avatars. This work represents a significant advancement towards fully drivable digital human models, enabling the creation of lifelike digital avatars that faithfully replicate human geometry and appearance.

Main Video

Method

EVA generates high-fidelity renderings from a virtual viewpoint, skeletal motion, and expression parameters. Using a personalized head avatar and a deformable character model, we control body movements and facial expressions to drive an actor-specific mesh. This mesh generates motion-aware textures, and separate modules independently predict the Gaussian parameters for the face and body. The 3D Gaussians are combined, UV-mapped, and warped from canonical to posed space via dual quaternion skinning. Finally, they are splatted to render the final photorealistic image.

Citation

@article{junkawitsch2025eva,
title = {EVA: Expressive Virtual Avatars from Multi-view Videos.},
author = {Junkawitsch, Hendrik and Sun, Guoxing and Zhu, Heming and Theobalt, Christian and Habermann, Marc},
booktitle={SIGGRAPH 2025 Conference Papers},
pages={1--11},
year={2025}
}