ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering

^†Joint first authors.

^‡Corresponding author.

Figure. 1: ASH takes an arbitrary 3D skeletal pose and virtual camera view, which can be controlled by the user, as input, and generates a photorealistic rendering of the human in real time. To achieve this, we propose an efficient and animatable Gaussian representation, which is parameterized on the surface of a deformable template mesh.

Overview

Real-time rendering of photorealistic and controllable human avatars stands as a cornerstone in Computer Vision and Graphics. While recent advances in neural implicit rendering have unlocked unprecedented photorealism for digital avatars, real-time performance has mostly been demonstrated for static scenes only. To address this, we propose ASH, an Animatable Gaussian Splatting approach for photorealistic rendering of dynamic Humans in real time. We parameterize the clothed human as animatable 3D Gaussians, which can be efficiently splatted into image space to generate the final rendering. However, naively learning the Gaussian parameters in 3D space poses a severe challenge in terms of compute. Instead, we attach the Gaussians onto a deformable character model, and learn their parameters in 2D texture space, which allows leveraging efficient 2D convolutional architectures that easily scale with the required number of Gaussians. We benchmark ASH with competing methods on pose-controllable avatars, demonstrating that our method outperforms existing real-time methods by a large margin and shows comparable or even better results than offline methods.

Pipeline

Figure 2. ASH generates high-fidelity rendering given a skeletal motion and a virtual camera view. A motion-dependent, canonicalized template mesh is generated with a learned deformation network. From the canonical template mesh, we can render the motion-aware textures, which are further adopted for predicting the Gaussian splat parameters with two 2D convolutional networks, i.e., the Geometry and Appearance Decoder, as the texels in the 2D texture space. Through UV mapping and DQ skinning, we warp the Gaussian splats from the canonical space to the posed space. Then, splatting is adopted to render the posed Gaussian splats.

Main Video

Novel view Rendering

Free Viewpoint Rendering

Motion Retargeting

Ash allows to retarget the motion(the first column) from the source character(the second column) to the target character(the third column), while preserving photo-real clothing wrinkles and plausible dynamics.

Citation

							
@InProceedings{Pang_2024_CVPR,
   author    = {Pang, Haokai and Zhu, Heming and Kortylewski, Adam and Theobalt, Christian and Habermann, Marc},
   title     = {ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering},
   booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
   month     = {June},
   year      = {2024},
   pages     = {1165-1175}
}