UMA:
Ultra-detailed Human Avatars via
Multi-level Surface
Alignment
‡Corresponding author.
Figure. 1: Given skeletal poses and a virtual camera, UMA renders ultra-detailed clothed human appearance and synthesizes high-fidelity geometry. Notably, UMA enables users to digitally zoom in, allowing close inspection of texture details or even fine yarn-level patterns. Additionally, we introduce a new dataset featuring multi-view 6K video recordings, capturing subjects wearing clothing with challenging texture patterns and rich dynamics. The fidelity of the reconstructed avatars makes them particularly suitable for virtual and mixed reality, where users can closely observe fine-grained appearance details.
Figure 2. UMA takes skeletal motion and the camera view as input and generates high-fidelity geometry and appearance. For avatar representation, to address the stochasticity of the clothing dynamics that cannot be modeled by the skeletal motions, we inject a learnable latent code $\mathbf{z}_f$ (zero latent $\mathbf{z}_{0}$ for testing) into the drivable template $\mathbf{V}_f$. A texel super-resolution module $\mathcal{E}_\mathrm{sr}$ is adopted to densify the animatable Gaussian textures. For multi-level surface alignment, we supervise the surface geometry at both the vertex and texel levels using novel supervision derived from a foundational 2D point tracker. Specifically, the 2D point tracks $\mathbf{P}_{f,c,i}$ between the rasterized and ground-truth images obtained from the tracker are lifted and aggregated into 3D correspondences $\tilde{\mathbf{P}}_{f,i}$ across multiple views using the drivable template $\mathbf{V}_f$.
UMA generates ultra-detailed geometry that captures fine clothing wrinkles, sharing the same triangulation and in correspondence over time.
UMA allows to retarget the motion from the source character (first column) to the target character (second and the third column), while preserving photo-real clothing wrinkles and plausible dynamics. Please enable fullscreen mode for better viewing experience.
Thanks to the texel-aligned consistent geometry, UMA enables consistent texture editing. Notably, the inserted texture deforms seamlessly with the clothing wrinkles and remains consistently anchored to the characters' original texture.
@article{zhu2025ultra,
title={UMA: Ultra-detailed Human Avatars via Multi-level Surface Alignment},
author={Zhu, Heming and Sun, Guoxing and Theobalt, Christian and Habermann, Marc},
journal={arXiv preprint arXiv:2506.01802},
year={2025}
}