Overview
Figure. 1: Given skeletal poses and a virtual camera as inputs, MUA produces photorealistic renderings and detailed geometry of animatable clothed humans. By distilling the ultra-high-quality teacher avatar model, i.e., UMA, into a compact student representation, MUA preserves large-scale clothing dynamics together with fine geometric and appearance details, while reducing computation by three orders of magnitude and achieving over 180 FPS on a personal computer. Moreover, MUA enables real-time on-device inference at 24 FPS on a standalone Meta Quest 3 headset, advancing the practical deployment of highly detailed animatable avatars on VR headsets and other computation-constrained mobile platforms.