TriHuman: A Real-time and Controllable Tri-plane Representation for Detailed Human Geometry and Appearance Synthesis

1Max Planck Institute for Informatics, Saarland Informatics Campus,

2Saarbr├╝cken Research Center for Visual Computing, Interaction and AI

Corresponding author.

Figure. 1: TriHuman renders photorealistic images of the virtual human and generates high-fidelity and topology consistent clothed human geometry given the skeletal motion and virtual camera view as input. Importantly, our method runs in real-time due to our efficient human representation and can be solely supervised on multi-view imagery during training.


Creating controllable, photorealistic, and geometrically detailed digital doubles of real humans solely from video data is a key challenge in Computer Graphics and Vision, especially when real-time performance is required. Recent methods attach a neural radiance field (NeRF) to an articulated structure, e.g., a body model or a skeleton, to map points into a pose canonical space while conditioning the NeRF on the skeletal pose. These approaches typically parameterize the neural field with a multi-layer perceptron (MLP) leading to a slow runtime. To address this drawback, we propose TriHuman a novel human-tailored, deformable, and efficient tri-plane representation, which achieves real-time performance, state-of-the-art pose-controllable geometry synthesis as well as photorealistic rendering quality. At the core, we non-rigidly warp global ray samples into our undeformed tri-plane texture space, which effectively addresses the problem of global points being mapped to the same tri-plane locations. We then show how such a tri-plane feature representation can be conditioned on the skeletal motion to account for dynamic appearance and geometry changes. Our results demonstrate a clear step towards higher quality in terms of geometry and appearance modeling of humans and runtime performance.


Figure 2.: Given a skeletal motion and virtual camera view as input, our method generates highly realistic renderings of the human under the specified pose and view. To this end, first a rough motion-dependent and deforming human mesh is regressed. From the deformed mesh, we extract several motion features in texture space, which are then passed through a 3D-aware convolutional architecture to generate a motion-conditioned feature tri-plane. Ray samples in global space can be mapped into a 3D texture cube, which can be then used to sample a feature from the tri-plane. This feature is then passed to a small MLP predicting color and density. Finally, volume rendering and our proposed mesh optimization can generate the geometry and images. Our method is solely supervised on multi-view imagery.


Table 1: Statistics of the subjects in our dataset. Hover on the sbujects' names to see the apperances. Note that for each subject, we provide a separate testing sequence to validate the generalization ability of the model.

Name Length (Train) Length (Test) Cameras Type Rigged Masks GT Meshes Pose Hand
Subject0000 Your Image 19000 6900 54 tight
Subject0003 Your Image 19000 7000 101 tight
Subject0005 Your Image 19000 7000 94 loose
Subject0010 Your Image 33000 7000 116 loose
Subject0021 Your Image 33000 7000 116 loose
Subject0028 Your Image 27000 7000 116 tight

Main Video

Motion Retargeting

TriHuman allows to retarget the motion from the source character from the source character (the first column) to the target character (the remaining columns), while preserving photo-real wrinkles and vivid dynamics. If the videos are not synchronized, please consider to Reload the Page(F5).

Novel View Rendering

Free Viewpoint Rendering

Training Motion Geometry Synthesis

Testing Motion Geometry Synthesis

Consistent Texture Editing


    title={TriHuman : A Real-time and Controllable Tri-plane Representation for Detailed Human Geometry and Appearance Synthesis}, 
    author={Heming Zhu and Fangneng Zhan and Christian Theobalt and Marc Habermann},