Real-time Free-view Human Rendering from Sparse-view RGB Videos using Double Unprojected Textures

Abstract

Real-time free-view human rendering from sparse-view RGB inputs is a challenging task due to the sensor scarcity and the tight time budget. To ensure efficiency, recent methods leverage 2D CNNs operating in texture space to learn rendering primitives. However, they either jointly learn geometry and appearance, or completely ignore sparse image information for geometry estimation, significantly harming visual quality and robustness to unseen body poses. To address these issues, we present Double Unprojected Textures, which at the core disentangles coarse geometric deformation estimation from appearance synthesis, enabling robust and photorealistic 4K rendering in real-time. Specifically, we first introduce a novel image-conditioned template deformation network, which estimates the coarse deformation of the human template from a first unprojected texture. This updated geometry is then used to apply a second and more accurate texture unprojection. The resulting texture map has fewer artifacts and better alignment with input views, which benefits our learning of finer-level geometry and appearance represented by Gaussian splats. We validate the effectiveness and efficiency of the proposed method in quantitative and qualitative experiments, which significantly surpasses other state-of-the-art methods.

Method

Given sparse-view images and respective motion, DUT predicts coarse template geometry and fine-grained 3D Gaussians. We first unproject images onto the posed template to obtain a texture map, which is fed into GeoNet to estimate deformations of the template in canonical pose. We then unproject images again onto the posed and deformed template to obtain a less-distorted texture map, which serves as input to our GauNet estimating 3D Gaussian parameters and undergoes scale refinement before splatting.

Under the same body pose, the degree of deformations can be reflected by the distortions of undeformed (first) texture map, which offers additional information to solve one-to-many mapping issue in motion-driven deformation methods.

Performing a second texture unprojection using the deformed template leads to less ghosting artifacts and better geometric alignment.

Citation

@InProceedings{sun2025real,
title = {Real-time Free-view Human Rendering from Sparse-view RGB Videos using Double Unprojected Textures},
author = {Sun, Guoxing and Dabral, Rishabh and Zhu, Heming and Fua, Pascal and Theobalt, Christian and Habermann, Marc},
year = {2025},
month = {June},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
}