3DV 2026
1Max Planck Institute for Informatics, Saarland Informatics Campus
2Saarbrücken Research Center for Visual Computing, Interaction and AI
3Google, Switzerland
GIGA creates texel-aligned 3D Gaussians from sparse (1-4) input views and a body template . It computes RGB texture and canonical position maps as character-specific inputs. Separate appearance () and geometry () encoders process these inputs. Both encoders use cross-attention for conditioning on the observed character pose embedding , with motion embedding serving as context and combined encoder outputs () as query. Multiple decoders () generate the final texel-aligned 3D Gaussian avatar, taking into account intermediate featur maps from the encoders, propagated through skip-connections (colored dashed lines). The final representation is articulated with linear blend skinning.
Driving a high-quality and photorealistic full-body virtual human from a few RGB cameras is a challenging problem that has become increasingly relevant with emerging virtual reality technologies. A promising solution to democratize such technology would be a generalizable method that takes sparse multi-view images of any person and then generates photoreal free-view renderings of them. However, the state-of-the-art approaches are not scalable to very large datasets and, thus, lack diversity and photorealism. To address this problem, we propose GIGA, a novel, generalizable full-body model for rendering photoreal humans in free viewpoint, driven by a single-view or sparse multi-view video. Notably, GIGA can scale training to a few thousand subjects while maintaining high photorealism and synthesizing dynamic appearance. At the core, we introduce a MultiHeadUNet architecture, which takes an approximate RGB texture accumulated from a single or multiple sparse views and predicts 3D Gaussian primitives represented as 2D texels on top of a human body mesh. At test time, our method performs novel view synthesis of a virtual 3D Gaussian-based human from 1 to 4 input views and a tracked body template for unseen identities. Our method excels over prior works by a significant margin in terms of identity generalization capability and photorealism.
@article{zubekhin2025giga,
title={GIGA: Generalizable Sparse Image-driven Gaussian Humans},
author={Zubekhin, Anton and Zhu, Heming and Gotardo, Paulo and Beeler, Thabo and Habermann, Marc and Theobalt, Christian},
year={2025},
journal={arXiv},
eprint={2504.07144},
}