We present Relightable Holoported Characters (RHC), a novel person-specific method for free-view rendering and relighting of full-body and highly dynamic humans solely observed from sparse-view RGB videos at inference. In contrast to classical one-light-at-a-time (OLAT)-based human relighting, our transformer-based RelightNet predicts relit appearance within a single network pass, avoiding costly OLAT-basis capture and generation. For training such a model, we introduce a new capture strategy and dataset recorded in a multi-view lightstage, where we alternate frames lit by random environment maps with uniformly lit tracking frames, simultaneously enabling accurate motion tracking and diverse illumination as well as dynamics coverage. Inspired by the rendering equation, we derive physics-informed features that encode geometry, albedo, shading, and the virtual camera view from a coarse human mesh proxy and the input views. Our RelightNet then takes these features as input and cross-attends them with a novel lighting condition, and regresses the relit appearance in the form of texel-aligned 3D Gaussian splats attached to the coarse mesh proxy. Consequently, our RelightNet implicitly learns to efficiently compute the rendering equation for novel lighting conditions within a single feed-forward pass. Experiments demonstrate our method’s superior visual fidelity and lighting reproduction compared to state-of-the-art approaches.
To learn a relightable full-body avatar, we propose to capture multi-view video sequences consisting of consecutive uniformly lit tracking frames and relit frames obtained by randomly projecting environment maps onto the lightstage LEDs.
Given four input views under flat lighting, skeleton pose, environment map, and camera parameters, our method generates photorealistic relighting. First, a mesh-based avatar is animated using the skeleton pose. Physics-informed features —Geometry, Albedo, Shading, and View Features—are extracted from sparse-view images and mesh tracking, and fed into RelightNet, which uses cross-attention to condition on the environment map. RelightNet predicts per-texel Gaussian parameters, which are placed on the mesh and splatted into the camera view.
We can enable natural telepresence by relighting multiple avatars to their digital environment where they look in place and consistent with each other.
We compare our method on out-of-distribution lighting conditions, i.e. OLAT environment maps. Notably the model never saw OLAT environment maps during training. Nonetheless, it can generate plausible results while competing methods either produce blurry renderings or completely fail.
@article{singh2025rhc,
title={Relightable Holoported Characters: Capturing and Relighting Dynamic Human Performance from Sparse Views},
author={Singh, Kunwar Maheep and Chen, Jianchun and Golyanik, Vladislav and Garbin, Stephan J. and Beeler, Thabo and Dabral, Rishabh and Habermann, Marc and Theobalt, Christian},
journal={arXiv preprint arXiv:2512.00255},
year={2025}
}