Overview

Over the past years, significant progress has been made in creating photorealistic and drivable 3D avatars solely from videos of real humans. However, a core remaining challenge is the fine-grained and user-friendly editing of clothing styles with textual descriptions. To this end, we present TEDRA the first method allowing text-based edits of an avatar, which maintains the avatar's high fidelity, space-time coherency, as well as dynamics, and enables skeletal pose and view control. We begin by training a model to create a controllable and high-fidelity digital replica of the real actor. Next, we personalize a pre-trained generative diffusion model by fine-tuning it on various frames of the real character captured from different camera angles, ensuring the digital representation faithfully captures the dynamics and movements of the real person. This two-stage process lays the foundation for our approach to dynamic human avatar editing. Utilizing this personalized diffusion model, we modify the dynamic avatar based on a provided text prompt using our Personalized Normal Aligned Score Distillation Sampling (PNA-SDS) within a model-based guidance framework. Additionally, we propose a time step annealing strategy to ensure high-quality edits. Our results demonstrate a clear improvement over prior work in functionality and visual quality.

Main Video (With Narration)

Method

Our approach starts with a pre-trained TriHuman model as the human representation. Then, we leverage a fine-tuned diffusion model in conjunction with our proposed Personalized Normal Aligned Score Distillation Sampling (PNA-SDS). The PNA-SDS loss optimizes the human representation toward the desired edit prompt while preserving the subject's characteristics. The method is further enhanced by an annealing strategy, which gradually refines the editing process.

Novel View Rendering

Free View Point Rendering

Motion retargeting and Animation

Citation

@misc{sunagad2024tedratextbasededitingdynamic,
  title={TEDRA: Text-based Editing of Dynamic and Photoreal Actors}, 
  author={Basavaraj Sunagad and Heming Zhu and Mohit Mendiratta and Adam Kortylewski and Christian Theobalt and Marc Habermann},
  year={2024},
  eprint={2408.15995},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2408.15995}, 
}