Style and Pose Control for Image Synthesis of Humans from a Single Monocular View

Arxiv 2021
[Paper][Results on DeepFashion]
Animation Swapp Animation Swapp


Photo-realistic re-rendering of a human from a single image with explicit control over body pose, shape and appearance enables a wide range of applications, such as human appearance transfer, virtual try-on, motion imitation, and novel view synthesis. While significant progress has been made in this direction using learning based image generation tools, such as GANs, existing approaches yield noticeable artefacts such as blurring of fine details, unrealistic distortions of the body parts and garments as well as severe changes of the textures. We, therefore, propose a new method for synthesizing photo-realistic human images with explicit control over pose and part based appearance ,i.e., StylePoseGAN, where we extend a non-controllable generator to accept conditioning of pose and appearance separately. Our network can be trained in a fully supervised way with human images to disentangle pose, appearance and body parts, and it significantly outperforms existing single image re-rendering methods. Our disentangled representation opens up further applications such as garment transfer, motion transfer virtual try-on, head (identity) swap and appearance interpolation. StylePoseGAN achieves state-of-the-art image generation fidelity on common perceptual metrics compared to the current best-performing methods, and convinces in a comprehensive user study.

Animation Swapp

High Resolution Results

Animation Swapp Animation Swapp Animation Swapp

Results on Pose-Transfer

For the pose-transfer experiment, we have used the train/test pairs of DeepFashion dataset that was also used in the existing works such as PoseGAN, DPT, CBI, etc. Specifically, our training and testing pairs were generated from the publically available code of PoseGAN. In this page, we provide our results for the 163 testing pairs (a subset of the full testing pairs) that was used in the paper for quantitative results. Please find our results in the Downloads section.


  • Paper

  • Results (256x256)
    163 testing pairs
    ~1 MB

  • Results (256x256)
    entire test set (8554 testing pairs)
    ~60 MB

  • Results (512x512)
    entire test set (8554 testing pairs)
    ~153 MB


BibTeX, 1 KB

      title={Style and Pose Control for Image Synthesis of Humans from a Single Monocular View}, 
      author={Kripasindhu Sarkar and Vladislav Golyanik and Lingjie Liu and Christian Theobalt},


This work was supported by the ERC Consolidator Grant 4DReply (770784).


For questions and clarifications please get in touch with:
Kripasindhu Ksarkar

This page is Zotero translator friendly. Page last updated Imprint. Data Protection.