High-Fidelity Monocular Face Reconstruction based on an Unsupervised Model-based Face Autoencoder

High-Fidelity Monocular Face Reconstruction based on an
Unsupervised Model-based Face Autoencoder

In this work we propose a novel model-based deep convolutional autoencoder that addresses the highly challenging problem of reconstructing a 3D human face from a single in-the-wild color image. To this end, we combine a convolutional encoder network with an expert-designed generative model that serves as decoder. The core innovation is the differentiable parametric decoder that encapsulates image formation analytically based on a generative model. Our decoder takes as input a code vector with exactly defined semantic meaning that encodes detailed face pose, shape, expression, skin reflectance and scene illumination. Due to this new way of combining CNN-based with model-based face reconstruction, the CNN-based encoder learns to extract semantically meaningful parameters from a single monocular input image. For the first time, a CNN encoder and an expert-designed generative model can be trained end-to-end in an unsupervised manner, which renders training on very large (unlabeled) real world datasets feasible. The obtained reconstructions compare favorably to current state-of-the-art approaches in terms of quality and richness of representation. This work is an extended version of the paper "MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction", where we additionally present a stochastic vertex sampling technique for faster training of our networks, and moreover, we propose and evaluate analysis-by-synthesis and shape-from-shading refinement approaches to achieve a high-fidelity reconstruction.

Paper

Bibtex

@ARTICLE{8496850,
  author={Tewari, Ayush and Zoll{\"o}fer, Michael and Bernard, Florian and Garrido, Pablo and Kim, Hyeongwoo and Perez, Patrick and Theobalt, Christian},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  title={High-Fidelity Monocular Face Reconstruction based on an Unsupervised Model-based Face Autoencoder},
  year={2018},
  volume={},
  number={},
  pages={1-1},
  keywords={Face;Image reconstruction;Three-dimensional displays;Training;Decoding;Shape;Lighting},
  doi={10.1109/TPAMI.2018.2876842},
  ISSN={0162-8828},
  month={}
}