Model-based Deep Convolutional Face Autoencoder
for Unsupervised Monocular Reconstruction

ICCV 2017 (Oral)

A. Tewari 1 M. Zollhöfer 1 H. Kim 1 P. Garrido 1 F. Bernard 1,2 P. Perez 3 C.Theobalt 1
1MPI Informatics 2Luxembourg Centre for Systems Biomedicine 3Technicolor


In this work we propose a novel model-based deep convolutional autoencoder that addresses the highly challenging problem of reconstructing a 3D human face from a single in-the-wild color image. To this end, we combine a convolutional encoder network with an expert-designed generative model that serves as decoder. The core innovation is the differentiable parametric decoder that encapsulates image formation analytically based on a generative model. Our decoder takes as input a code vector with exactly defined semantic meaning that encodes detailed face pose, shape, expression, skin reflectance and scene illumination. Due to this new way of combining CNN-based with model-based face reconstruction, the CNN-based encoder learns to extract semantically meaningful parameters from a single monocular input image. For the first time, a CNN encoder and an expert-designed generative model can be trained end-to-end in an unsupervised manner, which renders training on very large (unlabeled) real world data feasible. The obtained reconstructions compare favorably to current state-of-the-art approaches in terms of quality and richness of representation.

Paper Supplemental Video Talk



title = {{MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction}},
author = {Tewari, Ayush and Zoll{\"o}fer, Michael and Kim, Hyeongwoo and Garrido, Pablo and Bernard, Florian and Perez, Patrick and Theobalt Christian},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
year = {2017}