MonLipReconstruction

In facial animation, the accurate shape and motion of the lips of virtual humans is of paramount importance, since subtle nuances in mouth expression strongly influence the interpretation of speech and the conveyed emotion. Unfortunately, passive photometric reconstruction of expressive lip motions, such as a kiss or rolling lips, is fundamentally hard even with multi-view methods in controlled studios. To alleviate this problem, we present a novel approach for fully automatic reconstruction of detailed and expressive lip shapes along with the dense geometry of the entire face, from just monocular RGB video. To this end, we learn the difference between inaccurate lip shapes found by a state-of-the-art monocular facial performance capture approach, and the true 3D lip shapes reconstructed using a high-quality multi-view system in combination with applied lip tattoos that are easy to track. A robust gradient domain regressor is trained to infer accurate lip shapes from coarse monocular reconstructions, with the additional help of automatically extracted inner and outer 2D lip contours. We quantitatively and qualitatively show that our monocular approach reconstructs higher quality lip shapes, even for complex shapes like a kiss or lip rolling, than previous monocular approaches. Furthermore, we compare the performance of person-specific and multi-person generic regression strategies and show that our approach generalizes to new individuals and general scenes, enabling high-fidelity reconstruction even from commodity video footage.


Paper pdf (5.6M) / (67.7M)	Supplementary Material pdf (29.3M)	Supplementary Video mp4 (149.6M)	Presentation pptx (198.7M)

Videos

Supplementary video to the paper

Additional video showing our training data, some validations and more results

Additional result obtained on internet video

Bibtex

@article{GZWBPBT16,
  author    = {Pablo Garrido and Michael Zollhoefer and Chenglei Wu and Derek Bradley and Patrick Perez and Thabo Beeler and Christian Theobalt},
  title     = {Corrective 3D Reconstruction of Lips from Monocular Video},
  journal   = {{ACM} Trans. Graph. (Proc. SIGGRAPH Asia 2016)},
  volume    = {35},
  number    = {6},
  pages     = {219:1--219:11},
  year      = {2016}
}