VDub: Modifying Face Video of Actors forPlausible Visual Alignment to a Dubbed Audio Track
Eurographics 2015
Abstract
In many countries, foreign movies and TV productions are dubbed,
i.e., the original voice of an actor is replaced with a
translation that is spoken by a dubbing actor in the country's
own language. Dubbing is a complex process that requires
specific translations and accurately timed recitations such that
the new audio at least coarsely adheres to the mouth motion in
the video. However, since the sequence of phonemes and visemes
in the original and the dubbing language are different, the
video-to-audio match is never perfect, which is a major source
of visual discomfort. In this paper, we propose a system to
alter the mouth motion of an actor in a video, so that it
matches the new audio track. Our paper builds on high-quality
monocular 3D facial performance, lighting and albedo capture of
the dubbing and target actors, and uses audio analysis in
combination with a space-time retrieval method to synthesize a
new photo-realistically rendered and highly detailed 3D shape
model of the mouth region to replace the target performance. We
demonstrate plausible visual quality of our results compared to
footage that has been professionally dubbed in the traditional
way, both qualitatively and through a user study.
Videos
|
Supplementary video to the paper
|
|
Results obtained by traditional dubbing
|
Bibtex
|
@article{GVSSVPT15, author = {Pablo Garrido and Levi Valgaerts and Hamid Sarmadi and Ingmar Steiner and Kiran Varanasi and Patrick Perez and Christian Theobalt}, title = {VDub: Modifying Face Video of Actors for Plausible Visual Alignment to a Dubbed Audio Track}, booktitle = {Comput. Graph. Forum (Proc. Eurographics)}, volume = {34}, number = {2}, pages = {193--204}, year = {2015}
}
|
|