Researchers from University of Washington created a system that converts audio clips into lip-synced videos of the speaker. First of all it needs to analyze 14 hours of existing footage of the person speaking, and it uses a neural network to learn which of the mouth shapes accompany which speech sounds. The researchers hope to reduce that figure from 14 to 1 hour.
The video is provided to the system, in which a person talks anything, with an audio file in which speaking the desired words, and it pairs the two together. The system drops the video’s original audio, replaces it with the desired audio, maps computer animated version of the speakers mouth in place of their mouth in the video. At ht end the person speaks the desired words with the mouth doing so. There’s the potential for treachery, and you can see and hear the system in use, in the following video.
Source: University of Washington