New neural network enables editing of announcer speech

news | Jun 19, 2019 | Tech & Security |

Adobe Research has joined forces with scientists from Stanford University and Max Planck Institute to develop a program that can alter speech recorded on video. The development will help cut expenses on video clip recording.

Unsuccessful takes are the bane of the movie industry and info agencies, because it costs a lot of money to pay for operators and other film crew members, not to mention wasted time. A new neural network from Adobe Research will enable editing of an actor’s or narrator’s speech, thus eliminating the trouble with correcting unsuccessful takes.

For artificial intelligence to successfully correct mistakes made by a narrator, it needs at least 40 minutes of video footage with this person. This is necessary for the neural network to learn the speaker’s facial expressions, match his movements with his words and perform all the necessary computing processes. Then, the program generates a sequence of gestures matching the updated text, creates the necessary textures and adds them to the video.

This is made possible thanks to a Machine Learning technology called Neural Rendering, which works with photo-realistic images. A Voco module (or a similar service) is used to render sound.

The creators of the technology believe that it will serve for the benefit of humanity, however they do not exclude the possibility of it being used for information wars and creation of fake news and compromising materials featuring famous people.