Speech generator from Google imitates the natural human voice

news | Jan 29, 2018 | Tech & Security |

Google developed a speech generator which reproduces the text in a voice indistinguishable from the human. The program called Tacotron 2 functions based on two neuronal networks. One algorithm is designed for...

Google developed a speech generator which reproduces the text in a voice indistinguishable from the human.

The program called Tacotron 2 functions based on two neuronal networks. One algorithm is designed for generating a spectrogram based on the provided text. The second component called WaveNet is directly responsible for synthesis of the sound.

Just like a human being Tacotron changes intonation depending on the punctuation, is able to highlight words starting from a capital letter or entirely written in caps lock. Now Tacotron 2 is able to speak using only one version of the female voice.

If a comma is found in the sentence, Tacotron 2 pauses:

This is your personal assistant, Google Home.

This is your personal assistant Google Home.

Also, the algorithm is able to recognize text written with coarse spelling errors:

Thisss isrealy awhsome.

Probably, the company will soon introduce the development to its products, and such devices as a smart Google Home column or usual smartphones on Android will communicate with their owners with a speech sounding natural for the human ear.

Canadian startup Lyrebird might compete with Google. Algorithms developed by Lyrebird programmers allow to synthesize the voice of any person: it is enough to have a voice recording of about a minute and the neural network will learn how to copy it. The developers claim that the algorithm can not only copy the voice of a person, but also ornament it with emotions.

Many IT-companies develop their algorithms of deep learning. For instance Nvidia has taught a neural network to create fake videos based on the real ones. And it is very complicated to distinguish a fake just by eye.

Such algorithms can soon create serious problems, as they can produce fake video and audio recordings with the participation of prominent people, and it will be increasingly difficult to distinguish them from the real ones. Probably, the same neural networks but trained to recognize fake content will be helpful.