November 15, 2019·Eva Popílková·1 min read·Archive 2019

New AI Clones Your Voice from Just 5 Seconds of Audio Recording!

New research introduces us to AI that converts text to speech (TTS). The algorithm is traditionally based on a neural network. Upon closer inspection, it consists of 3 main components…

New research introduces us to AI that converts text to speech (TTS). The algorithm is traditionally based on a neural network. Upon closer inspection, it consists of three main components:

Speaker encoder network (trained on thousands of speakers – this is how the system learns what a human voice sounds like).
Next is a sequence synthesis network based on Tacotron 2, which generates a spectrogram from text.
Finally, there is an auto-regressive vocoder based on WaveNet, which converts the spectrogram into a sequence of samples.

For more information, check the links.

Example and basic explanation:

Paper: https://arxiv.org/abs/1806.04558

Original source: wordpress

Související články

November 2019

Karen Hao analysed nearly 17,000 studies on artificial intelligence and wrote an article on where he believes AI is heading

Read

December 2019

Do You Know the Czech Project OLS?

Read

June 2019

The Third Year of the Data Science Olympics!

Read