Back to Blog
·Eva Popílková·1 min read·Archive 2019

New AI Clones Your Voice from Just 5 Seconds of Audio Recording!

New research introduces us to AI that converts text to speech (TTS). The algorithm is traditionally based on a neural network. Upon closer inspection, it consists of 3 main components…

New AI Clones Your Voice from Just 5 Seconds of Audio Recording!

New research introduces us to AI that converts text to speech (TTS). The algorithm is traditionally based on a neural network. Upon closer inspection, it consists of three main components:

  1. Speaker encoder network (trained on thousands of speakers – this is how the system learns what a human voice sounds like).

  2. Next is a sequence synthesis network based on Tacotron 2, which generates a spectrogram from text.

  3. Finally, there is an auto-regressive vocoder based on WaveNet, which converts the spectrogram into a sequence of samples.

For more information, check the links.

Example and basic explanation:

Paper: https://arxiv.org/abs/1806.04558

Původní zdroj: wordpress

Související články