Back to Blog
·Jan Tyl·1 min read·Archive 2019

New AI Clones Your Voice from Just 5 Seconds of Audio Recording!

New AI clones your voice from just 5 seconds of audio recording! New research introduces us to AI that converts text to speech (TTS). The algorithm is traditionally based on a neural network. Upon closer inspection, it consists of 3 main components:

New AI Clones Your Voice from Just 5 Seconds of Audio Recording!

New AI clones your voice from just 5 seconds of audio recording!

New research introduces us to AI that converts text to speech (TTS). The algorithm is traditionally based on a neural network. Upon closer inspection, it consists of 3 main components:

  1. A speaker encoder network (trained on thousands of speakers — this is how the system learns what a human voice sounds like).

  2. Next is a sequence synthesis network based on Tacotron 2, which generates a spectrogram from text.

  3. Finally, there is an auto-regressive vocoder based on WaveNet, which converts the spectrum into a sequence of samples.

More information can be found in the links.

Demonstration and basic explanation: https://www.youtube.com/watch?v=0sR1rU3gLzQ&fbclid=IwAR0cXA2E6gt0YWusREZpj9K5k2o91Ecvsgki7NhnPfMfWV7Sjll66R0T-q0

Paper: https://arxiv.org/abs/1806.04558

Originally published on Facebook — link to post

Původní zdroj: facebook

Související články