Whisper – Converts Spoken Word to Text
I am pleased to announce that OpenAI has released another product from its remarkable workshop. It is called 'Whisper'. Whisper is a universal end-to-end weakly supervised family of ASR models…

I am pleased to announce that OpenAI has released another product from its remarkable workshop. It is called "Whisper". Whisper is a universal end-to-end weakly supervised family of ASR (Automatic Speech Recognition) models based on transformers. Simply put, it converts spoken word to text. And it does so in a manner we refer to as "General-purpose", meaning that in addition to speech recognition, it can also perform tasks such as voice detection, language identification, transcription, and machine translation.

A whole family of models is being released, varying in size. From the smallest to the largest in terms of parameters: Tiny (39M), Base (74B), Small (244B), Medium (769M), and Large (1.55B). The great news is that it is being released as open source! Interested parties can easily try out the online demo and tease how the model works in Czech (or those more curious/adept can download it directly from GitHub). A little gem to conclude: the models are trained on 77 years of spoken discourse sourced from the web, which I believe is the largest dataset of its kind.
Resources:
– Demo on Hugging Face: https://huggingface.co/spaces/openai/whisper?fbclid=IwAR1RZI5q9KqWp9eFHRuFPXpIB1WUyOLXWt7JBDo_4KJafnkpWYVF-gbAyzs
– Open AI blog: https://openai.com/blog/whisper/?fbclid=IwAR02-8MW800lMmtVGgfynk2UTXxk41Q1-9ZhMs6W9H5vM5VY11y_QVhQHMI
– Paper: https://cdn.openai.com/papers/whisper.pdf
– GitHub: https://github.com/openai/whisper
– Colab: https://colab.research.google.com/…/LibriSpeech.ipynb
– Medium: https://towardsdatascience.com/openai-whisper-holds-the-key-to-gpt-4-a7f922a7dad9
Původní zdroj: wordpress