August 14, 2019·Eva Popílková·1 min read·Archive 2019

Nvidia Announces It Has Trained the Largest Language Model in the World, GPT-2 8B!

The model uses 8.3 billion parameters and is 24 times larger than BERT and 5 times larger than the previously largest GPT-2 from OpenAI. Nvidia employed parallelism that…

The model uses 8.3 billion parameters and is 24 times larger than BERT and 5 times larger than the previously largest GPT-2 from OpenAI. Nvidia employed parallelism that divided the neural network into chunks that always fit into the memory of a single GPU.

Nvidia also announced the fastest training times for the BERT model. They managed to train the BERT-Large model using optimised PyTorch software and a DGX-SuperPOD with 1472 GPUs (V100) in a record-breaking 53 minutes! Just earlier this year, we were calculating this performance in terms of weeks!

Source: https://devblogs.nvidia.com/training-bert-with-gpus/
Github: https://github.com/nvidia/megatron-lm

Original source: wordpress

Související články

September 2020

Nvidia Announces It Has Trained the Largest Language Model in the World, GPT-2 8B!

Související články

An Interesting Article about GPT-3

Ladies and gentlemen, this week we are witnessing one groundbreaking step after another

This Week, Google Unveiled Its Latest Technological Toy – Bidirectional Encoder Representations Transformers, or BERT