Back to Blog
·Eva Popílková·1 min read·Archive 2021

A New Champion on the Scene?

The Switch Transformer language model from Google is nearly six times larger than GPT-3! The Switch Transformer has 9 times more parameters, totalling 1.6 trillion. Google has optimised…

A New Champion on the Scene?

The Switch Transformer language model from Google is nearly six times larger than GPT-3! The Switch Transformer has 9 times more parameters, totalling 1.6 trillion. Google has optimised computational costs using the Mixture of Experts (MoE) algorithm and effectively combined data, models, and expert parallelism. As a result, it was four times faster to retrain the model using the older T5-XXL model (Google's previous champion).

Are you curious about how good this new super-large model is? The largest variant achieves an accuracy of 88.6% in the SQuAD (Stanford Question Answering Dataset) test, which is one of the fundamental tests for understanding extensive content. This is better than the BERT model, but slightly less than BART and RoBERTa. In the SuperGLUE test for overall language understanding, it scored 84.7 points, which is significantly higher than GPT-3, which sits around 71.8, and is roughly on par with RoBERTa but less than DeBERTa. However, these models each have different objectives, so let’s take these results as merely indicative.

According to some researchers, this model is less refined for text generation than GPT-3. GPT-3 cost OPEN AI approximately 100 million crowns (just for computation, not the supercomputer). It is anticipated that GPT-4 will have around 20 trillion parameters. If algorithms like MoE can significantly speed up and reduce the cost of computation, this certainly represents a remarkable advancement.

Sources:

Paper: https://arxiv.org/pdf/2101.03961.pdf

Git: https://github.com/…/mesh_tensorflow/transformer/moe.py

https://thenextweb.com/…/googles-new-trillion…/

https://syncedreview.com/…/google-brains-switch…/

Původní zdroj: wordpress

Související články