A New Champion on the Scene? The Switch Transformer Language Model from Google
A new champion on the scene? The Switch Transformer language model from Google is nearly six times larger than GPT-3! The Switch Transformer has 9x more parameters, totalling 1.6 trillion. Google optimised computational costs using the Mixture of Experts algorithm.

A new champion on the scene? The Switch Transformer language model from Google is nearly six times larger than GPT-3! The Switch Transformer boasts 9x more parameters, amounting to 1.6 trillion. Google has optimised computational costs using the Mixture of Experts (MoE) algorithm, effectively combining data, model, and expert parallelism. This has enabled the model to be retrained four times faster using the older T5-XXL model (Google's previous champion).
Are you curious about how good this new super-large model is? The largest variant achieves an impressive 88.6% accuracy on the SQuAD (Stanford Question Answering Dataset) test, which is one of the fundamental tests for understanding extensive content. This surpasses models like BERT, though it falls slightly short of BART and RoBERTa. In the SuperGLUE test for overall language understanding, it scored 84.7 points, significantly higher than GPT-3, which sits around 71.8, and comparable to RoBERTa, but less than DeBERTa. However, each of these models has a different objective, so we should take these results as indicative rather than definitive.
According to some researchers, this model is less refined for text generation than GPT-3. The development of GPT-3 cost OPEN AI approximately 100 million crowns (excluding supercomputer expenses). It is anticipated that GPT-4 will have around 20 trillion parameters. If algorithms like MoE can significantly accelerate and reduce the cost of computation, this certainly represents a remarkable advancement.
Sources:
Paper: https://arxiv.org/pdf/2101.03961.pdf
Git: https://github.com/tensorflow/mesh/blob/master/mesh_tensorflow/transformer/moe.py
https://thenextweb.com/neural/2021/01/13/googles-new-trillion-parameter-ai-language-model-is-almost-6-times-bigger-than-gpt-3/
https://syncedreview.com/2021/01/14/google-brains-switch-transformer-language-model-packs-1-6-trillion-parameters/
Originally published on Facebook — link to post
Původní zdroj: facebook
Související články
September 2022
I am pleased to announce that OpenAI has released another product from its remarkable workshop...
ReadSeptember 2020
Read an Interesting Article about GPT-3, One of the First to be Published Here!
ReadNovember 2019