Back to Blog
·Jan Tyl·2 min read·Archive 2019

Among the Best Artificial Intelligence Models for Natural Language Processing (Bert, …

Among the best artificial intelligence models for natural language processing (Bert, Robert, GPT-2, or Megatron), another player has entered the field: ALBERT! ALBERT is brought to us by Google Research and the Toyota Technological Institute. What’s interesting is not just that the model delivers fantastic results on classic tasks like GLUE, RACE, or SQuAD, but also that it is smaller than its predecessors!

Among the Best Artificial Intelligence Models for Natural Language Processing (Bert, …

Among the best artificial intelligence models for natural language processing (Bert, Robert, GPT-2, or Megatron), another player has entered the field: ALBERT! ALBERT is brought to us by Google Research and the Toyota Technological Institute. What’s interesting is not just that the model delivers fantastic results on classic tasks like GLUE, RACE, or SQuAD, but also that it is smaller than its predecessors! For instance, the old BERT x-large has approximately 1.27 billion parameters, compared to ALBERT x-large with a mere 59 million parameters.

How did the authors manage to increase accuracy while simultaneously reducing the number of "brain cells"?

There are three reasons for this: 1 — Factorized Embedding Parameterization
In other words, a more efficient use of parameters. ALBERT employs two smaller embedding layers instead of one large one. The one-hot vector is transferred to a smaller layer with fewer dimensions.

2 — Cross Layer Parameter Sharing
ALBERT further optimises parameter sharing (Feed Forward Network and Attention) across all layers. Simply put, imagine that the new little brain has its individual brain centres better interconnected.

3 — SOP (Sentence Order Prediction) algorithm replaces NSP (Next Sentence Prediction)
The authors of RoBERTa already noticed that the NSP algorithm was not very effective. However, the authors of ALBERT have introduced their own improved algorithm, SOP. While in NSP the model learns to recognise the correct sentence by ensuring it comes from the same document, and the incorrect one is taken from a different document, SOP takes both sentences from the same document, with the correct pair in the correct order and the incorrect one in a swapped order. This allows ALBERT to avoid unintended topic prediction and enables it to learn a more nuanced relationship between individual sentences.

In summary, a new set of models for text processing has emerged, which is highly accurate while occupying less space.

Sources:
https://medium.com/@lessw/meet-albert-a-new-lite-bert-from-google-toyota-with-state-of-the-art-nlp-performance-and-18x-df8f7b58fa28

https://arxiv.org/abs/1909.11942v1

Originally published on Facebook — link to post

Původní zdroj: facebook

Související články