« Pre-trained language models » : différence entre les versions
(Page créée avec « Les auteurs de "A Survey of Large Language Models" distinguent les "Pre-trained language models (PLM)" des Large language models (LLM). ELMo et BERT appartiendraient à la 1ère catégorie, un peu comme les pionniers des LLM. As an early attempt, ELMo was proposed to capture context-aware word representations by first pre-training a bidirectional LSTM (biLSTM) network (instead of learning fixed word representations) and then fine-tuning the biLSTM network acco... ») |
Aucun résumé des modifications |
||
Ligne 10 : | Ligne 10 : | ||
Researchers find that scaling PLM (e.g., scaling model size or data size) often leads to an improved model capacity on downstream tasks | Researchers find that scaling PLM (e.g., scaling model size or data size) often leads to an improved model capacity on downstream tasks | ||
== Références == | |||
* [https://arxiv.org/pdf/1802.05365.pdf] Deep contextualized word representations | |||
* [https://aclanthology.org/N19-1423.pdf] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding |
Version actuelle datée du 27 avril 2023 à 14:53
Les auteurs de "A Survey of Large Language Models" distinguent les "Pre-trained language models (PLM)" des Large language models (LLM).
ELMo et BERT appartiendraient à la 1ère catégorie, un peu comme les pionniers des LLM.
As an early attempt, ELMo was proposed to capture context-aware word representations by first pre-training a bidirectional LSTM (biLSTM) network (instead of learning fixed word representations) and then fine-tuning the biLSTM network according to specific downstream tasks.
Further, based on the highly parallelizable Transformer architecture [22] with self-attention mechanisms, BERT [23] was proposed by pre- training bidirectional language models with specially de- signed pre-training tasks on large-scale unlabeled corpora.
La différence entre les PLM et les LLM c'est l'augmentation de taille des données et des modèles, qui a pour effet d'améliorer significativement les performances sur les tâches.
Researchers find that scaling PLM (e.g., scaling model size or data size) often leads to an improved model capacity on downstream tasks