Jboscher le 27 avril 2023 à 14:53

2023-04-27T14:53:16Z

← Version précédente		Version du 27 avril 2023 à 14:53
Ligne 10 :		Ligne 10 :

	Researchers find that scaling PLM (e.g., scaling model size or data size) often leads to an improved model capacity on downstream tasks		Researchers find that scaling PLM (e.g., scaling model size or data size) often leads to an improved model capacity on downstream tasks

			== Références ==

			* [https://arxiv.org/pdf/1802.05365.pdf] Deep contextualized word representations
			* [https://aclanthology.org/N19-1423.pdf] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jboscher : Page créée avec « Les auteurs de "A Survey of Large Language Models" distinguent les "Pre-trained language models (PLM)" des Large language models (LLM). ELMo et BERT appartiendraient à la 1ère catégorie, un peu comme les pionniers des LLM. As an early attempt, ELMo was proposed to capture context-aware word representations by first pre-training a bidirectional LSTM (biLSTM) network (instead of learning fixed word representations) and then fine-tuning the biLSTM network acco... »

2023-04-27T14:44:33Z

Page créée avec « Les auteurs de "A Survey of Large Language Models" distinguent les "Pre-trained language models (PLM)" des Large language models (LLM). ELMo et BERT appartiendraient à la 1ère catégorie, un peu comme les pionniers des LLM. As an early attempt, ELMo was proposed to capture context-aware word representations by first pre-training a bidirectional LSTM (biLSTM) network (instead of learning fixed word representations) and then fine-tuning the biLSTM network acco... »

Nouvelle page

Les auteurs de "A Survey of Large Language Models" distinguent les "Pre-trained language models (PLM)" des Large language models (LLM).

ELMo et BERT appartiendraient à la 1ère catégorie, un peu comme les pionniers des LLM.

As an early attempt, ELMo was proposed to capture context-aware word representations by first pre-training a bidirectional LSTM (biLSTM) network (instead of learning fixed word representations) and then fine-tuning the biLSTM network according to specific downstream tasks.

Further, based on the highly parallelizable Transformer architecture [22] with self-attention mechanisms, BERT [23] was proposed by pre- training bidirectional language models with specially de- signed pre-training tasks on large-scale unlabeled corpora.

La différence entre les PLM et les LLM c'est l'augmentation de taille des données et des modèles, qui a pour effet d'améliorer significativement les performances sur les tâches.

Researchers find that scaling PLM (e.g., scaling model size or data size) often leads to an improved model capacity on downstream tasks

Pre-trained language models - Historique des versions

Jboscher le 27 avril 2023 à 14:53