Emergent Abilities of Large Language Models - Historique des versions

Jboscher le 27 avril 2023 à 16:23

2023-04-27T16:23:12Z

← Version précédente		Version du 27 avril 2023 à 16:23
Ligne 5 :		Ligne 5 :
	[1] We will consider the following general definition of emergence, adapted from Steinhardt (2022) and rooted in a 1972 essay called “More Is Different” by Nobel prize-winning physicist Philip Anderson		[1] We will consider the following general definition of emergence, adapted from Steinhardt (2022) and rooted in a 1972 essay called “More Is Different” by Nobel prize-winning physicist Philip Anderson

			[1] Emergence is when quantitative changes in a system result in qualitative changes in behavior.

			Les paramètres qui servent à changer d'échelle pour les modèles sont : la taille du dataset, le nombre de paramètres du modèle et la puissance de calcul requise pour l'entraîner (les deux derniers étant souvent corrélés).

	[2] Although scaling is mainly conducted in model size (with similar architectures and pre-training tasks), these large-sized PLMs display different behaviors from smaller PLMs (e.g., 330M-parameter BERT and 1.5B- parameter GPT-2) and show surprising abilities (called emergent abilities) in solving a series of complex tasks.		[2] Although scaling is mainly conducted in model size (with similar architectures and pre-training tasks), these large-sized PLMs display different behaviors from smaller PLMs (e.g., 330M-parameter BERT and 1.5B- parameter GPT-2) and show surprising abilities (called emergent abilities) in solving a series of complex tasks.

Jboscher le 27 avril 2023 à 15:58

2023-04-27T15:58:23Z

← Version précédente		Version du 27 avril 2023 à 15:58
Ligne 3 :		Ligne 3 :
	[1] We consider an ability to be emergent if it is not present in smaller models but is present in larger models.		[1] We consider an ability to be emergent if it is not present in smaller models but is present in larger models.

			[1] We will consider the following general definition of emergence, adapted from Steinhardt (2022) and rooted in a 1972 essay called “More Is Different” by Nobel prize-winning physicist Philip Anderson

	Although scaling is mainly conducted in model size (with similar architectures and pre-training tasks), these large-sized PLMs display different behaviors from smaller PLMs (e.g., 330M-parameter BERT and 1.5B- parameter GPT-2) and show surprising abilities (called emergent abilities) in solving a series of complex tasks.

	For example, GPT-3 can solve few-shot tasks through in-context learning, whereas GPT-2 cannot do well.		[2] Although scaling is mainly conducted in model size (with similar architectures and pre-training tasks), these large-sized PLMs display different behaviors from smaller PLMs (e.g., 330M-parameter BERT and 1.5B- parameter GPT-2) and show surprising abilities (called emergent abilities) in solving a series of complex tasks.

			[2] For example, GPT-3 can solve few-shot tasks through in-context learning, whereas GPT-2 cannot do well.

	Thus, the research community coins the term “large language models (LLM)”1 for these large-sized PLMs [32–35]. A remarkable application of LLMs is ChatGPT2 that adapts the LLMs from the GPT series for dialogue, which presents an amazing conversation ability with humans.

	== Références ==		== Références ==

	* [https://openreview.net/pdf?id=yzkSU5zdwD] Emergent Abilities of Large Language Models		* [https://openreview.net/pdf?id=yzkSU5zdwD] Emergent Abilities of Large Language Models
			* [https://arxiv.org/pdf/2303.18223.pdf] A Survey of Large Language Models

Jboscher : Page créée avec « On entend par "Emergent Abilities of Large Language Models" une capacité présente dans un LLM qui ne se retrouve pas dans un modèle similaire mais plus petit. Ce qui veut dire aussi qu'on ne peut pas prévoir (extrapoler) cette nouvelle capacité uniquement à partir de celles d'un modèle plus petit. [1] We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Although scaling is mainly conducted in mode... »

2023-04-27T15:39:06Z

Page créée avec « On entend par "Emergent Abilities of Large Language Models" une capacité présente dans un LLM qui ne se retrouve pas dans un modèle similaire mais plus petit. Ce qui veut dire aussi qu'on ne peut pas prévoir (extrapoler) cette nouvelle capacité uniquement à partir de celles d'un modèle plus petit. [1] We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Although scaling is mainly conducted in mode... »

Nouvelle page

On entend par "Emergent Abilities of Large Language Models" une capacité présente dans un LLM qui ne se retrouve pas dans un modèle similaire mais plus petit. Ce qui veut dire aussi qu'on ne peut pas prévoir (extrapoler) cette nouvelle capacité uniquement à partir de celles d'un modèle plus petit.

[1] We consider an ability to be emergent if it is not present in smaller models but is present in larger models.

Although scaling is mainly conducted in model size (with similar architectures and pre-training tasks), these large-sized PLMs display different behaviors from smaller PLMs (e.g., 330M-parameter BERT and 1.5B- parameter GPT-2) and show surprising abilities (called emergent abilities) in solving a series of complex tasks.

For example, GPT-3 can solve few-shot tasks through in-context learning, whereas GPT-2 cannot do well.

Thus, the research community coins the term “large language models (LLM)”1 for these large-sized PLMs [32–35]. A remarkable application of LLMs is ChatGPT2 that adapts the LLMs from the GPT series for dialogue, which presents an amazing conversation ability with humans.

== Références ==

* [https://openreview.net/pdf?id=yzkSU5zdwD] Emergent Abilities of Large Language Models