<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="fr">
	<id>http://wiki.backprop.fr/index.php?action=history&amp;feed=atom&amp;title=Pre-trained_language_models</id>
	<title>Pre-trained language models - Historique des versions</title>
	<link rel="self" type="application/atom+xml" href="http://wiki.backprop.fr/index.php?action=history&amp;feed=atom&amp;title=Pre-trained_language_models"/>
	<link rel="alternate" type="text/html" href="http://wiki.backprop.fr/index.php?title=Pre-trained_language_models&amp;action=history"/>
	<updated>2026-05-09T14:20:44Z</updated>
	<subtitle>Historique des versions pour cette page sur le wiki</subtitle>
	<generator>MediaWiki 1.38.4</generator>
	<entry>
		<id>http://wiki.backprop.fr/index.php?title=Pre-trained_language_models&amp;diff=43&amp;oldid=prev</id>
		<title>Jboscher le 27 avril 2023 à 14:53</title>
		<link rel="alternate" type="text/html" href="http://wiki.backprop.fr/index.php?title=Pre-trained_language_models&amp;diff=43&amp;oldid=prev"/>
		<updated>2023-04-27T14:53:16Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;fr&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Version précédente&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Version du 27 avril 2023 à 14:53&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l10&quot;&gt;Ligne 10 :&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Ligne 10 :&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Researchers find that scaling PLM (e.g., scaling model size or data size) often leads to an improved model capacity on downstream tasks&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Researchers find that scaling PLM (e.g., scaling model size or data size) often leads to an improved model capacity on downstream tasks&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;== Références ==&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* [https://arxiv.org/pdf/1802.05365.pdf] Deep contextualized word representations&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* [https://aclanthology.org/N19-1423.pdf] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key wikiuser:diff::1.12:old-42:rev-43 --&gt;
&lt;/table&gt;</summary>
		<author><name>Jboscher</name></author>
	</entry>
	<entry>
		<id>http://wiki.backprop.fr/index.php?title=Pre-trained_language_models&amp;diff=42&amp;oldid=prev</id>
		<title>Jboscher : Page créée avec « Les auteurs de &quot;A Survey of Large Language Models&quot; distinguent les &quot;Pre-trained language models (PLM)&quot; des Large language models (LLM).  ELMo et BERT appartiendraient à la 1ère catégorie, un peu comme les pionniers des LLM.   As an early attempt, ELMo was proposed to capture context-aware word representations by first pre-training a bidirectional LSTM (biLSTM) network (instead of learning fixed word representations) and then fine-tuning the biLSTM network acco... »</title>
		<link rel="alternate" type="text/html" href="http://wiki.backprop.fr/index.php?title=Pre-trained_language_models&amp;diff=42&amp;oldid=prev"/>
		<updated>2023-04-27T14:44:33Z</updated>

		<summary type="html">&lt;p&gt;Page créée avec « Les auteurs de &amp;quot;A Survey of Large Language Models&amp;quot; distinguent les &amp;quot;Pre-trained language models (PLM)&amp;quot; des Large language models (LLM).  ELMo et BERT appartiendraient à la 1ère catégorie, un peu comme les pionniers des LLM.   As an early attempt, ELMo was proposed to capture context-aware word representations by first pre-training a bidirectional LSTM (biLSTM) network (instead of learning fixed word representations) and then fine-tuning the biLSTM network acco... »&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Nouvelle page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;Les auteurs de &amp;quot;A Survey of Large Language Models&amp;quot; distinguent les &amp;quot;Pre-trained language models (PLM)&amp;quot; des Large language models (LLM).&lt;br /&gt;
&lt;br /&gt;
ELMo et BERT appartiendraient à la 1ère catégorie, un peu comme les pionniers des LLM.&lt;br /&gt;
 &lt;br /&gt;
As an early attempt, ELMo was proposed to capture context-aware word representations by first pre-training a bidirectional LSTM (biLSTM) network (instead of learning fixed word representations) and then fine-tuning the biLSTM network according to specific downstream tasks. &lt;br /&gt;
&lt;br /&gt;
Further, based on the highly parallelizable Transformer architecture [22] with self-attention mechanisms, BERT [23] was proposed by pre- training bidirectional language models with specially de- signed pre-training tasks on large-scale unlabeled corpora.&lt;br /&gt;
&lt;br /&gt;
La différence entre les PLM et les LLM c&amp;#039;est l&amp;#039;augmentation de taille des données et des modèles, qui a pour effet d&amp;#039;améliorer significativement les performances sur les tâches.&lt;br /&gt;
&lt;br /&gt;
Researchers find that scaling PLM (e.g., scaling model size or data size) often leads to an improved model capacity on downstream tasks&lt;/div&gt;</summary>
		<author><name>Jboscher</name></author>
	</entry>
</feed>