SFT

De Wiki BackProp
Révision datée du 1 août 2023 à 12:10 par Jboscher (discussion | contributions)
(diff) ← Version précédente | Voir la version actuelle (diff) | Version suivante → (diff)
Aller à la navigation Aller à la recherche

Supervised Fine-Tuning (SFT): Models are trained on a dataset of instructions and responses. It adjusts the weights in the LLM to minimize the difference between the generated answers and ground-truth responses, acting as labels.[1]

However, in some cases, updating the knowledge of the model is not enough and you want to modify the behavior of the LLM. In these situations, you will need a supervised fine-tuning (SFT) dataset, which is a collection of prompts and their corresponding responses. SFT datasets can be manually curated by users or generated by other LLMs. Supervised fine-tuning is especially important for LLMs such as ChatGPT, which have been designed to follow user instructions and stay on a specific task across long stretches of text. This specific type of fine-tuning is also referred to as instruction fine-tuning [2]

Références

  • [1] Fine-Tune Your Own Llama 2 Model in a Colab Notebook
  • [2] The complete guide to LLM fine-tuning