Textual Inversion

De Wiki BackProp
Aller à la navigation Aller à la recherche

Textual Inversion est défini de la façon suivante :

"We learn to generate specific concepts, like personal objects or artistic styles, by describing them using new "words" in the embedding space of pre-trained text-to-image models. These can be used in new sentences, just like any other word."

Textual inversion is a process where you can quickly "teach" a new word to the text model and plain its embeddings close to some visual representation. This is achieved by adding a new token to the vocabulary, freezing the weights of all the models (except the text encoder), and train with a few representative images.


Références

  • [1] An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion