Top K

De Wiki BackProp
Aller à la navigation Aller à la recherche

First, there are different models you can choose from. Each model is tuned to perform well on specific tasks. You can also specify the temperature, top P, and top K. These parameters all adjust the randomness of responses by controlling how the output tokens are selected. When you send a prompt to the model, it produces an array of probabilities over the words that could come next. And from this array, we need some strategy to decide what to return. A simple strategy might be to select the most likely word at every timestep.

But this method can result in uninteresting and sometimes repetitive answers. On the contrary, if you randomly sample over the distribution returned by the model, you might get some unlikely responses.

By controlling the degree of randomness, you can get more unexpected, and some might say creative, responses. Back to the model parameters, temperature is a number used to tune the degree of randomness.

Low temperature: Means to select the words that are highly possible and more predictable. In this case, those are flowers and the other words that are located at the beginning ofthe list. This setting is generally better for tasks like q&a and summarization where you expect a more “predictable” answer with less variation. … High temperature: Means to select the words that have low possibility and are more unusual. In this case, those are bugs and the other words that that are located at the end of the list. This setting is good if you want to generate more “creative” or unexpected content.

In addition to adjusting the temperature, top K lets the model randomly return a word from the top K number of words in terms of possibility. For example, top 2 means you get a random word from the top 2 possible words including flowers and trees. This approach allows the other high-scoring word a chance of being selected. However, if the probability distribution of the words is highly skewed and you have one word that is very likely and everything else is very unlikely, this approach can result in some strange responses. The difficulty of selecting the best top-k value, leads to another popular approach thatdynamically sets the size of the shortlist of words.


https://www.cloudskillsboost.google/course_sessions/3264154/video/381925