|
|
LLM Temperature and Top-P Explained
Author: Venkata Sudhakar
When an LLM generates text, it does not just pick the single most probable next word every time - that would produce repetitive, predictable output. Instead, it samples from a probability distribution over its entire vocabulary. Temperature and Top-P are the two parameters that control how that sampling works: how much randomness to introduce and how wide the vocabulary pool to sample from should be. Getting these right is the difference between a model that gives deterministic, factual answers and one that writes creative, varied content. Temperature scales the probability distribution before sampling. A temperature of 0 makes the model always pick the single highest-probability token - completely deterministic and repetitive. A temperature of 1.0 uses the raw distribution as the model learned it. Above 1.0, the distribution flattens and low-probability tokens become more likely - output becomes creative but can turn incoherent. For most production use cases, temperature between 0 and 0.7 gives reliable results. Top-P (nucleus sampling) samples only from the smallest set of tokens whose cumulative probability reaches P. Top-P of 0.9 means only tokens that together account for 90% of the probability mass are considered - rare, unlikely tokens are excluded entirely. The below example shows the same prompt called with different temperature values using the OpenAI API, demonstrating the practical effect on output variety and predictability.
It gives the following output,
temperature=0.0 (deterministic):
Semantic search for a document retrieval system.
Semantic search for a document retrieval system.
Semantic search for a document retrieval system.
temperature=1.2 (creative/random):
Powering recommendation engines for e-commerce personalization.
Building a knowledge base search tool for enterprise documentation.
Storing facial recognition embeddings for identity verification systems.
# temperature=0: identical output every time - use for factual Q&A, classification
# temperature=1.2: varied output each time - use for creative writing, brainstorming
It gives the following output,
Factual: Paris
Chat: I am doing well, thank you for asking! How can I help you today?
Creative: The last lighthouse keeper wound up the clockwork sun each morning
not knowing it was the only reason the world had not gone dark.
Quick reference: temperature=0 for extraction, classification, and anything requiring a definite correct answer. temperature=0.3-0.7 for conversational AI, code generation, and summaries where some variation is acceptable. temperature=0.8-1.2 for creative writing, brainstorming, and idea generation. Do not set both temperature and top_p to non-default values simultaneously - the OpenAI documentation recommends changing only one at a time since they both control randomness and stacking them makes behaviour harder to predict.
|
|