Sampling techniques control how language models choose the next word during text generation. The model assigns probabilities to possible words and sampling determines which one is picked. By adjusting these methods, you can balance creativity and accuracy in generated responses.
- Temperature controls randomness in predictions
- Top-K limits choices to the most probable tokens
- Top-P selects tokens based on cumulative probability
- Used to tune output diversity and coherence
Temperature Sampling in LLMs
Temperature controls how random the model’s output is and typically ranges from 0 to 2
- Low temperature: Safer, more predictable text
- High temperature: More creative and varied text

Low temperature one clear choice (car) and High temperature many possible choices
How it works
Before choosing the next word, the model adjusts word probabilities using the temperature setting
Low temperature:
- Strongly favours high-probability words
- Produces stable and predictable text
High temperature:
- Flattens the probability distribution
- Allows less likely words to appear more often
- Increases creativity but may reduce accuracy
Example
- Temperature = 0.2: factual, low creativity
- Temperature = 1.0: balanced output
- Temperature = 1.5: creative but less reliable
Implementation
- Loads a pre-trained GPT-2 tokenizer and language model
- Takes a text prompt and converts it into tokens
- Generates text by predicting the next words step by step
- Applies temperature (0.7) to control creativity and limits output length
- Converts the output back to readable text and prints it
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")
inputs = tokenizer("Explain AI in simple terms", return_tensors="pt")
output = model.generate(
**inputs,
max_length=50,
temperature=0.7,
do_sample=True
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Output:
Explain AI in simple terms
It's not too hard to learn to use artificial intelligence. The problem is that what you see here is not a single AI system. You'll see that the AI systems are different from the ones we've seen
Advantages
- Allows control over the balance between accuracy and creativity
- Produces reliable outputs when accuracy is needed
- Encourages diverse ideas when creativity is preferred
Top-K Sampling in LLMs
Top-K limits the model to choosing the next word from only the K most likely options, ignoring all other possibilities. This helps control randomness by keeping the selection focused on higher-probability words.

Only the top K tokens are considered; one is sampled from them.
How it works
- The model ranks all possible next words by probability
- Keeps only the top K most likely words
- Randomly selects one word from this limited set
Example
- Top K = 50: selects from the 50 most likely words
- Smaller K: safer output
- Larger K: more variety
Implementation:
- Loads a pre-trained GPT-2 tokenizer and model
- Converts the input text into tokens
- Generates text by predicting the next words
- Uses Top-K sampling (K = 50) to limit word choices and reduce unlikely outputs
- Decodes and prints the generated text
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")
inputs = tokenizer("Explain AI in simple terms", return_tensors="pt")
output = model.generate(
**inputs,
max_length=50,
top_k=50,
do_sample=True
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Output:
Explain AI in simple terms.
You may find yourself with the following problems. You have a basic understanding of your current behaviour. If you've tried it out, you'll find that when you go through the troubleshooting, every problem is
Advantages
- Removes very unlikely words from consideration
- Reduces strange or incorrect outputs
- Helps produce cleaner and more reliable text
Top-P (Nucleus) Sampling in LLMs
Top P selects the next word based on cumulative probability instead of a fixed number of options, allowing the set of possible choices to grow or shrink depending on how confident the model is.

How it works
- Words are ranked by their probability
- Starting from the most likely word, words are added one by one
- The model stops when the combined probability reaches or exceeds P
- One word is then randomly selected from this group
Example
- Top P = 0.9: selects from words that together account for 90% of the total probability
- Lower P: fewer choices, safer output
- Higher P: more choices, more creative output
Implementation:
- Loads a pre-trained GPT-2 tokenizer and model
- Converts the input text into tokens
- Generates text by predicting the next words
- Uses Top-P sampling (P = 0.9) to select from words covering 90% probability
- Decodes and prints the generated text
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")
inputs = tokenizer("Explain AI in simple terms", return_tensors="pt")
output = model.generate(
**inputs,
max_length=50,
top_p=0.9,
do_sample=True
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Output:
Explain AI in simple terms:
- Give AI a set value.
- Add it to your AI.
- Add AI to the game.
- Put AI into game to improve AI performance.
- Have
Advantages
- The number of available choices adapts to the model’s confidence
- Fewer options when the model is confident, leading to safer output
- More options when confidence is lower, allowing greater creativity
- More flexible than Top-K sampling
Temperature vs Top-K vs Top-P
Now lets see the key differences in how sampling methods control creativity and reliability
| Factor | Temperature | Top-K | Top-P (Nucleus) |
|---|---|---|---|
What it controls | Randomness of output | Number of allowed words | Total probability mass |
How it works | Rescales word probabilities | Keeps top K words | Keeps words until probability ≥ P |
Main purpose | Balance creativity vs accuracy | Remove unlikely words | Adaptive, confidence-based sampling |
Effect on creativity | Higher means more creative | Higher K means more variety | Higher P means more creativity |
Typical values | 0.2 – 1.5 | 10 – 100 | 0.8 – 0.95 |
Best used when | You want control over randomness | You want strict limits | You want flexible control |
You can download full code from here