Skip to main content

Temperature and Seed Parameters: Controlling LLM Randomness

Temperature and seed are the foundational controls for LLM reproducibility. Temperature scales the probability distribution; seed locks the random number generator. This article teaches you exactly when and how to use each, with concrete API examples and real-world decision trees to guide your choices.

The key insight is this: temperature and seed work together. Temperature defines what range of outputs is possible; seed determines which output the RNG will pick from that range. Without seed, the same temperature produces different outputs. Without controlling temperature, a seed can't improve output quality—it just ensures consistent mediocrity.

Temperature: The Core Lever

Temperature is a multiplier applied to logits (raw model scores) before softmax normalization. Formally, adjusted logit = original logit / temperature.

Temperature = 0.0: Greedy decoding. Always pick the highest-probability token. Deterministic without a seed. Output is highly predictable and often stilted. Use for objective tasks: extracting structured data, answering factual questions, following step-by-step instructions.

Temperature = 0.5 to 1.0: The sweet spot for most applications. Balanced between coherence and diversity. Slightly favors high-probability tokens but allows lower-probability options. Requires a seed for reproducibility.

Temperature = 1.0: The model's default. Preserves the original probability distribution. Neither conservative nor creative.

Temperature = 1.5 to 2.0: Higher creativity and diversity. Low-probability tokens become more likely. Useful for brainstorming, creative writing, or generating multiple diverse outputs. Requires careful seed management.

Temperature > 2.0: Rarely recommended. Distribution becomes too flat; outputs become incoherent.

Here's how temperature transforms probabilities for a simple three-token distribution:

import numpy as np

logits = np.array([3.0, 1.5, 0.5])

def softmax(logits, temperature=1.0):
scaled = logits / temperature
exp_scores = np.exp(scaled - np.max(scaled))
return exp_scores / np.sum(exp_scores)

print("Logits:", logits)
print("T=0.5:", softmax(logits, 0.5)) # [0.946, 0.051, 0.003]
print("T=1.0:", softmax(logits, 1.0)) # [0.802, 0.170, 0.028]
print("T=2.0:", softmax(logits, 2.0)) # [0.480, 0.288, 0.232]

Notice: at T=0.5, token 0 dominates (94.6%). At T=2.0, the distribution flattens, giving lower-probability tokens a real chance.

Seed: Locking in Reproducibility

A seed is a non-negative integer (typically 0 to 2^32 - 1) that initializes the pseudo-random number generator used for sampling. The same seed produces the same sequence of random numbers.

Critical rule: Seed alone is not sufficient for determinism. You must also lock temperature, prompt, and model version. Seed + temperature = reproducible output.

Most modern LLM APIs support seed parameters. Here's an example with the OpenAI Python SDK:

from openai import OpenAI

client = OpenAI(api_key="your-key")

def deterministic_completion(prompt, seed=42, temperature=0.7):
response = client.messages.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
temperature=temperature,
seed=seed,
max_tokens=100
)
return response.content[0].text

prompt = "Explain photosynthesis in one sentence."
result1 = deterministic_completion(prompt, seed=42, temperature=0.7)
result2 = deterministic_completion(prompt, seed=42, temperature=0.7)
# result1 and result2 will be identical

For Anthropic's Claude API, seed is not currently exposed, but you can achieve determinism via strict prompt formatting and low temperature:

import anthropic

def deterministic_claude(prompt, temperature=0.2):
client = anthropic.Anthropic(api_key="your-key")
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=100,
temperature=temperature, # Lower = more deterministic
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text

result = deterministic_claude("What is 2+2?", temperature=0.2)
# Near-deterministic due to low temp; will be very consistent

Choosing Temperature and Seed: Decision Tree

When should you use each parameter?

ScenarioTemperatureSeedWhy
Factual Q&A (capitals, math)0.0–0.3Helpful but not requiredLow temp is deterministic; seed adds insurance
Instruction following0.5–0.8RequiredNeed diversity to follow varied instructions; seed locks behavior
Creative writing1.0–1.5OptionalDiversity is desired; seed locks output if consistency later required
Content generation at scale0.6–0.9Use different seed per requestAvoid repetition across documents; each seed produces unique variant
Conversation context synthesis0.7–1.0Fixed per conversationSame seed within one conversation, new seed for next
Debugging/reproduction0.2–0.5RequiredLock down behavior to isolate the issue

Managing Multiple Seeds

For applications that generate many outputs, use a deterministic seed derivation. For example, hash the user ID + request timestamp to generate a unique, reproducible seed per user:

import hashlib

def derive_seed(user_id, request_id):
combined = f"{user_id}:{request_id}"
hash_obj = hashlib.sha256(combined.encode())
seed = int(hash_obj.hexdigest(), 16) % (2**31 - 1) # Map to valid range
return seed

user_id = "user_123"
request_id = "req_abc"
seed = derive_seed(user_id, request_id) # Always same seed for same user+request
print(f"Seed: {seed}")

This ensures:

  • If the same user makes the same request twice, they get the same output (deterministic).
  • Different users or different requests get different outputs (diversity).
  • You can audit exactly which seed was used for any past response.

Best Practices

1. Document your temperature and seed choices. In code, config files, or logs, record which temperature and seed you used for each request. This is critical for debugging.

2. Test determinism before production. Make a test function that calls your LLM API twice with the same seed and temperature, then compares outputs. Fail fast if outputs differ.

def test_determinism(prompt, model, seed=42, temperature=0.7):
result1 = generate_with_api(prompt, model, temperature, seed)
result2 = generate_with_api(prompt, model, temperature, seed)
assert result1 == result2, f"Outputs differed:\n{result1}\n{result2}"

3. Version your seed strategy. If you change how seeds are derived (e.g., from a hash to a counter), document the change and migrate gradually. Old seeds may no longer reproduce old outputs after a model update.

4. Adjust temperature based on task, not gut feeling. Use A/B testing: measure factuality, diversity, and user satisfaction for different temperatures, then pick the winner.

Key Takeaways

  • Temperature scales the probability distribution before sampling: 0.0 is greedy (deterministic), 0.5–1.0 is balanced, > 1.5 is creative.
  • Seed locks the RNG state: same seed + same temperature + same prompt = identical output.
  • Temperature alone does not guarantee determinism; seed does not guarantee quality. Use both, with temperature tuned to your task.
  • Derive seeds deterministically from user/request context to enable reproducibility without manual seed management.
  • Document and test your temperature and seed choices in CI/CD to catch non-determinism bugs early.

Frequently Asked Questions

What happens if I set temperature = 0.0 and change the seed?

The seed has no effect. Temperature = 0.0 is greedy decoding: always pick the top token, so the RNG is never invoked. Changing the seed changes nothing.

Can I use a seed without knowing the model's exact RNG algorithm?

Yes, as long as your API provider implements seeded sampling. The RNG implementation is opaque to you; the provider guarantees reproducibility if you use the same seed and temperature. However, model updates may change the RNG, breaking old seeds.

If I set temperature = 0.0, is the output better quality?

Not necessarily. Greedy decoding avoids low-probability tokens, which sometimes means missing better phrasings. Moderate temperature (0.7–0.9) often produces higher-quality text for most tasks. Test on your specific task.

What if my API doesn't support seed?

Use very low temperature (0.2–0.3) for near-determinism, or add post-hoc determinism via prompt formatting (e.g., "Return exactly one sentence" rather than letting the model vary output length).

Further Reading