Explainability

Prysm's explainability suite helps you understand why your models produce specific outputs.

Token Confidence Heatmap

Per-token confidence visualization using logprobs data. Each token is colored on an OKLCH gradient from red (low confidence) to green (high confidence).

  • OpenAI & Gemini: Native logprobs from the API
  • Anthropic: Estimated confidence based on token patterns

Hallucination Detection

Automatic identification of low-confidence segments within completions. The detector flags sequences where multiple consecutive tokens have low confidence scores, indicating the model may be generating unreliable content.

Risk levels: low (isolated low-confidence tokens), medium (short low-confidence sequences), high (extended low-confidence passages).

"Why Did It Say That?"

LLM-powered explanations for any completion. Select a trace in the Request Explorer and click "Explain" to get a decision analysis covering:

  • What the model likely prioritized in the prompt
  • Why it chose specific phrasing or content
  • Alternative responses it may have considered

Decision Points Timeline

Visual timeline of high-entropy tokens where the model considered multiple alternatives. Each decision point shows the top candidate tokens and their probabilities, helping you understand where the model was most uncertain.

Model Comparison

Side-by-side confidence and hallucination risk comparison across traces. Compare how different models handle the same prompt by viewing their confidence distributions together.