Session coherence

Definition

The session coherence test evaluates the logical flow and consistency of a multi-turn conversation. An LLM-as-a-judge reads the full conversation and scores it against four criteria:

Responses logically follow from the user’s messages
The overall trajectory of the conversation is easy to follow
Individual responses are well-structured and internally consistent
Transitions between topics feel smooth

Taxonomy

Task types: LLM.
Availability: and .
Evaluation level: session.
Polarity: higher score = better. 0 = completely incoherent, 1 = perfectly coherent.

Why it matters

Coherence is a distinct quality from correctness: an assistant can give factually correct answers that still feel disjointed or contradictory across turns.
Low coherence is a strong leading indicator of user dissatisfaction even when task outcomes look fine.

Required columns

Input: The user’s message in each turn.
Output: The assistant’s response in each turn.
Session ID: Groups turns belonging to the same conversation.
Timestamp: Used to reconstruct turn order within a session.

This metric relies on an LLM evaluator. On Openlayer you can configure the underlying LLM used to compute it. Check out the OpenAI or Anthropic integration guides for details.

Test configuration examples

[
  {
    "name": "Session coherence above 0.7",
    "description": "Ensure conversations maintain logical flow across turns",
    "type": "performance",
    "subtype": "sessionCoherence",
    "thresholds": [
      {
        "insightName": "sessionCoherence",
        "measurement": "meanScore",
        "operator": ">=",
        "value": 0.7
      }
    ],
    "subpopulationFilters": null,
    "mode": "monitoring",
    "usesProductionData": true,
    "evaluationWindow": 3600,
    "delayWindow": 0
  }
]

Coherence — trace-level counterpart.
Session context retention — related but stricter signal about tracking prior facts.

Sentence length

Session context retention

⌘I

Get started

Workspace setup

Governance

Observability

Offline testing

Tests

Gateway

Data quality monitoring

Administration

Notifications

Other resources

Definition

Taxonomy

Why it matters

Required columns

Test configuration examples

​Definition

​Taxonomy

​Why it matters

​Required columns

​Test configuration examples

​Related

Definition

Taxonomy

Why it matters

Required columns

Test configuration examples

Related