Definition
The session role adherence test evaluates whether the assistant stays in its defined role across a multi-turn conversation. You supply a role description in plain English (for example, “a customer-support agent for an e-commerce platform who handles order tracking, returns, and product questions”), and an LLM-as-a-judge scores how well the assistant kept to that role throughout the session.Taxonomy
- Task types: LLM.
- Availability: and .
- Evaluation level: session.
- Polarity: higher score = better (role adhered to).
Why it matters
- Role drift is a leading indicator of prompt leakage, jailbreak success, or retrieval-augmented context bleeding into the agent’s persona.
- Role adherence at the session level catches drift that builds across turns and is invisible to per-turn checks.
Required columns
- Input: The user’s message in each turn.
- Output: The assistant’s response in each turn.
- Session ID: Groups turns belonging to the same conversation.
- Timestamp: Used to reconstruct turn order within a session.
Insight parameters
role_definition(string, optional): A plain-English description of the assistant’s expected role. If omitted, the judge falls back to a generic “appropriate-for-context” role check (useful when you haven’t yet formalized the role, but lower-signal than the explicit version).
- Persona / expertise — does the assistant present itself as the specified role?
- Scope — does it stay within the role’s topical boundaries?
- Tone & style — does it match the communication style expected of the role?
- Handling out-of-role requests — does it decline or redirect cleanly?
Test configuration examples
Related
- Session guideline adherence — broader custom-behaviour check.

