Skip to main content

Definition

The session record count test monitors the number of records — typically turns — per session. It’s a numeric test (no LLM evaluator involved) that aggregates the row count per session ID and lets you alert on pathological cases: runaway sessions with hundreds of turns, or sessions that never went past one.

Taxonomy

  • Task types: LLM.
  • Availability: and .
  • Evaluation level: session.
  • Computation: deterministic aggregation.

Why it matters

  • Runaway sessions (too many turns) often signal tool-call loops, clarification loops, or frustrated users hammering on the same question.
  • Very short sessions (one turn then drop-off) may signal users bouncing off the product before the assistant could help.
  • Track both tails — mean and percentile views of session length often tell very different stories.

Available measurements

MeasurementWhat it means
totalSessionsNumber of sessions in the window
meanRecordsPerSessionAverage number of records per session
medianRecordsPerSessionMedian number of records per session
stdRecordsPerSessionStandard deviation of records per session
minRecordsPerSessionShortest session in the window
maxRecordsPerSessionLongest session in the window
p90RecordsPerSession90th-percentile session length
p95RecordsPerSession95th-percentile session length
p99RecordsPerSession99th-percentile session length
shortSessionCountCount of sessions below a short-session threshold
mediumSessionCountCount of sessions in the medium-length band
longSessionCountCount of sessions above a long-session threshold

Required columns

  • Session ID: Groups turns belonging to the same conversation.

Test configuration examples

[
  {
    "name": "Mean records per session below 20",
    "description": "Alert when average session length exceeds 20 turns",
    "type": "performance",
    "subtype": "sessionRecordCount",
    "thresholds": [
      {
        "insightName": "sessionRecordCount",
        "measurement": "meanRecordsPerSession",
        "operator": "<=",
        "value": 20
      }
    ],
    "subpopulationFilters": null,
    "mode": "monitoring",
    "usesProductionData": true,
    "evaluationWindow": 3600,
    "delayWindow": 0
  }
]