Tests in Openlayer let you codify expectations for your AI system. They help you ensure that your data, models, and outputs remain reliable, safe, and compliant over time. You can apply tests in two main contexts:
  • Development: run tests on datasets and artifacts during development to catch regressions before deployment.
  • Monitoring: run tests on live requests after deployment to detect failures, unexpected behavior, and drift in production.
Together, these two modes allow you to test your system both pre- and post-production.

Getting started with tests

There are three main ways to begin:

1. Use a test bundle

Bundles are pre-packaged sets of tests for common use cases. They allow you to apply broad coverage with a single step, without having to configure each test individually. Examples include:
  • Agentic bundle: evaluate the performance of agentic and RAG systems with metrics like faithfulness, relevance, and more.
  • Usage bundle: track system usage via cost, tokens, latency, and more.
  • OWASP bundle: check for common security issues such as prompt injection, hallucinations, and more.
  • EU AI Act bundle: align with regulatory requirements, including fairness, transparency, and more.
  • Data quality bundle: catch data quality issues such as missing values, duplicates, anomalies, and more.
We are continuously adding and improving bundles, so stay tuned for updates.

2. Browse the full test catalog

Openlayer provides 100+ individual tests. You can browse all tests and assemble your own test suite for fine-grained control.

3. Create your own tests

If the built-in catalog does not cover your use case, you can create custom tests with custom metrics. This allows you to encode domain-specific checks alongside the standard ones.