- Development: run tests on datasets and artifacts during development to catch regressions before deployment.
- Monitoring: run tests on live requests after deployment to detect failures, unexpected behavior, and drift in production.
Getting started with tests
There are three main ways to begin:1. Use a test bundle
Bundles are pre-packaged sets of tests for common use cases. They allow you to apply broad coverage with a single step, without having to configure each test individually. Examples include:- Agentic bundle: evaluate the performance of agentic and RAG systems with metrics like faithfulness, relevance, and more.
- Usage bundle: track system usage via cost, tokens, latency, and more.
- OWASP bundle: check for common security issues such as prompt injection, hallucinations, and more.
- EU AI Act bundle: align with regulatory requirements, including fairness, transparency, and more.
- Data quality bundle: catch data quality issues such as missing values, duplicates, anomalies, and more.