Product | Openlayer

Passing

No ill-formed inputs

Training

Failing

No SSN

Training

Passing

No credit card information

Validation

Training

Passing

No ill-formed inputs

Training

Failing

No SSN

Training

Passing

No credit card information

Validation

Training

Passing

No ill-formed inputs

Training

Failing

No SSN

Training

Passing

No credit card information

Validation

Training

Passing

Is valid JSON

Validation

Passing

Is valid Python code

Validation

Failing

No column drift

Validation

Training

Passing

Is valid JSON

Validation

Passing

Is valid Python code

Validation

Failing

No column drift

Validation

Training

Passing

Is valid JSON

Validation

Passing

Is valid Python code

Validation

Failing

No column drift

Validation

Training

Passing

High GPT-evaluation score

Model

Validation

Passing

High answer relevancy

Model

Validation

Failing

Response is concise

Model

Validation

Passing

High GPT-evaluation score

Model

Validation

Passing

High answer relevancy

Model

Validation

Failing

Response is concise

Model

Validation

Passing

High GPT-evaluation score

Model

Validation

Passing

High answer relevancy

Model

Validation

Failing

Response is concise

Model

Validation

Passing

Response is factual

Model

Validation

Failing

High BLEU-1 score on protected subpopulation

Model

Validation

Passing

High METEOR score

Model

Validation

Passing

Response is factual

Model

Validation

Failing

High BLEU-1 score on protected subpopulation

Model

Validation

Passing

High METEOR score

Model

Validation

Passing

Response is factual

Model

Validation

Failing

High BLEU-1 score on protected subpopulation

Model

Validation

Passing

High METEOR score

Model

Validation

Passing

No ill-formed sentences

Training

Failing

No new tokens

Training

Passing

No disparity between gender-related pronouns

Validation

Training

Passing

No ill-formed sentences

Training

Failing

No new tokens

Training

Passing

No disparity between gender-related pronouns

Validation

Training

Passing

No ill-formed sentences

Training

Failing

No new tokens

Training

Passing

No disparity between gender-related pronouns

Validation

Training

Failing

No rows from the training set present in validation

Validation

Training

Passing

No new labels in the validation set

Validation

Training

Passing

No label drift

Training

Failing

No rows from the training set present in validation

Validation

Training

Passing

No new labels in the validation set

Validation

Training

Passing

No label drift

Training

Failing

No rows from the training set present in validation

Validation

Training

Passing

No new labels in the validation set

Validation

Training

Passing

No label drift

Training

Passing

No significant label drift

Validation

Training

Passing

No rows from the training set present in validation

Validation

Training

Failing

Expect high performance on sentences containing "help"

Model

Validation

Passing

No significant label drift

Validation

Training

Passing

No rows from the training set present in validation

Validation

Training

Failing

Expect high performance on sentences containing "help"

Model

Validation

Passing

No significant label drift

Validation

Training

Passing

No rows from the training set present in validation

Validation

Training

Failing

Expect high performance on sentences containing "help"

Model

Validation

Failing

High precision on sentences containing key tokens

Model

Validation

Failing

High confidence and accuracy on must-pass cases

Model

Validation

Passing

High precision on "urgent" predictions

Model

Validation

Failing

High precision on sentences containing key tokens

Model

Validation

Failing

High confidence and accuracy on must-pass cases

Model

Validation

Passing

High precision on "urgent" predictions

Model

Validation

Failing

High precision on sentences containing key tokens

Model

Validation

Failing

High confidence and accuracy on must-pass cases

Model

Validation

Passing

High precision on "urgent" predictions

Model

Validation

Passing

No more than 10 rows with nulls

Training

Failing

No more than 10 duplicate rows

Training

Passing

No significant difference in accuracy between gender feature values

Validation

Training

Passing

No more than 10 rows with nulls

Training

Failing

No more than 10 duplicate rows

Training

Passing

No significant difference in accuracy between gender feature values

Validation

Training

Passing

No more than 10 rows with nulls

Training

Failing

No more than 10 duplicate rows

Training

Passing

No significant difference in accuracy between gender feature values

Validation

Training

Failing

No rows from the training set present in validation

Validation

Training

Passing

No new labels in the validation set

Validation

Training

Passing

No null columns

Training

Failing

No rows from the training set present in validation

Validation

Training

Passing

No new labels in the validation set

Validation

Training

Passing

No null columns

Training

Failing

No rows from the training set present in validation

Validation

Training

Passing

No new labels in the validation set

Validation

Training

Passing

No null columns

Training

Passing

No significant label drift

Validation

Training

Passing

No rows from the training set present in validation

Validation

Training

Failing

Expect high performance on young adult females in South Africa

Model

Validation

Passing

No significant label drift

Validation

Training

Passing

No rows from the training set present in validation

Validation

Training

Failing

Expect high performance on young adult females in South Africa

Model

Validation

Passing

No significant label drift

Validation

Training

Passing

No rows from the training set present in validation

Validation

Training

Failing

Expect high performance on young adult females in South Africa

Model

Validation

Failing

High precision on "fraudulent" predictions

Model

Validation

Failing

High confidence and accuracy on must-pass cases

Model

Validation

Passing

High precision on "urgent" predictions

Model

Validation

Failing

High precision on "fraudulent" predictions

Model

Validation

Failing

High confidence and accuracy on must-pass cases

Model

Validation

Passing

High precision on "urgent" predictions

Model

Validation

Failing

High precision on "fraudulent" predictions

Model

Validation

Failing

High confidence and accuracy on must-pass cases

Model

Validation

Passing

High precision on "urgent" predictions

Model

Validation

Openlayer is the most advanced platform for tracking, testing, and monitoring your AI

Task types

Tests

Monitoring

Diagnosis

Versioning

Collaboration

Support for your task type

We support a diverse range of task types so that all of your use cases are covered.

LLMs

Text classification

Tabular classification

Tabular regression

Tests

Make sure your AI is at peak performance when it gets in the hands of your users. Tests are a great way to track all of the constraints that are important for your model's performance.

Data integrity

Start by improving the foundation of your AI system: the data.

Data consistency

Make sure data stays consistent between different datasets.

Performance

Identify underperforming subpopulations Choose from the most advanced metrics.

Fairness

Guard against biases and ensure equal treatment of sensitive groups.

Robustness

Probe for edge-cases not captured by your data that you may encounter in the wild. See how your AI performs under adversarial attack.

Monitoring

Add your monitoring pipeline to set production-specific tests and keep a close eye on your model behavior in the wild.

Real-time alerts

Something went wrong in production? Be the first to know with real-time pings on email, Slack, or in-app.

Openlayer

@sophia commented on No duplicate rows
– thoughts on changing the threshold?

Openlayer

Test status updated for No output drift
To 🔴 Failing From 🟢 Passing

Evaluation windows

Different tests require different windows of data. Set custom evaluation windows to determine when to run each test.

Monitor dashboard

The monitoring dashboard offers a comprehensive view of current test results. Click on any individual test to dive deeper into its performance history.

Diagnosis

Breeze through the "why?"s behind failed tests. The information you need to find the root cause of issues is at your fingertips.

Root cause analysis

Dive deeper into every test to understand why it is failing. Stop questioning what to do next to improve your model.

Explainability

Understand which features are the culprits for driving model performance over a particular data slice or r the whole dataset.

What-if-analysis

Perturb individual model inputs and see how the prediction changes. Compare the explainability scores and model predictions side-by-side to validate your hypotheses.