Tests
Overview
Explore the tests available on the platform
Tests materialize expectations around your model and data. They are categorized into three types: integrity, consistency, and performance. On this page, you can find a list of tests available on the platform, grouped by type.
To learn more about tests and their role in AI/ML evaluation, check out Understanding tests.
Integrity tests
Test | Description | Task type |
---|---|---|
Character length | Define min/max bounds on the number of characters in a column across all rows. | LLM, text classification |
Class imbalance ratio | Measure the ratio between the most common class and the least common class. | Tabular classification, text classification |
Column average | Column average must be within range. | LLM, tabular classification, tabular regression, text classification |
Column contains string | Check that values in column A are contained in the lists in column B. | LLM, tabular classification, tabular regression, text classification |
Conflicting labels | Check for rows with identical feature values but differing labels. | Tabular classification, text classification |
Correlated features | Prevent features that are strongly correlated with one another. | Tabular classification, tabular regression |
Data type validation | Guard against features with violating data types. | Tabular classification, tabular regression |
Duplicate rows | Guard against identical rows in the dataset. | LLM, tabular classification, tabular regression, text classification |
Empty features | Expect specified features to not have only null values. | Tabular classification, tabular regression |
Empty feature count | Number of features that have only null values. | Tabular classification, tabular regression |
Features missing values | Ensure specified features do not have missing values. | Tabular classification, tabular regression |
Feature values | Ensure feature values do not violate defined ranges or categories. | Tabular classification, tabular regression |
Great expectations | Validate your data using any expectation supported by GX, an open-source library. | LLM, tabular classification, tabular regression, text classification |
Ill-formed rows | Rows with more non-alphabetical characters than alphabetical. | LLM, text classification |
Is code | Check that the data contains compilable and executable code. | LLM |
Is JSON | Check that the data contains valid JSONs. | LLM |
Null rows | Guard against rows containing missing values. | LLM, tabular classification, tabular regression, text classification |
Number of rows | Define min/max bounds on the number of dataset rows. | LLM, tabular classification, tabular regression, text classification |
Personal identifiable information (PII) | Detect rows containing personally identifiable information. | LLM |
PPS (predictive power score) | PPS (predictive power score) for a feature must be in specific range. | Tabular classification, tabular regression |
Quasi-constant features | Expect specified features to be near-constant, with very low variance. | Tabular classification, tabular regression |
Quasi-constant feature count | Set expectations on the number of features that are near-constant, with very low variance. | Tabular classification, tabular regression |
Special characters ratio | Check the ratio between the number of special characters to alphanumeric in the dataset. | LLM, text classification |
String validation | Guard against rows containing strings that violate defined patterns (RegEx). | LLM |
Valid URLs | Ensure the data contains valid URLs | LLM |
Consistency tests
Test | Description | Task type |
---|---|---|
Column drift | Measure drift in a specific column using one of the drift detection methods supported. | LLM, tabular classification, tabular regression, text classification |
Column values match | Make sure that rows in your two datasets have the same values for a specific column. | LLM, tabular classification, tabular regression, text classification |
Feature drift | Ensure similar feature distributions between current and reference datasets. | Tabular classification, tabular regression |
Label drift | Check if label distributions are significantly different between the training and validation sets. | Tabular classification, text classification |
New categories | Check if there are new categories that appear in the validation set not in the training set. | Tabular classification, text classification |
New labels | Labels in the validation set that are not in the training set. | Tabular classification, text classification |
Size ratio | Check the ratio between the number of rows in the validation and training datasets. | LLM, tabular classification, tabular regression, text classification |
Training-validation leakage | Detect training rows that are present in the validation dataset. | LLM, tabular classification, tabular regression, text classification |
Performance tests
Test | Description | Task type |
---|---|---|
Aggregate metrics | Set aggregate metrics thresholds for whole datasets or subpopulations within it. | LLM, tabular classification, tabular regression, text classification |
GPT evaluation | Evaluate the outputs using an LLM given a custom criteria. | LLM |
Max cost | Ensures that the maximum request cost (in USD) for the data is within a given range. | LLM |
Max latency (ms) | Measures the maximum latency of a single request in the period of data. | LLM, tabular classification, tabular regression, text classification |
Max tokens | Measures the max number of tokens used in one request in the period of data. | LLM |
Mean cost | Ensures that the average request cost (in USD) for the data is within a given range. | LLM |
Mean latency (ms) | Measures the mean latency of requests in the period of data. | LLM, tabular classification, tabular regression, text classification |
Mean tokens | Measures the mean number of tokens used in the period of data. | LLM |
Total cost | Ensures that the total request cost (in USD) for the data is within a given range. | LLM |
Total tokens | Measures the total number of tokens used in the period of data. | LLM |
Was this page helpful?