
- If you are building an AI system with OpenAI LLMs and want to evaluate it, you can use the SDKs to make Openlayer part of your workflow.
- Some tests on Openlayer are based on a score produced by an LLM evaluator. You can set any of OpenAI’s LLMs as the LLM evaluator for these tests.
Evaluating OpenAI LLMs
You can set up Openlayer tests to evaluate your OpenAI LLMs in development and monitoring.Development
In development mode, Openlayer becomes a step in your CI/CD pipeline, and your tests get automatically evaluated after being triggered by some events. Openlayer tests often rely on your AI system’s outputs on a validation dataset. As discussed in the Configuring output generation guide, you have two options:- either provide a way for Openlayer to run your AI system on your datasets, or
- before pushing, generate the model outputs yourself and push them alongside your artifacts.
OPENAI_API_KEY
secret.

For Azure OpenAI, add the
AZURE_OPENAI_API_KEY
, and AZURE_OPENAI_ENDPOINT
secrets instead.
You can use one of the OpenAI templates to check out how a sample project
fully set up with Openlayer looks like. We have templates in
Python,
and
TypeScript.
Monitoring
To use the monitoring mode, you must set up a way to publish the requests your AI system receives to the Openlayer platform. This process is streamlined for OpenAI LLMs. To set it up, you must follow the steps in the code snippet below:For Azure OpenAI, check out this code
example
instead.

If the OpenAI LLM call is just one of the steps of your AI system, you can use
the code snippets above together with tracing. In this
case, your OpenAI LLM calls get added as a step of a larger trace. You can
enhance traces with metadata using
update_current_trace(user_id="123", inferenceId="custom_id")
and update_current_step(model="gpt-4", tokens=150)
for rich observability and request correlation. Refer to the Tracing
guide for details.OpenAI LLM evaluator
Some tests on Openlayer rely on scores produced by an LLM evaluator. For example, tests that use Ragas metrics and the custom LLM evaluator test. You can use any of OpenAI’s LLMs as the underlying LLM evaluator for these tests. You can change the default LLM evaluator for a project in the project settings page. To do so, navigate to “Settings” > Select your project in the left sidebar > click on “Metrics” to go to the metric settings page. Under “LLM evaluator,” choose the OpenAI LLM you want to use. Furthermore, make sure to add yourOPENAI_API_KEY
as a workspace secret.
