Tracing
Traces help you understand your system, particularly when it contains multiple steps, such as in RAG, LLM chains, and agents.
In the monitoring mode of an Openlayer project, you can view the traces for the live requests your AI system receives. This allows you to log the inputs, outputs, latency, and other metadata such as cost and number of tokens associated with every step of your system.
This guide shows how you can set up tracing with Openlayer’s SDKs to achieve a result similar to the one below.
If you prefer, feel free to refer to a notebook example. Our templates gallery also has complete sample projects that show how tracing works for development and monitoring.
How to set up tracing
You must use one of Openlayer’s SDKs to trace your system. After installing the SDK in your language of choice, follow the steps:
Set environment variables
Openlayer needs to know where to upload the traces to. This information is in the following environment variables:
OPENLAYER_API_KEY=YOUR_OPENLAYER_API_KEY
OPENLAYER_INFERENCE_PIPELINE_ID=YOUR_OPENLAYER_INFERENCE_PIPELINE_ID
Annotate the code you want to trace
Annotate all the functions you want to trace with Openlayer’s SDK.
import openai
from openlayer.lib import trace, trace_openai
# Wrap the OpenAI client Openlayer's `trace_openai`
openai_client = trace_openai(openai.OpenAI(api_key="sk-..."))
# Decorate all the functions you want to trace
@trace()
def main(user_query: str) -> str:
context = retrieve_context(user_query)
answer = generate_answer(user_query, context)
return answer
@trace()
def retrieve_context(user_query: str) -> str:
return "Some context"
@trace()
def generate_answer(user_query: str, context: str) -> str:
result = openai_client.chat.completions.create(
messages=[{"role": "user", "content": user_query + " " + context}],
model="gpt-3.5-turbo"
)
return result.choices[0].message.content
The traced generate_answer
function in the example above uses an OpenAI LLM.
However, tracing also works for other LLM providers. If you set up any of the
streamlined approaches described in the Publishing data
guide, it will get added to the trace as well.
Use the annotated code
All data that goes through the decorated code is automatically streamed to the Openlayer platform, where your tests and alerts are defined.
In the example above, if we call main
:
main("What's the meaning of life?")
the resulting trace would be:
The main
function has two nested steps: retrieve_context
, and
generate_answer
. The generate_answer
has a chat completion call within it. The cost,
number of tokens, latency, and other metadata are all computed automatically behind the
scenes.
Was this page helpful?