Uploading a reference dataset
A reference dataset is usually a representative sample of the training data used by the model. It is required to monitor data drift — as its distribution serves as a reference to compare the distribution of your published data.
How to upload a reference dataset
You can upload a reference dataset to your inference pipeline on Openlayer with the Python SDK.
See full Python example
Load your dataset as a pandas DataFrame
Let’s say that your reference dataset looks like the one below. For simplicity, we show a single row.
import pandas as pd
df = pd.DataFrame(
{
"CreditScore": [600],
"Geography": ["France"],
"Gender": ["Male"],
"Age": [40],
"Tenure": [5],
"Balance": [100000],
"NumOfProducts": [1],
"HasCrCard": [1],
"IsActiveMember": [1],
"EstimatedSalary": [50000],
"AggregateRate": [0.5],
"Year": [2020],
"Exited": [0],
}
)
Prepare the dataset configuration
The dataset config is a dictionary containing information that helps Openlayer understand your data.
For example, the dataset above is from a tabular classification task, so our dataset config will have information such as the feature names, class names, and others:
from openlayer.types.inference_pipelines import data_stream_params
# You can replace with `ConfigTabularRegressionData`, `ConfigTextClassificationData`
# or `ConfigTabularLlmData`, according to your task type
config = data_stream_params.ConfigTabularClassificationData(
categorical_feature_names=["Gender", "Geography"],
class_names=["Retained", "Exited"],
feature_names=[
"CreditScore",
"Geography",
"Gender",
"Age",
"Tenure",
"Balance",
"NumOfProducts",
"HasCrCard",
"IsActiveMember",
"EstimatedSalary",
"AggregateRate",
"Year",
],
label_column_name="Exited",
)
Upload to Openlayer
Now, you can upload your reference dataset alongside its config to Openlayer:
from openlayer import Openlayer
from openlayer.lib import data
data.upload_reference_dataframe(
client=Openlayer(api_key="YOUR_OPENLAYER_API_KEY_HERE"),
inference_pipeline_id="YOUR_INFERENCE_PIPELINE_ID_HERE",
dataset_df=df,
config=config,
)
Was this page helpful?