OS

Changelog

A log of all the changes and improvements made to our app

Subscribe to the changelog

November 30th, 2023

GPT evaluation, Great Expectations, real-time streaming, TypeScript support, and new docs

10x the number of tests, available now in your workspace ⚡

Openlayer now offers built-in GPT evaluation for your model outputs. You can write descriptive evaluations like "Make sure the outputs do not contain profanity," and we will use an LLM to grade your agent or model given this criteria.

We also added support for creating and running tests from Great Expectations (GX). GX offers hundreds of unique tests on your data, which are now available in all your Openlayer projects. Besides these, there are many other new tests available across different project task types. View the full list below ⬇️

You can now stream data real-time to Openlayer rather than uploading in batch. Alongside this, there is a new page for viewing all your model's requests in monitoring mode. You can now see a table of your model's usage in real-time, as well as metadata like token count and latency per-row.

We've shipped the V1 of our new TypeScript client! You can use this to log your requests to Openlayer if you are using OpenAI as a provider directly. Later, we will expand this library to support other providers and use cases. If you are interested, reach out and we can prioritize.

Finally, we're releasing a brand new http://docs.openlayer.com/ that offers more guidance on how to get the most out of Openlayer and features an updated, sleek UI.

As always, stay tuned for more updates and join our Discord community to be a part of our ongoing development journey 🤗

New features

  • GPT evaluation tests
    • You can now create tests that rely on an LLM to evaluate your outputs given any sort of descriptive criteria. Try it out by going to Create tests > Performance in either monitoring or development mode!
  • Great Expectations
    • We added support for Great Expectations tests, which will allow you to create hundreds of new kinds of tests available here. To try it out, navigate to Create tests > Integrity in either monitoring or development mode.
  • New and improved data integrity & consistency tests
    • Class imbalance ratio (integrity) (tabular classification & text classification) — The ratio between the most common class and the least common class
    • Predictive power score (integrity) (tabular classification & tabular regression) — PPS for a feature (or index) must be in specific range
    • Special characters ratio (integrity) (LLM & text classification) — Check the ratio between the number of special characters to alphanumeric in the dataset
    • Feature missing values (integrity) (tabular classification & tabular regression) — Similar to null rows but for a specific feature, ensure features are not missing values
    • Quasi-constant features (integrity) (tabular classification & tabular regression) — Same as quasi-constant feature count but for a specific feature, expect specified features to be near-constant and with very low variance
    • Empty feature (integrity) (tabular classification & tabular regression) — Same as empty feature count but for a specific feature, expect specified features to not have only null value
  • Updates to existing tests
    • Set percentages as the threshold for duplicate rows, null rows, conflicting labels, ill-formed rows, and train-val leakage tests
  • We've added a new endpoint for streaming your data to Openlayer rather than uploading in batch
  • The new requests page allows you to see a real-time stream of your model's requests, and per-row metadata such as token count and latency
  • The new Openlayer TypeScript library allows users who are directly leveraging OpenAI to monitor their requests
  • Our brand new docs are live, with more guided walkthroughs and in-depth information on the Openlayer platform and API

Improvements

  • Renamed goals to tests
    • We have decided that the word "test" is a more accurate representation, and have updated all references in our product, docs, website, and sample notebooks
  • Polish and improvements to the new onboarding and navigation flows, including an updated "Getting started" page with more resources to help you get the most out of Openlayer
  • Creating a project in the UI now presents as a modal
  • Creating a project in the UI opens up subsequent onboarding modals for adding an initial commit (development) or setting up an inference pipeline (monitoring)
  • Added commit statuses and button for adding new commits and inference pipelines to the navigation pane
  • Once a commit is added in development mode, new tests are suggested that are personalized to your model and data and identify critical failures and under-performing subpopulations
  • Added more clarifying tooltip on how to enable subpopulation filtering for performance tests in monitoring mode
  • Improved wording of various suggested test titles
  • Default test groupings appropriately by mode
  • Floating point thresholds were difficult to input for users

Bug Fixes

  • Tests rendered without grouping should be be sorted by date updated
  • Creating a project through the UI would not allow you to change the task type
  • Requests graph would not update with new data immediately and faithfully
  • Button for adding an OpenAI key was rendering for non-LLM projects
  • Feature value and data type validation tests were disabled
  • Rows and explainability were not rendering for certain tests
  • Token maps were not being rendered in the performance test creation page
  • Heatmap values would sometimes overflow
  • Column drift goals would not always successfully be created
  • In-app data tables for training datasets would not render
  • The final step of commit creation forms was hidden behind content
  • Updated the thresholds of suggested tests to be more reasonable for the metric
  • Test and requests line graphs fixes and improvements
    • Graph data would overflow container
    • Hovering over points would not display data correctly
    • Threshold lines would not render
    • Improved design for when only a single data point is rendered