A log of all the changes and improvements made to our app

Subscribe to the changelog

November 30th, 2023

GPT evaluation, Great Expectations, real-time streaming, TypeScript support, and new docs

10x the number of tests, available now in your workspace ⚡

Openlayer now offers built-in GPT evaluation for your model outputs. You can write descriptive evaluations like "Make sure the outputs do not contain profanity," and we will use an LLM to grade your agent or model given this criteria.

We also added support for creating and running tests from Great Expectations (GX). GX offers hundreds of unique tests on your data, which are now available in all your Openlayer projects. Besides these, there are many other new tests available across different project task types. View the full list below ⬇️

You can now stream data real-time to Openlayer rather than uploading in batch. Alongside this, there is a new page for viewing all your model's requests in monitoring mode. You can now see a table of your model's usage in real-time, as well as metadata like token count and latency per-row.

We've shipped the V1 of our new TypeScript client! You can use this to log your requests to Openlayer if you are using OpenAI as a provider directly. Later, we will expand this library to support other providers and use cases. If you are interested, reach out and we can prioritize.

Finally, we're releasing a brand new http://docs.openlayer.com/ that offers more guidance on how to get the most out of Openlayer and features an updated, sleek UI.

As always, stay tuned for more updates and join our Discord community to be a part of our ongoing development journey 🤗

New features

  • GPT evaluation tests
    • You can now create tests that rely on an LLM to evaluate your outputs given any sort of descriptive criteria. Try it out by going to Create tests > Performance in either monitoring or development mode!
  • Great Expectations
    • We added support for Great Expectations tests, which will allow you to create hundreds of new kinds of tests available here. To try it out, navigate to Create tests > Integrity in either monitoring or development mode.
  • New and improved data integrity & consistency tests
    • Class imbalance ratio (integrity) (tabular classification & text classification) — The ratio between the most common class and the least common class
    • Predictive power score (integrity) (tabular classification & tabular regression) — PPS for a feature (or index) must be in specific range
    • Special characters ratio (integrity) (LLM & text classification) — Check the ratio between the number of special characters to alphanumeric in the dataset
    • Feature missing values (integrity) (tabular classification & tabular regression) — Similar to null rows but for a specific feature, ensure features are not missing values
    • Quasi-constant features (integrity) (tabular classification & tabular regression) — Same as quasi-constant feature count but for a specific feature, expect specified features to be near-constant and with very low variance
    • Empty feature (integrity) (tabular classification & tabular regression) — Same as empty feature count but for a specific feature, expect specified features to not have only null value
  • Updates to existing tests
    • Set percentages as the threshold for duplicate rows, null rows, conflicting labels, ill-formed rows, and train-val leakage tests
  • We've added a new endpoint for streaming your data to Openlayer rather than uploading in batch
  • The new requests page allows you to see a real-time stream of your model's requests, and per-row metadata such as token count and latency
  • The new Openlayer TypeScript library allows users who are directly leveraging OpenAI to monitor their requests
  • Our brand new docs are live, with more guided walkthroughs and in-depth information on the Openlayer platform and API


  • Renamed goals to tests
    • We have decided that the word "test" is a more accurate representation, and have updated all references in our product, docs, website, and sample notebooks
  • Polish and improvements to the new onboarding and navigation flows, including an updated "Getting started" page with more resources to help you get the most out of Openlayer
  • Creating a project in the UI now presents as a modal
  • Creating a project in the UI opens up subsequent onboarding modals for adding an initial commit (development) or setting up an inference pipeline (monitoring)
  • Added commit statuses and button for adding new commits and inference pipelines to the navigation pane
  • Once a commit is added in development mode, new tests are suggested that are personalized to your model and data and identify critical failures and under-performing subpopulations
  • Added more clarifying tooltip on how to enable subpopulation filtering for performance tests in monitoring mode
  • Improved wording of various suggested test titles
  • Default test groupings appropriately by mode
  • Floating point thresholds were difficult to input for users

Bug Fixes

  • Tests rendered without grouping should be be sorted by date updated
  • Creating a project through the UI would not allow you to change the task type
  • Requests graph would not update with new data immediately and faithfully
  • Button for adding an OpenAI key was rendering for non-LLM projects
  • Feature value and data type validation tests were disabled
  • Rows and explainability were not rendering for certain tests
  • Token maps were not being rendered in the performance test creation page
  • Heatmap values would sometimes overflow
  • Column drift goals would not always successfully be created
  • In-app data tables for training datasets would not render
  • The final step of commit creation forms was hidden behind content
  • Updated the thresholds of suggested tests to be more reasonable for the metric
  • Test and requests line graphs fixes and improvements
    • Graph data would overflow container
    • Hovering over points would not display data correctly
    • Threshold lines would not render
    • Improved design for when only a single data point is rendered

November 17th, 2023

Enhanced onboarding, redesigned navigation, and new goals

Navigate around Openlayer with ease 🌟

We're thrilled to announce a new and improved onboarding flow, designed to make your start with us even smoother. We've also completely redesigned the app navigation, making it more intuitive than ever.

You can now use several new consistency and integrity goals — fine-grained feature & label drift, dataset size-ratios, new category checks and more. These are described in more detail below.

You'll also notice a range of improvements — new Slack and email notifications for monitoring projects, enhanced dark mode colors and improved transactional email deliverability. We've reorganized several features for ease of use, including the subpopulation filter flow and the performance goal page layout.

If you're working in dev mode, check out the dedicated commit page where you can view all the commit's metadata and download your models and data to use locally.

Stay tuned for more updates and join our Discord community to be a part of our ongoing development journey. 🚀👥

New features

  • New and improved onboarding for monitoring mode
  • Redesigned app navigation
  • New goals
    • Column drift (consistency) — choose specific columns and specific test types to measure drift in production
    • Column values match (consistency) — specify a cohort that must have matching values for a set of features in both production and reference data
    • New categories (consistency) — check for new categories present for features in your production data
    • Size-ratio (consistency) — specify a required size ratio between your datasets
    • Character length (integrity) — enforce character limits on your text-based columns
    • Ill-formed rows for LLMs (integrity) — check that your input and output columns don't contain ill-formed text
  • Dedicated commit page to view all commit metadata and download artifacts


  • Updated Slack, email notifications in monitoring mode
  • Color improvements for dark mode
  • Text no longer resets when toggling between block types in prompt playground
  • Text highlight color is now standard blue for browsers
  • Better transactional email deliverability
  • Navigate to notification settings directly from the notifications modal
  • Improved readability of prompt block content
  • Volume graphs in monitoring mode are more real-time
  • You may now invite team members in the workspace dropdown
  • Reorganized subpopulation filter flow
  • Reorganized create performance goal page layout
  • Improved multi-select for subpopulation filters
  • Requesting an upgrade in-app now opens a new tab
  • You can now specify arbitrary column names in goal thresholds and subpopulation filters

Bug Fixes

  • Back navigation didn't maintain project mode
  • Residual plots were missing cohorts in performance diagnosis page
  • Null metric values would cause all metrics to appear empty
  • Sample projects missing "sample" tag in projects page
  • Icon for comment in the activity log was incorrect
  • Metrics table was busted when missing subpopulation information
  • Performance and diagnostics page would freeze when using 1000s of classes
  • Aggregate metrics would sometimes get cut off
  • Filtering project page by LLMs or tabular-regression would not work
  • App links returned by client API now navigate to the correct project mode
  • Auto-conversion of input variables with spaces to underscores for inference

October 24th, 2023

Evals for LLMs, real-time monitoring, Slack notifications and so much more!

Introducing support for testing LLMs & monitoring production data 🔍📊

It’s been a couple of months since we posted our last update, but not without good reason! Our team has been cranking away at our two most requested features: support for LLMs and real-time monitoring / observability. We’re so excited to share that they are both finally here! 🚀

We’ve also added a Slack integration, so you can receive all your Openlayer notifications right where you work. Additionally, you’ll find tons of improvements and bug fixes that should make your experience using the app much smoother.

We’ve also upgraded all Sandbox accounts to a free Starter plan that allows you to create your own project in development and production mode. We hope you find this useful!

Join our Discord for more updates like this and get closer to our development journey!

New features

  • LLMs in development mode
    • Experiment with and version different prompts, model providers and chains
    • Create a new commit entirely in the UI with our prompt playground. Connects seamlessly with OpenAI, Anthropic and Cohere
    • Set up sophisticated tests around RAG (hallucination, harmfulness etc.), regex validation, json schemas, and much more
  • LLMs in monitoring mode
    • Seamlessly evaluate responses in production with the same tests you used in development and measure token usage, latency, drift and data volume too
  • All existing tasks support monitoring mode as well
  • Toggle between development mode and monitoring mode for any project
  • Add a few lines of code to your model’s inference pipeline to start monitoring production data
  • Slack & email notifications
    • Setup personal and team notifications
    • Get alerted on goal status updates in development and production, team activity like comments, and other updates in your workspace
  • Several new tests across all AI task types
  • New sample project for tabular regression
  • Select and star the metrics you care about for each project
  • Add encrypted workspace secrets your models can rely on


  • Revamped onboarding for more guidance on how to get started quick with Openlayer in development and production
  • Better names for suggested tests
  • Add search bar to filter integrity and consistency goals in create page
  • Reduce feature profile size for better app performance
  • Add test activity item for suggestion accepted
  • Improved commit history allows for better comparison of the changes in performance between versions of your model and data across chosen metrics and goals
  • Added indicators to the aggregate metrics in the project page that indicate how they have changed from the previous commit in development mode
  • Improved logic for skipping or failing tests that don’t apply
  • Updated design of the performance goal creation page for a more efficient and clear UX
  • Allow specifying MAPE as metric for the regression heatmap
  • Improvements to data tables throughout the app, including better performance and faster loading times
  • Improved UX for viewing performance insights across cohorts of your data in various distribution tables and graphs
  • Updated and added new tooltips throughout the app for better clarity of concepts

Bug Fixes

  • Downloading commit artifacts triggered duplicate downloads
  • Fixed lagginess when browsing large amounts of data in tables throughout the app
  • Valid subpopulation filters sometimes rendered empty data table
  • Fixed bugs affecting experience navigating through pages in the app
  • Fixed issues affecting the ability to download data and logs from the app
  • Filtering by tokens in token cloud insight would not always apply correctly
  • Fixed UI bugs affecting the layout of various pages throughout the app that caused content to be cut off
  • Fixed python client commit upload issues

June 29th, 2023

Regression projects, toasts, and artifact retrieval

Introducing support for regression tasks 📈

This week we shipped a huge set of features and improvements, including our solution for regression projects!

Finally, you can use Openlayer to evaluate your tabular regression models. We’ve updated our suite of goals for these projects, added new metrics like mean squared error (MSE) and mean absolute error (MAE), and delivered a new set of tailored insights and visualizations such as residuals plots.

This update also includes an improved notification system: toasts that present in the bottom right corner when creating or updating goals, projects, and commits. Now, you create all your goals at once with fewer button clicks.

Last but not least, you can now download the models and datasets under a commit within the platform. Simply navigate to your commit history and click on the options icon to download artifacts. Never worry about losing track of your models or datasets again.

New features

  • Added support for tabular regression projects
  • Toast notifications now present for various in-app user actions, e.g. when creating projects, commits, or goals
  • Enabled downloading commit artifacts (models and datasets)
  • Allowed deleting commits


  • Improved graph colors for dark mode
  • Commits within the timeline now show the time uploaded when within the past day
  • Commit columns in the timeline are now highlighted when hovering

Bug fixes

  • Sentence length goals would not render failing rows in the goal diagnosis modal
  • Filtering by non-alphanumeric symbols when creating performance goals was not possible in text classification projects
  • Changing operators would break filters within the performance goal creation page
  • Heatmap labels would not always align or overflow properly
  • Buggy UI artifacts would unexpectedly appear when hovering over timeline cells
  • Sorting the timeline would not persist the user selection correctly
  • Quasi-constant feature goals would break when all features have low variance
  • Selection highlight was not visible within certain input boxes
  • NaN values inside categorical features would break performance goal subpopulations
  • Heatmaps that are too large across one or both dimensions no longer attempt to render
  • Confidence distributions now display an informative error message when failing to compute

May 30th, 2023

Sign in with Google, sample projects, mentions and more!

📢 Introducing our changelog! 🎉

We are thrilled to release the first edition of our company's changelog, marking an exciting new chapter in our journey. We strive for transparency and constant improvement, and this changelog will serve as a comprehensive record of all the noteworthy updates, enhancements, and fixes that we are constantly shipping. With these releases, we aim to foster a tighter collaboration with all our amazing users, ensuring you are up to date on the progress we make and exciting features we introduce. So without further ado, let's dive into the new stuff!

New features

  • Enabled SSO (single sign-on) with Google
  • Added sample projects to all workspaces
  • Added support for mentioning users, goals, and commits in goal comments and descriptions — type @ to mention another user in your workspace, or # to mention a goal or commit
  • Added the ability to upload “shell” models (just the predictions on a dataset) without the model binary (required for explainability, robustness, and text classification fairness goals)
  • Added ROC AUC to available project metrics
  • Added an overview page to browse and navigate to projects
  • Added an in-app onboarding flow to help new users get setup with their workspace
  • Added announcement bars for onboarding and workspace plan information
  • Integrated with Stripe for billing management
  • Added marketing email notification settings


  • Optimized network requests to dramatically improve page time-to-load and runtime performance
  • Improved the experience scrolling through dataset rows, especially for very large datasets
  • Added more suggested subpopulations for performance goal creation
  • Added more warning and error messages to forms
  • Added loading indicators when submitting comments in goals
  • Allowed submitting comments via Cmd + Enter
  • Improved the color range for heatmap tiles and tokens in the performance goal creation page
  • Updated wording of various labels throughout the app for clarity
  • Allowed specifying a role when inviting users to workspaces
  • Updated the design of the password reset and confirmation pages
  • Updated the design of the in-app onboarding modal
  • Sorted confusion matrix labels and predictions dropdown items alphabetically and enabled searching them
  • Added the ability to expand and collapse the confusion matrix

Bug fixes

  • Adding filters with multiple tokens when creating performance goals for text classification projects would sometimes fail to show insights
  • Adding filters when creating performance goals in any project would sometimes fail to show insights
  • Updating passwords in-app would fail
  • Notifications mentioning users that were deleted from a workspace would show a malformed label rather than their name or username
  • Email was sometimes empty in the page notifying users an email was sent to confirm their account after signup
  • Explainability graph cells would sometimes overflow or become misaligned
  • Users were sometimes unexpectedly logged out
  • Feature drift insights were broken for tabular datasets containing completely empty features
  • Feature profile insights would fail to compute when encountering NaN values
  • Token cloud insights would fail to compute when encountering NaN values
  • Commits in the history view would sometimes have overflowing content
  • Replaying onboarding successively would start the flow at the last step
  • Switching between projects and workspaces would sometimes fail to redirect properly
  • Confusion matrix UI would break when missing column values
  • Sorting the confusion matrix by subpopulation values wouldn’t apply
  • Goals would show as loading infinitely when missing results for the current commit
  • Improved the loading states for goal diagnosis modals
  • Performing what-if on rows with null columns would break the table UI
  • Uploading new commits that do not contain features used previously in the project as a subpopulation filter would cause unexpected behavior
  • Fixed various UI bugs affecting graphs throughout the app