We’ve added more ways to test latency. Beyond just mean, max, and total, you can now make test latency with minimum, median, 90th percentile, and 99th percentile metrics. Just head over to the Performance page and the new test types are there.
You can also create more granular data tests by applying subpopulation filters to run the tests on specific clusters of your data. Just add filters in the Data Integrity or Data Consistency pages, and the subpopulation will be applied.
Features
•EvalsAbility to apply subpopulation filters to data tests (Min Latenc, Median Latency, 90th Percentile Latency, 95th Percentile Latency, 99th Percentile Latency)
•SDKsSupport for logging and testing runs of the OpenAI Assistants API with our Python and TypeScript clients
Improvements
•APIUpdated OpenAI model pricing
•TemplatesSupport for OpenAI assistants with example notebook
•PerformanceImproved performance for monitoring projects
•UI/UXRequests are updated every 5 seconds live on the page
•UI/UXAbility to search projects by name in the project overview
•UI/UXYou can now view rows per evaluation window in test modals
•UI/UXDate picker for selecting data range in test modal
•UI/UXShow only the failing rows for tests
•UI/UXAllow opening rows to the side in test modal tables
•UI/UXEnable collapsing the metadata pane in test modals
•UI/UXSkipped test results now render the value from the last successful evaluation in monitoring
Fixes
•IntegrationsLangchain version bug is fixed
•UI/UXMetric score and explanations did not appear in data tables in development mode
•UI/UXRequest table layout was broken
•UI/UXNow able to navigate to subsequent pages in requests page
•UI/UXFixed bug with opening request metadata
•PerformanceRequests and inference pipeline occasionally did not load
•PerformanceSome LLM metrics had null scores in development mode
•UI/UXThere was a redundant navigation tab bar in monitoring test modals
•PerformanceMonitoring tests with no results loaded infinitely
$ openlayer push
Stop guessing.
Ship with confidence.
The automated AI evaluation and monitoring platform.