Skip to content

v1.3.0

Latest

Choose a tag to compare

@kfswain kfswain released this 21 Jan 14:17
· 135 commits to main since this release
v1.3.0
616745e

Noteworthy

LoRA Syncer

This release, and future releases will not have the lora syncer image associated with them, as we are deprecating that feature, a similar functionality will still exist in the form of the file system resolver. For model servers that do not yet support this form of LoRA management, but support the discrete LoRA management endpoints that the lora-syncer uses, the old images will be kept indefinitely, and can still be used.

In the next release, the lora syncer code will be removed from the codebase.

Flow Control

Flow Control continues to evolve with the addition of Scale from/to Zero support. Allowing requests to be sent to an EPP with no model serving endpoints behind it, and emitting metrics to be used by the autoscaler to then scale up the pool.

In following releases we will continue to develop towards this feature being default enabled.

Standalone EPP

This functionality allows the EPP to be deployed as a proxy, all contained within a single pod. This is achieved by the Envoy proxy having EPP as a sidecar container. This feature was developed for batch inference scenarios, and is currently considered experimental.

Fix(es)

  • We improved the functionality of the approximate prefix cache scorer when working with the llm-d P/D setup

What's Changed

New Contributors

Full Changelog: v1.2.1...v1.3.0