Skip to content

[ML] Trained models: Warn users of the implication of a min_allocations=0 configuration #218631

@arisonl

Description

@arisonl

Work to be done:

We need to guide/warn users (trained models and inference endpoints UIs) by including descriptive messages on the UI, informing them of the cost/availability tradeoff and the implications of configuration choices when deploying an NLP model. If users choose to optimize for cost, they may face temporary unavailability. If they optimize for availability, they may face some level of cost when idling.

A separate github issue will deal with the API error handling/descriptiveness for the same issue.

Context:

With the development of adaptive allocations, models can scale down to zero allocations for cost optimization, depending on the selected configuration.

Specifically, when setting up a model deployment through the trained models UI, users are presented with a choice for level of usage. If the user chooses low usage, min_allocations is set to zero to optimize for cost in the cost/availability tradeoff.

In that case, when no inference calls take place, the model will scale down to zero allocations for zero cost. After a while, if still idling, the ML node will be released and subsequent inference calls will be presented with unavailability errors until a node is spun up and allocated to serve these calls. Configurations with min_allocations=1 do not have this problem but they induce a small idling cost.

We know that some users are surprised by this behavior, including erroring in Search Playground. This is because before adaptive allocations, models were not scaling down and they were never facing this side-issue.

This problem will not be present on EIS.

cc @peteharverson

Metadata

Metadata

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions