[ML] Trained models: Warn users of the implication of a min_allocations=0 configuration

@peteharverson

Work to be done:

We need to guide/warn users (trained models and inference endpoints UIs) by including descriptive messages on the UI, informing them of the cost/availability tradeoff and the implications of configuration choices when deploying an NLP model. If users choose to optimize for cost, they may face temporary unavailability. If they optimize for availability, they may face some level of cost when idling.

A separate github issue will deal with the API error handling/descriptiveness for the same issue.

Context:

With the development of adaptive allocations, models can scale down to zero allocations for cost optimization, depending on the selected configuration.

Specifically, when setting up a model deployment through the trained models UI, users are presented with a choice for level of usage. If the user chooses low usage, min_allocations is set to zero to optimize for cost in the cost/availability tradeoff.

In that case, when no inference calls take place, the model will scale down to zero allocations for zero cost. After a while, if still idling, the ML node will be released and subsequent inference calls will be presented with unavailability errors until a node is spun up and allocated to serve these calls. Configurations with min_allocations=1 do not have this problem but they induce a small idling cost.

We know that some users are surprised by this behavior, including erroring in Search Playground. This is because before adaptive allocations, models were not scaling down and they were never facing this side-issue.

This problem will not be present on EIS.

cc @peteharverson

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Trained models: Warn users of the implication of a min_allocations=0 configuration #218631

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ML] Trained models: Warn users of the implication of a min_allocations=0 configuration #218631

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions