Feature Request: Improve model load time when using the RPC backend #12954

Open

Open

Feature Request: Improve model load time when using the RPC backend#12954

Assignees

Labels

Prerequisites

I am running the latest code. Mention the version if possible as well.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Load model faster when using one or several RPC servers

Motivation

The local cache of the rpc-server made things better but there is still room for improvements.

Possible Implementation

We may explore storing pre-computed hashes in GGUF and avoid loading the entire model on the main host.

Metadata

Assignees

rgerganov

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests