Skip to content

Feature Request: add draft model in llama-bench and more. #13456

Open
@Djip007

Description

@Djip007

Feature Description

I do not know if it is possible, but it may be nice to add draft model in more place.
If I am not wrong only the server can use it.

Add it to cli may be "simple", but if it can be nice to have it in "benchmark" .

Motivation

  • see what speed we can have with draft model.
  • what config/model work the best (This requires adding information about speculative decoding usage statistics.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions