Open
Description
Feature Description
I do not know if it is possible, but it may be nice to add draft model in more place.
If I am not wrong only the server can use it.
Add it to cli may be "simple", but if it can be nice to have it in "benchmark" .
Motivation
- see what speed we can have with draft model.
- what config/model work the best (This requires adding information about speculative decoding usage statistics.)