Open
Description
Name and Version
Last commit
Operating systems
Linux
GGML backends
CUDA
Hardware
threadripper 7980x rtx 5090/w7900 dual slot
Models
Problem description & steps to reproduce
Hybrid models not supported:
Support:
Hymba
A hybrid attention mechanism combining local sliding window attention and global attention.
Grouped-query attention (GQA).
A mix of global and local rotary embeddings.
First Bad Commit
No response
Relevant log output
not load correctly tensors