Support Hybrid Models

Name and Version

Last commit

Operating systems

Linux

GGML backends

CUDA

Hardware

threadripper 7980x rtx 5090/w7900 dual slot

Models

https://developer.nvidia.com/blog/hymba-hybrid-head-architecture-boosts-small-language-model-performance/

Problem description & steps to reproduce

Hybrid models not supported:
Support:
Hymba
A hybrid attention mechanism combining local sliding window attention and global attention.
Grouped-query attention (GQA).
A mix of global and local rotary embeddings.

First Bad Commit

No response

Relevant log output

not load correctly tensors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Hybrid Models #12331

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support Hybrid Models #12331

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions