Skip to content

Support Hybrid Models #12331

Open
Open
@johnnynunez

Description

@johnnynunez

Name and Version

Last commit

Operating systems

Linux

GGML backends

CUDA

Hardware

threadripper 7980x rtx 5090/w7900 dual slot

Models

https://developer.nvidia.com/blog/hymba-hybrid-head-architecture-boosts-small-language-model-performance/

Problem description & steps to reproduce

Hybrid models not supported:
Support:
Hymba
A hybrid attention mechanism combining local sliding window attention and global attention.
Grouped-query attention (GQA).
A mix of global and local rotary embeddings.

First Bad Commit

No response

Relevant log output

not load correctly tensors

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions