Research: How to integrate VITA 1.5 for multi-modal GGUF deployment?

### Research Stage

- [ ] Background Research (Let's try to avoid reinventing the wheel)
- [ ] Hypothesis Formed (How do you think this will work and it's effect?)
- [ ] Strategy / Implementation Forming
- [ ] Analysis of results
- [ ] Debrief / Documentation (So people in the future can learn from us)

### Previous existing literature and research

I'm trying to deploy a multi-modal model based on VITA-1.5, where:

The text backbone is the same as Qwen2.

The vision tower is InternViT-300M-448px from OpenGVLab.

Yesterday I noticed that convert_hf_to_gguf.py added a new class:

class InternVisionModel(VisionModel)

which is the same one used in vita's vision part
However:

There's no corresponding tensor name mapping in constants.py under MODEL_TENSORS.

There's no build function in llama_model.cpp (e.g., no build_internvit() ).

I’m not sure how to combine the vision and text parts into a single GGUF model so that llama.cpp can infer with both modalities.

My goal:
To deploy VITA-1.5 via llama.cpp and run image+text inference (similar to LLaVA / MobileVLM).

Questions:
What is the recommended way to combine Qwen2 text + InternViT vision into one GGUF model?

Will InternViTVisionModel support GGUF inference soon, or should I write the corresponding GGML graph manually?

### Hypothesis

_No response_

### Implementation

_No response_

### Analysis

_No response_

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Research: How to integrate VITA 1.5 for multi-modal GGUF deployment? #13520

Research Stage

Previous existing literature and research

Hypothesis

Implementation

Analysis

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Research: How to integrate VITA 1.5 for multi-modal GGUF deployment? #13520

Description

Research Stage

Previous existing literature and research

Hypothesis

Implementation

Analysis

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions