memory : migrate from llama_kv_cache to more generic llama_memory #14006

ggerganov · 2025-06-04T10:29:32Z

cont #13988

Merge llama_kv_cache into llama_memory_i
llama_kv_cache_unified now implements llama_memory_i
llama_kv_cache_recurrent now implements llama_memory_i
Add new llama_memory_ public API to libllama
The old llama_kv_self_* public API is now simply routing to the new llama_memory_ API and it will be deprecated in the next PR

TODO

Implement the new llama_memory_ public API

Next PRs

Deprecate the llama_kv_self_* public API in favor of the new llama_memory_ API

ggml-ci

ggerganov · 2025-06-05T06:37:40Z

src/llama-memory.h

+
+// general concept of LLM memory
+// the KV cache is a type of LLM memory, but there can be other types
+struct llama_memory_i {


Changed this from class to struct to be compatible with the C-header declaration.

slaren · 2025-06-05T11:55:15Z

src/llama-context.cpp

-    llama_kv_cache * kv_self = static_cast<llama_kv_cache *>(memory.get());
-    return kv_self;
+llama_memory_t llama_context::get_memory() const {
+    return static_cast<llama_memory_t>(memory.get());


This cast shouldn't be necessary.

slaren · 2025-06-05T11:57:40Z

src/llama-context.cpp

 llama_kv_cache * llama_get_kv_self(llama_context * ctx) {
-    return ctx->get_kv_self();
+    return static_cast<llama_kv_cache *>(ctx->get_memory());


I think this is not a safe cast, so it should be a checked with dynamic_cast

ggml-ci

…ml-org#14006) * memory : merge llama_kv_cache into llama_memory + new `llama_memory` API ggml-ci * context : fix casts ggml-ci

…mory (ggml-org#14006)" This reverts commit 7f37b6c.

ggerganov mentioned this pull request Jun 4, 2025

kv-cache : refactor the update/defrag mechanism #13988

Merged

1 task

Base automatically changed from gg/kv-cache-refactor-update to master June 4, 2025 15:58

ggerganov force-pushed the gg/llama-memory-public branch from fe4b1b3 to bca2671 Compare June 5, 2025 06:16

memory : merge llama_kv_cache into llama_memory + new llama_memory API

f149a8e

ggml-ci

ggerganov force-pushed the gg/llama-memory-public branch from bca2671 to f149a8e Compare June 5, 2025 06:36

ggerganov marked this pull request as ready for review June 5, 2025 06:36

ggerganov commented Jun 5, 2025

View reviewed changes

ggerganov requested a review from slaren June 5, 2025 06:38

slaren approved these changes Jun 5, 2025

View reviewed changes

context : fix casts

7babd98

ggml-ci

ggerganov merged commit 7f37b6c into master Jun 5, 2025
49 of 52 checks passed

ggerganov deleted the gg/llama-memory-public branch June 5, 2025 12:29

ggerganov mentioned this pull request Jun 5, 2025

llama : deprecate llama_kv_self_ API #14030

Merged

shefben added a commit to shefben/llama.cpp that referenced this pull request Jun 6, 2025

Revert "memory : migrate from llama_kv_cache to more generic llama_me…

e49a9ab

…mory (ggml-org#14006)" This reverts commit 7f37b6c.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

memory : migrate from llama_kv_cache to more generic llama_memory #14006

memory : migrate from llama_kv_cache to more generic llama_memory #14006

Uh oh!

ggerganov commented Jun 4, 2025 •

edited

Loading

Uh oh!

ggerganov Jun 5, 2025

Uh oh!

slaren Jun 5, 2025

Uh oh!

slaren Jun 5, 2025

Uh oh!

Uh oh!

Uh oh!

memory : migrate from llama_kv_cache to more generic llama_memory #14006

memory : migrate from llama_kv_cache to more generic llama_memory #14006

Uh oh!

Conversation

ggerganov commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO

Next PRs

Uh oh!

ggerganov Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

slaren Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

slaren Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ggerganov commented Jun 4, 2025 •

edited

Loading