Skip to content

memory : migrate from llama_kv_cache to more generic llama_memory #14006

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 5, 2025

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Jun 4, 2025

cont #13988

  • Merge llama_kv_cache into llama_memory_i
  • llama_kv_cache_unified now implements llama_memory_i
  • llama_kv_cache_recurrent now implements llama_memory_i
  • Add new llama_memory_ public API to libllama
  • The old llama_kv_self_* public API is now simply routing to the new llama_memory_ API and it will be deprecated in the next PR

TODO

  • Implement the new llama_memory_ public API

Next PRs

  • Deprecate the llama_kv_self_* public API in favor of the new llama_memory_ API

Base automatically changed from gg/kv-cache-refactor-update to master June 4, 2025 15:58
@ggerganov ggerganov force-pushed the gg/llama-memory-public branch from fe4b1b3 to bca2671 Compare June 5, 2025 06:16
@ggerganov ggerganov force-pushed the gg/llama-memory-public branch from bca2671 to f149a8e Compare June 5, 2025 06:36
@ggerganov ggerganov marked this pull request as ready for review June 5, 2025 06:36

// general concept of LLM memory
// the KV cache is a type of LLM memory, but there can be other types
struct llama_memory_i {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this from class to struct to be compatible with the C-header declaration.

@ggerganov ggerganov requested a review from slaren June 5, 2025 06:38
llama_kv_cache * kv_self = static_cast<llama_kv_cache *>(memory.get());
return kv_self;
llama_memory_t llama_context::get_memory() const {
return static_cast<llama_memory_t>(memory.get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cast shouldn't be necessary.

llama_kv_cache * llama_get_kv_self(llama_context * ctx) {
return ctx->get_kv_self();
return static_cast<llama_kv_cache *>(ctx->get_memory());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not a safe cast, so it should be a checked with dynamic_cast

@ggerganov ggerganov merged commit 7f37b6c into master Jun 5, 2025
49 of 52 checks passed
@ggerganov ggerganov deleted the gg/llama-memory-public branch June 5, 2025 12:29
furyhawk pushed a commit to furyhawk/llama.cpp that referenced this pull request Jun 6, 2025
…ml-org#14006)

* memory : merge llama_kv_cache into llama_memory + new `llama_memory` API

ggml-ci

* context : fix casts

ggml-ci
shefben added a commit to shefben/llama.cpp that referenced this pull request Jun 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants