Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b6792
b6791
llama-model: fix insonsistent ctxs <-> bufs order (#16581)
b6790
rpc : report actual free memory (#16616) * rpc : report actual free memory Start reporting the free memory on every device instead of using fixed values. Now llama-cli users can get a nice memory breakdown when using RPC devices. * drop --mem in rpc-server
b6789
vulkan: Add State Space Model (SSM) Operations Support (#16463) * vulkan: implement SSM scan operation Add State Space Model scan operation to the Vulkan backend. Signed-off-by: Giuseppe Scrivano <[email protected]> * vulkan: implement SSM conv operation Add State Space Model conv operation to the Vulkan backend. Signed-off-by: Giuseppe Scrivano <[email protected]> --------- Signed-off-by: Giuseppe Scrivano <[email protected]>
b6788
ggml : fix SpaceMit IME array out-of-bounds in task assignment (#16629) Fix incorrect task-to-batch index calculation in the quantization phase. The bug caused out-of-bounds access to qnbitgemm_args array when compute_idx exceeded per_gemm_block_count_m, leading to invalid pointer dereferences and SIGBUS errors. Correctly map tasks to batches by dividing compute_idx by per_gemm_block_count_m instead of block_size_m. Example: batch_feature=1, gemm_m=30, block_size_m=4 per_gemm_block_count_m = 8, task_count = 8 Old: gemm_idx = 4/4 = 1 (out of bounds New: gemm_idx = 4/8 = 0 (correct) Tested on SpaceMit K1 RISC-V64 with qwen2.5:0.5b model. Co-authored-by: muggle <[email protected]>
b6786
vulkan: fix debug build (add_rms_len/data not found) (#16624)
b6785
metal : add `CONV_TRANSPOSE_2D` (#16542) * initial: headers and metal-device.cpp updates * adding conv_transpose_2d * fix type * fix type: int32->int64 * Update ggml/src/ggml-metal/ggml-metal.metal Co-authored-by: Georgi Gerganov <[email protected]> * Update ggml/src/ggml-metal/ggml-metal.metal Co-authored-by: Georgi Gerganov <[email protected]> * Update ggml/src/ggml-metal/ggml-metal.metal Co-authored-by: Georgi Gerganov <[email protected]> * add checks for src[0] and src[1]; add type checks * Update ggml-metal.metal Co-authored-by: Georgi Gerganov <[email protected]> * add more tests, add optimization to threading * add dynamic memory allocation in metal --------- Co-authored-by: Georgi Gerganov <[email protected]>
b6784
grammar : use int64_t to avoid int overflows in int schema to grammar…
b6783
SYCL SET operator optimized for F32 tensors (#16350) * SYCL/SET: implement operator + wire-up; docs/ops updates; element_wise & ggml-sycl changes * sycl(SET): re-apply post-rebase; revert manual docs/ops.md; style cleanups * move SET op to standalone file, GPU-only implementation * Update SYCL SET operator for F32 * ci: fix editorconfig issues (LF endings, trailing spaces, final newline) * fixed ggml-sycl.cpp --------- Co-authored-by: Gitty Burstein <[email protected]>
b6782
mtmd : support home-cooked Mistral Small Omni (#14928)