Skip to content

Releases: ggml-org/llama.cpp

b6792

18 Oct 01:13
8138785

Choose a tag to compare

opencl: transposed gemm/gemv moe kernel with mxfp4,f32 (#16602)

* opencl: transposed gemm/gemv moe kernel with mxfp4,f32

* add restore kernel for moe transpose

* fix trailing whitespaces

* resolve compilation warnings

b6791

17 Oct 16:32
66b0dbc

Choose a tag to compare

llama-model: fix insonsistent ctxs <-> bufs order (#16581)

b6790

17 Oct 16:21
41386cf

Choose a tag to compare

rpc : report actual free memory (#16616)

* rpc : report actual free memory

Start reporting the free memory on every device instead of using
fixed values. Now llama-cli users can get a nice memory breakdown
when using RPC devices.

* drop --mem in rpc-server

b6789

17 Oct 12:54
3d4e86b

Choose a tag to compare

vulkan: Add State Space Model (SSM) Operations Support (#16463)

* vulkan: implement SSM scan operation

Add State Space Model scan operation to the Vulkan backend.

Signed-off-by: Giuseppe Scrivano <[email protected]>

* vulkan: implement SSM conv operation

Add State Space Model conv operation to the Vulkan backend.

Signed-off-by: Giuseppe Scrivano <[email protected]>

---------

Signed-off-by: Giuseppe Scrivano <[email protected]>

b6788

17 Oct 10:33
342c728

Choose a tag to compare

ggml : fix SpaceMit IME array out-of-bounds in task assignment (#16629)

Fix incorrect task-to-batch index calculation in the quantization phase.

The bug caused out-of-bounds access to qnbitgemm_args array when
compute_idx exceeded per_gemm_block_count_m, leading to invalid
pointer dereferences and SIGBUS errors.

Correctly map tasks to batches by dividing compute_idx by
per_gemm_block_count_m instead of block_size_m.

Example:
  batch_feature=1, gemm_m=30, block_size_m=4
  per_gemm_block_count_m = 8, task_count = 8

  Old: gemm_idx = 4/4 = 1 (out of bounds  New: gemm_idx = 4/8 = 0 (correct)

Tested on SpaceMit K1 RISC-V64 with qwen2.5:0.5b model.

Co-authored-by: muggle <[email protected]>

b6786

17 Oct 08:29
b194915

Choose a tag to compare

vulkan: fix debug build (add_rms_len/data not found) (#16624)

b6785

17 Oct 07:28
9ad4f19

Choose a tag to compare

metal : add `CONV_TRANSPOSE_2D` (#16542)

* initial: headers and metal-device.cpp updates

* adding conv_transpose_2d

* fix type

* fix type: int32->int64

* Update ggml/src/ggml-metal/ggml-metal.metal

Co-authored-by: Georgi Gerganov <[email protected]>

* Update ggml/src/ggml-metal/ggml-metal.metal

Co-authored-by: Georgi Gerganov <[email protected]>

* Update ggml/src/ggml-metal/ggml-metal.metal

Co-authored-by: Georgi Gerganov <[email protected]>

* add checks for src[0] and src[1]; add type checks

* Update ggml-metal.metal

Co-authored-by: Georgi Gerganov <[email protected]>

* add more tests, add optimization to threading

* add dynamic memory allocation in metal

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b6784

17 Oct 06:25
79967ec

Choose a tag to compare

grammar : use int64_t to avoid int overflows in int schema to grammar…

b6783

17 Oct 02:58
ceff6bb

Choose a tag to compare

SYCL SET operator optimized for F32 tensors (#16350)

* SYCL/SET: implement operator + wire-up; docs/ops updates; element_wise & ggml-sycl changes

* sycl(SET): re-apply post-rebase; revert manual docs/ops.md; style cleanups

* move SET op to standalone file, GPU-only implementation

* Update SYCL SET operator for F32

* ci: fix editorconfig issues (LF endings, trailing spaces, final newline)

* fixed ggml-sycl.cpp

---------

Co-authored-by: Gitty Burstein <[email protected]>

b6782

16 Oct 17:40
1bb4f43

Choose a tag to compare

mtmd : support home-cooked Mistral Small Omni (#14928)