Releases · ggml-org/llama.cpp

18 Oct 01:13

8138785

b6792 Latest

Latest

opencl: transposed gemm/gemv moe kernel with mxfp4,f32 (#16602)

* opencl: transposed gemm/gemv moe kernel with mxfp4,f32

* add restore kernel for moe transpose

* fix trailing whitespaces

* resolve compilation warnings

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-10-18T01:13:43Z
llama-b6792-bin-macos-arm64.zip

sha256:8407cc9853376873e6dceb4e0fb1540ec76e860edf39bef54f0685eb607bc134

10.4 MB 2025-10-18T01:13:51Z
llama-b6792-bin-macos-x64.zip

sha256:a7311ccd08963457f7f1cf1c2809ab903cab17df183bc31340cf7d611b56593d

27 MB 2025-10-18T01:13:52Z
llama-b6792-bin-ubuntu-vulkan-x64.zip

sha256:1b41651157dadb2ec252b787a972ed1089ed70b56f583dc8c77a5405ccc39ae6

25.9 MB 2025-10-18T01:13:53Z
llama-b6792-bin-ubuntu-x64.zip

sha256:cfa859346fc482f5022fc1f94de44da94b2f8e7bb7c5e2a6c12fda1c69958e7b

12.5 MB 2025-10-18T01:13:55Z
llama-b6792-bin-win-cpu-arm64.zip

sha256:87ad01a23efd9e9e4888a4fecf22b330259d28131c38b7c6145e5e0e2d6097db

10.6 MB 2025-10-18T01:13:55Z
llama-b6792-bin-win-cpu-x64.zip

sha256:b2bf91a4accca83602b4f2c8586450aee56614a99ecb4705f816cc5d48f4fd18

13.7 MB 2025-10-18T01:13:56Z
llama-b6792-bin-win-cuda-12.4-x64.zip

sha256:eecbbe6579096d980d36fdf30e7e78d064216849e13dd2f8154fd2c4a5e387c6

169 MB 2025-10-18T01:13:57Z
llama-b6792-bin-win-hip-radeon-x64.zip

sha256:76fd941f5ab38ddf1c3c3c4b6079afbd508a00bf25daf81874ef03b2dbf5ae4e

321 MB 2025-10-18T01:14:02Z
llama-b6792-bin-win-opencl-adreno-arm64.zip

sha256:526e692a7172a84f068bf4405c6efc1fb8e935792030b50182aa97e142a16e6e

11 MB 2025-10-18T01:14:11Z
Source code (zip)

2025-10-18T00:55:32Z
Source code (tar.gz)

2025-10-18T00:55:32Z

17 Oct 16:32

github-actions

b6791

66b0dbc

b6791

llama-model: fix insonsistent ctxs <-> bufs order (#16581)

Assets 15

17 Oct 16:21

github-actions

b6790

41386cf

b6790

rpc : report actual free memory (#16616)

* rpc : report actual free memory

Start reporting the free memory on every device instead of using
fixed values. Now llama-cli users can get a nice memory breakdown
when using RPC devices.

* drop --mem in rpc-server

Assets 15

17 Oct 12:54

github-actions

b6789

3d4e86b

b6789

vulkan: Add State Space Model (SSM) Operations Support (#16463)

* vulkan: implement SSM scan operation

Add State Space Model scan operation to the Vulkan backend.

Signed-off-by: Giuseppe Scrivano <[email protected]>

* vulkan: implement SSM conv operation

Add State Space Model conv operation to the Vulkan backend.

Signed-off-by: Giuseppe Scrivano <[email protected]>

---------

Signed-off-by: Giuseppe Scrivano <[email protected]>

Assets 15

17 Oct 10:33

github-actions

b6788

342c728

b6788

ggml : fix SpaceMit IME array out-of-bounds in task assignment (#16629)

Fix incorrect task-to-batch index calculation in the quantization phase.

The bug caused out-of-bounds access to qnbitgemm_args array when
compute_idx exceeded per_gemm_block_count_m, leading to invalid
pointer dereferences and SIGBUS errors.

Correctly map tasks to batches by dividing compute_idx by
per_gemm_block_count_m instead of block_size_m.

Example:
  batch_feature=1, gemm_m=30, block_size_m=4
  per_gemm_block_count_m = 8, task_count = 8

  Old: gemm_idx = 4/4 = 1 (out of bounds  New: gemm_idx = 4/8 = 0 (correct)

Tested on SpaceMit K1 RISC-V64 with qwen2.5:0.5b model.

Co-authored-by: muggle <[email protected]>

Assets 15

17 Oct 08:29

github-actions

b6786

b194915

b6786

vulkan: fix debug build (add_rms_len/data not found) (#16624)

Assets 15

17 Oct 07:28

github-actions

b6785

9ad4f19

b6785

metal : add `CONV_TRANSPOSE_2D` (#16542)

* initial: headers and metal-device.cpp updates

* adding conv_transpose_2d

* fix type

* fix type: int32->int64

* Update ggml/src/ggml-metal/ggml-metal.metal

Co-authored-by: Georgi Gerganov <[email protected]>

* Update ggml/src/ggml-metal/ggml-metal.metal

Co-authored-by: Georgi Gerganov <[email protected]>

* Update ggml/src/ggml-metal/ggml-metal.metal

Co-authored-by: Georgi Gerganov <[email protected]>

* add checks for src[0] and src[1]; add type checks

* Update ggml-metal.metal

Co-authored-by: Georgi Gerganov <[email protected]>

* add more tests, add optimization to threading

* add dynamic memory allocation in metal

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 15

17 Oct 06:25

github-actions

b6784

79967ec

b6784

grammar : use int64_t to avoid int overflows in int schema to grammar…

Assets 15

17 Oct 02:58

github-actions

b6783

ceff6bb

b6783

SYCL SET operator optimized for F32 tensors (#16350)

* SYCL/SET: implement operator + wire-up; docs/ops updates; element_wise & ggml-sycl changes

* sycl(SET): re-apply post-rebase; revert manual docs/ops.md; style cleanups

* move SET op to standalone file, GPU-only implementation

* Update SYCL SET operator for F32

* ci: fix editorconfig issues (LF endings, trailing spaces, final newline)

* fixed ggml-sycl.cpp

---------

Co-authored-by: Gitty Burstein <[email protected]>

Assets 15

16 Oct 17:40

github-actions

b6782

1bb4f43

b6782

mtmd : support home-cooked Mistral Small Omni (#14928)

Assets 15

Releases: ggml-org/llama.cpp

b6792

Uh oh!

b6791

Uh oh!

b6790

Uh oh!

b6789

Uh oh!

b6788

Uh oh!

b6786

Uh oh!

b6785

Uh oh!

b6784

Uh oh!

b6783

Uh oh!

b6782

Uh oh!