Releases · ggml-org/llama.cpp

16 Oct 17:40

1bb4f43

b6782 Latest

Latest

mtmd : support home-cooked Mistral Small Omni (#14928)

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-10-16T17:40:47Z
llama-b6782-bin-macos-arm64.zip

sha256:29ee095bff9c4e798d2c2a0fb87592eda86a73ef8c70cbd4f585daab9d3ca225

10.4 MB 2025-10-16T17:41:02Z
llama-b6782-bin-macos-x64.zip

sha256:5b7f4eea69b0cb37db3d2c12ec0668307bd6247ee71088b99bbcaa839ac0aa35

27 MB 2025-10-16T17:41:04Z
llama-b6782-bin-ubuntu-vulkan-x64.zip

sha256:160d50f8885c71dcf6ea251175e1b435b0c5e359fe027a6ac251051dab518db1

25.8 MB 2025-10-16T17:41:05Z
llama-b6782-bin-ubuntu-x64.zip

sha256:bbffc50c9d7b3706754a0ff93f6f39dca4e3b84d7ec7d724af19eccdb6f2c9c5

12.5 MB 2025-10-16T17:41:07Z
llama-b6782-bin-win-cpu-arm64.zip

sha256:08bc838e1ffe3d0404dff7e00ec5b32f07d2360c3b8439d2c27ab607dd348396

10.6 MB 2025-10-16T17:41:09Z
llama-b6782-bin-win-cpu-x64.zip

sha256:c29be4291618194137c31ae21cce3059f88cc0de6863a387ce31ed767b388e04

13.7 MB 2025-10-16T17:41:10Z
llama-b6782-bin-win-cuda-12.4-x64.zip

sha256:355820452c808e418295256ddc387e7a83a8fe11f02640ac55c5dc378e9a7260

169 MB 2025-10-16T17:41:11Z
llama-b6782-bin-win-hip-radeon-x64.zip

sha256:159c27019e0f0a3de724701160f2792c34d08c89803f5af03dbbef77a9fd5bed

321 MB 2025-10-16T17:41:19Z
llama-b6782-bin-win-opencl-adreno-arm64.zip

sha256:3028adffecf5348c2311a08e43457b0b36609ba9222da29d5989d8f1bef38476

11 MB 2025-10-16T17:41:32Z
Source code (zip)

2025-10-16T17:00:31Z
Source code (tar.gz)

2025-10-16T17:00:31Z

16 Oct 13:44

github-actions

b6780

b22572e

b6780

sycl : add ARANGE operator (#16362)

* SYCL: update element-wise ops and presets

* clean arange

* Re-trigger CI

---------

Co-authored-by: Gitty Burstein <[email protected]>

Assets 15

16 Oct 09:03

github-actions

b6779

7a50cf3

b6779

CANN: format code using .clang-format (#15863)

This commit applies .clang-format rules to all source files under the
ggml-cann directory to ensure consistent coding style and readability.
The .clang-format option `SortIncludes: false` has been set to disable
automatic reordering of include directives.
No functional changes are introduced.

Co-authored-by: hipudding <[email protected]>

Assets 15

16 Oct 05:38

github-actions

b6778

6f5d924

b6778

common : Update the docs on -t --threads (#16236)

* Update the docs on -t --threads

* Revert "Update the docs on -t --threads"

This reverts commit eba97345e2c88d8ca510abec87d00bf6b9b0e0c2.

* docs: clarify -t/--threads parameter uses CPU threads and defaults to all available cores

* Update arg.cpp

Assets 15

16 Oct 05:36

github-actions

b6777

adc9b60

b6777

ggml-cpu: replace putenv with setenv for const-correctness (#16573)

## Why it failed

When compiling with strict compiler flags (-Wwrite-strings -Werror=discarded-qualifiers),
the build fails with the following error:

```
cmake \
  -S . \
  -B ../llama.cpp.build \
  --preset=x64-linux-gcc-debug \
  -DCMAKE_INSTALL_PREFIX=/tmp/local \
  -DCMAKE_C_FLAGS="-Wwrite-strings -Werror=discarded-qualifiers" && \
cmake --build ../llama.cpp.build/
...
/home/otegami/work/cpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c: In function ‘ggml_cpu_init’:
/home/otegami/work/cpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3572:24: error: passing argument 1 of ‘putenv’ discards ‘const’ qualifier from pointer target type [-Werror=discarded-qualifiers]
 3572 |                 putenv("KMP_BLOCKTIME=200"); // 200ms
      |                        ^~~~~~~~~~~~~~~~~~~
In file included from /home/otegami/work/cpp/llama.cpp/ggml/src/./ggml-impl.h:10,
                 from /home/otegami/work/cpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu-impl.h:6,
                 from /home/otegami/work/cpp/llama.cpp/ggml/src/ggml-cpu/traits.h:3,
                 from /home/otegami/work/cpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:6:
/usr/include/stdlib.h:786:26: note: expected ‘char *’ but argument is of type ‘const char *’
  786 | extern int putenv (char *__string) __THROW __nonnull ((1));
      |                    ~~~~~~^~~~~~~~
cc1: some warnings being treated as errors
ninja: build stopped: subcommand failed.
```

The issue is that putenv() expects a non-const char * but receives a string literal (const char *).

## How to fix

This PR replaces putenv("KMP_BLOCKTIME=200") with setenv("KMP_BLOCKTIME", "200", 0).

Benefits of setenv():
- Accepts const char * parameters (no qualifier warnings)
- Makes copies of the strings (safer memory handling)
- The third parameter (0) ensures we don't overwrite if already set

Assets 15

16 Oct 05:03

github-actions

b6776

ee50ee1

b6776

SYCL: Add GGML_OP_MEAN operator support (#16009)

* SYCL: Add GGML_OP_MEAN operator support

* SYCL: Fix formatting for GGML_OP_MEAN case

* Update ggml/src/ggml-sycl/ggml-sycl.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

---------

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Assets 15

15 Oct 21:13

github-actions

b6774

466c191

b6774

cpu : add FLOOR, CEIL, ROUND and TRUNC unary operators (#16083)

* CPU: Add support for FLOOR,CEIL,ROUND and TRUNC unary operators

- Added the operators to unary op enum
- Implemented API functions
- Implemented forward and unary-op logic in CPU backend
- Updated ggml_get_n_tasks
- Updated operators names array and static_assert
- Updated docs and enabled automatic tests

* docs: add documentation for ggml_trunc and ggml_trunc_inplace in ggml.h

* chore: remove trailing whitespace from ggml.h

* Remove unresolved merge markers

* Apply review suggestions: cleanup formatting, enum order and leftover artifacts

* Regenerate ops.md using create_ops_docs.py

Assets 15

15 Oct 18:07

github-actions

b6773

0cb7a06

b6773

opencl: add q8_0 mm support (#16469)

* opencl: add mm_q8_0_f32

* opencl: fix data loading for incomplete tile

* opencl: use q8_0 mm for larger matrix

* opencl: add some tests to cover the path

Assets 15

15 Oct 15:19

github-actions

b6770

f4ce81c

b6770

metal: optimise `GGML_OP_SUM` (#16559)

* optimise GGML_OP_SUM

* add non-contiguous tests by permuting the input

* change tests to require full contiguity of OP_SUM

* cuda : add check GGML_OP_SUM

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 15

15 Oct 15:07

github-actions

b6769

17304cb

b6769

server : fix img token logs (#16595)

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b6782

Uh oh!

b6780

Uh oh!

b6779

Uh oh!

b6778

Uh oh!

b6777

Uh oh!

b6776

Uh oh!

b6774

Uh oh!

b6773

Uh oh!

b6770

Uh oh!

b6769

Uh oh!