Releases · ggml-org/llama.cpp

15 Oct 13:20

3e3cb19

b6768 Latest

Latest

llama-quant: add support for mmproj (#16592)

* llama-quant: add support for mmproj

* Update src/llama.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* check prefix instead

* small fix

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-10-15T13:20:56Z
llama-b6768-bin-macos-arm64.zip

sha256:8f99a356848c5ecbdb01fd531eec8c777997132f742c79849852bd8f1e569c78

10.4 MB 2025-10-15T13:21:12Z
llama-b6768-bin-macos-x64.zip

sha256:d0efe623bb3ea82519e476ea1443b7b2868d449a910fddb53164f2420b5fe7cc

27 MB 2025-10-15T13:21:14Z
llama-b6768-bin-ubuntu-vulkan-x64.zip

sha256:5a322d569db2957318b0f8d75f75e2c874e1d619f05947e2f4d6018ac638c228

25.8 MB 2025-10-15T13:21:16Z
llama-b6768-bin-ubuntu-x64.zip

sha256:ae059b723fed5898098ec3f7229269e7f348ca3c9df4276f754d518667303566

12.5 MB 2025-10-15T13:21:18Z
llama-b6768-bin-win-cpu-arm64.zip

sha256:de23e6b58e35d04bb8422fff69551610e59aa8a36d862f62230074b846648ef9

10.6 MB 2025-10-15T13:21:20Z
llama-b6768-bin-win-cpu-x64.zip

sha256:c33d0bb56042d005bdda26584abb7e1ccac46c9849a3f057bddba98f41c7d60c

13.6 MB 2025-10-15T13:21:21Z
llama-b6768-bin-win-cuda-12.4-x64.zip

sha256:87019bbbdabc896a36d9cf25195bf6d5fcab64cc822c2cb3c4bbb6289a556fd7

169 MB 2025-10-15T13:21:23Z
llama-b6768-bin-win-hip-radeon-x64.zip

sha256:fc27071c596525bea83784e6ac7e5b9ff97b44bb728be5145885b62159e12d5d

321 MB 2025-10-15T13:21:32Z
llama-b6768-bin-win-opencl-adreno-arm64.zip

sha256:941580c9fa2ad9c8d32fd4317bf739cce9e40235e8b5fdf4391b8b93ce082291

11 MB 2025-10-15T13:21:44Z
Source code (zip)

2025-10-15T12:48:08Z
Source code (tar.gz)

2025-10-15T12:48:08Z

15 Oct 12:22

github-actions

b6767

5acd455

b6767

CUDA: Changing the CUDA scheduling strategy to spin (#16585)

* CUDA set scheduling strategy to spinning for cc121

* Using prop.major and prop.minor, include HIP and MUSA

* Exclude HIP and MUSA

* Remove trailing whitespace

Co-authored-by: Johannes Gäßler <[email protected]>

* Remove empty line

Co-authored-by: Johannes Gäßler <[email protected]>

---------

Co-authored-by: Johannes Gäßler <[email protected]>

Assets 15

15 Oct 10:52

github-actions

b6766

554fd57

b6766

server : fix mtmd checkpoints (#16591)

Assets 15

14 Oct 18:26

github-actions

b6765

fa882fd

b6765

metal : avoid using Metal's gpuAddress property (#16576)

* metal : avoid using Metal's gpuAddress property

* metal : fix rope kernels buffer check

Assets 15

14 Oct 17:48

github-actions

b6764

ffa0590

b6764

vulkan: Add ACC_TYPE_VEC2 implementation (#16203)

Signed-off-by: Stefan Savic <[email protected]>
Co-authored-by: Stefan Savic <[email protected]>

Assets 15

14 Oct 15:19

github-actions

b6763

120bf70

b6763

CUDA + openCL: fix bug in accessing rms_norm->src while doing fusion …

Assets 15

14 Oct 14:31

github-actions

b6762

4258e0c

b6762

vulkan: Support FA with K/V in F32 (#16543)

Assets 15

14 Oct 13:32

github-actions

b6761

7ea15bb

b6761

vulkan: Improve build time for MSVC (#16545)

Enable CMP0147 so custom build steps (invoking vulkan-shader-gen) are run in parallel.

Enable /MP so source files are compiled in parallel.

Assets 15

14 Oct 12:47

github-actions

b6760

9c7185d

b6760

CUDA: enable FA for FP32 KV cache (#16546)

Assets 15

14 Oct 11:48

github-actions

b6759

1ee9d0b

b6759

CUDA: use fastdiv + ggml_cuda_mad for mmvf (#16557)

* CUDA: use fastdiv + ggml_cuda_mad for mmvf

* use bf16 directly + fix formatting

* Add exception for HIP code

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b6768

Uh oh!

b6767

Uh oh!

b6766

Uh oh!

b6765

Uh oh!

b6764

Uh oh!

b6763

Uh oh!

b6762

Uh oh!

b6761

Uh oh!

b6760

Uh oh!

b6759

Uh oh!