Skip to content

Releases: ggml-org/llama.cpp

b6768

15 Oct 13:20
3e3cb19
Compare
Choose a tag to compare
llama-quant: add support for mmproj (#16592)

* llama-quant: add support for mmproj

* Update src/llama.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* check prefix instead

* small fix

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b6767

15 Oct 12:22
5acd455
Compare
Choose a tag to compare
CUDA: Changing the CUDA scheduling strategy to spin (#16585)

* CUDA set scheduling strategy to spinning for cc121

* Using prop.major and prop.minor, include HIP and MUSA

* Exclude HIP and MUSA

* Remove trailing whitespace

Co-authored-by: Johannes Gäßler <[email protected]>

* Remove empty line

Co-authored-by: Johannes Gäßler <[email protected]>

---------

Co-authored-by: Johannes Gäßler <[email protected]>

b6766

15 Oct 10:52
554fd57
Compare
Choose a tag to compare
server : fix mtmd checkpoints (#16591)

b6765

14 Oct 18:26
fa882fd
Compare
Choose a tag to compare
metal : avoid using Metal's gpuAddress property (#16576)

* metal : avoid using Metal's gpuAddress property

* metal : fix rope kernels buffer check

b6764

14 Oct 17:48
ffa0590
Compare
Choose a tag to compare
vulkan: Add ACC_TYPE_VEC2 implementation (#16203)

Signed-off-by: Stefan Savic <[email protected]>
Co-authored-by: Stefan Savic <[email protected]>

b6763

14 Oct 15:19
120bf70
Compare
Choose a tag to compare
CUDA + openCL: fix bug in accessing rms_norm->src while doing fusion …

b6762

14 Oct 14:31
4258e0c
Compare
Choose a tag to compare
vulkan: Support FA with K/V in F32 (#16543)

b6761

14 Oct 13:32
7ea15bb
Compare
Choose a tag to compare
vulkan: Improve build time for MSVC (#16545)

Enable CMP0147 so custom build steps (invoking vulkan-shader-gen) are run in parallel.

Enable /MP so source files are compiled in parallel.

b6760

14 Oct 12:47
9c7185d
Compare
Choose a tag to compare
CUDA: enable FA for FP32 KV cache (#16546)

b6759

14 Oct 11:48
1ee9d0b
Compare
Choose a tag to compare
CUDA: use fastdiv + ggml_cuda_mad for mmvf (#16557)

* CUDA: use fastdiv + ggml_cuda_mad for mmvf

* use bf16 directly + fix formatting

* Add exception for HIP code