Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b6768
llama-quant: add support for mmproj (#16592) * llama-quant: add support for mmproj * Update src/llama.cpp Co-authored-by: Georgi Gerganov <[email protected]> * check prefix instead * small fix --------- Co-authored-by: Georgi Gerganov <[email protected]>
b6767
CUDA: Changing the CUDA scheduling strategy to spin (#16585) * CUDA set scheduling strategy to spinning for cc121 * Using prop.major and prop.minor, include HIP and MUSA * Exclude HIP and MUSA * Remove trailing whitespace Co-authored-by: Johannes Gäßler <[email protected]> * Remove empty line Co-authored-by: Johannes Gäßler <[email protected]> --------- Co-authored-by: Johannes Gäßler <[email protected]>
b6766
server : fix mtmd checkpoints (#16591)
b6765
metal : avoid using Metal's gpuAddress property (#16576) * metal : avoid using Metal's gpuAddress property * metal : fix rope kernels buffer check
b6764
vulkan: Add ACC_TYPE_VEC2 implementation (#16203) Signed-off-by: Stefan Savic <[email protected]> Co-authored-by: Stefan Savic <[email protected]>
b6763
CUDA + openCL: fix bug in accessing rms_norm->src while doing fusion …
b6762
vulkan: Support FA with K/V in F32 (#16543)
b6761
vulkan: Improve build time for MSVC (#16545) Enable CMP0147 so custom build steps (invoking vulkan-shader-gen) are run in parallel. Enable /MP so source files are compiled in parallel.
b6760
CUDA: enable FA for FP32 KV cache (#16546)
b6759
CUDA: use fastdiv + ggml_cuda_mad for mmvf (#16557) * CUDA: use fastdiv + ggml_cuda_mad for mmvf * use bf16 directly + fix formatting * Add exception for HIP code