Pulse · ggml-org/llama.cpp · GitHub

June 11, 2025 – June 18, 2025

Overview

82 Active pull requests

78 Active issues

47 Releases published by 1 person

b5632
published Jun 11, 2025
b5633
published Jun 11, 2025
b5634
published Jun 11, 2025
b5636
published Jun 11, 2025
b5637
published Jun 11, 2025
b5638
published Jun 11, 2025
b5639
published Jun 11, 2025
b5640
published Jun 11, 2025
b5641
published Jun 12, 2025
b5642
published Jun 12, 2025
b5644
published Jun 12, 2025
b5645
published Jun 12, 2025
b5646
published Jun 12, 2025
b5648
published Jun 12, 2025
b5649
published Jun 13, 2025
b5650
published Jun 13, 2025
b5651
published Jun 13, 2025
b5652
published Jun 13, 2025
b5653
published Jun 13, 2025
b5654
published Jun 13, 2025
b5655
published Jun 13, 2025
b5657
published Jun 13, 2025
b5659
published Jun 13, 2025
b5662
published Jun 13, 2025
b5664
published Jun 14, 2025
b5666
published Jun 15, 2025
b5667
published Jun 15, 2025
b5668
published Jun 15, 2025
b5669
published Jun 15, 2025
b5670
published Jun 15, 2025
b5671
published Jun 15, 2025
b5672
published Jun 15, 2025
b5673
published Jun 15, 2025
b5674
published Jun 15, 2025
b5675
published Jun 16, 2025
b5676
published Jun 16, 2025
b5679
published Jun 16, 2025
b5681
published Jun 16, 2025
b5682
published Jun 16, 2025
b5683
published Jun 16, 2025
b5684
published Jun 16, 2025
b5685
published Jun 16, 2025
b5686
published Jun 16, 2025
b5687
published Jun 17, 2025
b5688
published Jun 17, 2025
b5689
published Jun 17, 2025
b5693
published Jun 18, 2025

61 Pull requests merged by 25 people

mtmd : refactor llava-uhd preprocessing logic
#14247 merged Jun 18, 2025
llama-chat : fix multiple system messages for gemma, orion
#14246 merged Jun 18, 2025
convert : fix null head_dim AutoConfig regression
#14248 merged Jun 18, 2025
sync : ggml
#14255 merged Jun 18, 2025
cmake: remove shader-gen step-targets from ggml-vulkan
#14226 merged Jun 17, 2025
ggml-cpu : remove the weak alias trick
#14221 merged Jun 17, 2025
musa: fix build warning (unused variable)
#14231 merged Jun 17, 2025
common : suggest --jinja when autodetection fails
#14222 merged Jun 16, 2025
server : fix incorrect usage of llama_get_embeddings()
#14225 merged Jun 16, 2025
llama : add thread safety test
#14035 merged Jun 16, 2025
cmake: clean up external project logic for vulkan-shaders-gen
#14179 merged Jun 16, 2025
Add NeoBERT
#14164 merged Jun 16, 2025
HIP: disable rocwmma on gfx12 by default until rocm 7.0
#14202 merged Jun 16, 2025
llama : rework embeddings logic
#14208 merged Jun 16, 2025
ggml: Add Android support for GGML_CPU_ALL_VARIANTS
#14206 merged Jun 16, 2025
Remove arcee AFM change in convert_hf_to_gguf_update.py
#14207 merged Jun 16, 2025
Allow override when adding value to ggufwriter
#14194 merged Jun 16, 2025
vulkan: mutex around vkQueueSubmit
#14127 merged Jun 16, 2025
ggml-cpu : rework weak alias on apple targets
#14146 merged Jun 16, 2025
Add support for Arcee AI's upcoming AFM model
#14185 merged Jun 15, 2025
When listening on a unix domain socket don't print http:// and port
#14180 merged Jun 15, 2025
quantize: Use UINT32 if there's an INT KV override
#14197 merged Jun 15, 2025
CUDA/HIP: fix ssm_scan on devices where warp size is not 32
#14196 merged Jun 15, 2025
HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRONT_SIZE__
#14183 merged Jun 15, 2025
kv-cache : fix use-after-move of defrag info
#14189 merged Jun 15, 2025
llama-model : add dots.llm1 architecture support (#14044)
#14118 merged Jun 15, 2025
cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ
#14188 merged Jun 15, 2025
batch : auto-gen positions + verify multi-sequence input
#14177 merged Jun 15, 2025
remove WIP since PR has been merged
#13912 merged Jun 15, 2025
llama-chat : Do not throw when tool parsing fails
#14012 merged Jun 14, 2025
compare llama-bench: add option to plot
#14169 merged Jun 14, 2025
vocab : fix build
#14175 merged Jun 13, 2025
sycl: fix docker image
#14144 merged Jun 13, 2025
batch : add LLAMA_BATCH_DEBUG environment variable
#14172 merged Jun 13, 2025
Update multimodal.md
#14122 merged Jun 13, 2025
batch : rework llama_batch_allocr
#14153 merged Jun 13, 2025
readme : remove survey link
#14168 merged Jun 13, 2025
cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT
#14167 merged Jun 13, 2025
Make cls_b and cls_out_b optional in ranking
#14165 merged Jun 13, 2025
server : fix SWA condition for full context reprocess
#14163 merged Jun 13, 2025
sycl: Adding additional cpy dbg print output
#14034 merged Jun 13, 2025
sycl: Bump oneMath commit
#14152 merged Jun 13, 2025
Improve build-info.cpp generation
#14156 merged Jun 13, 2025
vocab : prevent heap overflow when vocab is too small
#14145 merged Jun 13, 2025
sycl: Remove not needed copy f16->f32 for dnnl mul mat
#14125 merged Jun 12, 2025
readme : remove project status link
#14149 merged Jun 12, 2025
server : re-enable SWA speculative decoding
#14131 merged Jun 12, 2025
context : simplify output counting logic during decode
#14142 merged Jun 12, 2025
batch : remove logits_all flag
#14141 merged Jun 12, 2025
cmake : handle whitepsaces in path during metal build
#14126 merged Jun 12, 2025
kv-cache : fix split_equal handling in unified implementation
#14130 merged Jun 12, 2025
context : round n_tokens to next multiple of n_seqs when reserving
#14140 merged Jun 12, 2025
common: fix issue with regex_escape routine on windows
#14133 merged Jun 11, 2025
Implement GGML_CPU_ALL_VARIANTS for ARM
#14080 merged Jun 11, 2025
chore : clean up relative source dir paths
#14128 merged Jun 11, 2025
tests : add test-tokenizers-repo
#14017 merged Jun 11, 2025
vulkan: Better thread-safety for command pools/buffers
#14116 merged Jun 11, 2025
webui: Wrap long numbers instead of infinite horizontal scroll
#14062 merged Jun 11, 2025
kv-cache : relax SWA masking condition
#14119 merged Jun 11, 2025
Pass --keep to llama-server
#14120 merged Jun 11, 2025
kv-cache : add LLAMA_KV_CACHE_DEBUG environment variable
#14121 merged Jun 11, 2025

21 Pull requests opened by 19 people

tests : add test-model-random
#14139 opened Jun 12, 2025
models/templates: add mistralai/Mistral-Small-3.1-24B-Instruct-2503 template with tool calling support
#14148 opened Jun 12, 2025
ggml : implement REGLU/GEGLU/SWIGLU ops
#14158 opened Jun 12, 2025
ggml : implement GLU for split up/gate
#14181 opened Jun 14, 2025
ci: re-enable rocm linux build, reduce the built targets to the ones currently available in rocblas
#14184 opened Jun 14, 2025
webui: save model name with conversation history (#13570)
#14192 opened Jun 15, 2025
gguf-py: Make sentencepiece optional
#14200 opened Jun 15, 2025
llama: fix compilation warning (#464)
#14209 opened Jun 16, 2025
sycl: Cleanup codepaths in Get Rows in sycl backend
#14215 opened Jun 16, 2025
ubatch : new splitting logic
#14217 opened Jun 16, 2025
tests : enhance llama-bench with separate timings (pp/gen t/s), added n_threads_batch
#14219 opened Jun 16, 2025
logit_bias: apply configurable escalating EOG bias at low n_remain
#14229 opened Jun 16, 2025
ggml: introduce GGML_NUMA_MIGRATE to optimize cross NUMA op computation
#14232 opened Jun 17, 2025
Mtmd: add a way to select device for vision encoder
#14236 opened Jun 17, 2025
MODEL: Falcon-H1 support
#14238 opened Jun 17, 2025
Add SmolLM3
#14240 opened Jun 17, 2025
server : add pidfile option
#14242 opened Jun 17, 2025
sycl: add usage of enqueue_functions extension
#14244 opened Jun 17, 2025
Vulkan: Fix host-pinned memory for large allocations
#14249 opened Jun 17, 2025
opencl: ref count `ggml_backend_opencl_context` and refactor profiling
#14254 opened Jun 18, 2025
ci : run slow tests first to precache all models
#14256 opened Jun 18, 2025

52 Issues closed by 20 people

Eval bug: Llama 4 Scout/Maverick crash when processing images with certain aspect ratio
#13827 closed Jun 18, 2025
Misc. bug: llama-server builds possibly erroneous prompt for gemma 3
#14151 closed Jun 18, 2025
Eval bug: Unexpected failure converting Mistral 7B v0.2 to f32 GGUF
#13976 closed Jun 18, 2025
Misc. bug: Compilation with openCL on latest build
#13300 closed Jun 18, 2025
Eval bug: Bad output from Qwen3-Embedding-0.6B
#14234 closed Jun 17, 2025
Compile bug: vulkan-shaders-gen is installed into the build location instead of $PREFIX/bin/vulkan-shaders-gen
#14190 closed Jun 17, 2025
Thad
#14241 closed Jun 17, 2025
W
#14239 closed Jun 17, 2025
Misc. bug: struct.error during GGUF conversion of Mistral-Instruct with convert_hf_to_gguf.py
#14243 closed Jun 17, 2025
Misc. bug: Performance regression on aarch64 q4_0
#14134 closed Jun 17, 2025
Generated thought process not shown on web ui for Qwen 3
#14199 closed Jun 17, 2025
Misc. bug: CUDA error: device kernel image is invalid (Quadro RTX 8000)
#12717 closed Jun 17, 2025
Slow token generation speed of Gemma 3 QAT Models
#13048 closed Jun 17, 2025
Feature Request: Support multimodal LLMs such as Qwen2.5-VL as embedding models
#13247 closed Jun 17, 2025
Compile bug: paths with spaces fail on Unix with Vulkan backend
#13288 closed Jun 17, 2025
Misc. bug:
#14223 closed Jun 16, 2025
Misc. feature: llama-cli support for solar-10.7b-instruct
#14173 closed Jun 16, 2025
Eval bug: Error in trying to use llama-server with Qwen3-Embedding-0.6B-GGUF
#14204 closed Jun 16, 2025
Eval bug: llama-server embedding responses {"error":{"code":500,"message":"Invalid input batch.","type":"server_error"}}
#14210 closed Jun 16, 2025
Compile bug: since 5615 build on PowerPC fails with ICE: `ggml-cpu-impl.h: internal compiler error: in assemble_alias, at varasm.cc:6435`
#14138 closed Jun 16, 2025
Feature Request: (webui) Implement a experimental features on webui
#11662 closed Jun 16, 2025
Misc. bug: convert_hf_to_gguf.py: ValueError: Can not map tensor 'model.layers.0.mlp.down_proj.SCB'
#12923 closed Jun 16, 2025
Misc. bug: (clip.cpp) q8_0 mmproj is broken on gemma 3
#13025 closed Jun 16, 2025
Eval bug: llama-server stays in unresponsive state- CUDA error: out of memory -
#13085 closed Jun 16, 2025
Misc. bug: OpenCL: Issue with Adreno 610
#13115 closed Jun 16, 2025
Eval bug: -sm row causes GGML_ASSERT fail in Llama 4 Scout
#13240 closed Jun 16, 2025
Misc. bug: terminate called after throwing an instance of 'vk::DeviceLostError'
#13248 closed Jun 16, 2025
Eval bug: sentencepiece tokenizer generates incorrect tokens
#13256 closed Jun 16, 2025
Misc. bug: the output file of llama-quantize is not gguf format
#13258 closed Jun 16, 2025
Misc. bug: Server does not always cancel requests for disconnected connections
#13262 closed Jun 16, 2025
Feature Request: add to llama-bench device info reporting of "bf16:1", if built with VK_KHR_bfloat16 support and driver also supports it..
#13274 closed Jun 16, 2025
Compile bug: RocmWMMA doesn't work
#14193 closed Jun 15, 2025
Feature Request: dots.llm1 model support
#14044 closed Jun 15, 2025
Misc. bug: xcframework does not contain support for Catalyst
#12751 closed Jun 15, 2025
Eval bug: Can't utilize all 16 threads / 8 CPU cores for prompt processing when using llama-server. works fine with llama-cli
#13197 closed Jun 14, 2025
Compile bug: llama-vocab.cpp Error
#14176 closed Jun 13, 2025
Eval bug: Command-A forces full-prompt re-processing due to lack of cache data
#14157 closed Jun 13, 2025
Misc. bug: The model's reasoning performance has significantly decreased despite using different versions of the same model architecture, identical parameters, and the same set of questions.
#12816 closed Jun 13, 2025
Compile bug: Vulkan Cross compile for arm64
#13068 closed Jun 13, 2025
Misc. bug: Shared libraries don't properly contain /common/ functions
#13156 closed Jun 13, 2025
Eval bug: Unreadable output when using qwen2-vl model.
#13165 closed Jun 13, 2025
Misc. bug: llama-parallel segmentation fault
#13172 closed Jun 13, 2025
Eval bug: Persistent <think> Tags in Qwen3-32B Output Despite enable_thinking: False and --reasoning-format none in llama.cpp
#13189 closed Jun 13, 2025
Eval bug: Qwen models lost ability to think
#14147 closed Jun 12, 2025
SYCL fails to initialize unless iGPU is disabled (Intel Arc A770 + i5-9500)
#13775 closed Jun 12, 2025
Misc. bug: Using draft model with Gemma producing error "get_logits_ith: invalid logits id 0"
#13963 closed Jun 12, 2025
Compile bug: [macOS] Metal shader compilation produces a binary but with non-functional Metal support due to corrupted Metal source embedding
#14108 closed Jun 12, 2025
Misc. bug: Flash Attention not working on CDNA3 ROCm 6.4 MI300
#13145 closed Jun 12, 2025
Compile bug: hipcc is ran with host C++ compiler flags
#14136 closed Jun 11, 2025
Misc. bug: `test-chat` fails on x86 windows builds but works everywhere else
#14112 closed Jun 11, 2025
Can't llama-quantize Command A unless I rollback
#14054 closed Jun 11, 2025
Misc. bug: 10 Image maximum?
#14111 closed Jun 11, 2025

26 Issues opened by 26 people

Misc. bug: [CANN] memory leaky using CANN as backend
#14257 opened Jun 18, 2025
Eval bug: exmaple llama-simple-chat run failed in Android
#14253 opened Jun 18, 2025
Feature Request: fix handling of Qwen3-Embedding-0.6B input to add EOS token
#14252 opened Jun 18, 2025
Misc. bug: prompt as pasted content in the server
#14251 opened Jun 17, 2025
Llama 4 mmproj fails `unable to find tensor mm.model.fc.weight`
#14237 opened Jun 17, 2025
Misc. bug: llama-server slower on 4bit quantized model with f470bc36bed
#14235 opened Jun 17, 2025
Misc. bug: weird cursor placement in the web UI
#14233 opened Jun 17, 2025
Feature Request: update Windows actions builder to use *LATEST* Vulkan SDK (1.4.313 SDK) for VK_KHR_shader_bfloat16 support (like Linux builds)....
#14230 opened Jun 17, 2025
Eval bug: Command-A generates a single repeating token when using split mode row on P40
#14228 opened Jun 16, 2025
Misc. bug: Complex tool calling schema causes an "Unrecognized Schema" exception
#14227 opened Jun 16, 2025
Feature Request: Add --no-warmup to llama-bench
#14224 opened Jun 16, 2025
Misc. bug: OAI response_format json_schema and json_object not applied with Llama 3.x models
#14218 opened Jun 16, 2025
Feature Request: llama-server: a flag for limiting input image size
#14216 opened Jun 16, 2025
Eval bug: RWKV inference with llama-parallel gets wrong output with lmhead offloaded to GPU
#14211 opened Jun 16, 2025
Misc. bug: LLAMA-SERVER is 40% slower than LLAMA-CLI when using identical parameters including -ot option for tensor offloading
#14201 opened Jun 15, 2025
Misc. bug: full-cuda docker build needs ldconfig before launching llama-*
#14195 opened Jun 15, 2025
Misc. bug: [Windows] GPU layers/tensors still consume system memory after load when mmap = true
#14187 opened Jun 15, 2025
Misc. bug: evaluate_and_capture_cuda_graph NULL POINTER DEREFERENCE
#14186 opened Jun 15, 2025
Misc. bug: Failure to allocate buffer with ROCm 6.4
#14178 opened Jun 13, 2025
Eval bug: build with backend-cpu，run llama-server report load_backend: failed to find ggml_backend_init in /home/ubutnu/llama.cpp-master/build/bin/libggml-cpu.so
#14160 opened Jun 13, 2025
prismatic-vlms to gguf?
#14159 opened Jun 13, 2025
Compile bug: HIP compile fails during linking stage, undefined reference error repeats
#14155 opened Jun 12, 2025
Research: mmap eviction
#14154 opened Jun 12, 2025
Metrics should not include : in Prometheus metric names
#14150 opened Jun 12, 2025
Misc. bug: Gemma3 multimodal (or all VL models?): </think> tag in the image or PDF text breaks prompt processing (or token generation?)
#14143 opened Jun 12, 2025
Misc. bug: llama-server drops multi-part content for final assistant message
#14137 opened Jun 12, 2025

67 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Hybrid recurrent cache
#13979 commented on Jun 18, 2025 • 54 new comments
llama: Attempt to add ModernBert
#14014 commented on Jun 16, 2025 • 24 new comments
finetune.cpp command-line arg
#13873 commented on Jun 17, 2025 • 23 new comments
ggml: aarch64: Implement SVE Kernels for Int 8 Quantization
#14117 commented on Jun 18, 2025 • 3 new comments
webui: add server info to chat message
#14065 commented on Jun 13, 2025 • 2 new comments
Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client
#13196 commented on Jun 14, 2025 • 2 new comments
scripts: Fix remote option in Windows (#14102)
#14100 commented on Jun 13, 2025 • 1 new comment
ggml-cpu-aarch64: Fix compilation issues
#11745 commented on Jun 17, 2025 • 0 new comments
llama : initial Mamba-2 support
#9126 commented on Jun 18, 2025 • 0 new comments
ggml : add WebGPU backend
#7773 commented on Jun 18, 2025 • 0 new comments
Misc. bug: linux/arm64 does not exist for the server docker image
#13891 commented on Jun 18, 2025 • 0 new comments
Misc. bug: --split-mode none ≠ --tensor-split 100,0,0 (all layers on GPU0)
#13612 commented on Jun 18, 2025 • 0 new comments
llama_model_load: error loading model: error loading model vocabulary: std::bad_cast
#13613 commented on Jun 18, 2025 • 0 new comments
Compile bug: tools build failing
#13614 commented on Jun 18, 2025 • 0 new comments
Feature Request: update readme for ideal MOE tensor override calculation
#13616 commented on Jun 18, 2025 • 0 new comments
Misc. bug: batch in the mtmd-cli.cpp not freed
#13620 commented on Jun 18, 2025 • 0 new comments
Feature Request: Falcon-H1
#13681 commented on Jun 17, 2025 • 0 new comments
Eval bug: Error running multiple contexts from multiple threads at the same time with Vulkan
#11371 commented on Jun 17, 2025 • 0 new comments
android built on GPU cannot comparable with CPU?
#13910 commented on Jun 17, 2025 • 0 new comments
Feature Request: Granite 4 Support
#13275 commented on Jun 16, 2025 • 0 new comments
Eval bug: I just finetuned gpt2 model with lora and save it to gguf file but not properly worked
#13489 commented on Jun 13, 2025 • 0 new comments
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on Jun 18, 2025 • 0 new comments
Fix rocWMMA build documentation
#12243 commented on Jun 13, 2025 • 0 new comments
imatrix: add option to display importance score statistics for a given imatrix file
#12718 commented on Jun 14, 2025 • 0 new comments
quantize: Handle user-defined pruning of whole layers (blocks)
#13037 commented on Jun 14, 2025 • 0 new comments
feat(server): Add tool call support to WebUI (LLama Server)
#13501 commented on Jun 16, 2025 • 0 new comments
[CUDA backend ONLY] Use just K-cache for MLA + FA: 47% saving on KV-cache size
#13529 commented on Jun 12, 2025 • 0 new comments
Granite Four
#13550 commented on Jun 16, 2025 • 0 new comments
Move page cache via mbind to prevent cross-NUMA access
#13731 commented on Jun 18, 2025 • 0 new comments
Add support for VK_EXT_debug_utils to add labels to Vulkan objects.
#13792 commented on Jun 18, 2025 • 0 new comments
[CANN]:Replace aclrtMemsetSync with InplaceZero operator for zero tensor creation
#14002 commented on Jun 17, 2025 • 0 new comments
llama : support qwen3 rerank and embeddings
#14029 commented on Jun 14, 2025 • 0 new comments
llama: automatically set runtime parameters such as --n-gpu-layers to fit VRAM
#14067 commented on Jun 13, 2025 • 0 new comments
server: add model alias presets
#14083 commented on Jun 17, 2025 • 0 new comments
Eval bug: BGE-M3 Embedding model is not accessible
#13494 commented on Jun 13, 2025 • 0 new comments
Misc. bug: llama-cli stopped starting in release b4191 (c9b00a7)
#13498 commented on Jun 13, 2025 • 0 new comments
Feature Request: Apple just release Fast-VLM, a very promising set of multimodal language models
#13512 commented on Jun 13, 2025 • 0 new comments
Misc. bug: llama-server webui with --jinja flag does not show thinking when using reasoning models
#14007 commented on Jun 12, 2025 • 0 new comments
Eval bug: (MAC) fail in `GGML_METAL_ADD_KERNEL(GGML_METAL_KERNEL_TYPE_FLASH_ATTN_EXT_Q8_0_H96, flash_attn_ext_q8_0_h96, has_simdgroup_mm);`
#14110 commented on Jun 12, 2025 • 0 new comments
Feature Request: Qwen 2.5 VL
#11483 commented on Jun 12, 2025 • 0 new comments
Misc. bug: Model not loaded on Android with NDK
#13399 commented on Jun 12, 2025 • 0 new comments
Eval bug: I cannot run llama 405b on CPU
#13475 commented on Jun 12, 2025 • 0 new comments
web UI either doesn't scroll or jumps to the wrong element
#13479 commented on Jun 12, 2025 • 0 new comments
Partial offload support for training
#13486 commented on Jun 12, 2025 • 0 new comments
Vulkan Runner Frequent Crashing under workload
#14105 commented on Jun 11, 2025 • 0 new comments
Misc. bug: Server/Chat parallel tool calling not working
#14101 commented on Jun 11, 2025 • 0 new comments
Eval bug: Abort is called in a thread from a custom thread pool during a llama_decode call
#13990 commented on Jun 11, 2025 • 0 new comments
Misc. bug: --cache-reuse no longer seems to be caching prompt prefixes
#14113 commented on Jun 11, 2025 • 0 new comments
tutorials : list for llama.cpp
#13523 commented on Jun 11, 2025 • 0 new comments
Misc. bug: "llama_context_params::swa_full = true" causes very large RAM/VRAM usage
#14123 commented on Jun 11, 2025 • 0 new comments
Eval bug: Qwen2.5-VL-7B-Instruct returns extremely inaccurate bbox coordinates
#13694 commented on Jun 16, 2025 • 0 new comments
Misc. bug: Stuck while loading the model
#14114 commented on Jun 16, 2025 • 0 new comments
Feature Request: Support Codestral Mamba
#8519 commented on Jun 16, 2025 • 0 new comments
Eval bug: multimodal llama-gemma3-cli gives nonsensical outputs when used with Vulkan
#13046 commented on Jun 16, 2025 • 0 new comments
Misc. bug: Potential out of bound in rerank
#13549 commented on Jun 16, 2025 • 0 new comments
Misc. bug: GGML_ASSERT(view_src == NULL || data_size == 0 || data_size + view_offs <= ggml_nbytes(view_src)) failed
#13581 commented on Jun 16, 2025 • 0 new comments
Eval bug:GGUF Conversion from LLaVA 1.6(LLaVA NeXT) doesn't work
#13593 commented on Jun 16, 2025 • 0 new comments
Misc. bug: ROCm images cannot be found
#11913 commented on Jun 15, 2025 • 0 new comments
Eval bug: Weight repacking for AVX2 block interleaving is very slow and NUMA unfriendly
#12759 commented on Jun 15, 2025 • 0 new comments
Feature Proposal: Server Model Switching at Runtime
#13027 commented on Jun 15, 2025 • 0 new comments
LoRA training example
#13485 commented on Jun 15, 2025 • 0 new comments
Feature request: Graphical GGUF viewer
#6715 commented on Jun 14, 2025 • 0 new comments
(Discussion) Improve usability of llama-server
#13367 commented on Jun 14, 2025 • 0 new comments
Research: How to integrate VITA 1.5 for multi-modal GGUF deployment?
#13520 commented on Jun 14, 2025 • 0 new comments
Misc. bug: -sm row results in gibberish output on HIP (ROCm 6.3.3)
#13545 commented on Jun 14, 2025 • 0 new comments
Feature Request: XiaomiMiMo/MiMo-7B-RL
#13218 commented on Jun 13, 2025 • 0 new comments
Why mul_mat in ggml slower than llama.cpp?
#13473 commented on Jun 13, 2025 • 0 new comments