-
Notifications
You must be signed in to change notification settings - Fork 12.1k
Insights: ggml-org/llama.cpp
Overview
Could not load contribution data
Please try again later
47 Releases published by 1 person
-
b5632
published
Jun 11, 2025 -
b5633
published
Jun 11, 2025 -
b5634
published
Jun 11, 2025 -
b5636
published
Jun 11, 2025 -
b5637
published
Jun 11, 2025 -
b5638
published
Jun 11, 2025 -
b5639
published
Jun 11, 2025 -
b5640
published
Jun 11, 2025 -
b5641
published
Jun 12, 2025 -
b5642
published
Jun 12, 2025 -
b5644
published
Jun 12, 2025 -
b5645
published
Jun 12, 2025 -
b5646
published
Jun 12, 2025 -
b5648
published
Jun 12, 2025 -
b5649
published
Jun 13, 2025 -
b5650
published
Jun 13, 2025 -
b5651
published
Jun 13, 2025 -
b5652
published
Jun 13, 2025 -
b5653
published
Jun 13, 2025 -
b5654
published
Jun 13, 2025 -
b5655
published
Jun 13, 2025 -
b5657
published
Jun 13, 2025 -
b5659
published
Jun 13, 2025 -
b5662
published
Jun 13, 2025 -
b5664
published
Jun 14, 2025 -
b5666
published
Jun 15, 2025 -
b5667
published
Jun 15, 2025 -
b5668
published
Jun 15, 2025 -
b5669
published
Jun 15, 2025 -
b5670
published
Jun 15, 2025 -
b5671
published
Jun 15, 2025 -
b5672
published
Jun 15, 2025 -
b5673
published
Jun 15, 2025 -
b5674
published
Jun 15, 2025 -
b5675
published
Jun 16, 2025 -
b5676
published
Jun 16, 2025 -
b5679
published
Jun 16, 2025 -
b5681
published
Jun 16, 2025 -
b5682
published
Jun 16, 2025 -
b5683
published
Jun 16, 2025 -
b5684
published
Jun 16, 2025 -
b5685
published
Jun 16, 2025 -
b5686
published
Jun 16, 2025 -
b5687
published
Jun 17, 2025 -
b5688
published
Jun 17, 2025 -
b5689
published
Jun 17, 2025 -
b5693
published
Jun 18, 2025
61 Pull requests merged by 25 people
-
mtmd : refactor llava-uhd preprocessing logic
#14247 merged
Jun 18, 2025 -
llama-chat : fix multiple system messages for gemma, orion
#14246 merged
Jun 18, 2025 -
convert : fix null head_dim AutoConfig regression
#14248 merged
Jun 18, 2025 -
sync : ggml
#14255 merged
Jun 18, 2025 -
cmake: remove shader-gen step-targets from ggml-vulkan
#14226 merged
Jun 17, 2025 -
ggml-cpu : remove the weak alias trick
#14221 merged
Jun 17, 2025 -
musa: fix build warning (unused variable)
#14231 merged
Jun 17, 2025 -
common : suggest --jinja when autodetection fails
#14222 merged
Jun 16, 2025 -
server : fix incorrect usage of llama_get_embeddings()
#14225 merged
Jun 16, 2025 -
llama : add thread safety test
#14035 merged
Jun 16, 2025 -
cmake: clean up external project logic for vulkan-shaders-gen
#14179 merged
Jun 16, 2025 -
Add NeoBERT
#14164 merged
Jun 16, 2025 -
HIP: disable rocwmma on gfx12 by default until rocm 7.0
#14202 merged
Jun 16, 2025 -
llama : rework embeddings logic
#14208 merged
Jun 16, 2025 -
ggml: Add Android support for GGML_CPU_ALL_VARIANTS
#14206 merged
Jun 16, 2025 -
Remove arcee AFM change in convert_hf_to_gguf_update.py
#14207 merged
Jun 16, 2025 -
Allow override when adding value to ggufwriter
#14194 merged
Jun 16, 2025 -
vulkan: mutex around vkQueueSubmit
#14127 merged
Jun 16, 2025 -
ggml-cpu : rework weak alias on apple targets
#14146 merged
Jun 16, 2025 -
Add support for Arcee AI's upcoming AFM model
#14185 merged
Jun 15, 2025 -
When listening on a unix domain socket don't print http:// and port
#14180 merged
Jun 15, 2025 -
quantize: Use UINT32 if there's an INT KV override
#14197 merged
Jun 15, 2025 -
CUDA/HIP: fix ssm_scan on devices where warp size is not 32
#14196 merged
Jun 15, 2025 -
HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRONT_SIZE__
#14183 merged
Jun 15, 2025 -
kv-cache : fix use-after-move of defrag info
#14189 merged
Jun 15, 2025 -
llama-model : add dots.llm1 architecture support (#14044)
#14118 merged
Jun 15, 2025 -
cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ
#14188 merged
Jun 15, 2025 -
batch : auto-gen positions + verify multi-sequence input
#14177 merged
Jun 15, 2025 -
remove WIP since PR has been merged
#13912 merged
Jun 15, 2025 -
llama-chat : Do not throw when tool parsing fails
#14012 merged
Jun 14, 2025 -
compare llama-bench: add option to plot
#14169 merged
Jun 14, 2025 -
vocab : fix build
#14175 merged
Jun 13, 2025 -
sycl: fix docker image
#14144 merged
Jun 13, 2025 -
batch : add LLAMA_BATCH_DEBUG environment variable
#14172 merged
Jun 13, 2025 -
Update multimodal.md
#14122 merged
Jun 13, 2025 -
batch : rework llama_batch_allocr
#14153 merged
Jun 13, 2025 -
readme : remove survey link
#14168 merged
Jun 13, 2025 -
cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT
#14167 merged
Jun 13, 2025 -
Make cls_b and cls_out_b optional in ranking
#14165 merged
Jun 13, 2025 -
server : fix SWA condition for full context reprocess
#14163 merged
Jun 13, 2025 -
sycl: Adding additional cpy dbg print output
#14034 merged
Jun 13, 2025 -
sycl: Bump oneMath commit
#14152 merged
Jun 13, 2025 -
Improve build-info.cpp generation
#14156 merged
Jun 13, 2025 -
vocab : prevent heap overflow when vocab is too small
#14145 merged
Jun 13, 2025 -
sycl: Remove not needed copy f16->f32 for dnnl mul mat
#14125 merged
Jun 12, 2025 -
readme : remove project status link
#14149 merged
Jun 12, 2025 -
server : re-enable SWA speculative decoding
#14131 merged
Jun 12, 2025 -
context : simplify output counting logic during decode
#14142 merged
Jun 12, 2025 -
batch : remove logits_all flag
#14141 merged
Jun 12, 2025 -
cmake : handle whitepsaces in path during metal build
#14126 merged
Jun 12, 2025 -
kv-cache : fix split_equal handling in unified implementation
#14130 merged
Jun 12, 2025 -
context : round n_tokens to next multiple of n_seqs when reserving
#14140 merged
Jun 12, 2025 -
common: fix issue with regex_escape routine on windows
#14133 merged
Jun 11, 2025 -
Implement GGML_CPU_ALL_VARIANTS for ARM
#14080 merged
Jun 11, 2025 -
chore : clean up relative source dir paths
#14128 merged
Jun 11, 2025 -
tests : add test-tokenizers-repo
#14017 merged
Jun 11, 2025 -
vulkan: Better thread-safety for command pools/buffers
#14116 merged
Jun 11, 2025 -
webui: Wrap long numbers instead of infinite horizontal scroll
#14062 merged
Jun 11, 2025 -
kv-cache : relax SWA masking condition
#14119 merged
Jun 11, 2025 -
Pass --keep to llama-server
#14120 merged
Jun 11, 2025 -
kv-cache : add LLAMA_KV_CACHE_DEBUG environment variable
#14121 merged
Jun 11, 2025
21 Pull requests opened by 19 people
-
tests : add test-model-random
#14139 opened
Jun 12, 2025 -
models/templates: add mistralai/Mistral-Small-3.1-24B-Instruct-2503 template with tool calling support
#14148 opened
Jun 12, 2025 -
ggml : implement REGLU/GEGLU/SWIGLU ops
#14158 opened
Jun 12, 2025 -
ggml : implement GLU for split up/gate
#14181 opened
Jun 14, 2025 -
ci: re-enable rocm linux build, reduce the built targets to the ones currently available in rocblas
#14184 opened
Jun 14, 2025 -
webui: save model name with conversation history (#13570)
#14192 opened
Jun 15, 2025 -
gguf-py: Make sentencepiece optional
#14200 opened
Jun 15, 2025 -
llama: fix compilation warning (#464)
#14209 opened
Jun 16, 2025 -
sycl: Cleanup codepaths in Get Rows in sycl backend
#14215 opened
Jun 16, 2025 -
ubatch : new splitting logic
#14217 opened
Jun 16, 2025 -
tests : enhance llama-bench with separate timings (pp/gen t/s), added n_threads_batch
#14219 opened
Jun 16, 2025 -
logit_bias: apply configurable escalating EOG bias at low n_remain
#14229 opened
Jun 16, 2025 -
ggml: introduce GGML_NUMA_MIGRATE to optimize cross NUMA op computation
#14232 opened
Jun 17, 2025 -
Mtmd: add a way to select device for vision encoder
#14236 opened
Jun 17, 2025 -
MODEL: Falcon-H1 support
#14238 opened
Jun 17, 2025 -
Add SmolLM3
#14240 opened
Jun 17, 2025 -
server : add pidfile option
#14242 opened
Jun 17, 2025 -
sycl: add usage of enqueue_functions extension
#14244 opened
Jun 17, 2025 -
Vulkan: Fix host-pinned memory for large allocations
#14249 opened
Jun 17, 2025 -
opencl: ref count `ggml_backend_opencl_context` and refactor profiling
#14254 opened
Jun 18, 2025 -
ci : run slow tests first to precache all models
#14256 opened
Jun 18, 2025
52 Issues closed by 20 people
-
Eval bug: Llama 4 Scout/Maverick crash when processing images with certain aspect ratio
#13827 closed
Jun 18, 2025 -
Misc. bug: llama-server builds possibly erroneous prompt for gemma 3
#14151 closed
Jun 18, 2025 -
Eval bug: Unexpected failure converting Mistral 7B v0.2 to f32 GGUF
#13976 closed
Jun 18, 2025 -
Misc. bug: Compilation with openCL on latest build
#13300 closed
Jun 18, 2025 -
Eval bug: Bad output from Qwen3-Embedding-0.6B
#14234 closed
Jun 17, 2025 -
Thad
#14241 closed
Jun 17, 2025 -
W
#14239 closed
Jun 17, 2025 -
Misc. bug: struct.error during GGUF conversion of Mistral-Instruct with convert_hf_to_gguf.py
#14243 closed
Jun 17, 2025 -
Misc. bug: Performance regression on aarch64 q4_0
#14134 closed
Jun 17, 2025 -
Generated thought process not shown on web ui for Qwen 3
#14199 closed
Jun 17, 2025 -
Misc. bug: CUDA error: device kernel image is invalid (Quadro RTX 8000)
#12717 closed
Jun 17, 2025 -
Slow token generation speed of Gemma 3 QAT Models
#13048 closed
Jun 17, 2025 -
Feature Request: Support multimodal LLMs such as Qwen2.5-VL as embedding models
#13247 closed
Jun 17, 2025 -
Compile bug: paths with spaces fail on Unix with Vulkan backend
#13288 closed
Jun 17, 2025 -
Misc. bug:
#14223 closed
Jun 16, 2025 -
Misc. feature: llama-cli support for solar-10.7b-instruct
#14173 closed
Jun 16, 2025 -
Eval bug: Error in trying to use llama-server with Qwen3-Embedding-0.6B-GGUF
#14204 closed
Jun 16, 2025 -
Feature Request: (webui) Implement a experimental features on webui
#11662 closed
Jun 16, 2025 -
Misc. bug: convert_hf_to_gguf.py: ValueError: Can not map tensor 'model.layers.0.mlp.down_proj.SCB'
#12923 closed
Jun 16, 2025 -
Misc. bug: (clip.cpp) q8_0 mmproj is broken on gemma 3
#13025 closed
Jun 16, 2025 -
Eval bug: llama-server stays in unresponsive state- CUDA error: out of memory -
#13085 closed
Jun 16, 2025 -
Misc. bug: OpenCL: Issue with Adreno 610
#13115 closed
Jun 16, 2025 -
Eval bug: -sm row causes GGML_ASSERT fail in Llama 4 Scout
#13240 closed
Jun 16, 2025 -
Misc. bug: terminate called after throwing an instance of 'vk::DeviceLostError'
#13248 closed
Jun 16, 2025 -
Eval bug: sentencepiece tokenizer generates incorrect tokens
#13256 closed
Jun 16, 2025 -
Misc. bug: the output file of llama-quantize is not gguf format
#13258 closed
Jun 16, 2025 -
Misc. bug: Server does not always cancel requests for disconnected connections
#13262 closed
Jun 16, 2025 -
Compile bug: RocmWMMA doesn't work
#14193 closed
Jun 15, 2025 -
Feature Request: dots.llm1 model support
#14044 closed
Jun 15, 2025 -
Misc. bug: xcframework does not contain support for Catalyst
#12751 closed
Jun 15, 2025 -
Compile bug: llama-vocab.cpp Error
#14176 closed
Jun 13, 2025 -
Eval bug: Command-A forces full-prompt re-processing due to lack of cache data
#14157 closed
Jun 13, 2025 -
Compile bug: Vulkan Cross compile for arm64
#13068 closed
Jun 13, 2025 -
Misc. bug: Shared libraries don't properly contain /common/ functions
#13156 closed
Jun 13, 2025 -
Eval bug: Unreadable output when using qwen2-vl model.
#13165 closed
Jun 13, 2025 -
Misc. bug: llama-parallel segmentation fault
#13172 closed
Jun 13, 2025 -
Eval bug: Qwen models lost ability to think
#14147 closed
Jun 12, 2025 -
SYCL fails to initialize unless iGPU is disabled (Intel Arc A770 + i5-9500)
#13775 closed
Jun 12, 2025 -
Misc. bug: Using draft model with Gemma producing error "get_logits_ith: invalid logits id 0"
#13963 closed
Jun 12, 2025 -
Misc. bug: Flash Attention not working on CDNA3 ROCm 6.4 MI300
#13145 closed
Jun 12, 2025 -
Compile bug: hipcc is ran with host C++ compiler flags
#14136 closed
Jun 11, 2025 -
Misc. bug: `test-chat` fails on x86 windows builds but works everywhere else
#14112 closed
Jun 11, 2025 -
Can't llama-quantize Command A unless I rollback
#14054 closed
Jun 11, 2025 -
Misc. bug: 10 Image maximum?
#14111 closed
Jun 11, 2025
26 Issues opened by 26 people
-
Misc. bug: [CANN] memory leaky using CANN as backend
#14257 opened
Jun 18, 2025 -
Eval bug: exmaple llama-simple-chat run failed in Android
#14253 opened
Jun 18, 2025 -
Feature Request: fix handling of Qwen3-Embedding-0.6B input to add EOS token
#14252 opened
Jun 18, 2025 -
Misc. bug: prompt as pasted content in the server
#14251 opened
Jun 17, 2025 -
Llama 4 mmproj fails `unable to find tensor mm.model.fc.weight`
#14237 opened
Jun 17, 2025 -
Misc. bug: llama-server slower on 4bit quantized model with f470bc36bed
#14235 opened
Jun 17, 2025 -
Misc. bug: weird cursor placement in the web UI
#14233 opened
Jun 17, 2025 -
Eval bug: Command-A generates a single repeating token when using split mode row on P40
#14228 opened
Jun 16, 2025 -
Misc. bug: Complex tool calling schema causes an "Unrecognized Schema" exception
#14227 opened
Jun 16, 2025 -
Feature Request: Add --no-warmup to llama-bench
#14224 opened
Jun 16, 2025 -
Misc. bug: OAI response_format json_schema and json_object not applied with Llama 3.x models
#14218 opened
Jun 16, 2025 -
Feature Request: llama-server: a flag for limiting input image size
#14216 opened
Jun 16, 2025 -
Eval bug: RWKV inference with llama-parallel gets wrong output with lmhead offloaded to GPU
#14211 opened
Jun 16, 2025 -
Misc. bug: full-cuda docker build needs ldconfig before launching llama-*
#14195 opened
Jun 15, 2025 -
Misc. bug: [Windows] GPU layers/tensors still consume system memory after load when mmap = true
#14187 opened
Jun 15, 2025 -
Misc. bug: evaluate_and_capture_cuda_graph NULL POINTER DEREFERENCE
#14186 opened
Jun 15, 2025 -
Misc. bug: Failure to allocate buffer with ROCm 6.4
#14178 opened
Jun 13, 2025 -
prismatic-vlms to gguf?
#14159 opened
Jun 13, 2025 -
Compile bug: HIP compile fails during linking stage, undefined reference error repeats
#14155 opened
Jun 12, 2025 -
Research: mmap eviction
#14154 opened
Jun 12, 2025 -
Metrics should not include : in Prometheus metric names
#14150 opened
Jun 12, 2025 -
Misc. bug: llama-server drops multi-part content for final assistant message
#14137 opened
Jun 12, 2025
67 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Hybrid recurrent cache
#13979 commented on
Jun 18, 2025 • 54 new comments -
llama: Attempt to add ModernBert
#14014 commented on
Jun 16, 2025 • 24 new comments -
finetune.cpp command-line arg
#13873 commented on
Jun 17, 2025 • 23 new comments -
ggml: aarch64: Implement SVE Kernels for Int 8 Quantization
#14117 commented on
Jun 18, 2025 • 3 new comments -
webui: add server info to chat message
#14065 commented on
Jun 13, 2025 • 2 new comments -
Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client
#13196 commented on
Jun 14, 2025 • 2 new comments -
scripts: Fix remote option in Windows (#14102)
#14100 commented on
Jun 13, 2025 • 1 new comment -
ggml-cpu-aarch64: Fix compilation issues
#11745 commented on
Jun 17, 2025 • 0 new comments -
llama : initial Mamba-2 support
#9126 commented on
Jun 18, 2025 • 0 new comments -
ggml : add WebGPU backend
#7773 commented on
Jun 18, 2025 • 0 new comments -
Misc. bug: linux/arm64 does not exist for the server docker image
#13891 commented on
Jun 18, 2025 • 0 new comments -
Misc. bug: --split-mode none ≠ --tensor-split 100,0,0 (all layers on GPU0)
#13612 commented on
Jun 18, 2025 • 0 new comments -
llama_model_load: error loading model: error loading model vocabulary: std::bad_cast
#13613 commented on
Jun 18, 2025 • 0 new comments -
Compile bug: tools build failing
#13614 commented on
Jun 18, 2025 • 0 new comments -
Feature Request: update readme for ideal MOE tensor override calculation
#13616 commented on
Jun 18, 2025 • 0 new comments -
Misc. bug: batch in the mtmd-cli.cpp not freed
#13620 commented on
Jun 18, 2025 • 0 new comments -
Feature Request: Falcon-H1
#13681 commented on
Jun 17, 2025 • 0 new comments -
Eval bug: Error running multiple contexts from multiple threads at the same time with Vulkan
#11371 commented on
Jun 17, 2025 • 0 new comments -
android built on GPU cannot comparable with CPU?
#13910 commented on
Jun 17, 2025 • 0 new comments -
Feature Request: Granite 4 Support
#13275 commented on
Jun 16, 2025 • 0 new comments -
Eval bug: I just finetuned gpt2 model with lora and save it to gguf file but not properly worked
#13489 commented on
Jun 13, 2025 • 0 new comments -
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on
Jun 18, 2025 • 0 new comments -
Fix rocWMMA build documentation
#12243 commented on
Jun 13, 2025 • 0 new comments -
imatrix: add option to display importance score statistics for a given imatrix file
#12718 commented on
Jun 14, 2025 • 0 new comments -
quantize: Handle user-defined pruning of whole layers (blocks)
#13037 commented on
Jun 14, 2025 • 0 new comments -
feat(server): Add tool call support to WebUI (LLama Server)
#13501 commented on
Jun 16, 2025 • 0 new comments -
[CUDA backend ONLY] Use just K-cache for MLA + FA: 47% saving on KV-cache size
#13529 commented on
Jun 12, 2025 • 0 new comments -
Granite Four
#13550 commented on
Jun 16, 2025 • 0 new comments -
Move page cache via mbind to prevent cross-NUMA access
#13731 commented on
Jun 18, 2025 • 0 new comments -
Add support for VK_EXT_debug_utils to add labels to Vulkan objects.
#13792 commented on
Jun 18, 2025 • 0 new comments -
[CANN]:Replace aclrtMemsetSync with InplaceZero operator for zero tensor creation
#14002 commented on
Jun 17, 2025 • 0 new comments -
llama : support qwen3 rerank and embeddings
#14029 commented on
Jun 14, 2025 • 0 new comments -
llama: automatically set runtime parameters such as --n-gpu-layers to fit VRAM
#14067 commented on
Jun 13, 2025 • 0 new comments -
server: add model alias presets
#14083 commented on
Jun 17, 2025 • 0 new comments -
Eval bug: BGE-M3 Embedding model is not accessible
#13494 commented on
Jun 13, 2025 • 0 new comments -
Misc. bug: llama-cli stopped starting in release b4191 (c9b00a7)
#13498 commented on
Jun 13, 2025 • 0 new comments -
Feature Request: Apple just release Fast-VLM, a very promising set of multimodal language models
#13512 commented on
Jun 13, 2025 • 0 new comments -
Misc. bug: llama-server webui with --jinja flag does not show thinking when using reasoning models
#14007 commented on
Jun 12, 2025 • 0 new comments -
Eval bug: (MAC) fail in `GGML_METAL_ADD_KERNEL(GGML_METAL_KERNEL_TYPE_FLASH_ATTN_EXT_Q8_0_H96, flash_attn_ext_q8_0_h96, has_simdgroup_mm);`
#14110 commented on
Jun 12, 2025 • 0 new comments -
Feature Request: Qwen 2.5 VL
#11483 commented on
Jun 12, 2025 • 0 new comments -
Misc. bug: Model not loaded on Android with NDK
#13399 commented on
Jun 12, 2025 • 0 new comments -
Eval bug: I cannot run llama 405b on CPU
#13475 commented on
Jun 12, 2025 • 0 new comments -
web UI either doesn't scroll or jumps to the wrong element
#13479 commented on
Jun 12, 2025 • 0 new comments -
Partial offload support for training
#13486 commented on
Jun 12, 2025 • 0 new comments -
Vulkan Runner Frequent Crashing under workload
#14105 commented on
Jun 11, 2025 • 0 new comments -
Misc. bug: Server/Chat parallel tool calling not working
#14101 commented on
Jun 11, 2025 • 0 new comments -
Eval bug: Abort is called in a thread from a custom thread pool during a llama_decode call
#13990 commented on
Jun 11, 2025 • 0 new comments -
Misc. bug: --cache-reuse no longer seems to be caching prompt prefixes
#14113 commented on
Jun 11, 2025 • 0 new comments -
tutorials : list for llama.cpp
#13523 commented on
Jun 11, 2025 • 0 new comments -
Misc. bug: "llama_context_params::swa_full = true" causes very large RAM/VRAM usage
#14123 commented on
Jun 11, 2025 • 0 new comments -
Eval bug: Qwen2.5-VL-7B-Instruct returns extremely inaccurate bbox coordinates
#13694 commented on
Jun 16, 2025 • 0 new comments -
Misc. bug: Stuck while loading the model
#14114 commented on
Jun 16, 2025 • 0 new comments -
Feature Request: Support Codestral Mamba
#8519 commented on
Jun 16, 2025 • 0 new comments -
Eval bug: multimodal llama-gemma3-cli gives nonsensical outputs when used with Vulkan
#13046 commented on
Jun 16, 2025 • 0 new comments -
Misc. bug: Potential out of bound in rerank
#13549 commented on
Jun 16, 2025 • 0 new comments -
Misc. bug: GGML_ASSERT(view_src == NULL || data_size == 0 || data_size + view_offs <= ggml_nbytes(view_src)) failed
#13581 commented on
Jun 16, 2025 • 0 new comments -
Eval bug:GGUF Conversion from LLaVA 1.6(LLaVA NeXT) doesn't work
#13593 commented on
Jun 16, 2025 • 0 new comments -
Misc. bug: ROCm images cannot be found
#11913 commented on
Jun 15, 2025 • 0 new comments -
Eval bug: Weight repacking for AVX2 block interleaving is very slow and NUMA unfriendly
#12759 commented on
Jun 15, 2025 • 0 new comments -
Feature Proposal: Server Model Switching at Runtime
#13027 commented on
Jun 15, 2025 • 0 new comments -
LoRA training example
#13485 commented on
Jun 15, 2025 • 0 new comments -
Feature request: Graphical GGUF viewer
#6715 commented on
Jun 14, 2025 • 0 new comments -
(Discussion) Improve usability of llama-server
#13367 commented on
Jun 14, 2025 • 0 new comments -
Research: How to integrate VITA 1.5 for multi-modal GGUF deployment?
#13520 commented on
Jun 14, 2025 • 0 new comments -
Misc. bug: -sm row results in gibberish output on HIP (ROCm 6.3.3)
#13545 commented on
Jun 14, 2025 • 0 new comments -
Feature Request: XiaomiMiMo/MiMo-7B-RL
#13218 commented on
Jun 13, 2025 • 0 new comments -
Why mul_mat in ggml slower than llama.cpp?
#13473 commented on
Jun 13, 2025 • 0 new comments