-
Notifications
You must be signed in to change notification settings - Fork 12.1k
Insights: ggml-org/llama.cpp
Overview
Could not load contribution data
Please try again later
212 Releases published by 1 person
-
b5416
published
May 19, 2025 -
b5417
published
May 19, 2025 -
b5421
published
May 19, 2025 -
b5422
published
May 19, 2025 -
b5423
published
May 19, 2025 -
b5425
published
May 19, 2025 -
b5426
published
May 19, 2025 -
b5427
published
May 20, 2025 -
b5429
published
May 20, 2025 -
b5430
published
May 20, 2025 -
b5431
published
May 20, 2025 -
b5432
published
May 20, 2025 -
b5434
published
May 20, 2025 -
b5435
published
May 20, 2025 -
b5436
published
May 20, 2025 -
b5437
published
May 20, 2025 -
b5438
published
May 20, 2025 -
b5439
published
May 21, 2025 -
b5440
published
May 21, 2025 -
b5441
published
May 21, 2025 -
b5442
published
May 21, 2025 -
b5443
published
May 21, 2025 -
b5444
published
May 21, 2025 -
b5446
published
May 21, 2025 -
b5448
published
May 21, 2025 -
b5449
published
May 21, 2025 -
b5450
published
May 21, 2025 -
b5451
published
May 21, 2025 -
b5452
published
May 21, 2025 -
b5453
published
May 22, 2025 -
b5454
published
May 22, 2025 -
b5456
published
May 22, 2025 -
b5458
published
May 22, 2025 -
b5459
published
May 22, 2025 -
b5460
published
May 22, 2025 -
b5461
published
May 23, 2025 -
b5462
published
May 23, 2025 -
b5463
published
May 23, 2025 -
b5464
published
May 23, 2025 -
b5465
published
May 23, 2025 -
b5466
published
May 23, 2025 -
b5468
published
May 23, 2025 -
b5471
published
May 24, 2025 -
b5472
published
May 24, 2025 -
b5473
published
May 24, 2025 -
b5474
published
May 24, 2025 -
b5475
published
May 24, 2025 -
b5476
published
May 24, 2025 -
b5477
published
May 24, 2025 -
b5478
published
May 25, 2025 -
b5479
published
May 25, 2025 -
b5480
published
May 25, 2025 -
b5481
published
May 25, 2025 -
b5483
published
May 25, 2025 -
b5484
published
May 25, 2025 -
b5486
published
May 25, 2025 -
b5488
published
May 26, 2025 -
b5489
published
May 26, 2025 -
b5490
published
May 26, 2025 -
b5492
published
May 26, 2025 -
b5493
published
May 26, 2025 -
b5494
published
May 26, 2025 -
b5495
published
May 26, 2025 -
b5497
published
May 26, 2025 -
b5498
published
May 26, 2025 -
b5499
published
May 26, 2025 -
b5501
published
May 26, 2025 -
b5502
published
May 27, 2025 -
b5503
published
May 27, 2025 -
b5504
published
May 27, 2025 -
b5505
published
May 27, 2025 -
b5506
published
May 27, 2025 -
b5508
published
May 27, 2025 -
b5509
published
May 27, 2025 -
b5510
published
May 27, 2025 -
b5512
published
May 27, 2025 -
b5513
published
May 27, 2025 -
b5514
published
May 27, 2025 -
b5515
published
May 27, 2025 -
b5516
published
May 27, 2025 -
b5517
published
May 28, 2025 -
b5519
published
May 28, 2025 -
b5522
published
May 28, 2025 -
b5524
published
May 28, 2025 -
b5526
published
May 28, 2025 -
b5527
published
May 28, 2025 -
b5529
published
May 29, 2025 -
b5530
published
May 29, 2025 -
b5532
published
May 29, 2025 -
b5533
published
May 29, 2025 -
b5534
published
May 29, 2025 -
b5535
published
May 29, 2025 -
b5537
published
May 29, 2025 -
b5538
published
May 29, 2025 -
b5539
published
May 30, 2025 -
b5540
published
May 30, 2025 -
b5541
published
May 30, 2025 -
b5543
published
May 30, 2025 -
b5544
published
May 30, 2025 -
b5545
published
May 30, 2025 -
b5546
published
May 30, 2025 -
b5547
published
May 30, 2025 -
b5548
published
May 30, 2025 -
b5551
published
May 31, 2025 -
b5552
published
May 31, 2025 -
b5554
published
May 31, 2025 -
b5555
published
May 31, 2025 -
b5556
published
May 31, 2025 -
b5558
published
May 31, 2025 -
b5559
published
Jun 1, 2025 -
b5560
published
Jun 1, 2025 -
b5568
published
Jun 1, 2025 -
b5569
published
Jun 1, 2025 -
b5571
published
Jun 1, 2025 -
b5572
published
Jun 1, 2025 -
b5573
published
Jun 2, 2025 -
b5574
published
Jun 2, 2025 -
b5575
published
Jun 2, 2025 -
b5576
published
Jun 2, 2025 -
b5577
published
Jun 2, 2025 -
b5578
published
Jun 2, 2025 -
b5579
published
Jun 2, 2025 -
b5580
published
Jun 3, 2025 -
b5581
published
Jun 3, 2025 -
b5584
published
Jun 4, 2025 -
b5585
published
Jun 4, 2025 -
b5586
published
Jun 4, 2025 -
b5587
published
Jun 4, 2025 -
b5588
published
Jun 4, 2025 -
b5589
published
Jun 4, 2025 -
b5590
published
Jun 4, 2025 -
b5591
published
Jun 5, 2025 -
b5592
published
Jun 5, 2025 -
b5593
published
Jun 5, 2025 -
b5595
published
Jun 5, 2025 -
b5596
published
Jun 5, 2025 -
b5598
published
Jun 5, 2025 -
b5600
published
Jun 6, 2025 -
b5601
published
Jun 6, 2025 -
b5602
published
Jun 6, 2025 -
b5603
published
Jun 7, 2025 -
b5604
published
Jun 7, 2025 -
b5606
published
Jun 8, 2025 -
b5608
published
Jun 9, 2025 -
b5609
published
Jun 9, 2025 -
b5610
published
Jun 9, 2025 -
b5612
published
Jun 9, 2025 -
b5613
published
Jun 9, 2025 -
b5614
published
Jun 9, 2025 -
b5615
published
Jun 9, 2025 -
b5616
published
Jun 9, 2025 -
b5617
published
Jun 9, 2025 -
b5618
published
Jun 9, 2025 -
b5621
published
Jun 10, 2025 -
b5620
published
Jun 10, 2025 -
b5622
published
Jun 10, 2025 -
b5624
published
Jun 10, 2025 -
b5625
published
Jun 10, 2025 -
b5627
published
Jun 10, 2025 -
b5629
published
Jun 10, 2025 -
b5630
published
Jun 11, 2025 -
b5631
published
Jun 11, 2025 -
b5632
published
Jun 11, 2025 -
b5633
published
Jun 11, 2025 -
b5634
published
Jun 11, 2025 -
b5636
published
Jun 11, 2025 -
b5637
published
Jun 11, 2025 -
b5638
published
Jun 11, 2025 -
b5639
published
Jun 11, 2025 -
b5640
published
Jun 11, 2025 -
b5641
published
Jun 12, 2025 -
b5642
published
Jun 12, 2025 -
b5644
published
Jun 12, 2025 -
b5645
published
Jun 12, 2025 -
b5646
published
Jun 12, 2025 -
b5648
published
Jun 12, 2025 -
b5649
published
Jun 13, 2025 -
b5650
published
Jun 13, 2025 -
b5651
published
Jun 13, 2025 -
b5652
published
Jun 13, 2025 -
b5653
published
Jun 13, 2025 -
b5654
published
Jun 13, 2025 -
b5655
published
Jun 13, 2025 -
b5657
published
Jun 13, 2025 -
b5659
published
Jun 13, 2025 -
b5662
published
Jun 13, 2025 -
b5664
published
Jun 14, 2025 -
b5666
published
Jun 15, 2025 -
b5667
published
Jun 15, 2025 -
b5668
published
Jun 15, 2025 -
b5669
published
Jun 15, 2025 -
b5670
published
Jun 15, 2025 -
b5671
published
Jun 15, 2025 -
b5672
published
Jun 15, 2025 -
b5673
published
Jun 15, 2025 -
b5674
published
Jun 15, 2025 -
b5675
published
Jun 16, 2025 -
b5676
published
Jun 16, 2025 -
b5679
published
Jun 16, 2025 -
b5681
published
Jun 16, 2025 -
b5682
published
Jun 16, 2025 -
b5683
published
Jun 16, 2025 -
b5684
published
Jun 16, 2025 -
b5685
published
Jun 16, 2025 -
b5686
published
Jun 16, 2025 -
b5687
published
Jun 17, 2025 -
b5688
published
Jun 17, 2025 -
b5689
published
Jun 17, 2025 -
b5693
published
Jun 18, 2025 -
b5695
published
Jun 18, 2025 -
b5696
published
Jun 18, 2025 -
b5697
published
Jun 18, 2025
268 Pull requests merged by 73 people
-
ggml : implement GLU for split up/gate
#14181 merged
Jun 18, 2025 -
ggml: Add Apple support for GGML_CPU_ALL_VARIANTS
#14258 merged
Jun 18, 2025 -
mtmd : refactor llava-uhd preprocessing logic
#14247 merged
Jun 18, 2025 -
llama-chat : fix multiple system messages for gemma, orion
#14246 merged
Jun 18, 2025 -
convert : fix null head_dim AutoConfig regression
#14248 merged
Jun 18, 2025 -
sync : ggml
#14255 merged
Jun 18, 2025 -
cmake: remove shader-gen step-targets from ggml-vulkan
#14226 merged
Jun 17, 2025 -
ggml-cpu : remove the weak alias trick
#14221 merged
Jun 17, 2025 -
musa: fix build warning (unused variable)
#14231 merged
Jun 17, 2025 -
common : suggest --jinja when autodetection fails
#14222 merged
Jun 16, 2025 -
server : fix incorrect usage of llama_get_embeddings()
#14225 merged
Jun 16, 2025 -
llama : add thread safety test
#14035 merged
Jun 16, 2025 -
cmake: clean up external project logic for vulkan-shaders-gen
#14179 merged
Jun 16, 2025 -
Add NeoBERT
#14164 merged
Jun 16, 2025 -
HIP: disable rocwmma on gfx12 by default until rocm 7.0
#14202 merged
Jun 16, 2025 -
llama : rework embeddings logic
#14208 merged
Jun 16, 2025 -
ggml: Add Android support for GGML_CPU_ALL_VARIANTS
#14206 merged
Jun 16, 2025 -
Remove arcee AFM change in convert_hf_to_gguf_update.py
#14207 merged
Jun 16, 2025 -
Allow override when adding value to ggufwriter
#14194 merged
Jun 16, 2025 -
vulkan: mutex around vkQueueSubmit
#14127 merged
Jun 16, 2025 -
ggml-cpu : rework weak alias on apple targets
#14146 merged
Jun 16, 2025 -
Add support for Arcee AI's upcoming AFM model
#14185 merged
Jun 15, 2025 -
When listening on a unix domain socket don't print http:// and port
#14180 merged
Jun 15, 2025 -
quantize: Use UINT32 if there's an INT KV override
#14197 merged
Jun 15, 2025 -
CUDA/HIP: fix ssm_scan on devices where warp size is not 32
#14196 merged
Jun 15, 2025 -
HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRONT_SIZE__
#14183 merged
Jun 15, 2025 -
kv-cache : fix use-after-move of defrag info
#14189 merged
Jun 15, 2025 -
llama-model : add dots.llm1 architecture support (#14044)
#14118 merged
Jun 15, 2025 -
cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ
#14188 merged
Jun 15, 2025 -
batch : auto-gen positions + verify multi-sequence input
#14177 merged
Jun 15, 2025 -
remove WIP since PR has been merged
#13912 merged
Jun 15, 2025 -
llama-chat : Do not throw when tool parsing fails
#14012 merged
Jun 14, 2025 -
compare llama-bench: add option to plot
#14169 merged
Jun 14, 2025 -
vocab : fix build
#14175 merged
Jun 13, 2025 -
sycl: fix docker image
#14144 merged
Jun 13, 2025 -
batch : add LLAMA_BATCH_DEBUG environment variable
#14172 merged
Jun 13, 2025 -
Update multimodal.md
#14122 merged
Jun 13, 2025 -
batch : rework llama_batch_allocr
#14153 merged
Jun 13, 2025 -
readme : remove survey link
#14168 merged
Jun 13, 2025 -
cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT
#14167 merged
Jun 13, 2025 -
Make cls_b and cls_out_b optional in ranking
#14165 merged
Jun 13, 2025 -
server : fix SWA condition for full context reprocess
#14163 merged
Jun 13, 2025 -
sycl: Adding additional cpy dbg print output
#14034 merged
Jun 13, 2025 -
sycl: Bump oneMath commit
#14152 merged
Jun 13, 2025 -
Improve build-info.cpp generation
#14156 merged
Jun 13, 2025 -
vocab : prevent heap overflow when vocab is too small
#14145 merged
Jun 13, 2025 -
sycl: Remove not needed copy f16->f32 for dnnl mul mat
#14125 merged
Jun 12, 2025 -
readme : remove project status link
#14149 merged
Jun 12, 2025 -
server : re-enable SWA speculative decoding
#14131 merged
Jun 12, 2025 -
context : simplify output counting logic during decode
#14142 merged
Jun 12, 2025 -
batch : remove logits_all flag
#14141 merged
Jun 12, 2025 -
cmake : handle whitepsaces in path during metal build
#14126 merged
Jun 12, 2025 -
kv-cache : fix split_equal handling in unified implementation
#14130 merged
Jun 12, 2025 -
context : round n_tokens to next multiple of n_seqs when reserving
#14140 merged
Jun 12, 2025 -
common: fix issue with regex_escape routine on windows
#14133 merged
Jun 11, 2025 -
Implement GGML_CPU_ALL_VARIANTS for ARM
#14080 merged
Jun 11, 2025 -
chore : clean up relative source dir paths
#14128 merged
Jun 11, 2025 -
tests : add test-tokenizers-repo
#14017 merged
Jun 11, 2025 -
vulkan: Better thread-safety for command pools/buffers
#14116 merged
Jun 11, 2025 -
webui: Wrap long numbers instead of infinite horizontal scroll
#14062 merged
Jun 11, 2025 -
kv-cache : relax SWA masking condition
#14119 merged
Jun 11, 2025 -
Pass --keep to llama-server
#14120 merged
Jun 11, 2025 -
kv-cache : add LLAMA_KV_CACHE_DEBUG environment variable
#14121 merged
Jun 11, 2025 -
vulkan: Track descriptor pools/sets per-context
#14109 merged
Jun 11, 2025 -
opencl: preliminary support for Q4_0 mul_mat_id using matvec
#14003 merged
Jun 10, 2025 -
kv-cache : avoid modifying recurrent cells when setting inputs
#13834 merged
Jun 10, 2025 -
convert : fix duplicate key DeepSeek-R1 conversion error
#14103 merged
Jun 10, 2025 -
llama : support GEGLU for jina-bert-v2
#14090 merged
Jun 10, 2025 -
vulkan: force device 0 in CI
#14106 merged
Jun 10, 2025 -
server: Fixed speculative decoding stats to use
#accepted \ #tested
rather than#accepted \ #drafted
#14104 merged
Jun 10, 2025 -
sync : ggml
#14107 merged
Jun 10, 2025 -
Vulkan: Don't default to CPU device (like llvmpipe)
#14099 merged
Jun 10, 2025 -
rpc: nicer error message for RPC server crash
#14076 merged
Jun 10, 2025 -
sync : ggml
#14096 merged
Jun 10, 2025 -
metal : use less stack memory in FA kernel
#14088 merged
Jun 9, 2025 -
kv-cache : fix shift and defrag logic
#14081 merged
Jun 9, 2025 -
llama : allow building all tests on windows when not using shared libs
#13980 merged
Jun 9, 2025 -
ggml-cpu : split arch-specific implementations
#13892 merged
Jun 9, 2025 -
cuda : fix device sync on buffer clear
#14033 merged
Jun 9, 2025 -
graph : fix geglu
#14077 merged
Jun 9, 2025 -
[CANN] Simplify the environment variable setting for GGML_CANN_MEM_POOL and GGML_CANN_ASYNC_MODE
#13104 merged
Jun 9, 2025 -
webui: fix sidebar being covered by main content
#14082 merged
Jun 9, 2025 -
server : fix LRU check
#14079 merged
Jun 9, 2025 -
sycl: Add reorder to Q6_K mmvq implementation
#13885 merged
Jun 9, 2025 -
Add geglu activation function
#14074 merged
Jun 9, 2025 -
[Ascend NPU] Enable labeler
#13914 merged
Jun 9, 2025 -
cuda : fix buffer type check with integrated GPUs
#14069 merged
Jun 8, 2025 -
ci: add LoongArch cross-compile build
#13944 merged
Jun 7, 2025 -
SYCL: Implement few same quantized type copy kernels
#13739 merged
Jun 7, 2025 -
llama : fix llama_model_chat_template with template name (LLM_KV with suffix)
#14050 merged
Jun 7, 2025 -
llama : deprecate llama_kv_self_ API
#14030 merged
Jun 6, 2025 -
context : fix SWA-related warning for multiple sequences
#14045 merged
Jun 6, 2025 -
llama : support multiple classifier outputs and labels
#13940 merged
Jun 6, 2025 -
gguf-py : add add_classifier_output_labels method to writer
#14031 merged
Jun 5, 2025 -
vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs
#14001 merged
Jun 5, 2025 -
Fix CUDA build failure on AutoDL cloud platforms
#14005 merged
Jun 5, 2025 -
memory : migrate from llama_kv_cache to more generic llama_memory
#14006 merged
Jun 5, 2025 -
llama : allow using mmap without PrefetchVirtualMemory
#14013 merged
Jun 5, 2025 -
chore: added badge and link to release
#13938 merged
Jun 5, 2025 -
vocab : warn about missing mask token
#14022 merged
Jun 5, 2025 -
context : fix pos_min initialization upon decode error
#14008 merged
Jun 5, 2025 -
vulkan: automatically deduce size of push constants
#13936 merged
Jun 5, 2025 -
ggml-vulkan: adds support for op CONV_TRANSPOSE_1D
#13813 merged
Jun 4, 2025 -
kv-cache : refactor the update/defrag mechanism
#13988 merged
Jun 4, 2025 -
ci : remove cuda 11.7 releases, switch runner to windows 2022
#13997 merged
Jun 4, 2025 -
releases : use dl backend for linux release, remove arm64 linux release
#13996 merged
Jun 4, 2025 -
llama-graph : use ggml_repeat_4d
#13998 merged
Jun 4, 2025 -
CUDA: fix FTZ in FA for Gemma 3
#13991 merged
Jun 4, 2025 -
kv-cache : fix unified::seq_rm to work with seq_id < 0
#13985 merged
Jun 4, 2025 -
vulkan: fix warnings in perf logger querypool code
#13937 merged
Jun 3, 2025 -
docs : add "Quick start" section for new users
#13862 merged
Jun 3, 2025 -
opencl: add
backend_synchronize
#13939 merged
Jun 2, 2025 -
OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat
#13840 merged
Jun 2, 2025 -
server : disable speculative decoding for SWA models
#13970 merged
Jun 2, 2025 -
metal : use F32 attention accumulators in FA kernels
#13975 merged
Jun 2, 2025 -
gemma : more consistent attention scaling for v2 and v3
#13951 merged
Jun 2, 2025 -
server
: update deepseek reasoning format (pass reasoning_content as diffs)#13933 merged
Jun 2, 2025 -
mtmd : fix memory leak in mtmd_helper_eval_chunk_single
#13961 merged
Jun 2, 2025 -
"Fix: Handle mixed-case 'Power' strings in POWER CPU detection"
#13966 merged
Jun 2, 2025 -
sycl: quantize and reorder the input to q8_1 when reorder is enabled
#13826 merged
Jun 2, 2025 -
gguf: fix failure on version == 0
#13956 merged
Jun 1, 2025 -
convert : fix nomic-bert-moe mask token
#13757 merged
Jun 1, 2025 -
convert : fix vocab padding code for bert models
#13954 merged
Jun 1, 2025 -
ggml: check if non-native endian model is being loaded
#13943 merged
Jun 1, 2025 -
sync : ggml
#13953 merged
Jun 1, 2025 -
add easy-llama Python bindings to README
#13950 merged
Jun 1, 2025 -
parallel : fix n_junk == 0
#13952 merged
Jun 1, 2025 -
kv-cache : split implementation in separate sources
#13920 merged
Jun 1, 2025 -
threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling
#12995 merged
May 31, 2025 -
Note about necessity of having libcurl installed for standard build
#13945 merged
May 31, 2025 -
chat
: allow unclosed thinking tags#13931 merged
May 31, 2025 -
llama : deprecate explicit kv_self defrag/update calls
#13921 merged
May 31, 2025 -
llama : use n_swa + n_ubatch cells for SWA cache
#13833 merged
May 31, 2025 -
Replace alert and confirm with custom modals.
#13711 merged
May 31, 2025 -
llama : auto-batch preparation
#13845 merged
May 31, 2025 -
mtmd : drop
_shared
fromlibmtmd
name, merge helpers into libmtmd (⚠️ breaking change)#13917 merged
May 31, 2025 -
kv-cache : refactor + add llama_memory_state_i
#13746 merged
May 31, 2025 -
CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856)
#13895 merged
May 31, 2025 -
CUDA: fix typo in FlashAttention code
#13926 merged
May 30, 2025 -
sched : avoid changing cur_copy when a graph is already allocated
#13922 merged
May 30, 2025 -
parallel : increase the variability of the prompt lengths
#13927 merged
May 30, 2025 -
cuda : prevent using split buffers with 3d/4d matrices
#13919 merged
May 30, 2025 -
SYCL: Add mrope kernel
#13755 merged
May 30, 2025 -
sync : vendor
#13901 merged
May 30, 2025 -
convert : fix rwkv bos/eos token
#13844 merged
May 30, 2025 -
convert : allow partial update to the chkhsh pre-tokenizer list
#13847 merged
May 30, 2025 -
Add support for DistilBert
#13907 merged
May 30, 2025 -
model: minicpm should use llm_build_granite
#13911 merged
May 30, 2025 -
cmake: Guard GGML_CPU_ALL_VARIANTS by architecture
#13890 merged
May 29, 2025 -
llama : add support for jina-reranker-v2
#13900 merged
May 29, 2025 -
gguf-py : add support for sub_type (in arrays) in GGUFWriter add_key_value method
#13561 merged
May 29, 2025 -
arm64: optimize q4_k_q8_k kernel with i8mm
#13886 merged
May 29, 2025 -
cmake: Factor out CPU architecture detection
#13883 merged
May 29, 2025 -
ggml: aarch64: Implement SVE F32 kernels for Mamba Sequential Scan Algorithm
#13882 merged
May 29, 2025 -
tests : remove json.hpp from a test
#13880 merged
May 29, 2025 -
convert : workaround for AutoConfig dummy labels
#13881 merged
May 29, 2025 -
llama : add RobertaForSequenceClassification reranker support
#13875 merged
May 29, 2025 -
ggml: aarch64: Implement SVE F32 kernels for vector functions
#13843 merged
May 29, 2025 -
gguf/utility: return full content on size < 0
#13841 merged
May 28, 2025 -
llama : fix KV shift for qwen2vl
#13870 merged
May 28, 2025 -
mtmd : move helpers to dedicated library (⚠️ breaking change)
#13866 merged
May 28, 2025 -
ci: disable LLAMA_CURL for Linux cross-builds
#13871 merged
May 28, 2025 -
Add support for BertForSequenceClassification reranking
#13858 merged
May 28, 2025 -
convert: small addition to support LlamaModel
#13838 merged
May 28, 2025 -
convert : fix qwen omni conversion
#13859 merged
May 28, 2025 -
Change umlaut test
#11600 merged
May 28, 2025 -
CUDA: fix FA tg at long context for CC >= 8.9
#13852 merged
May 28, 2025 -
convert : fix tensor naming conflict for llama 4 vision
#13836 merged
May 28, 2025 -
[CANN]: Add SOC TYPE printing in cmake configuration processing
#13837 merged
May 28, 2025 -
opencl: add new ops -
argsort
,div
,sub
,addrows
,sigmoid
,group_norm
#13787 merged
May 27, 2025 -
opencl: mark
MUL_MAT
supports non-contiguous tensors for f32#13790 merged
May 27, 2025 -
vulkan: use timestamp queries for GGML_VULKAN_PERF
#13817 merged
May 27, 2025 -
cmake : add llama-cparams.cpp to build
#13832 merged
May 27, 2025 -
SYCL: add gelu_erf kernel
#13749 merged
May 27, 2025 -
sync : ggml
#13829 merged
May 27, 2025 -
ggml : add ggml_repeat_4d
#13824 merged
May 27, 2025 -
ggml : riscv: add xtheadvector support
#13720 merged
May 27, 2025 -
mtmd : support Qwen 2.5 Omni (input audio+vision, no audio output)
#13784 merged
May 27, 2025 -
docs: remove link for llama-cli function calling
#13810 merged
May 27, 2025 -
ggml-cpu: x86 feature detection is specific to x86
#13811 merged
May 27, 2025 -
ggml : allow CUDA graphs when using pipeline parallelism
#13814 merged
May 27, 2025 -
kv-cells : track min/max used cells and per-sequence positions
#13808 merged
May 27, 2025 -
sampling : make sure samplers return at least 1 token
#13822 merged
May 27, 2025 -
llama : validate seq id batch input
#13809 merged
May 27, 2025 -
server: --offline mode
#13804 merged
May 26, 2025 -
scripts : add option to compare commits in Debug
#13806 merged
May 26, 2025 -
cuda : avoid cuGetErrorString when not needed
#13791 merged
May 26, 2025 -
SYCL: Add non contiguous support in RMS_NORM and NORM kernels
#13611 merged
May 26, 2025 -
server: fix streaming crashes
#13786 merged
May 26, 2025 -
examples/training: Fix file name in README
#13803 merged
May 26, 2025 -
server
: fix format of streamed tool call deltas (diff name, fix id location)#13800 merged
May 26, 2025 -
server: fix regression on streamed non-chat completion w/ stops
#13785 merged
May 26, 2025 -
examples : allow extracting embeddings from decoder contexts
#13797 merged
May 26, 2025 -
llama : clarify deprecation message
#13794 merged
May 26, 2025 -
sycl: Add more debug prints
#13640 merged
May 26, 2025 -
vulkan: mark IM2COL as supporting non-contig
#13783 merged
May 26, 2025 -
[CANN]: add the basic supports of Flash Attention kernel
#13627 merged
May 26, 2025 -
server
: add--reasoning-budget 0
to disable thinking (incl. qwen3 w/ enable_thinking:false)#13771 merged
May 25, 2025 -
webui : bump max upload file size to 500MB
#13779 merged
May 25, 2025 -
tests : improve UGM tokenizer test coverage
#13773 merged
May 25, 2025 -
kv-cache : rework kv_cell
#13706 merged
May 25, 2025 -
Fix build on OpenBSD
#13541 merged
May 25, 2025 -
mtmd : add support for Qwen2-Audio and SeaLLM-Audio
#13760 merged
May 25, 2025 -
docs : add Moondream2 pre-quantized link
#13745 merged
May 25, 2025 -
server: fix/test add_generation_prompt param
#13770 merged
May 25, 2025 -
Qwen3 MoE should also work with tie_word_embeddings
#13768 merged
May 25, 2025 -
SYCL: Temporarily revert "sycl: simplify bin_bcast_kernel (#13383)"
#13752 merged
May 25, 2025 -
server
: streaming of tool calls and thoughts when--jinja
is on#12379 merged
May 25, 2025 -
releases : bundle llvm omp library in windows release
#13763 merged
May 24, 2025 -
releases : enable openmp in windows cpu backend build
#13756 merged
May 24, 2025 -
ggml-cpu : set openmp wait time if not set
#13758 merged
May 24, 2025 -
Move GLM4 f32 attention fix to the correct function
#13750 merged
May 24, 2025 -
ggml : add ggml_gelu_erf() CUDA kernel
#13719 merged
May 24, 2025 -
vocab : fix ugm tokenizer precision
#13743 merged
May 24, 2025 -
CUDA: fix race condition in FA vector kernels
#13742 merged
May 24, 2025 -
ci : enable winget package updates
#13734 merged
May 23, 2025 -
ci : add winget package updater
#13732 merged
May 23, 2025 -
hparams : initialize arrays
#13728 merged
May 23, 2025 -
llama : allow custom list of swa_layers
#13726 merged
May 23, 2025 -
server : support audio input
#13714 merged
May 23, 2025 -
[CANN]Support OP MUL_MAT_ID Q8 && Q4
#13705 merged
May 23, 2025 -
ggml : fix the order of ggml_unary_op
#13718 merged
May 23, 2025 -
vulkan: support CPY from any type to itself
#13695 merged
May 23, 2025 -
vulkan: Disable coopmat/coopmat2/bfloat extensions if glslc doesn't support it
#13696 merged
May 23, 2025 -
use LOG_WARN to replace
std::cerr
#13657 merged
May 23, 2025 -
release : fix windows hip release
#13707 merged
May 22, 2025 -
tts : fix n_ubatch + make WavTokenizer cache-less
#13713 merged
May 22, 2025 -
mtmd : add ultravox audio input
#13623 merged
May 22, 2025 -
common: Include torch package for s390x
#13699 merged
May 22, 2025 -
server : pad small embedding batches
#13692 merged
May 22, 2025 -
gguf-py : correct charsmap parameter typing
#13701 merged
May 22, 2025 -
sycl: Remove waits from async functions call
#13702 merged
May 22, 2025 -
SYCL: Avoid using SYCL-Graph for unsupported nodes
#13587 merged
May 22, 2025 -
opencl: Add support for multiple devices
#12622 merged
May 21, 2025 -
opencl: fix couple crashes
#12795 merged
May 21, 2025 -
releases : build CPU backend separately (windows)
#13642 merged
May 21, 2025 -
hparams : support models for which all layers use SWA
#13682 merged
May 21, 2025 -
server : improve error reporting
#13680 merged
May 21, 2025 -
convert : add qwen2vl support for unsloth merges
#13686 merged
May 21, 2025 -
examples : switch retrieval to llama_encode
#13685 merged
May 21, 2025 -
gguf-py : display the invalid gguf type
#13687 merged
May 21, 2025 -
ggml : add ggml_gelu_erf()
#13667 merged
May 21, 2025 -
Add the endpoints /api/tags and /api/chat
#13659 merged
May 21, 2025 -
server : fix first message identification
#13634 merged
May 21, 2025 -
kv-cache : simplify the interface
#13660 merged
May 21, 2025 -
model : disable SWA for Phi models
#13676 merged
May 21, 2025 -
musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy
#13647 merged
May 21, 2025 -
vulkan: small fixes
#13626 merged
May 20, 2025 -
mtmd-helper : bug fix to token batching in mtmd
#13650 merged
May 20, 2025 -
model : fix llama4 graph
#13663 merged
May 20, 2025 -
llama : remove llama_kv_cache_view API + remove deprecated
#13653 merged
May 20, 2025 -
CUDA: skip fully masked-out KV in FA vec kernel
#13584 merged
May 20, 2025 -
tests : avoid github urls due to throttling
#13654 merged
May 20, 2025 -
sycl: disable reorder for sycl mulmat
#13536 merged
May 20, 2025 -
Fix GLM4 incoherence with fp16 accumulators
#13639 merged
May 20, 2025 -
metal : fix typo in FA kernel comments
#13651 merged
May 20, 2025 -
kv-cache : add SWA support
#13194 merged
May 20, 2025 -
[CANN] Update CANN model support status
#13162 merged
May 20, 2025 -
sycl : Overcoming workaround for mmap() allocation on Windows
#13482 merged
May 20, 2025 -
added load_progress_callback to common_params
#13617 merged
May 19, 2025 -
Vulkan: Support fp32 accumulator in quantized matmul to fix GLM4-32B incoherence
#13607 merged
May 19, 2025 -
sycl : reviewing the backend documentation
#13544 merged
May 19, 2025 -
mtmd : add vision support for llama 4
#13282 merged
May 19, 2025 -
ci : upgraded oneAPI version in SYCL workflows and dockerfile
#13532 merged
May 19, 2025 -
sync : ggml
#13630 merged
May 19, 2025 -
fix: check model pointer validity before use
#13631 merged
May 19, 2025 -
[CANN]Support OP MUL_MAT_ID
#13042 merged
May 19, 2025
58 Pull requests opened by 51 people
-
scripts: update pyproject.toml - deprecated poetry config + support uv
#13615 opened
May 18, 2025 -
cuda: fix CMAKE_CUDA_COMPILER not found error (#13528)
#13625 opened
May 19, 2025 -
webui: Allow editing file attachments when editing messages.
#13645 opened
May 20, 2025 -
model : jina-embeddings-v3 support
#13693 opened
May 21, 2025 -
common/llama: align structures for reduce cacheline size on 64bit platforms
#13710 opened
May 22, 2025 -
remove templates from soft_max_f32_submitter to allow SYCL graph updates
#13724 opened
May 23, 2025 -
Move page cache via mbind to prevent cross-NUMA access
#13731 opened
May 23, 2025 -
cmake : set `RPATH` to `$ORIGIN` on Linux (#13740)
#13741 opened
May 24, 2025 -
Add comprehensive test for llama_batch/sbatch/ubatch concepts
#13764 opened
May 24, 2025 -
ggml : add ggml_fill()
#13772 opened
May 25, 2025 -
server: args for draft model cache types (#11200)
#13782 opened
May 25, 2025 -
Add support for VK_EXT_debug_utils to add labels to Vulkan objects.
#13792 opened
May 26, 2025 -
Add OPT model support - Add OPT architecture support in C++ code - Im…
#13799 opened
May 26, 2025 -
Tokenize logging
#13821 opened
May 27, 2025 -
examples : support MiniCPM-V-2
#13828 opened
May 27, 2025 -
musa: enable fp16 mma (all) and cublas on qy2
#13842 opened
May 28, 2025 -
finetune.cpp command-line arg
#13873 opened
May 28, 2025 -
Need to undefine "hz" on AIX
#13894 opened
May 29, 2025 -
convert: add eagle2 draft arch
#13908 opened
May 30, 2025 -
[CANN]Support Acl Graph
#13915 opened
May 30, 2025 -
Add plamo2
#13930 opened
May 30, 2025 -
`chat`: improve llama 3.x handling of <|python_tag|> (+ allow --special combo)
#13932 opened
May 30, 2025 -
sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices
#13973 opened
Jun 2, 2025 -
Hybrid recurrent cache
#13979 opened
Jun 2, 2025 -
chore(server): split context-server to its own file
#13987 opened
Jun 3, 2025 -
[CANN]:Replace aclrtMemsetSync with InplaceZero operator for zero tensor creation
#14002 opened
Jun 4, 2025 -
llama: Attempt to add ModernBert
#14014 opened
Jun 4, 2025 -
server: Enable mtmd in llama-server `/completion` endpoint
#14016 opened
Jun 4, 2025 -
ggml-cpu: fix uncaught underscore terminators for s390x
#14023 opened
Jun 5, 2025 -
llama : support qwen3 rerank and embeddings
#14029 opened
Jun 5, 2025 -
cpu: Update RISC-V condition to require GCC version 14 or higher
#14032 opened
Jun 5, 2025 -
ggml-cpu: optimise assembly calls for hsum on s390x
#14037 opened
Jun 5, 2025 -
webui: add server info to chat message
#14065 opened
Jun 8, 2025 -
llama: automatically set runtime parameters such as --n-gpu-layers to fit VRAM
#14067 opened
Jun 8, 2025 -
server: add model alias presets
#14083 opened
Jun 9, 2025 -
scripts: Fix remote option in Windows (#14102)
#14100 opened
Jun 10, 2025 -
ggml: aarch64: Implement SVE Kernels for Int 8 Quantization
#14117 opened
Jun 11, 2025 -
tests : add test-model-random
#14139 opened
Jun 12, 2025 -
models/templates: add mistralai/Mistral-Small-3.1-24B-Instruct-2503 template with tool calling support
#14148 opened
Jun 12, 2025 -
ggml : implement op fusion, starting with REGLU/GEGLU/SWIGLU
#14158 opened
Jun 12, 2025 -
ci: re-enable rocm linux build, reduce the built targets to the ones currently available in rocblas
#14184 opened
Jun 14, 2025 -
webui: save model name with conversation history (#13570)
#14192 opened
Jun 15, 2025 -
gguf-py: Make sentencepiece optional
#14200 opened
Jun 15, 2025 -
llama: fix compilation warning (#464)
#14209 opened
Jun 16, 2025 -
sycl: Cleanup codepaths in Get Rows in sycl backend
#14215 opened
Jun 16, 2025 -
ubatch : new splitting logic
#14217 opened
Jun 16, 2025 -
tests : enhance llama-bench with separate timings (pp/gen t/s), added n_threads_batch
#14219 opened
Jun 16, 2025 -
logit_bias: apply configurable escalating EOG bias at low n_remain
#14229 opened
Jun 16, 2025 -
ggml: introduce GGML_NUMA_MIGRATE to optimize cross NUMA op computation
#14232 opened
Jun 17, 2025 -
Mtmd: add a way to select device for vision encoder
#14236 opened
Jun 17, 2025 -
MODEL: Falcon-H1 support
#14238 opened
Jun 17, 2025 -
Add SmolLM3
#14240 opened
Jun 17, 2025 -
server : add pidfile option
#14242 opened
Jun 17, 2025 -
sycl: add usage of enqueue_functions extension
#14244 opened
Jun 17, 2025 -
Vulkan: Fix host-pinned memory for large allocations
#14249 opened
Jun 17, 2025 -
opencl: ref count `ggml_backend_opencl_context` and refactor profiling
#14254 opened
Jun 18, 2025 -
fix: resolve gcc compile warnings
#14261 opened
Jun 18, 2025 -
CUDA: mul_mat_v support for batch sizes > 1
#14262 opened
Jun 18, 2025
214 Issues closed by 57 people
-
Eval bug: Llama 4 Scout/Maverick crash when processing images with certain aspect ratio
#13827 closed
Jun 18, 2025 -
Misc. bug: llama-server builds possibly erroneous prompt for gemma 3
#14151 closed
Jun 18, 2025 -
Eval bug: Unexpected failure converting Mistral 7B v0.2 to f32 GGUF
#13976 closed
Jun 18, 2025 -
Misc. bug: Compilation with openCL on latest build
#13300 closed
Jun 18, 2025 -
Eval bug: Bad output from Qwen3-Embedding-0.6B
#14234 closed
Jun 17, 2025 -
Thad
#14241 closed
Jun 17, 2025 -
W
#14239 closed
Jun 17, 2025 -
Misc. bug: struct.error during GGUF conversion of Mistral-Instruct with convert_hf_to_gguf.py
#14243 closed
Jun 17, 2025 -
Misc. bug: Performance regression on aarch64 q4_0
#14134 closed
Jun 17, 2025 -
Generated thought process not shown on web ui for Qwen 3
#14199 closed
Jun 17, 2025 -
Misc. bug: CUDA error: device kernel image is invalid (Quadro RTX 8000)
#12717 closed
Jun 17, 2025 -
Slow token generation speed of Gemma 3 QAT Models
#13048 closed
Jun 17, 2025 -
Feature Request: Support multimodal LLMs such as Qwen2.5-VL as embedding models
#13247 closed
Jun 17, 2025 -
Compile bug: paths with spaces fail on Unix with Vulkan backend
#13288 closed
Jun 17, 2025 -
Misc. bug:
#14223 closed
Jun 16, 2025 -
Misc. feature: llama-cli support for solar-10.7b-instruct
#14173 closed
Jun 16, 2025 -
Eval bug: Error in trying to use llama-server with Qwen3-Embedding-0.6B-GGUF
#14204 closed
Jun 16, 2025 -
Feature Request: (webui) Implement a experimental features on webui
#11662 closed
Jun 16, 2025 -
Misc. bug: convert_hf_to_gguf.py: ValueError: Can not map tensor 'model.layers.0.mlp.down_proj.SCB'
#12923 closed
Jun 16, 2025 -
Misc. bug: (clip.cpp) q8_0 mmproj is broken on gemma 3
#13025 closed
Jun 16, 2025 -
Eval bug: llama-server stays in unresponsive state- CUDA error: out of memory -
#13085 closed
Jun 16, 2025 -
Misc. bug: OpenCL: Issue with Adreno 610
#13115 closed
Jun 16, 2025 -
Eval bug: -sm row causes GGML_ASSERT fail in Llama 4 Scout
#13240 closed
Jun 16, 2025 -
Misc. bug: terminate called after throwing an instance of 'vk::DeviceLostError'
#13248 closed
Jun 16, 2025 -
Eval bug: sentencepiece tokenizer generates incorrect tokens
#13256 closed
Jun 16, 2025 -
Misc. bug: the output file of llama-quantize is not gguf format
#13258 closed
Jun 16, 2025 -
Misc. bug: Server does not always cancel requests for disconnected connections
#13262 closed
Jun 16, 2025 -
Compile bug: RocmWMMA doesn't work
#14193 closed
Jun 15, 2025 -
Feature Request: dots.llm1 model support
#14044 closed
Jun 15, 2025 -
Misc. bug: xcframework does not contain support for Catalyst
#12751 closed
Jun 15, 2025 -
Compile bug: llama-vocab.cpp Error
#14176 closed
Jun 13, 2025 -
Eval bug: Command-A forces full-prompt re-processing due to lack of cache data
#14157 closed
Jun 13, 2025 -
Compile bug: Vulkan Cross compile for arm64
#13068 closed
Jun 13, 2025 -
Misc. bug: Shared libraries don't properly contain /common/ functions
#13156 closed
Jun 13, 2025 -
Eval bug: Unreadable output when using qwen2-vl model.
#13165 closed
Jun 13, 2025 -
Misc. bug: llama-parallel segmentation fault
#13172 closed
Jun 13, 2025 -
Eval bug: Qwen models lost ability to think
#14147 closed
Jun 12, 2025 -
SYCL fails to initialize unless iGPU is disabled (Intel Arc A770 + i5-9500)
#13775 closed
Jun 12, 2025 -
Misc. bug: Using draft model with Gemma producing error "get_logits_ith: invalid logits id 0"
#13963 closed
Jun 12, 2025 -
Misc. bug: Flash Attention not working on CDNA3 ROCm 6.4 MI300
#13145 closed
Jun 12, 2025 -
Compile bug: hipcc is ran with host C++ compiler flags
#14136 closed
Jun 11, 2025 -
Misc. bug: `test-chat` fails on x86 windows builds but works everywhere else
#14112 closed
Jun 11, 2025 -
Can't llama-quantize Command A unless I rollback
#14054 closed
Jun 11, 2025 -
Misc. bug: 10 Image maximum?
#14111 closed
Jun 11, 2025 -
Eval bug: The precompiled CUDA 12.4 package for Windows cannot be used.
#14040 closed
Jun 11, 2025 -
Dequantize function: Row misalignment in dequantized tensors - only first column matches original
#13839 closed
Jun 10, 2025 -
Misc. bug: ValueError: Duplicated key name 'deepseek2.attention.key_length'
#14093 closed
Jun 10, 2025 -
Eval bug: Compute function exceeds available stack space
#14055 closed
Jun 10, 2025 -
Misc. bug: KV defrag bug: nf != nh
#14059 closed
Jun 9, 2025 -
llama.cpp error when using the snowflake-arctic-embed-v2 model
#14018 closed
Jun 9, 2025 -
Compile bug: Linux with CUDA 12.6
#11696 closed
Jun 9, 2025 -
Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache.
#12352 closed
Jun 9, 2025 -
Feature Request: Add kv-quant fa kernel variants for head sizes other than 128
#12989 closed
Jun 9, 2025 -
Eval bug: Flash Attention not working with NVIDIA GeForce RTX 4060 Ti
#13092 closed
Jun 9, 2025 -
Eval: HIP: Llama-server multi-instance lockup
#13100 closed
Jun 9, 2025 -
Feature Request: define key bindings for quick deletion of the previous conversation.
#13111 closed
Jun 9, 2025 -
Feature Request: Kimi-Audio-7B
#13114 closed
Jun 9, 2025 -
Eval bug: EXAONE fails to run with quantized KV cache
#13121 closed
Jun 9, 2025 -
Misc. bug: Cannot start in Docker for gfx1036
#14068 closed
Jun 8, 2025 -
Eval bug: runtime_error: Invalid diff caused by placing "</think>" instead of a plain "<" token
#14060 closed
Jun 8, 2025 -
Misc. bug: llama-cli (vulkan backend) output gibberish with old vulkan sdk
#13044 closed
Jun 8, 2025 -
Misc. bug: Retrieval sample not decoding token successfully
#13102 closed
Jun 8, 2025 -
Misc. bug: vulkan prompt processing suddenly slows down once I reach a certain prompt size
#13765 closed
Jun 7, 2025 -
Eval bug: Server still allocates KV cache on GPU when specifying --no-kv-offload
#14043 closed
Jun 7, 2025 -
Feature Request: Interleaved sliding window attention support for gemma 2 and 3
#12637 closed
Jun 7, 2025 -
Compile bug: There was a errror while compiling support for the backend Vulkan
#12619 closed
Jun 7, 2025 -
Model Repeats Nonsensical Output
#13066 closed
Jun 7, 2025 -
Compile bug: NVIDIA A800-SXM4-40GB ggml_cuda_init failed
#13059 closed
Jun 6, 2025 -
Doc. bug: docs/multimodal/gemma3.md need to be updated
#13064 closed
Jun 6, 2025 -
Eval bug: AttributeError: Moonlight-16B-A3B-Instruct - TikTokenTokenizer has no attribute vocab
#13072 closed
Jun 6, 2025 -
Eval bug: llama.cpp crashes in string comparison when using a reasoning model for long periods of time
#13965 closed
Jun 5, 2025 -
Eval bug: std::runtime_error Invalid diff:
#13876 closed
Jun 5, 2025 -
Compile bug: Race condition during compilation, compilation works with -j 1 but not with -j 8
#13993 closed
Jun 5, 2025 -
Bug: MinGW build fails to load models with "error loading model: PrefetchVirtualMemory unavailable"
#9311 closed
Jun 5, 2025 -
Eval bug: llama-server -hf nomic-ai/nomic-embed-text-v2-moe-GGUF --embeddings , broken on latest version
#14021 closed
Jun 5, 2025 -
Compile bug: Prooted Debian in Droid Termux only
#12452 closed
Jun 5, 2025 -
[Build] Some Build Options/Definitions seems Missing in ggml-base
#13017 closed
Jun 5, 2025 -
Eval bug: Error when load `bge-reranker-v2-gemma` model
#13041 closed
Jun 5, 2025 -
Misc. bug: new kv cell seq implementation does not handle "seq_id = -1" specified in the API
#13983 closed
Jun 4, 2025 -
Eval bug: OpenAI incompatible image handling in server multimodal
#12947 closed
Jun 4, 2025 -
Perplexity script for non GGUF quantization
#13015 closed
Jun 4, 2025 -
Eval bug: RWKV inference issue with llama-server
#13018 closed
Jun 4, 2025 -
Container images in GHCR registry, are not multi arch
#13995 closed
Jun 3, 2025 -
Misc. bug: llama-server didn't display thought process since b5576
#13981 closed
Jun 3, 2025 -
Misc. bug: Reasoning content is not separated when streaming
#13867 closed
Jun 2, 2025 -
Misc. bug: memory leak in mtmd ? (mtmd_helper_eval_chunk_single)
#13958 closed
Jun 2, 2025 -
Misc. bug: rpc - Flash Attention Failure in Metal/CUDA RPC Mixed Environment
#12655 closed
Jun 2, 2025 -
gmake[2]: *** [tests/CMakeFiles/test-tokenizer-0.dir/build.make:107: bin/test-tokenizer-0] Error 1
#12998 closed
Jun 2, 2025 -
Eval bug: Segmentation fault when running gemma3-cli on Android
#13000 closed
Jun 2, 2025 -
Eval bug: why Gemma 3 model has run into CPU inference
#13004 closed
Jun 2, 2025 -
Eval bug: default system prompt in llama-server
#13948 closed
Jun 1, 2025 -
Eval bug: Quad P40 unable to run 70B models on recent releases
#12990 closed
Jun 1, 2025 -
Eval bug: Not support DeepSeek-R1-0528-GGUF-Q8_0
#13916 closed
May 31, 2025 -
mtmd: cmake: C API broken since last change, static linking always broken
#13902 closed
May 31, 2025 -
Eval bug: uncaught std::runtime_exception thrown in llama-server during tool use
#13812 closed
May 31, 2025 -
CUDA illigal memory bug 75 fixed?
#13906 closed
May 31, 2025 -
Misc. bug: what(): Unexpected empty grammar stack after accepting piece: <unused32>
#13341 closed
May 31, 2025 -
Compile bug: gcc-11: error: unrecognized command-line option '-compress-mode=size'
#12325 closed
May 31, 2025 -
Eval bug: convert_hf_to_gguf.py AttributeError:
#12847 closed
May 31, 2025 -
Compile bug: FAILED: examples/llava/CMakeFiles/llava.dir/llava.cpp.obj
#12899 closed
May 31, 2025 -
Compile bug: how to enable opencl in termux
#12911 closed
May 31, 2025 -
Misc. bug: llama-server speculative decoding not as performant as llama-speculative-simple
#12968 closed
May 31, 2025 -
Feature Request: multi model cli tools: Convert submitted images to best size and format for model
#12981 closed
May 31, 2025 -
Feature Request: Make chat sessions possible with multi model cli tools
#12982 closed
May 31, 2025 -
Misc. bug: Potential memory leak in backend registry
#12986 closed
May 31, 2025 -
Eval bug: llama-server.exe silently crashes (ucrtbased.dll) after 2-3 requests in a dialogue
#13877 closed
May 30, 2025 -
`CUDA error: an illegal memory access was encountered` on DeepSeek-R1-0528
#13909 closed
May 30, 2025 -
CUDA error: an illegal memory access was encountered (with large prompts)
#13851 closed
May 30, 2025 -
Eval bug: "GGML_ASSERT(!(split && ne02 > 1)) failed" when loading DeepSeek-R1T with --split-mode row
#13372 closed
May 30, 2025 -
Feature Request: Splitting layers according to VRAM usage on multi GPUs setups
#12654 closed
May 30, 2025 -
Misc. bug: Excessive power draw on the second GPU in dual RTX 3090 setup when idle
#12958 closed
May 30, 2025 -
Why does /ggml/CMakeLists.txt add_subdirectory(examples)?
#12963 closed
May 30, 2025 -
Misc. bug: gguf-new-metadata and gguf-editor-gui changes all integer arrays to INT32
#13557 closed
May 29, 2025 -
Eval bug: stream with tool_call fix in b5478 crash in container and issues with calls from apps
#13766 closed
May 29, 2025 -
Misc. bug: ALL gguf models fail to run (no log, docker exit code 139),
#12205 closed
May 29, 2025 -
Eval bug: got exception: {"code":500,"message":"Unsupported param: echo","type":"server_error"}
#12591 closed
May 29, 2025 -
Compile bug: ggml-cuda/opt-step-adamw.cu error: identifier "__Poly8x8_t" is undefined on Jetson Orin AGX
#12826 closed
May 29, 2025 -
CUDA: implementation of mul_mat_id
#12859 closed
May 29, 2025 -
what *tool/framework* to use if testing performance of .gguf models
#12901 closed
May 29, 2025 -
Misc. bug: llama-bench --tensor-split handling is broken
#12917 closed
May 29, 2025 -
Compile bug: macro "DECL_FATTN_MMA_F16_CASE" requires 3 arguments, but only 2 given
#12921 closed
May 29, 2025 -
Misc. bug: llama-server "terminate called after throwing an instance of 'std::runtime_error'"
#12939 closed
May 29, 2025 -
Model conversion issue
#12941 closed
May 29, 2025 -
Eval bug: KV cache shifting does not work for Qwen2.5VL
#13865 closed
May 28, 2025 -
CI: build-linux-cross failing
#13869 closed
May 28, 2025 -
Eval bug: qwen2.5-vl related bugs
#13848 closed
May 28, 2025 -
Unable to deploy the fine-tuned qwen2.5-vl-7b using llama.cpp.
#13723 closed
May 28, 2025 -
Misc. bug: Streaming tool calls does not return "type": "function", unlike non-stream
#13798 closed
May 28, 2025 -
Feature Request: Free up VRAM when llama-server not in use
#11703 closed
May 28, 2025 -
Eval bug: ggml_vulkan: Device memory allocation of size N failed with ub > 4096 and c > 4096 and b > 4096
#12817 closed
May 28, 2025 -
Eval bug: ROCm error: CUBLAS_STATUS_INTERNAL_ERROR
#12878 closed
May 28, 2025 -
Misc. bug: gguf-my-repo doesn't work - [Errno 2] No such file or directory: './llama.cpp/llama-quantize'
#12925 closed
May 28, 2025 -
Misc. bug: The llama-server not read the "--keep" param that user input in the cli
#12927 closed
May 28, 2025 -
I ran into this issue while trying to convert Smollm2 and Qwen2.5
#13603 closed
May 27, 2025 -
Misc. bug: llama-mtmd-cli ignores multiple image input
#13704 closed
May 27, 2025 -
Large performance drop when using pipeline parallelism and layer splitting on multiple GPUs
#13751 closed
May 27, 2025 -
Eval bug: gemma3 getting stuck with no output when
#13715 closed
May 27, 2025 -
Misc. bug: llama-sampling.cpp:204: GGML_ASSERT(cur_p->size > 0) failed
#13405 closed
May 27, 2025 -
Compile bug: MTT S4000 compile error
#13819 closed
May 27, 2025 -
Misc. bug: Streaming with tools causes pydantic-ai to mess up tool name
#13774 closed
May 26, 2025 -
server: terminate called after throwing an instance of 'std::runtime_error'
#13780 closed
May 26, 2025 -
Eval bug: Output NAN when use Qwen3 embedding models with FP16
#13795 closed
May 26, 2025 -
Eval bug: GGML_ASSERT(ggml_vk_op_supports_incontiguous(op) || ggml_vk_dim01_contiguous(src0)) failed
#13597 closed
May 26, 2025 -
convert_hf_to_gguf.py does not work for QWen-7b-chat fine tuning with LoRa exported model.
#13789 closed
May 26, 2025 -
Eval bug: Mistral Small Multiomodal fails when used with the Vulkan backend
#13778 closed
May 26, 2025 -
Feature Request: NUMA-aware MoE Expert Allocation for Improved Performanc
#11333 closed
May 26, 2025 -
Eval bug: Accuracy is dropped when I convert model to gguf. Qwen2_VL_7B_Instruct
#12538 closed
May 26, 2025 -
Eval bug: Crash in trim method
#12710 closed
May 26, 2025 -
How to use *chat_template* with .gguf models ? (tokenizer_name not implemented)
#12897 closed
May 26, 2025 -
multiple_choice_score : task 17 does not fit in the context window
#12905 closed
May 26, 2025 -
Eval bug: GLM-Z1-9B-0414
#12946 closed
May 25, 2025 -
Misc. bug: Speed degradation in `bin-win-cpu-x64` compared to `bin-win-avx2-x64` on Intel Core i7-12700H
#13664 closed
May 25, 2025 -
Feature Request: moondream2 vlm support in mtmd
#13332 closed
May 25, 2025 -
Compile bug: ‘ggml_gelu_erf’ was not declared in this scope; did you mean ‘ggml_gelu’
#13744 closed
May 25, 2025 -
Feature Request: Support for Qwen2-VL
#9246 closed
May 25, 2025 -
Prompt eval is 5x slower than in Ollama and maxes out the CPU
#12237 closed
May 25, 2025 -
Feature Request: Slim Attention (lossless 2x reduction in KV cache size)
#12359 closed
May 25, 2025 -
Misc. bug: convert_hf_to_gguf.py fails to convert the model of architecture T5ForConditionalGeneration
#12862 closed
May 25, 2025 -
Eval bug: Assertion _LIBCPP_ASSERT_VALID_ELEMENT_ACCESS while using a particular model
#12877 closed
May 25, 2025 -
Eval bug: moonshotai/Moonlight-16B-A3B-Instruct
#12880 closed
May 25, 2025 -
Eval bug: add support for https://huggingface.co/
#12884 closed
May 25, 2025 -
Misc. bug: llama-server token per second slow down sigificant after release b5450 (#13642)
#13735 closed
May 24, 2025 -
Eval bug: UGM tokenizer sometimes outputs wrong tokens/in the wrong order
#13725 closed
May 24, 2025 -
Compile bug: Build failure for Intel oneMKL on Windows
#12478 closed
May 24, 2025 -
Add support for gemma 3 in the server?
#12762 closed
May 24, 2025 -
CUDA performance bug when two cards are visible and only one is used
#12838 closed
May 24, 2025 -
Misc. bug: Overflow in Cast (
#13722 closed
May 23, 2025 -
Phi-4-mini reasoning CRASH!!! (Vulkan)
#13464 closed
May 23, 2025 -
OpenCL: Performance comparison depending on gpu_offloads
#12810 closed
May 23, 2025 -
Llama 4 convert_hf_to_gguf.py tokenizer error
#12819 closed
May 23, 2025 -
Misc. bug: HIP / ROCm memory allocation broken after release b5450
#13698 closed
May 22, 2025 -
Eval bug: `llama-tts` fails (abort) with longer lines
#13712 closed
May 22, 2025 -
GGML_ASSERT(seq_id < n_tokens && "seq_id cannot be larger than n_tokens with pooling_type == MEAN") failed
#13689 closed
May 22, 2025 -
Eval bug: MUSA backend cause non-sense output on unsloth/deepseek-r1 quantized model
#12779 closed
May 22, 2025 -
Misc. bug: Metric names are invalid
#12803 closed
May 22, 2025 -
crash: GGML_ASSERT(seq_id < n_tokens && "seq_id cannot be larger than n_tokens with pooling_type == MEAN")
#13688 closed
May 21, 2025 -
Eval bug: phi-4 crashes with new versions
#13665 closed
May 21, 2025 -
OpenCL: Add CPU fallback for unsupported operations
#13621 closed
May 21, 2025 -
Eval bug: Cannot run unsloth/deepseek-r1 2bit Model
#12778 closed
May 21, 2025 -
Qwen3 32B and 30B models are similar size, But there is 4x difference between the performance!?
#13652 closed
May 20, 2025 -
Eval bug: NVIDIA Jetson AGX Xavier CUDA Compatibility Issue with llama.cpp
#13629 closed
May 20, 2025 -
Eval bug: vulkan Llama cpp prefers shared memory over dedicated memory
#12748 closed
May 20, 2025 -
Compile bug: `binary-ops.cpp: error: invalid conversion`
#12765 closed
May 20, 2025 -
Cannot compile SYCL backend SYCL_LIBRARY=SYCL_LIBRARY - NOTFOUND as per documentation
#12696 closed
May 19, 2025 -
Eval bug: No output using llama-batched-bench
#13553 closed
May 19, 2025 -
Feature Request: when llama.cpp can support convert qwen2.5 VL 7B/72B model to gguf?
#11541 closed
May 19, 2025 -
Misc. bug: HIP backend performs poorly on AMD Ryzen AI MAX 395 (Strix Halo gfx1151)
#13565 closed
May 18, 2025
120 Issues opened by 113 people
-
Compile bug: gcc-12: error: unrecognized command-line option ‘-compress-mode=size’
#14260 opened
Jun 18, 2025 -
Compile bug: [blas] choose blas backend to run llama2-7b model, but system info doesn't have the blas flag.
#14259 opened
Jun 18, 2025 -
Misc. bug: [CANN] memory leaky using CANN as backend
#14257 opened
Jun 18, 2025 -
Eval bug: exmaple llama-simple-chat run failed in Android
#14253 opened
Jun 18, 2025 -
Feature Request: fix handling of Qwen3-Embedding-0.6B input to add EOS token
#14252 opened
Jun 18, 2025 -
Misc. bug: prompt as pasted content in the server
#14251 opened
Jun 17, 2025 -
Llama 4 mmproj fails `unable to find tensor mm.model.fc.weight`
#14237 opened
Jun 17, 2025 -
Misc. bug: llama-server slower on 4bit quantized model with f470bc36bed
#14235 opened
Jun 17, 2025 -
Misc. bug: weird cursor placement in the web UI
#14233 opened
Jun 17, 2025 -
Eval bug: Command-A generates a single repeating token when using split mode row on P40
#14228 opened
Jun 16, 2025 -
Misc. bug: Complex tool calling schema causes an "Unrecognized Schema" exception
#14227 opened
Jun 16, 2025 -
Feature Request: Add --no-warmup to llama-bench
#14224 opened
Jun 16, 2025 -
Misc. bug: OAI response_format json_schema and json_object not applied with Llama 3.x models
#14218 opened
Jun 16, 2025 -
Feature Request: llama-server: a flag for limiting input image size
#14216 opened
Jun 16, 2025 -
Eval bug: RWKV inference with llama-parallel gets wrong output with lmhead offloaded to GPU
#14211 opened
Jun 16, 2025 -
Misc. bug: full-cuda docker build needs ldconfig before launching llama-*
#14195 opened
Jun 15, 2025 -
Misc. bug: [Windows] GPU layers/tensors still consume system memory after load when mmap = true
#14187 opened
Jun 15, 2025 -
Misc. bug: evaluate_and_capture_cuda_graph NULL POINTER DEREFERENCE
#14186 opened
Jun 15, 2025 -
Misc. bug: Failure to allocate buffer with ROCm 6.4
#14178 opened
Jun 13, 2025 -
prismatic-vlms to gguf?
#14159 opened
Jun 13, 2025 -
Compile bug: HIP compile fails during linking stage, undefined reference error repeats
#14155 opened
Jun 12, 2025 -
Research: mmap eviction
#14154 opened
Jun 12, 2025 -
Metrics should not include : in Prometheus metric names
#14150 opened
Jun 12, 2025 -
Misc. bug: llama-server drops multi-part content for final assistant message
#14137 opened
Jun 12, 2025 -
Misc. bug: "llama_context_params::swa_full = true" causes very large RAM/VRAM usage
#14123 opened
Jun 11, 2025 -
Misc. bug: Stuck while loading the model
#14114 opened
Jun 11, 2025 -
Misc. bug: --cache-reuse no longer seems to be caching prompt prefixes
#14113 opened
Jun 11, 2025 -
Vulkan Runner Frequent Crashing under workload
#14105 opened
Jun 10, 2025 -
Misc. bug: option --remote of convert_hf_to_gguf.py does not work in Windows
#14102 opened
Jun 10, 2025 -
Misc. bug: Server/Chat parallel tool calling not working
#14101 opened
Jun 10, 2025 -
Eval bug: Gemma3 decode and update_slots fail with parallel slots
#14097 opened
Jun 10, 2025 -
Eval bug: MiniCPM4 0.5B run failed
#14094 opened
Jun 10, 2025 -
Misc. bug: Server tests /health race conditions
#14092 opened
Jun 9, 2025 -
Misc. bug: [SYCL] llama-cli built by Visual Studio 2022 is not working
#14086 opened
Jun 9, 2025 -
Misc. bug: Qwen3-Embedding-0.6B-GGUF doesn't work for 32768 context size (too much memory used)
#14084 opened
Jun 9, 2025 -
Eval bug: Model produces gibberish or repeated output when using `-sm row` on CUDA
#14075 opened
Jun 8, 2025 -
Eval bug: KV cache stopped working in b5554 version
#14071 opened
Jun 8, 2025 -
Can you add an example of running the model using the llama-cpp-python Python binding for quick start?
#14066 opened
Jun 8, 2025 -
[How to serve lookahead decoding Qwen 3]
#14057 opened
Jun 7, 2025 -
Feature Request: add support for length_penalty
#14053 opened
Jun 6, 2025 -
Misc. bug: llama-server --batch-size always set to 64
#14046 opened
Jun 6, 2025 -
Feature Request: add a new repo for convertion of gguf
#14027 opened
Jun 5, 2025 -
Feature Request: support FP8 data type in llama.cpp
#14020 opened
Jun 5, 2025 -
Misc. bug: "error: invalid argument: /bin/sh" when using Docker image
#14019 opened
Jun 5, 2025 -
Feature Request: Support Llama-Nemotron-Nano-VL-8B-V1
#14015 opened
Jun 4, 2025 -
Compile bug: numerous deprecation warnings when compiling in Termux
#14011 opened
Jun 4, 2025 -
Misc. bug: llama-server webui with --jinja flag does not show thinking when using reasoning models
#14007 opened
Jun 4, 2025 -
Feature Request: allow spacebar to confirm web UI prompts [like the deleting a chat confirmation]
#13999 opened
Jun 3, 2025 -
Compile bug:
#13992 opened
Jun 3, 2025 -
Eval bug: Abort is called in a thread from a custom thread pool during a llama_decode call
#13990 opened
Jun 3, 2025 -
Feature Request:
#13989 opened
Jun 3, 2025 -
Misc. bug: sentencepiece not included in requirements.txt
#13982 opened
Jun 3, 2025 -
Eval bug: Unusual high RAM usage on Windows when running DeepSeek V3 Q2_K_XL/IQ2_XXS, on Hybrid CPU+GPU
#13978 opened
Jun 2, 2025 -
Misc. bug: llama-bench improper tensor split
#13972 opened
Jun 2, 2025 -
context shifting should be default option?
#13971 opened
Jun 2, 2025 -
make using shifting context easier.
#13969 opened
Jun 2, 2025 -
Eval bug: Unable to load the model on GPU
#13967 opened
Jun 2, 2025 -
Feature Request: WINA
#13964 opened
Jun 2, 2025 -
Eval bug: llama-mtmd-cli : option --image failed to load image
#13959 opened
Jun 1, 2025 -
Eval bug: llama-tts abort
#13955 opened
Jun 1, 2025 -
Feature Request: Regarding Hardcoded GGML Tensor Name Length Limit (GGML_MAX_NAME)
#13947 opened
May 31, 2025 -
Feature Request: Generate Image Embeddings with llama.cpp
#13913 opened
May 30, 2025 -
android built on GPU cannot comparable with CPU?
#13910 opened
May 30, 2025 -
Compile bug: nvcc fatal : Unsupported gpu architecture 'compute_'
#13893 opened
May 29, 2025 -
Misc. bug: linux/arm64 does not exist for the server docker image
#13891 opened
May 29, 2025 -
Feature Request: Make the `/completion` endpoint in `llama-server` work with multimodal models
#13872 opened
May 28, 2025 -
Automatic optimization of runtime parameters such as -ngl given memory constraints
#13860 opened
May 28, 2025 -
Feature Request: Optimize for Nvidia Jetson Series' truly Unified Memory Architecture
#13856 opened
May 28, 2025 -
Eval bug: Embeddings Always returned as non
#13854 opened
May 28, 2025 -
Feature Request: Set default of --numa to distribute
#13850 opened
May 28, 2025 -
Eval bug: Uncaught exception [json.exception.parse_error.101] during tool use crashes llama-server
#13825 opened
May 27, 2025 -
Eval bug: seed seems to be locked to a single value 4294967295
#13823 opened
May 27, 2025 -
Eval bug: Cannot load Qwen3 ranking models
#13820 opened
May 27, 2025 -
ERROR:hf-to-gguf:Model MllamaForConditionalGeneration is not supported
#13805 opened
May 26, 2025 -
ERROR:hf-to-gguf:Model Qwen2_5_VLModel is not supported
#13802 opened
May 26, 2025 -
Compile bug: Vulkan Build Fails in Termux/Proot Due to Missing Cooperative Matrix Shader Variables
#13801 opened
May 26, 2025 -
Misc. bug: Decreased success rate for tool calling
#13769 opened
May 25, 2025 -
Misc. bug: llama-cli.exe stopped working on Windows Server 10
#13767 opened
May 25, 2025 -
Misc. bug: segfault in test-gbnf-validator
#13762 opened
May 24, 2025 -
Feature Request: video support in mtmd-cli / server
#13754 opened
May 24, 2025 -
Feature Request: Add keep_alive function for llama-server
#13748 opened
May 24, 2025 -
Feature Request: --swa-extra parameter needed to restore speculative decode function with SWA
#13747 opened
May 24, 2025 -
Misc. bug: RUNPATH properties are not properly set
#13740 opened
May 24, 2025 -
open source dataset for low bit quantization?
#13736 opened
May 24, 2025 -
Eval bug: Server and mtmd both crashing when starting Ultravox
#13727 opened
May 23, 2025 -
Feature Request: (webui) do not throw away message if there is error in stream
#13709 opened
May 22, 2025 -
Eval bug: Server Returns Empty Responses Under High Load
#13703 opened
May 22, 2025 -
Misc. bug: ./llama-server API max_completion_tokens Parameter Not Working
#13700 opened
May 22, 2025 -
Eval bug: Qwen2.5-VL-7B-Instruct returns extremely inaccurate bbox coordinates
#13694 opened
May 21, 2025 -
Eval bug: std::regex to split the text
#13691 opened
May 21, 2025 -
Eval bug: swa_full = true is slower than false
#13683 opened
May 21, 2025 -
Feature Request: Falcon-H1
#13681 opened
May 21, 2025 -
devops/nix: `flake.lock` is very obsolete
#13679 opened
May 21, 2025 -
Misc. bug: AMX is not ready to be used!
#13678 opened
May 21, 2025 -
Eval bug: SYCL branch produces mul_mat bug when trying to run.
#13674 opened
May 21, 2025 -
Eval bug: Output garbled in dual-GPU environment
#13673 opened
May 21, 2025 -
Feature Request: Llama-bench improvement
#13671 opened
May 20, 2025 -
Feature Request: Procedure for reproducing test models
#13662 opened
May 20, 2025 -
Eval bug: Not splitting model across rows correctly
#13661 opened
May 20, 2025 -
Compile bug: GPU Detection Fails during cmake --build
#13636 opened
May 19, 2025 -
Feature Request: Support for Qwen with Parallel Scaling
#13632 opened
May 19, 2025 -
can't quant llama3 with expanded tokenizer
#13628 opened
May 19, 2025 -
webui: First user prompt sometimes disappears after sending
#13622 opened
May 18, 2025 -
Misc. bug: batch in the mtmd-cli.cpp not freed
#13620 opened
May 18, 2025 -
Feature Request: update readme for ideal MOE tensor override calculation
#13616 opened
May 18, 2025 -
Compile bug: tools build failing
#13614 opened
May 18, 2025
138 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
feat(server): Add tool call support to WebUI (LLama Server)
#13501 commented on
Jun 16, 2025 • 30 new comments -
Update python verions
#13574 commented on
May 23, 2025 • 18 new comments -
Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client
#13196 commented on
Jun 14, 2025 • 15 new comments -
ggml : fix race-condition in ggml-rpc
#13600 commented on
May 25, 2025 • 7 new comments -
server : separate the notion of position and KV tokens, remove prompt truncation
#13576 commented on
May 19, 2025 • 6 new comments -
llama : try loading tensors with pre-computed hashes
#13106 commented on
May 25, 2025 • 6 new comments -
Misc. bug: -sm row results in gibberish output on HIP (ROCm 6.3.3)
#13545 commented on
Jun 14, 2025 • 0 new comments -
Research: How to integrate VITA 1.5 for multi-modal GGUF deployment?
#13520 commented on
Jun 14, 2025 • 0 new comments -
(Discussion) Improve usability of llama-server
#13367 commented on
Jun 14, 2025 • 0 new comments -
Feature request: Graphical GGUF viewer
#6715 commented on
Jun 14, 2025 • 0 new comments -
LoRA training example
#13485 commented on
Jun 15, 2025 • 0 new comments -
Feature Proposal: Server Model Switching at Runtime
#13027 commented on
Jun 15, 2025 • 0 new comments -
Eval bug: Weight repacking for AVX2 block interleaving is very slow and NUMA unfriendly
#12759 commented on
Jun 15, 2025 • 0 new comments -
Misc. bug: ROCm images cannot be found
#11913 commented on
Jun 15, 2025 • 0 new comments -
Eval bug:GGUF Conversion from LLaVA 1.6(LLaVA NeXT) doesn't work
#13593 commented on
Jun 16, 2025 • 0 new comments -
Misc. bug: GGML_ASSERT(view_src == NULL || data_size == 0 || data_size + view_offs <= ggml_nbytes(view_src)) failed
#13581 commented on
Jun 16, 2025 • 0 new comments -
Misc. bug: Potential out of bound in rerank
#13549 commented on
Jun 16, 2025 • 0 new comments -
Eval bug: multimodal llama-gemma3-cli gives nonsensical outputs when used with Vulkan
#13046 commented on
Jun 16, 2025 • 0 new comments -
Feature Request: Support Codestral Mamba
#8519 commented on
Jun 16, 2025 • 0 new comments -
Feature Request: Granite 4 Support
#13275 commented on
Jun 16, 2025 • 0 new comments -
Eval bug: Error running multiple contexts from multiple threads at the same time with Vulkan
#11371 commented on
Jun 17, 2025 • 0 new comments -
Granite Four
#13550 commented on
Jun 16, 2025 • 0 new comments -
Feature Request: XiaomiMiMo/MiMo-7B-RL
#13218 commented on
Jun 13, 2025 • 0 new comments -
Why mul_mat in ggml slower than llama.cpp?
#13473 commented on
Jun 13, 2025 • 0 new comments -
Eval bug: I just finetuned gpt2 model with lora and save it to gguf file but not properly worked
#13489 commented on
Jun 13, 2025 • 0 new comments -
Eval bug: BGE-M3 Embedding model is not accessible
#13494 commented on
Jun 13, 2025 • 0 new comments -
Misc. bug: llama-cli stopped starting in release b4191 (c9b00a7)
#13498 commented on
Jun 13, 2025 • 0 new comments -
Feature Request: Apple just release Fast-VLM, a very promising set of multimodal language models
#13512 commented on
Jun 13, 2025 • 0 new comments -
Feature Request: Qwen 2.5 VL
#11483 commented on
Jun 12, 2025 • 0 new comments -
Misc. bug: Model not loaded on Android with NDK
#13399 commented on
Jun 12, 2025 • 0 new comments -
Eval bug: I cannot run llama 405b on CPU
#13475 commented on
Jun 12, 2025 • 0 new comments -
web UI either doesn't scroll or jumps to the wrong element
#13479 commented on
Jun 12, 2025 • 0 new comments -
Partial offload support for training
#13486 commented on
Jun 12, 2025 • 0 new comments -
tutorials : list for llama.cpp
#13523 commented on
Jun 11, 2025 • 0 new comments -
Feature Request: (webui) read data from /props endpoint and use it on the webui
#11717 commented on
Jun 11, 2025 • 0 new comments -
Misc. bug: Completions hang after CUDA error, but health endpoint reports all OK
#13281 commented on
Jun 11, 2025 • 0 new comments -
Misc. bug: The web UI of llama-server is not displaying correctly.
#13428 commented on
Jun 11, 2025 • 0 new comments -
Compile bug: ld returned 1 exit status (file bigger than 2gb)
#13446 commented on
Jun 11, 2025 • 0 new comments -
[CUDA backend ONLY] Use just K-cache for MLA + FA: 47% saving on KV-cache size
#13529 commented on
Jun 12, 2025 • 0 new comments -
cuda: set cuda compiler path (#13527)
#13528 commented on
May 21, 2025 • 0 new comments -
webui: Add editing assistant messages (#11849)
#13522 commented on
May 29, 2025 • 0 new comments -
convert: Swap GLM4 EOS / EOT token
#13505 commented on
May 20, 2025 • 0 new comments -
Update README.md for using llama.cpp in Microsoft Word locally
#13401 commented on
May 20, 2025 • 0 new comments -
[Perf] [CPU] eliminate redundant memory access in group query attention
#13319 commented on
Jun 11, 2025 • 0 new comments -
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method (TMAC)
#13206 commented on
Jun 10, 2025 • 0 new comments -
quantize: Handle user-defined pruning of whole layers (blocks)
#13037 commented on
Jun 14, 2025 • 0 new comments -
gguf-py: byteswapping improvements
#12851 commented on
May 27, 2025 • 0 new comments -
convert : write tensors in parallel
#12837 commented on
Jun 2, 2025 • 0 new comments -
Support for OuteTTS 1.0
#12794 commented on
May 20, 2025 • 0 new comments -
Update llama-quant.cpp llama_tensor_get_type with DeepSeek friendly modifications
#12727 commented on
May 29, 2025 • 0 new comments -
imatrix: add option to display importance score statistics for a given imatrix file
#12718 commented on
Jun 14, 2025 • 0 new comments -
WIP: Add support for CogAgent
#12679 commented on
May 29, 2025 • 0 new comments -
tts : implement sesame CSM + Mimi decoder
#12648 commented on
Jun 10, 2025 • 0 new comments -
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on
Jun 3, 2025 • 0 new comments -
Fix rocWMMA build documentation
#12243 commented on
Jun 13, 2025 • 0 new comments -
llama : expose API to retrieve devices associated with the model.
#12073 commented on
Jun 10, 2025 • 0 new comments -
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on
Jun 18, 2025 • 0 new comments -
ggml-cpu-aarch64: Fix compilation issues
#11745 commented on
Jun 17, 2025 • 0 new comments -
Allow user to compile with any cuda version using github actions
#10928 commented on
May 23, 2025 • 0 new comments -
Introduce Graph Profiler
#9659 commented on
Jun 5, 2025 • 0 new comments -
[Draft] Tensor Parallel support to llama.cpp
#9648 commented on
May 31, 2025 • 0 new comments -
llama : initial Mamba-2 support
#9126 commented on
Jun 18, 2025 • 0 new comments -
ggml: avoid rebuild of GGML graph for each token (#7456)
#8366 commented on
Jun 5, 2025 • 0 new comments -
`json`: unified properties order across optional & required
#8133 commented on
May 26, 2025 • 0 new comments -
Add PaliGemma Support
#7553 commented on
Jun 1, 2025 • 0 new comments -
Llama cpp low level python bindings
#1660 commented on
Jun 1, 2025 • 0 new comments -
ggml : add WebGPU backend
#7773 commented on
Jun 18, 2025 • 0 new comments -
Misc. bug: --split-mode none ≠ --tensor-split 100,0,0 (all layers on GPU0)
#13612 commented on
Jun 18, 2025 • 0 new comments -
llama_model_load: error loading model: error loading model vocabulary: std::bad_cast
#13613 commented on
Jun 18, 2025 • 0 new comments -
Misc. bug: logit-bias doesn't seem to work
#13605 commented on
May 19, 2025 • 0 new comments -
something with llama_server? slow vs llama_cli
#13560 commented on
May 25, 2025 • 0 new comments -
Feature Request: Qwen2.5-Omni
#12673 commented on
May 26, 2025 • 0 new comments -
Feature Request: Add support of convert.py for model Qwen2.5-Omni-7B
#12641 commented on
May 26, 2025 • 0 new comments -
Feature Request: add per-request "reasoning" options in llama-server
#13272 commented on
May 27, 2025 • 0 new comments -
Feature Request: (webui) add import / export function for ALL conversations
#11718 commented on
May 27, 2025 • 0 new comments -
Feature Reequest: Multi model cli tools: Add a possibility to specify a image in conversation mode plus tab auto completion for path
#12983 commented on
May 27, 2025 • 0 new comments -
Eval bug: llama-mtmd-cli doesn't support system prompts
#13454 commented on
May 28, 2025 • 0 new comments -
Feature Request: Installable package via winget
#8188 commented on
May 29, 2025 • 0 new comments -
Eval bug: microsoft/bitnet-b1.58-2B-4T-gguf
#12997 commented on
Jun 2, 2025 • 0 new comments -
Misc. bug: missing messages in JSON export via llama-server web UI
#13552 commented on
Jun 3, 2025 • 0 new comments -
Eval bug: SIGILL
#13161 commented on
Jun 4, 2025 • 0 new comments -
Compile bug: Canot convert from char8_t to char* in llama-chat.cpp
#12740 commented on
Jun 4, 2025 • 0 new comments -
Misc. bug: [SERVER] Multiple slots, generation speed is degraded after each generation/slot used
#10860 commented on
Jun 4, 2025 • 0 new comments -
Eval bug: Can't run Qwen3-32B Q4_K_XL
#13298 commented on
Jun 4, 2025 • 0 new comments -
Misc. bug: -TS doesn't support more than ? Devices
#13293 commented on
Jun 4, 2025 • 0 new comments -
Eval bug: Custom model error.
#13318 commented on
Jun 5, 2025 • 0 new comments -
Eval bug: Qwen3 30B A3B is slow with CUDA
#13211 commented on
Jun 5, 2025 • 0 new comments -
Feature Request: dynamic number of experts (hyperparam per request)
#13572 commented on
May 19, 2025 • 0 new comments -
llama : add CLI assistant
#10688 commented on
May 19, 2025 • 0 new comments -
Eval bug: Qwen3 30B A3B Q4_0 failed to run
#13168 commented on
May 19, 2025 • 0 new comments -
llama : combined beam search + grammar sampling strategy
#2923 commented on
May 19, 2025 • 0 new comments -
Eval bug: repeated output for llama-server
#12782 commented on
May 20, 2025 • 0 new comments -
changelog : `libllama` API
#9289 commented on
May 20, 2025 • 0 new comments -
How to start gemma3 multimodal model service using llama_server
#13465 commented on
May 20, 2025 • 0 new comments -
Compile bug: llama.cpp-master/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:80:54:error '_mm256_set_m128i' was not declared in this scope
#11385 commented on
May 21, 2025 • 0 new comments -
Feature Request: Support Jina V3 arch
#9585 commented on
May 21, 2025 • 0 new comments -
Error while converting peft finetuned merged model to gguf
#12494 commented on
May 21, 2025 • 0 new comments -
Feature Request: Mapping model name to LoRA config
#11031 commented on
May 21, 2025 • 0 new comments -
changelog : `llama-server` REST API
#9291 commented on
May 21, 2025 • 0 new comments -
Eval bug: A100 GPU not working with CUDA 12.8 in llama.cpp
#13609 commented on
May 21, 2025 • 0 new comments -
Misc. bug: Inconsistent Vulkan segfault
#10528 commented on
May 21, 2025 • 0 new comments -
Feature Request: support for image input in llama-server (and web ui)
#12792 commented on
May 22, 2025 • 0 new comments -
Feature Request: add jina embeddings model availible convert to gguf
#12327 commented on
May 22, 2025 • 0 new comments -
Misc. bug: The KV cache is sometimes truncated incorrectly when making v1/chat/completions API calls
#11970 commented on
May 24, 2025 • 0 new comments -
Refactor: (clip.cpp) identify and regroup pre-processing strategies
#13077 commented on
Jun 8, 2025 • 0 new comments -
Feature Request: Ability to pack multiple GGUFs into single one
#13028 commented on
Jun 8, 2025 • 0 new comments -
Feature Request: Add Support for ModernBert
#11282 commented on
Jun 8, 2025 • 0 new comments -
Eval bug: llama-cli, Qwen3 jinja template will break CLI multiturn conversation
#13404 commented on
Jun 9, 2025 • 0 new comments -
Compile bug: I tried compiling llama.cpp for HIP on my system (elementaryOS 8/ubuntu 24.04, rocm 6.4.0, gfx1100) using the installation guide
#13340 commented on
Jun 9, 2025 • 0 new comments -
Misc. bug: Vulkan performance depends on thread priority
#12976 commented on
Jun 9, 2025 • 0 new comments -
Eval bug: Qwen3-30B-A3B-Q4_K_M: Slows down when using the \no_think mode.
#13427 commented on
Jun 10, 2025 • 0 new comments -
Eval bug: llama-speculative core dump with Qwen3, GGML_ASSERT(batch.n_tokens > 0) failed
#13433 commented on
Jun 10, 2025 • 0 new comments -
Differential mode for llama-bench + plotting code
#13408 commented on
Jun 10, 2025 • 0 new comments -
Eval bug: Regex
#13347 commented on
Jun 10, 2025 • 0 new comments -
Misc. bug: llama-server webui overriding command line parameters
#13277 commented on
Jun 10, 2025 • 0 new comments -
Enhancement: Improve ROCm performance on various quants (benchmarks included)
#11931 commented on
Jun 10, 2025 • 0 new comments -
Suport for Jamba JambaForCausalLM
#6372 commented on
Jun 10, 2025 • 0 new comments -
Support Hybrid Models
#12331 commented on
Jun 10, 2025 • 0 new comments -
Feature Request: add draft model in llama-bench and more.
#13456 commented on
Jun 11, 2025 • 0 new comments -
Misc. bug: Illegal CUDA memory access in ggml_backend_cuda_cpy_tensor_async
#13449 commented on
Jun 11, 2025 • 0 new comments -
Drop support for sentencepiece
#13448 commented on
Jun 11, 2025 • 0 new comments -
Feature Request: Tensor paralellism (--split-mode row) over rpc
#13083 commented on
Jun 5, 2025 • 0 new comments -
Feature Request: s390x CI
#13243 commented on
Jun 5, 2025 • 0 new comments -
Qwen3-8B and other models generate garbage output / repeat tokens (GGGGGG...) in llama.cpp via LM Studio Vulkan backend
#13310 commented on
Jun 5, 2025 • 0 new comments -
Misc. bug: Extended swap/unswap times when loading large models on Apple Silicon
#13361 commented on
Jun 7, 2025 • 0 new comments -
Compile bug: clang-18.1.3 compile fail (vsetivli)
#13358 commented on
Jun 7, 2025 • 0 new comments -
Misc. bug: error in remote conversion for the new ServiceNow Nemotron 15B model
#13354 commented on
Jun 7, 2025 • 0 new comments -
Feature Request: tensor split needs control over where CPU layers go
#13314 commented on
Jun 7, 2025 • 0 new comments -
bug: ValueError: Architecture qwen3 not supported
#13157 commented on
Jun 7, 2025 • 0 new comments -
Feature Request: Improve model load time when using the RPC backend
#12954 commented on
Jun 7, 2025 • 0 new comments -
Eval bug: IQ2_M broken for mradermacher / Llama-4-Maverick-17B-128E-Instruct-GGUF
#12913 commented on
Jun 7, 2025 • 0 new comments -
[Tracker] Docker build fails on CI for arm64
#11888 commented on
Jun 7, 2025 • 0 new comments -
Feature Request: allow setting jinja chat template from server webui
#11689 commented on
Jun 7, 2025 • 0 new comments -
Feature Request: Add support for Kokoro TTS
#11050 commented on
Jun 7, 2025 • 0 new comments -
Misc. bug: invalid regex grammar causes segment violation
#13390 commented on
Jun 8, 2025 • 0 new comments -
Compile bug: ninja: build stopped: subcommand failed.
#13375 commented on
Jun 8, 2025 • 0 new comments -
Token Generation Speed Decline with GGUF Models on M3 Ultra
#13373 commented on
Jun 8, 2025 • 0 new comments -
Misc. bug: Qwen 3.0 "enable_thinking" parameter not working
#13160 commented on
Jun 8, 2025 • 0 new comments