| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| koboldcpp-1.88 source code.tar.gz | 2025-04-13 | 27.2 MB | |
| koboldcpp-1.88 source code.zip | 2025-04-13 | 27.7 MB | |
| README.md | 2025-04-13 | 3.2 kB | |
| koboldcpp_cu12.exe | 2025-04-13 | 624.7 MB | |
| koboldcpp.exe | 2025-04-13 | 508.5 MB | |
| koboldcpp-mac-arm64 | 2025-04-13 | 26.9 MB | |
| koboldcpp-linux-x64-nocuda | 2025-04-13 | 78.8 MB | |
| koboldcpp-linux-x64-cuda1210 | 2025-04-13 | 702.2 MB | |
| koboldcpp-linux-x64-cuda1150 | 2025-04-13 | 613.7 MB | |
| koboldcpp_oldcpu.exe | 2025-04-13 | 508.7 MB | |
| koboldcpp_nocuda.exe | 2025-04-13 | 77.8 MB | |
| Totals: 11 Items | 3.2 GB | 0 | |
koboldcpp-1.88
- NEW: Added Image Inpainting support to StableUI, and merged inpainting support from stable-diffusion.cpp (by @stduhpf)
- You can use the built-in StableUI to mask out areas to inpaint when editing with Img2Img (Similar to A1111). API docs for this are updated.
- Added slider for setting clip-skip in StableUI.
- Other improvements from stable-diffusion.cpp are also merged.
- Added Zenity and YAD support for displaying file picker dialogs on linux (by @henk717), if they are installed on your system they will be used. To continue using the previous TKinter file picker, you can select "Use Classic FilePicker" in the extras tab.
- Added a new API endpoint
/api/extra/json_to_grammarwhich can be used to convert a JSON schema into GBNF grammar (check API docs for an example). - Added
--maxrequestsizeflag, you can configure the server max payload size before a HTTP request is dropped (default 32mb). - Can now perform GPU memory estimation using vulkaninfo too (if nvidia-smi is not available).
- Merged Llama 4 support from upstream llama.cpp. Qwen3 is technically included too, but until it releases officially we won't know if it actually works.
- Fixed not autosetting backend and layers when swapping to new model in admin mode using a template.
- Added additional warnings in GUI and terminal when you try to use FlashAttention on Vulkan backend - generally this is discouraged due to performance issues.
- Fixed system prompt on gemma3 template
- Updated Kobold Lite, multiple fixes and improvements
- Added Llama4 prompt format
- Consolidated vision dropdown when selecting a vision provider
- Fixed think tokens formatting issue with markdown
- Merged fixes and improvements from upstream
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller. If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller. If you have an Nvidia GPU, but use an old CPU and koboldcpp.exe does not work, try koboldcpp_oldcpu.exe If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster). If you're using Linux, select the appropriate Linux binary file instead (not exe). If you're on a modern MacOS (M1, M2, M3) you can try the koboldcpp-mac-arm64 MacOS binary. If you're using AMD, we recommend trying the Vulkan option (available in all releases) first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here
Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.