| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| koboldcpp.exe | 2023-08-24 | 285.3 MB | |
| koboldcpp_nocuda.exe | 2023-08-24 | 22.8 MB | |
| koboldcpp-1.41 (beta) source code.tar.gz | 2023-08-24 | 10.6 MB | |
| koboldcpp-1.41 (beta) source code.zip | 2023-08-24 | 10.7 MB | |
| README.md | 2023-08-24 | 1.5 kB | |
| Totals: 5 Items | 329.3 MB | 0 | |
koboldcpp-1.41 (beta)
It's been a while since the last release and quite a lot upstream has changed under the hood, so consider this release a beta.
- Added support for LLAMA GGUF models, handled automatically. All older models will still continue to work normally. Note that GGUF format support for other non-llama architectures has not been added yet.
- Added
--configflag to load a.kcppssettings file when launching from command line (Credits: @poppeman), these files can also be imported/exported from the GUI. - Added a new endpoint
/api/extra/tokencountwhich can be used to tokenize and accurately measure how many tokens any string has. - Fix for bell characters occasionally causing the terminal to beep in debug mode.
- Fix for incorrect list of backends & missing backends displayed in the GUI.
- Set MMQ to be the default for CUDA when running from GUI.
- Updated Lite, and merged all the improvements and fixes from upstream.
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller. If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help flag.