| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| koboldcpp-linux-x64-nocuda | 2024-05-04 | 58.9 MB | |
| koboldcpp-linux-x64-cuda1150 | 2024-05-04 | 405.4 MB | |
| koboldcpp.exe | 2024-05-04 | 325.6 MB | |
| koboldcpp_nocuda.exe | 2024-05-04 | 47.8 MB | |
| koboldcpp-1.64.1 source code.tar.gz | 2024-05-04 | 17.0 MB | |
| koboldcpp-1.64.1 source code.zip | 2024-05-04 | 17.3 MB | |
| README.md | 2024-05-04 | 3.4 kB | |
| Totals: 7 Items | 872.0 MB | 0 | |
koboldcpp-1.64.1
- Added fixes for Llama 3 tokenization: Support updated Llama 3 GGUFs with pre-tokenizations.
- Note: In order to benefit from the tokenizer fix, the GGUF models need to be reconverted after this commit. A warning will be displayed if the model was created before this fix.
- Automatically support and apply both EOS and EOT tokens. EOT tokens are also correctly biased when EOS is banned.
finish_reasonis now correctly communicated in both sync and SSE streamed modes responses when token generation is stopped by EOS/EOT. Also, Kobold Lite no longer trims sentences if a EOS/EOT is detected as the stop reason in instruct mode.- Added proper support for
trim_stopin SSE streaming modes. Stop sequences will no longer be exposed even during streaming whentrim_stopis enabled. Additionally, using the Chat Completions endpoint automatically applies trim stop to the instruct tag format used. This allows better out-of-box compatibility with third party clients like LibreChat. --bantokensflag has been removed. Instead, you can now submitbanned_tokensdynamically via the generate API, for each specific generation, and all matching tokens will be banned for that generation.- Added
render_specialto the generate API, which allows you to enable rendering of special tokens like<|start_header_id|>or<|eot_id|>if enabled. - Added new experimental flag
--flashattentionto enable Flash Attention for compatible models. - Added support for resizing the GUI launcher, all GUI elements will auto-scale to fit. This can be useful for high DPI screens.
- Improved speed of rep pen sampler.
- Added additional debug information in
--debugmode. - Added a button for starting the benchmark feature in GUI launcher mode.
- Fixed slow clip processing speed issue on Colab
- Fixed quantization tool compilation again
- Updated Kobold Lite:
- Improved stop sequence and EOS handling
- Fixed instruct tag dropdown
- Added token filter feature
- Added enhanced regex replacement (now also allowed for submitted text)
- Support custom
{{placeholder}}tags. - Better max context handling when used in Kcpp
- Support for Inverted world info secondary keys (triggers when NOT present)
- Language customization for XTTS
Hotfix 1.64.1: Fixed LLAVA being incoherent on the second generation onwards. Also, the gui launcher has been tidied up, lowvram is now removed from quick launch tab and only in hardware tab. --benchmark includes version and gives clearer exit instructions in console output now. Fixed some tkinter error outputs on quit.
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller. If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller. If you're using Linux, select the appropriate Linux binary file instead (not exe). If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here
Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help flag.