| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| koboldcpp_ggml_tools_26jul.zip | 2023-07-26 | 2.6 MB | |
| koboldcpp_nocuda.exe | 2023-07-26 | 22.3 MB | |
| koboldcpp.exe | 2023-07-26 | 283.2 MB | |
| koboldcpp-1.37.1 source code.tar.gz | 2023-07-26 | 10.2 MB | |
| koboldcpp-1.37.1 source code.zip | 2023-07-26 | 10.3 MB | |
| README.md | 2023-07-26 | 2.3 kB | |
| Totals: 6 Items | 328.5 MB | 0 | |
koboldcpp-1.37.1
- NEW: KoboldCpp now comes with an embedded Horde Worker which allows anyone to share their ggml models with the AI Horde without downloading additional dependences.
--hordeconfignow accepts 5 parameters[hordemodelname] [hordegenlength] [hordemaxctx] [hordeapikey] [hordeworkername], filling up all 5 will start a Horde worker for you that serves horde requests automatically in the background. For previous behavior, exclude the last 2 parameters to continue using your own Horde worker (e.g. HaidraScribe/KAIHordeBridge). This feature can also be enabled via the GUI. - Added Support for LLAMA2 70B models. This should work automatically, GQA will be set to 8 if it's detected.
- Fixed a bug with mirostat v2 that was causing overly deterministic results. Please try it again. (Credit: @ycros)
- Added addition information to
/api/extra/perffor the last generation info, including the stopping reason as well as generated token counts. - Exposed the parameter for
--tensor_splitwhich works exactly like it does upstream. Only for CUDA. - Try to support Kepler as a target for CUDA as well on henky's suggestion, can't guarantee it will work as I don't have a K80, but it might.
- Retained support for
--blasbatchsize 1024after it was removed upstream. Scratch & KV buffer sizes will be larger when using this. - Minor bugfixes, pulled other upstream fixes and optimizations, updated Kobold Lite (chat mode improvements)
Hotfix 1.37.1
- Fixed clblast to work correctly for LLAMA2 70B
- Fixed sending Client-Agent for embedded horde worker in addition to Bridge Agent and User Agent
- Changed rms_norm_eps to 5e-6 for better results for both llama1 and 2
- Fixed some streaming bugs in Lite
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller. If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help flag.