| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| koboldcpp.exe | 2023-08-09 | 284.4 MB | |
| koboldcpp_nocuda.exe | 2023-08-09 | 22.3 MB | |
| koboldcpp-1.40.1 source code.tar.gz | 2023-08-09 | 10.2 MB | |
| koboldcpp-1.40.1 source code.zip | 2023-08-09 | 10.3 MB | |
| README.md | 2023-08-09 | 1.4 kB | |
| Totals: 5 Items | 327.2 MB | 0 | |
koboldcpp-1.40.1
This release is mostly for bugfixes to the previous one, but enough small stuff has changed that I chose to make it a new version instead of a patch for the previous one.
- Fixed a regression in format detection for LLAMA 70B.
- Converted the embedded horde worker into daemon mode, hopefully solves the occasional exceptions
- Fixed some OOMs for blasbatchsize 2048, adjusted buffer sizes
- Slight modification to the look ahead (2 to 5%) for the cuda pool malloc.
- Pulled some bugfixes from upstream
- Added a new field
idlefor the/api/extra/perfendpoint, allows checking if a generation is in progress without sending one. - Fixed cmake compilation for cudatoolkit 12.
- Updated Lite, includes option for aesthetic instruct UI (early beta by @Lyrcaxis, please send them your feedback)
hotfix 1.40.1: - handle stablecode-completion-alpha-3b
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller. If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help flag.