| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| koboldcpp_nocuda.exe | 2023-08-02 | 22.2 MB | |
| koboldcpp.exe | 2023-08-02 | 284.8 MB | |
| koboldcpp-1.38 source code.tar.gz | 2023-08-02 | 10.2 MB | |
| koboldcpp-1.38 source code.zip | 2023-08-02 | 10.3 MB | |
| README.md | 2023-08-02 | 1.4 kB | |
| Totals: 5 Items | 327.5 MB | 0 | |
koboldcpp-1.38
- Added upstream support for Quantized MatMul (MMQ) prompt processing, a new option for CUDA (enabled by adding
--usecublas mmqor toggle in GUI). This uses slightly less memory, and is slightly faster for Q4_0 but slower for K-quants. - Fixed SSE streaming for multibyte characters (For Tavern compatibility)
--noavx2mode now does not use OpenBLAS (same as Failsafe), this is due to numerous compatibility complaints.- GUI dropdown preset only displays built platforms (Credit: @YellowRoseCx)
- Added a Help button in the GUI
- Fixed an issue with mirostat not reading correct value from GUI
- Fixed an issue with context size slider being limited to 4096 in the GUI
- Displays a terminal warning if received context exceeds max launcher allocated context
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller. If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help flag.