Download Latest Version koboldcpp-1.100.1 source code.zip (45.6 MB)
Email in envelope

Get an email when there's a new version of KoboldCpp

Home / v1.38
Name Modified Size InfoDownloads / Week
Parent folder
koboldcpp_nocuda.exe 2023-08-02 22.2 MB
koboldcpp.exe 2023-08-02 284.8 MB
koboldcpp-1.38 source code.tar.gz 2023-08-02 10.2 MB
koboldcpp-1.38 source code.zip 2023-08-02 10.3 MB
README.md 2023-08-02 1.4 kB
Totals: 5 Items   327.5 MB 0

koboldcpp-1.38

image

  • Added upstream support for Quantized MatMul (MMQ) prompt processing, a new option for CUDA (enabled by adding --usecublas mmq or toggle in GUI). This uses slightly less memory, and is slightly faster for Q4_0 but slower for K-quants.
  • Fixed SSE streaming for multibyte characters (For Tavern compatibility)
  • --noavx2 mode now does not use OpenBLAS (same as Failsafe), this is due to numerous compatibility complaints.
  • GUI dropdown preset only displays built platforms (Credit: @YellowRoseCx)
  • Added a Help button in the GUI
  • Fixed an issue with mirostat not reading correct value from GUI
  • Fixed an issue with context size slider being limited to 4096 in the GUI
  • Displays a terminal warning if received context exceeds max launcher allocated context

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller. If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI. and then once loaded, you can connect like this (or use the full koboldai client): http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

Source: README.md, updated 2023-08-02