Download Latest Version koboldcpp-1.100.1 source code.zip (45.6 MB)
Email in envelope

Get an email when there's a new version of KoboldCpp

Home / v1.37.1a
Name Modified Size InfoDownloads / Week
Parent folder
koboldcpp_ggml_tools_26jul.zip 2023-07-26 2.6 MB
koboldcpp_nocuda.exe 2023-07-26 22.3 MB
koboldcpp.exe 2023-07-26 283.2 MB
koboldcpp-1.37.1 source code.tar.gz 2023-07-26 10.2 MB
koboldcpp-1.37.1 source code.zip 2023-07-26 10.3 MB
README.md 2023-07-26 2.3 kB
Totals: 6 Items   328.5 MB 0

koboldcpp-1.37.1

  • NEW: KoboldCpp now comes with an embedded Horde Worker which allows anyone to share their ggml models with the AI Horde without downloading additional dependences. --hordeconfig now accepts 5 parameters [hordemodelname] [hordegenlength] [hordemaxctx] [hordeapikey] [hordeworkername], filling up all 5 will start a Horde worker for you that serves horde requests automatically in the background. For previous behavior, exclude the last 2 parameters to continue using your own Horde worker (e.g. HaidraScribe/KAIHordeBridge). This feature can also be enabled via the GUI.
  • Added Support for LLAMA2 70B models. This should work automatically, GQA will be set to 8 if it's detected.
  • Fixed a bug with mirostat v2 that was causing overly deterministic results. Please try it again. (Credit: @ycros)
  • Added addition information to /api/extra/perf for the last generation info, including the stopping reason as well as generated token counts.
  • Exposed the parameter for --tensor_split which works exactly like it does upstream. Only for CUDA.
  • Try to support Kepler as a target for CUDA as well on henky's suggestion, can't guarantee it will work as I don't have a K80, but it might.
  • Retained support for --blasbatchsize 1024 after it was removed upstream. Scratch & KV buffer sizes will be larger when using this.
  • Minor bugfixes, pulled other upstream fixes and optimizations, updated Kobold Lite (chat mode improvements)

Hotfix 1.37.1 - Fixed clblast to work correctly for LLAMA2 70B - Fixed sending Client-Agent for embedded horde worker in addition to Bridge Agent and User Agent - Changed rms_norm_eps to 5e-6 for better results for both llama1 and 2 - Fixed some streaming bugs in Lite

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller. If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI. and then once loaded, you can connect like this (or use the full koboldai client): http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

Source: README.md, updated 2023-07-26