The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
koboldcpp_ggml_tools_26jul.zip	2023-07-26	2.6 MB	0
koboldcpp_nocuda.exe	2023-07-26	22.3 MB	0
koboldcpp.exe	2023-07-26	283.2 MB	0
koboldcpp-1.37.1 source code.tar.gz	2023-07-26	10.2 MB	0
koboldcpp-1.37.1 source code.zip	2023-07-26	10.3 MB	0
README.md	2023-07-26	2.3 kB	0
Totals: 6 Items		328.5 MB	0

koboldcpp-1.37.1

NEW: KoboldCpp now comes with an embedded Horde Worker which allows anyone to share their ggml models with the AI Horde without downloading additional dependences. --hordeconfig now accepts 5 parameters [hordemodelname] [hordegenlength] [hordemaxctx] [hordeapikey] [hordeworkername], filling up all 5 will start a Horde worker for you that serves horde requests automatically in the background. For previous behavior, exclude the last 2 parameters to continue using your own Horde worker (e.g. HaidraScribe/KAIHordeBridge). This feature can also be enabled via the GUI.
Added Support for LLAMA2 70B models. This should work automatically, GQA will be set to 8 if it's detected.
Fixed a bug with mirostat v2 that was causing overly deterministic results. Please try it again. (Credit: @ycros)
Added addition information to /api/extra/perf for the last generation info, including the stopping reason as well as generated token counts.
Exposed the parameter for --tensor_split which works exactly like it does upstream. Only for CUDA.
Try to support Kepler as a target for CUDA as well on henky's suggestion, can't guarantee it will work as I don't have a K80, but it might.
Retained support for --blasbatchsize 1024 after it was removed upstream. Scratch & KV buffer sizes will be larger when using this.
Minor bugfixes, pulled other upstream fixes and optimizations, updated Kobold Lite (chat mode improvements)

Hotfix 1.37.1 - Fixed clblast to work correctly for LLAMA2 70B - Fixed sending Client-Agent for embedded horde worker in addition to Bridge Agent and User Agent - Changed rms_norm_eps to 5e-6 for better results for both llama1 and 2 - Fixed some streaming bugs in Lite

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller. If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI. and then once loaded, you can connect like this (or use the full koboldai client): http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

Source: README.md, updated 2023-07-26