Download Latest Version koboldcpp-1.100.1 source code.zip (45.6 MB)
Email in envelope

Get an email when there's a new version of KoboldCpp

Home / v1.0.9beta
Name Modified Size InfoDownloads / Week
Parent folder
koboldcpp.exe 2023-04-05 16.4 MB
koboldcpp-1.0.9beta source code.tar.gz 2023-04-05 8.5 MB
koboldcpp-1.0.9beta source code.zip 2023-04-05 8.6 MB
README.md 2023-04-05 1.4 kB
Totals: 4 Items   33.5 MB 0

koboldcpp-1.0.9beta

  • Integrated support for GPT2! This also should theoretically work with Cerebras models, but I have not tried those yet. This is a great way to get started as now you can try models so tiny even a potato CPU can run them. Here's a good one to start with: https://huggingface.co/ggerganov/ggml/resolve/main/ggml-model-gpt-2-117M.bin with which I can generate 100 tokens in a second.
  • Upgraded embedded Kobold Lite to support a Stanford Alpaca compatible Instruct Mode, which can be enabled in settings.
  • Removed all -march=native and -mtune=native flags when building the binary. Compatibility should be more consistent with different devices now.
  • Fixed an incorrect flag name used to trigger the ACCELERATE library for mac OSX. This should give you greatly increased performance of OSX users for GPT-J and GPT2 models, assuming you have ACCELERATE support.
  • Added Rep Pen for GPT-J and GPT-2 models, and by extension pyg.cpp, this means that repetition penalty now works similar to the way it does in llama.cpp.

To use, download and run the koboldcpp.exe Alternatively, drag and drop a compatible ggml model on top of the .exe, or run it and manually select the model in the popup dialog.

and then once loaded, you can connect like this (or use the full koboldai client): http://localhost:5001

Source: README.md, updated 2023-04-05