KoboldCpp Files

Run GGUF models easily with a UI or API. One File. Zero Install.

Brought to you by: henk717

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
koboldcpp_nocuda.exe	2023-08-02	22.2 MB	0
koboldcpp.exe	2023-08-02	284.8 MB	0
koboldcpp-1.38 source code.tar.gz	2023-08-02	10.2 MB	0
koboldcpp-1.38 source code.zip	2023-08-02	10.3 MB	0
README.md	2023-08-02	1.4 kB	0
Totals: 5 Items		327.5 MB	0

koboldcpp-1.38

Added upstream support for Quantized MatMul (MMQ) prompt processing, a new option for CUDA (enabled by adding --usecublas mmq or toggle in GUI). This uses slightly less memory, and is slightly faster for Q4_0 but slower for K-quants.
Fixed SSE streaming for multibyte characters (For Tavern compatibility)
--noavx2 mode now does not use OpenBLAS (same as Failsafe), this is due to numerous compatibility complaints.
GUI dropdown preset only displays built platforms (Credit: @YellowRoseCx)
Added a Help button in the GUI
Fixed an issue with mirostat not reading correct value from GUI
Fixed an issue with context size slider being limited to 4096 in the GUI
Displays a terminal warning if received context exceeds max launcher allocated context

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller. If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI. and then once loaded, you can connect like this (or use the full koboldai client): http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

Source: README.md, updated 2023-08-02

Other Useful Business Software

MongoDB Atlas runs apps anywhere Icon

MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free

Photo and Video Editing APIs and SDKs Icon

Photo and Video Editing APIs and SDKs

Trusted by 150 million+ creators and businesses globally

Unlock Picsart's full editing suite by embedding our Editor SDK directly into your platform. Offer your users the power of a full design suite without leaving your site.

Learn More

Recommended Projects

Nexa SDK
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), and speech-to-text (ASR), and text-to-speech (TTS) capabilities. Additionally, it offers an OpenAI-compatible API server with JSON schema mode for...
Qwen3
Qwen3 is a cutting-edge large language model (LLM) series developed by the Qwen team at Alibaba Cloud. The latest updated version, Qwen3-235B-A22B-Instruct-2507, features significant improvements in instruction-following, reasoning, knowledge coverage, and long-context understanding up to 256K...
whisper.cpp
whisper.cpp is a lightweight, C/C++ reimplementation of OpenAI’s Whisper automatic speech recognition (ASR) model—designed for efficient, standalone transcription without external dependencies. The entire high-level implementation of the model is contained in whisper.h and whisper.cpp. The rest...
SillyTavern
Mobile-friendly, Multi-API (KoboldAI/CPP, Horde, NovelAI, Ooba, OpenAI, OpenRouter, Claude, Scale), VN-like Waifu Mode, Horde SD, System TTS, WorldInfo (lorebooks), customizable UI, auto-translate, and more prompt options than you'd ever want or need. Optional Extras server for more SD/TTS...
LocalAI
Self-hosted, community-driven, local OpenAI compatible API. Drop-in replacement for OpenAI running LLMs on consumer-grade hardware. Free Open Source OpenAI alternative. No GPU is required. Runs ggml, GPTQ, onnx, TF compatible models: llama, gpt4all, rwkv, whisper, vicuna, koala, gpt4all-j,...