KoboldCpp Files

Run GGUF models easily with a UI or API. One File. Zero Install.

Brought to you by: henk717

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
koboldcpp.exe	2023-04-08	15.9 MB	0
tools.zip	2023-04-08	794.2 kB	0
koboldcpp.zip	2023-04-08	6.7 MB	0
koboldcpp-1.2 source code.tar.gz	2023-04-08	6.9 MB	0
koboldcpp-1.2 source code.zip	2023-04-08	6.9 MB	0
README.md	2023-04-08	958 Bytes	0
koboldcpp_noavx2.exe	2023-04-08	15.9 MB	3
Totals: 7 Items		53.2 MB	3

koboldcpp-1.2

This is a checkpoint version which should be relatively stable and includes more release variants. - Support for new versions of GPT2 models, for example the Cerebras models on HF. - Prevented the TK GUI window from staying open and being annoying.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller. Alternatively, drag and drop a compatible ggml model on top of the .exe, or run it and manually select the model in the popup dialog.

and then once loaded, you can connect like this (or use the full koboldai client): http://localhost:5001

Alternative Options: If your CPU is very old and doesn't support AVX2 instructions, you can try running the noavx2 version. It will be slower. If you prefer, you can download the zip file, extract and run the python script e.g. koboldcpp.py [ggml_model.bin] manually To quantize an fp16 model, you can use the quantize.exe in the tools.zip

Source: README.md, updated 2023-04-08

Other Useful Business Software

MongoDB Atlas runs apps anywhere Icon

MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free

Build Securely on Azure with Proven Frameworks Icon

Build Securely on Azure with Proven Frameworks

Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.

Download Now

Recommended Projects

whisper.cpp
whisper.cpp is a lightweight, C/C++ reimplementation of OpenAI’s Whisper automatic speech recognition (ASR) model—designed for efficient, standalone transcription without external dependencies. The entire high-level implementation of the model is contained in whisper.h and whisper.cpp. The rest...
Nexa SDK
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), and speech-to-text (ASR), and text-to-speech (TTS) capabilities. Additionally, it offers an OpenAI-compatible API server with JSON schema mode for...
LocalAI
Self-hosted, community-driven, local OpenAI compatible API. Drop-in replacement for OpenAI running LLMs on consumer-grade hardware. Free Open Source OpenAI alternative. No GPU is required. Runs ggml, GPTQ, onnx, TF compatible models: llama, gpt4all, rwkv, whisper, vicuna, koala, gpt4all-j,...
Qwen3
Qwen3 is a cutting-edge large language model (LLM) series developed by the Qwen team at Alibaba Cloud. The latest updated version, Qwen3-235B-A22B-Instruct-2507, features significant improvements in instruction-following, reasoning, knowledge coverage, and long-context understanding up to 256K...
Qwen2-Audio
Qwen2-Audio is a large audio-language model by Alibaba Cloud, part of the Qwen series. It is trained to accept various audio signal inputs (including speech, sounds, etc.) and perform both voice chat and audio analysis, producing textual responses. It supports two major modes: Voice Chat...