KoboldCpp Files

Run GGUF models easily with a UI or API. One File. Zero Install.

Brought to you by: henk717

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
llamacpp_for_kobold.zip	2023-03-20	657.4 kB	0
llamacpp-for-kobold-1.0.1 source code.tar.gz	2023-03-20	2.0 MB	0
llamacpp-for-kobold-1.0.1 source code.zip	2023-03-20	2.0 MB	0
README.md	2023-03-20	400 Bytes	0
Totals: 4 Items		4.6 MB	0

llamacpp-for-kobold-1.0.1

Bugfixes for OSX, and KV caching allows continuing a previous generation without reprocessing the whole prompt
Weights not included.

To use, download, extract and run (defaults port is 5001): llama_for_kobold.py [ggml_quant_model.bin] [port]

and then you can connect like this (or use the full koboldai client): https://lite.koboldai.net/?local=1&port=5001

Source: README.md, updated 2023-03-20

Other Useful Business Software

Gen AI apps are built with MongoDB Atlas

The database for AI-powered applications.

MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.

Start Free

Our Free Plans just got better! | Auth0

With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now

Recommended Projects

Nexa SDK
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), and speech-to-text (ASR), and text-to-speech (TTS) capabilities. Additionally, it offers an OpenAI-compatible API server with JSON schema mode for...
Qwen3
Qwen3 is a cutting-edge large language model (LLM) series developed by the Qwen team at Alibaba Cloud. The latest updated version, Qwen3-235B-A22B-Instruct-2507, features significant improvements in instruction-following, reasoning, knowledge coverage, and long-context understanding up to 256K...
whisper.cpp
whisper.cpp is a lightweight, C/C++ reimplementation of OpenAI’s Whisper automatic speech recognition (ASR) model—designed for efficient, standalone transcription without external dependencies. The entire high-level implementation of the model is contained in whisper.h and whisper.cpp. The rest...
SillyTavern
Mobile-friendly, Multi-API (KoboldAI/CPP, Horde, NovelAI, Ooba, OpenAI, OpenRouter, Claude, Scale), VN-like Waifu Mode, Horde SD, System TTS, WorldInfo (lorebooks), customizable UI, auto-translate, and more prompt options than you'd ever want or need. Optional Extras server for more SD/TTS...
LocalAI
Self-hosted, community-driven, local OpenAI compatible API. Drop-in replacement for OpenAI running LLMs on consumer-grade hardware. Free Open Source OpenAI alternative. No GPU is required. Runs ggml, GPTQ, onnx, TF compatible models: llama, gpt4all, rwkv, whisper, vicuna, koala, gpt4all-j,...

KoboldCpp Files

Run GGUF models easily with a UI or API. One File. Zero Install.

Get an email when there's a new version of KoboldCpp