KoboldCpp Files

Run GGUF models easily with a UI or API. One File. Zero Install.

Brought to you by: henk717

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
koboldcpp.exe	2023-08-24	285.3 MB	0
koboldcpp_nocuda.exe	2023-08-24	22.8 MB	0
koboldcpp-1.41 (beta) source code.tar.gz	2023-08-24	10.6 MB	0
koboldcpp-1.41 (beta) source code.zip	2023-08-24	10.7 MB	0
README.md	2023-08-24	1.5 kB	0
Totals: 5 Items		329.3 MB	0

koboldcpp-1.41 (beta)

It's been a while since the last release and quite a lot upstream has changed under the hood, so consider this release a beta.

Added support for LLAMA GGUF models, handled automatically. All older models will still continue to work normally. Note that GGUF format support for other non-llama architectures has not been added yet.
Added --config flag to load a .kcpps settings file when launching from command line (Credits: @poppeman), these files can also be imported/exported from the GUI.
Added a new endpoint /api/extra/tokencount which can be used to tokenize and accurately measure how many tokens any string has.
Fix for bell characters occasionally causing the terminal to beep in debug mode.
Fix for incorrect list of backends & missing backends displayed in the GUI.
Set MMQ to be the default for CUDA when running from GUI.
Updated Lite, and merged all the improvements and fixes from upstream.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller. If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI. and then once loaded, you can connect like this (or use the full koboldai client): http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

Source: README.md, updated 2023-08-24

Other Useful Business Software

Gen AI apps are built with MongoDB Atlas Icon

Gen AI apps are built with MongoDB Atlas

Build gen AI apps with an all-in-one modern database: MongoDB Atlas

MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.

Start Free

Keep company data safe with Chrome Enterprise Icon

Keep company data safe with Chrome Enterprise

Protect your business with AI policies and data loss prevention in the browser

Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.

Download Chrome

Recommended Projects

Text Generation Web UI
A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and GALACTICA. Dropdown menu for switching between models. Notebook mode that resembles OpenAI's playground. Chat mode for conversation and role playing. Instruct mode compatible with Alpaca and Open...
Python Client For NLP Cloud
NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, dialogue summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases...
PHP Client For NLP Cloud
NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, dialogue summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases...
Basaran
Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models. The open source community will eventually witness the Stable Diffusion moment for large language models (LLMs), and...
gpt2-client
GPT-2 is a Natural Language Processing model developed by OpenAI for text generation. It is the successor to the GPT (Generative Pre-trained Transformer) model trained on 40GB of text from the internet. It features a Transformer model that was brought to light by the Attention Is All You Need...