| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| llamacpp_for_kobold.zip | 2023-03-22 | 740.0 kB | |
| llamacpp-for-kobold-1.0.3 source code.tar.gz | 2023-03-22 | 2.4 MB | |
| llamacpp-for-kobold-1.0.3 source code.zip | 2023-03-22 | 2.4 MB | |
| README.md | 2023-03-22 | 791 Bytes | |
| Totals: 4 Items | 5.5 MB | 0 | |
llamacpp-for-kobold-1.0.3
- Applied the massive refactor from the parent repo. It was a huge pain but I managed to keep the old tokenizer untouched and retained full support for the original model formats.
- Reduced default batch sizes greatly, as large batch sizes were causing bad output and high memory usage
- Support dynamic context lengths sent from client.
- TavernAI is working although I wouldn't recommend it, they spam the server with multiple requests of huge contexts so you're going to have a very painful time getting responses.
Weights not included.
To use, download, extract and run (defaults port is 5001):
llama_for_kobold.py [ggml_quant_model.bin] [port]
and then you can connect like this (or use the full koboldai client): http://localhost:5001