This is the LLM module of the modular w-AI-fu SDK. It is a standalone module able to operate and communicate alongside any client applications via WebSocket.
The goal of this module is to integrate as many LLMs as possible into a single module.
A main principle of the module will be one of minimal dependencies. Since there will be so many LLMs, there is a very high likelihood that downloading every dependencies for every LLMs available would be time consuming and a considerable waste of disk memory.
In simpler words, you only install what you use.
Python venvs everywhere.
A client example is available in the example_client.ts file.
// from example_client.ts
let client = new wAIfuLlmClient();
let providers = await client.getProviders();
await client.loadProvider(provider, {
api_key: apiKey
});
let models = await client.getModels();
let response = await client.generate([
{
role: "user",
content: "How are you feeling?"
}
], {
model_id: model,
character_name: "AI",
max_output_length: 250,
temperature: 1.0,
stop_tokens: null,
timeout_ms: null,
});
await client.generateStream([
{
role: "user",
content: "Write me a very long story, as long as possible."
}
], {
model_id: model,
character_name: "AI",
max_output_length: 500,
temperature: 1.0,
stop_tokens: null,
timeout_ms: 15_000,
}, (chunk: string) => {
console.log(chunk);
});
Input message types:
["load", "generate", "interrupt", "close", "get_providers", "get_models"]
Load provider:
{
"type": "load",
"unique_request_id": "<id unique to request>",
"provider": "openai", // "groq" | "novelai" | ...,
"api_key": "<api key>", // (optional, required by API llms),
"preload_model_id": "<model id>" // (optional, useful for local llms)
}
Important: the load message may require a "api_key" field or other fields depending on the needs of the implementation.
load
- load_ack
- load_done
Generate:
{
"type": "generate",
"unique_request_id": "<id unique to request>",
"messages": [
{
"role": "system",
"content": "This is a system prompt"
},
{
"role": "user",
"content": "erm",
"name": "DEV" // or missing
}
],
"params": {
"model_id": "gpt-4o-mini",
"character_name": "Mia",
"temperature": 1.0,
"max_output_length": 200,
"stop_tokens": ["\r", "\n"], // or null
"timeout_ms": 10000 // or null
// In stream mode, timeout is refreshed at every new chunk received
},
"stream": false
}
generate (stream:false)
- generate_ack
- generate_done
generate (stream:true)
- generate_ack
- generate_stream_chunk (x amount of chunks)
- generate_stream_done
Interrupt:
{
"type": "interrupt",
"unique_request_id": "<id unique to request>",
}
interrupt
- interrupt_ack
Close module:
{
"type": "close",
"unique_request_id": "<id unique to request>",
}
close
- close_ack
Get available providers:
{
"type": "get_providers",
"unique_request_id": "<id unique to request>",
}
get_providers
- get_providers_done
Get available models from provider:
{
"type": "get_models",
"unique_request_id": "<id unique to request>",
}
This can only be done after a provider has already been loaded.
get_models
- get_models_done
Output message types:
["load_ack", "generate_ack", "interrupt_ack", "close_ack", "load_done", "generate_done", "generate_streamed", "generate_stream_done", "generate_stream_chunk", "get_providers_done", "get_models_done"]
Provider load acknowledgment:
{
"type": "load_ack",
"unique_request_id": "<id of initial request>",
"provider": "openai" // "groq" | "novelai" | ...
}
Provider load done:
{
"type": "load_done",
"unique_request_id": "<id of initial request>",
"provider": "openai" // "groq" | "novelai" | ...,
"is_error": false,
"error": "SUCCESS" // or "<error type>" if is_error is true
}
Generate acknowledgment:
{
"type": "generate_ack",
"unique_request_id": "<id of initial request>"
}
Generate response:
{
"type": "generate_done",
"unique_request_id": "<id of initial request>",
"is_error": false,
"error": "SUCCESS", // or "<error type>" if is_error is true
"response": "llm response" // or "" if is_error is true
}
Generate stream chunk:
{
"type": "generate_stream_chunk",
"unique_request_id": "<id of initial request>",
"chunk": "<chunk of response>"
}
Generate stream done:
{
"type": "generate_stream_done",
"unique_request_id": "<id of initial request>",
"is_error": false,
"error": "SUCCESS", // or "<error type>" if is_error is true
}
Interrupt acknowledgment:
{
"type": "interrupt_ack",
"unique_request_id": "<id of initial request>",
}
Close acknowledgment:
{
"type": "close_ack",
"unique_request_id": "<id of initial request>",
}
Get providers done:
{
"type": "get_providers_done",
"unique_request_id": "<id of initial request>",
"providers": ["<list of providers>"]
}
Get models done:
{
"type": "get_models_done",
"unique_request_id": "<id of initial request>",
"providers": ["<list of model ids>"]
}
export enum LLM_GEN_ERR {
SUCCESS = "SUCCESS",
UNEXPECTED = "UNEXPECTED",
AUTHORIZATION = "AUTHORIZATION",
INVALID_PROMPT = "INVALID_PROMPT",
INVALID_PROVIDER = "INVALID_PROVIDER",
INVALID_MODEL = "INVALID_MODEL",
TIMEOUT = "TIMEOUT",
INTERRUPT = "INTERRUPT",
};
NodeJS version >= v20.9.0 (v20.9.0 tested)
Python 3.10 (if required by LLM implementation)
- LLM Interface definition
- IN/OUT Message protocol definition
- Incoming socket messages handler
- OpenAI LLM implementation
- NovelAI LLM implementation
- Groq LLM implementation
- DeepSeek LLM implementation
- Ollama LLMs implementation
- Eventual solution for local models (hugging face, ollama)