A lightweight proxy service that lets you remap OpenAI-compatible model names on the fly.
Many tools and libraries hard-code specific model IDs (e.g., gpt-3.5-turbo-16k
), making it difficult to switch to locally hosted or alternative models without changing source code. This service sits between your client and the model endpoint, rewriting model names according to your configuration.
- Model Mapping: Redirect any incoming model ID to one of your choice.
- Flexible Configuration: Define exact or regex-based mappings via environment variables or Docker settings.
- Streaming & Non-Streaming: Fully compatible with both chat streaming (SSE) and standard JSON completions.
- Strip Thinking Tokens: Remove
<think>...</think>
blocks from responses whenSTRIP_THINKING
is enabled (defaulttrue
). - Disable Internal Thinking: Append
/no_think
marker to prompts whenDISABLE_THINKING
is enabled (defaultfalse
). - OpenAI API Interface: List models, create completions, chat completions—just like the official API.
-
Build or pull the Docker image
docker build -t openai-model-rerouter . # or docker pull your-registry/openai-model-rerouter:latest
-
Run the service
docker run -d \ --name openai-model-rerouter \ --restart unless-stopped \ -p 1234:1234 \ -e UPSTREAM_URL="http://localhost:8000" \ -e MODEL_MAP='{"gpt-3.5-turbo-16k":"qwen3-4b"}' \ -e DISABLE_THINKING="true" \ openai-model-rerouter
-
Point your client at the proxy
Usehttp://<host>:1234/v1/...
exactly as you would the OpenAI API. The service will rewrite themodel
field in the payload according to your mappings. -
Customize
- MODEL_MAP: JSON object mapping source IDs (or
/regex/
) to target model IDs. - STRIP_THINKING: Set to
true
(default) to remove<think>...</think>
tokens from responses. - DISABLE_THINKING: Set to
true
to append/no_think
to prompts. - LISTEN_HOST, LISTEN_PORT, UPSTREAM_URL: Configure networking via environment variables.
- MODEL_MAP: JSON object mapping source IDs (or
Enjoy seamless model swapping without changing your code!