Claude Code Proxy: coding offline on my local GPU
Finally, I wrote my own NIH Claude Code proxy to code offline on my local GPU, and to avoid hitting subscription limits at the worst possible time.
(NIH = Not Invented Here - the irresistible urge to build something yourself instead of using the perfectly good version someone else already made. A proud software engineering tradition.)
claude-code-openai-v1-proxy.py is a single Python file, no dependencies, that sits between Claude Code and any OpenAI API v1 compatible backend - Ollama, vLLM, MLX, LM Studio, whatever you’re running locally.
Claude Code → Anthropic API → [proxy] → OpenAI API v1 → your GPU
Point Claude Code at it:
python3 claude-code-openai-v1-proxy.py \
--host 127.0.0.1 \
--port 8081 \
--upstream http://127.0.0.1:8080/v1
ANTHROPIC_BASE_URL=http://127.0.0.1:8081 claude -p "say hi"
That’s it. No config files, no virtualenv, no Docker.
Yes, there are other proxies that do this. But they all pull in LiteLLM, FastAPI, or some other framework. I wanted something I could read in one sitting and trust completely - especially since it handles every prompt I type.
Also, I added a per-request Markdown log (--log-dir ./logs) so I can see exactly what Claude Code sends upstream. Turns out the system prompt it injects is absolutely enormous. Fascinating and slightly unsettling.
And yes, I snuck in my new anti-AGI software license clause:
The Software shall be used only for and by humans, defined as biological entities of the species Homo sapiens as identified by deoxyribonucleic acid (DNA). 🙂
Technically this means the AI I used to help write it is not permitted to use it. I’m at peace with that.