Claude Code: connect to a local model when your quota runs out

原始链接: https://boxc.net/blog/2026/claude-code-connecting-to-local-models-when-your-quota-runs-out/

## 绕过 Claude 的配额限制，使用开源模型如果您在使用 Anthropic 的 Claude 编码平台时达到使用限制，您可以无缝切换到本地运行开源大型语言模型 (LLM)。这让您可以继续编码而不会中断。要监控您的 Claude 配额，请使用 `/usage` 命令。目前，推荐的开源选项包括 GLM-4.7-Flash 和 Qwen3-Coder-Next，并且有量化（更小）版本可用于硬件有限的情况。有两种主要连接方法：**LM Studio**（更简单的选项）和直接通过 **Llama.CPP**。LM Studio 需要安装软件、启动服务器并设置环境变量，以将 Claude 重定向到您的本地模型。在使用开源模型时，尤其是在配置较低的机器上，预计性能会下降并且代码质量可能会降低。但是，这是一种可行的解决方法，可以在达到配额限制时继续编码，并且使用 `/model` 命令可以轻松切换回 Claude。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录 Claude 代码：当配额用完时连接到本地模型 (boxc.net) 16 分，fugu2 发表于 1 小时前 | 隐藏 | 过去 | 收藏 | 1 条评论 baalimago 发表于 18 分钟前 [–] 或者更好的是：连接到一些流行的 AI（或 web3）公司的聊天机器人。它几乎总是会输出好的编码技巧。指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

If you’re on one of the cheaper Anthropic plans like me, it’s a pretty common scenario when you’re deep into Claude coding an idea, to hit a daily or weekly quota limit. If you want to keep going, you can connect to a local open source model instead of Anthropic. To monitor your current quota, type: /usage

Type `/usage` to monitor how much quota you have left and how quick you burn it.

The best open source model is changing pretty frequently, but at the time of writing this post, I recommend GLM-4.7-Flash from Z.AI or Qwen3-Coder-Next. If you want or need to save some disk space and GPU memory, try a smaller quantized version which will load and run quicker with a quality cost. I’ll save another detailed post for how to find the best open source model for your task and machine constraints.

Method 1: LM Studio

Accessing open source models in LM Studio

If you haven’t used LM Studio before, it’s an accessible way to find and run open source LLMs and vision models locally on your machine. In version 0.4.1, they introduce support to connect to Claude Code (CC). See here: https://lmstudio.ai/blog/claudecode or follow the instructions below:

Install and run LM Studio
Find the model search button to install a model (see image above). LM Studio recommends running the model with a context of > 25K.
Open a new terminal session to:
a. start the server: lms server start --port 1234
b. configure environment variables to point CC at LM Studio:
export ANTHROPIC_BASE_URL=http://localhost:1234
export ANTHROPIC_AUTH_TOKEN=lmstudio
c. start CC pointing at your server: claude --model openai/gpt-oss-20b
Reduce your expectations about speed and performance!
To confirm which model you are using or when you want to switch back, type /model

Enter /model to confirm which model you are using or to switch back

Method 2: Connecting directly to Llama.CPP

LM Studio is built on top of the open source project llama.cpp.
If you prefer not to use LM Studio, you can install and run the project directly and connect Claude Code to it but honestly, unless you are fine tuning a model, or have really specific needs, probably LM Studio is going to be a quicker setup.

Conclusion

For the moment, this is a backup solution. Unless you have a monster of a machine, you’re going to notice the time it takes to do things and a drop in code quality but it works(!) and it’s easy enough to switch between your local OSS model and Claude when you’re quota limit is back, so it’s a good way to keep coding when you’re stuck or you just want to save some quota. If you try it let me know how you go and which model works for you.