MCP服务器正在消耗你的上下文窗口。有一个更简单的方法。

MCP服务器正在消耗你的上下文窗口。有一个更简单的方法。
Apideck CLI – An AI-agent interface with much lower context consumption than MCP

原始链接: https://www.apideck.com/blog/mcp-server-eating-context-window-cli-alternative

## AI 代理集成隐藏成本通过传统方法（多工具调用 - MCP）将 AI 代理连接到工具（如 API）面临一个重要且经常被忽视的问题：**上下文窗口膨胀**。描述即使是适量的工具也会消耗代理可用“思考空间”的很大一部分，从而留下很少的空间用于实际任务。这可能导致代理失效，迫使开发者大幅限制集成或添加复杂且会增加延迟的解决方法。几种解决方案正在出现。使用压缩技术维护 MCP 适用于简单的交互，但需要管理基础设施。代码执行为复杂的工作流程提供了强大的功能，但会引入重大的安全风险。一个有希望的替代方案是利用 **CLI（命令行界面）**。通过采用 CLI，代理可以*按需*发现和使用工具，类似于人类开发者使用 `--help` 标志。这大大减少了预先使用的 token 数量——从数万个减少到大约 80 个——并通过消除对远程服务器的依赖来提高可靠性。虽然 CLI 并非通用解决方案（MCP 在频繁使用少量工具时表现出色，而代码执行可以处理复杂的状态），但它们为许多代理集成提供了效率、安全性和兼容性的实用平衡。 API 提供商应优先考虑简化的、可发现的 API 和强大的权限管理，以适应不断变化的环境。

## LLM 代理：MCP 与 CLI 的上下文窗口争论一则 Hacker News 讨论集中在使用 MCP（多调用协议）服务器与 CLI（命令行界面）构建大型语言模型（LLM）代理的效率问题。核心问题是上下文窗口的限制——MCP 加载工具定义会消耗大量 token（一个案例中超过 50,000 个），从而减少了用户消息的空间。许多评论者提倡使用 CLI，理由是其较低的 token 开销（系统提示约为 80 个 token）以及通过 `--help` 标志实现逐步发现的能力。这与 MCP 将整个 API 定义加载到上下文中的方式形成对比。然而，也有人为 MCP 辩护，强调其安全优势——特别是能够在多个服务器上强制执行策略（例如，防止在网络搜索期间访问财务数据），这是使用技能或 CLI 难以复制的功能。还有人建议采用混合方法，例如使用 MCP 进行技能选择，并使用 CLI 进行执行。对话还涉及未来的 LLM 能力以及上下文窗口可能不再成为限制的可能性，以及对 AI 和人类工程师未来发展方向的幽默推测。

The problem nobody talks about at demo scale

Here's a scenario that'll feel familiar if you've wired up MCP servers for anything beyond a demo.

You connect GitHub, Slack, and Sentry. Three services, maybe 40 tools total. Before your agent has read a single user message, 55,000 tokens of tool definitions are sitting in the context window. That's over a quarter of Claude's 200k limit. Gone.

It gets worse. Each MCP tool costs 550–1,400 tokens for its name, description, JSON schema, field descriptions, enums, and system instructions. Connect a real API surface, say a SaaS platform with 50+ endpoints, and you're looking at 50,000+ tokens just to describe what the agent could do, with almost nothing left for what it should do.

One team reported three MCP servers consuming 143,000 of 200,000 tokens. That's 72% of the context window burned on tool definitions. The agent had 57,000 tokens left for the actual conversation, retrieved documents, reasoning, and response. Good luck building anything useful in that space.

This isn't a theoretical concern. David Zhang (@dzhng), building Duet, described ripping out their MCP integrations entirely, even after getting OAuth and dynamic client registration working. The tradeoff was impossible:

Load everything up front → lose working memory for reasoning and history
Limit integrations → agent can only talk to a few services
Build dynamic tool loading → add latency and middleware complexity

He called it a "trilemma." That feels about right.

And the numbers hold up under controlled testing. A recent benchmark by Scalekit ran 75 head-to-head comparisons (same model, Claude Sonnet 4, same tasks, same prompts) and found MCP costing 4 to 32× more tokens than CLI for identical operations. Their simplest task, checking a repo's language, consumed 1,365 tokens via CLI and 44,026 via MCP. The overhead is almost entirely schema: 43 tool definitions injected into every conversation, of which the agent uses one or two.

Three approaches to the same problem

The industry is converging on three responses to context bloat. Each has a sweet spot.

MCP with compression tricks

The first response is to keep MCP but fight the bloat. Teams compress schemas, use tool search to load definitions on demand, or build middleware that slices OpenAPI specs into smaller chunks.

This works for small, well-defined interactions like looking up an issue, creating a ticket, or fetching a document. MCP's structured tool calls and typed schemas are genuinely useful when you have a tight set of operations that agents use frequently.

But it adds infrastructure. You need a tool registry, search logic, caching, and routing. You're building a service to manage your services. And you're still paying per-tool token costs every time the agent decides it needs a new capability.

Code execution (the Duet approach)

Duet's answer was to treat the agent like a developer with a persistent workspace. When the agent needs a new integration, it reads the API docs, writes code against the SDK, runs it, and saves the script for reuse.

This is powerful for long-lived workspace agents that maintain state across sessions and need complex workflows (loops, conditionals, polling, batch operations). Things that are awkward to express as individual tool calls become natural in code.

The downside: your agent is now writing and executing arbitrary code against production APIs. The safety surface is enormous. You need sandboxing, review mechanisms, and a lot of trust in your agent's judgment.

CLI as the agent interface

The third approach is the one we took. Instead of loading schemas into the context window or letting the agent write integration code, you give it a CLI.

A well-designed CLI is a progressive disclosure system by nature. When a human developer needs to use a tool they haven't touched before, they don't read the entire API reference. They run tool --help, find the subcommand they need, run tool subcommand --help, and get the specific flags for that operation. They pay attention costs proportional to what they actually need.

Agents can do exactly the same thing. And the token economics are dramatically different.

Why CLIs are the pragmatic sweet spot

Progressive disclosure saves tokens

Here's what the Apideck CLI agent prompt looks like. This is the entire thing an AI agent needs in its system prompt:

Use `apideck` to interact with the Apideck Unified API.
Available APIs: `apideck --list`
List resources: `apideck <api> --list`
Operation help: `apideck <api> <resource> <verb> --help`
APIs: accounting, ats, crm, ecommerce, hris, ...
Auth is pre-configured. GET auto-approved. POST/PUT/PATCH prompt (use --yes). DELETE blocked (use --force).
Use --service-id <connector> to target a specific integration.
For clean output: -q -o json

That's ~80 tokens. Compare that to the alternatives:

Approach	Tokens consumed	When
Full OpenAPI spec in context	30,000–100,000+	Before first message
MCP tools (~3,600 per API)	10,000–50,000+	Before first message
CLI agent prompt	~80	Before first message
CLI `--help` call	~50–200	Only when needed

The agent starts with 80 tokens of guidance and discovers capabilities on demand:

# Level 1: What APIs are available? (~20 tokens output)
$ apideck --list
accounting ats connector crm ecommerce hris ...

# Level 2: What can I do with accounting? (~200 tokens output)
$ apideck accounting --list
Resources in accounting API:

  invoices
    list       GET  /accounting/invoices
    get        GET  /accounting/invoices/{id}
    create     POST /accounting/invoices
    delete     DELETE /accounting/invoices/{id}

  customers
    list       GET  /accounting/customers
    ...

# Level 3: How do I create an invoice? (~150 tokens output)
$ apideck accounting invoices create --help
Usage: apideck accounting invoices create [flags]

Flags:
  --data string        JSON request body (or @file.json)
  --service-id string  Target a specific connector
  --yes                Skip write confirmation
  -o, --output string  Output format (json|table|yaml|csv)
  ...

Each step costs 50–200 tokens, loaded only when the agent decides it needs that information. An agent handling an accounting query might consume 400 tokens total across three --help calls. The same surface through MCP would cost 10,000+ tokens loaded upfront whether the agent uses them or not.

This mirrors how Claude Agent Skills work. Metadata first, full details only when selected, reference material only when needed. The CLI is doing the same thing through a different mechanism.

Scalekit's benchmark independently validated this pattern. They found that even a minimal ~800-token "skills file" (a document of CLI tips and common workflows) reduced tool calls by a third and latency by a third compared to a bare CLI. Our approach takes it further: the ~80-token agent prompt provides the same progressive discovery at a tenth of the cost. The principle is the same. A small, upfront hint about how to navigate the tool is worth more than thousands of tokens of exhaustive schema.

Reliability: local beats remote

There's a dimension of the MCP problem that doesn't get enough attention: availability.

Scalekit's benchmark recorded a 28% failure rate on MCP calls to GitHub's Copilot server. Out of 25 runs, 7 failed with TCP-level connection timeouts. The remote server simply didn't respond in time. Not a protocol error, not a bad tool call. The connection never completed.

CLI agents don't have this failure mode. The binary runs locally. There's no remote server to time out, no connection pool to exhaust, no intermediary to go down. When your agent runs apideck accounting invoices list, it makes a direct HTTPS call to the Apideck API. One hop, not two.

This matters at scale. At 10,000 operations per month, a 28% failure rate means roughly 2,800 retries, each burning additional tokens and latency. Scalekit estimated the monthly cost difference at $3.20 for CLI versus $55.20 for direct MCP, a 17× cost multiplier, with the reliability tax on top.

Remote MCP servers will improve. Connection pooling, better infrastructure, and gateway layers will close the gap. But "the binary is on your machine" is a reliability guarantee that no amount of infrastructure engineering on the server side can match.

Structural safety beats prompt-based safety

Telling an agent "never delete production data" in a system prompt is like putting a sticky note on the nuclear launch button. It might work. Probably. Until a creative prompt injection peels the note off.

Security research on AI agents in CI/CD has shown how prompt injection can manipulate agents with high-privilege tokens into leaking secrets or modifying infrastructure. The pattern is always the same: untrusted input gets injected into a prompt, the agent has broad tool access, and bad things happen.

The Apideck CLI takes a structural approach. Permission classification is baked into the binary based on HTTP method:

// From internal/permission/engine.go
switch op.Permission {
case spec.PermissionRead:
    return ActionAllow      // GET → auto-approved
case spec.PermissionWrite:
    return ActionPrompt     // POST/PUT/PATCH → confirmation required
case spec.PermissionDangerous:
    return ActionBlock       // DELETE → blocked by default
}

No prompt can override this. A DELETE operation is blocked unless the caller explicitly passes --force. A POST requires --yes or interactive confirmation. GET operations run freely because they can't modify state.

The agent frameworks reinforce this. Claude Code, Cursor, and GitHub Copilot all have permission systems that gate shell command execution. So you get two layers of structural safety: the agent framework asks "should I run this command?" and the CLI itself enforces "is this operation allowed?"

You can also customize the policy per operation:

# ~/.apideck-cli/permissions.yaml
defaults:
  read: allow
  write: prompt
  dangerous: block

overrides:
  accounting.payments.create: block    # payments are sensitive
  crm.contacts.delete: prompt          # contacts can be soft-deleted

This is the same principle behind Duda blocking destructive MCP actions, but enforced structurally in the binary, not through prompt instructions that compete with everything else in the context window.

Universal compatibility, zero protocol overhead

Every serious agent framework ships with "run shell command" as a primitive. Claude Code has Bash. Cursor has terminal access. GitHub Copilot SDK exposes shell execution. Gemini CLI runs commands natively.

MCP requires dedicated client support, connection plumbing, and server lifecycle management. A CLI requires a binary on the PATH.

This matters more than it sounds. When you're building an agent that needs to interact with APIs, the integration path for a CLI is:

Install the binary
Set environment variables for auth
Add ~80 tokens to the system prompt
Done

The integration path for MCP is:

Implement or configure an MCP client
Set up server connections (transport, auth, lifecycle)
Handle tool registration and schema loading
Manage connection state and reconnection
Deal with the token budget for tool definitions

The CLI approach also means your agent integration isn't locked to any specific framework. The same apideck binary works from Claude Code, Cursor, a custom Python agent, a bash script, or a CI/CD pipeline.

How we built it

The Apideck CLI is a single static binary that parses our OpenAPI spec at startup and generates its entire command tree dynamically.

OpenAPI-native, no code generation. The binary embeds the latest Apideck Unified API spec. On startup, it parses the spec with libopenapi and builds commands for every API group, resource, and operation. When the API adds new endpoints, apideck sync pulls the latest spec. No SDK regeneration, no version bumps.

Smart output defaults. When running in a terminal, output defaults to a formatted table with colors. When piped or called from a non-TTY (which is how agents call it), output defaults to JSON. Agents get machine-parseable output without needing to remember --output json.

# Agent calls this (non-TTY) → gets JSON automatically
$ apideck accounting invoices list -q
[{"id": "inv_12345", "number": "INV-001", "total": 1500.00, ...}]

# Human runs the same command in terminal → gets a table
$ apideck accounting invoices list
┌──────────┬─────────┬──────────┐
│ ID       │ Number  │ Total    │
├──────────┼─────────┼──────────┤
│ inv_12345│ INV-001 │ 1,500.00 │
└──────────┴─────────┴──────────┘

Auth is invisible. Credentials are resolved from environment variables (APIDECK_API_KEY, APIDECK_APP_ID, APIDECK_CONSUMER_ID) or a config file, and injected into every request automatically. The agent never handles tokens, never sees auth headers, never needs to manage sessions.

Connector targeting. The --service-id flag lets agents target specific integrations. apideck accounting invoices list --service-id quickbooks hits QuickBooks. Swap to --service-id xero and the same command hits Xero. Same interface, different backend. That's the unified API doing its job.

When CLI isn't the answer

CLIs aren't universally better. Here's where the other approaches still win.

MCP is better for tightly scoped, high-frequency tools. If your agent calls the same 5–10 tools hundreds of times per session, the upfront schema cost amortizes well. A customer support agent that only ever looks up tickets, updates status, and sends replies doesn't need progressive disclosure. It needs those tools ready immediately.

Code execution is better for complex, stateful workflows. If your agent needs to poll an API every 30 seconds, aggregate results across paginated endpoints, or orchestrate multi-step transactions with rollback logic, writing code is more natural than chaining CLI calls. Duet's approach makes sense for agents that are essentially autonomous developers.

MCP is better when your agent acts on behalf of other people's users. This is the dimension most CLI-vs-MCP comparisons gloss over, and it's worth being direct about. When your agent automates your own workflow, ambient credentials are fine. You are the user, and the only person at risk is you. But if you're building a B2B product where agents act on behalf of your customers' employees, across organizations those customers control, the identity problem becomes three-layered: which agent is calling, which user authorized it, and which tenant's data boundary applies. Per-user OAuth with scoped access, consent flows, and structured audit trails are real requirements at that boundary, and they're requirements that raw CLI auth (gh auth login, environment variables) wasn't designed to solve. MCP's authorization model, whatever its efficiency cost, addresses this natively.

That said, the gap is narrower than it looks for unified API architectures. Apideck already centralizes auth through Vault: credentials are managed per-consumer, per-connection, and scoped by service. The --service-id flag targets a specific integration within a specific consumer's vault. The structural permission system enforces read/write/delete boundaries in the binary. What's missing is the per-user OAuth consent flow and tenant-scoped audit trail, real gaps, but ones that sit at the platform layer, not the agent interface layer. A CLI can be the interface while a backend handles delegated authorization. These aren't mutually exclusive.

It's also worth noting that MCP's auth story is less settled than it appears. As Speakeasy's MCP OAuth guide makes clear, user-facing OAuth exchange is not actually required by the MCP spec. Passing access tokens or API keys directly is completely valid. The real complexity kicks in when MCP clients need to handle OAuth flows dynamically, which requires Dynamic Client Registration (DCR), a capability most API providers don't support today. Companies like Stripe and Asana have started adding DCR to accommodate MCP, but it remains a high-friction integration. The auth advantage MCP has over CLI is real in theory, but in practice, the ecosystem is still catching up to the spec.

CLIs are weaker at streaming and bi-directional communication. A CLI call is request-response. If you need server-sent events, WebSocket streams, or long-lived connections, you'll want an SDK or MCP server that can hold a connection open.

Distribution has friction. MCP servers can theoretically live behind a URL. CLIs need a binary per platform, updates, and PATH management. For the Apideck CLI, we ship a static Go binary that runs everywhere without dependencies, but it's still a binary you need to install.

The honest framing: MCP, code execution, and CLIs are complementary tools. The mistake is treating MCP as the universal answer when, for many integration patterns, a CLI does the job with two orders of magnitude less context overhead.

What this means for API providers

If you're building developer tools in 2026, AI agents are becoming a primary consumer of your API surface. Not the only consumer (human developers still matter), but a rapidly growing one.

A few things are worth considering:

Your OpenAPI spec is too big for a context window. If you have 50+ endpoints, converting your spec to MCP tools will burn the budget of most agent interactions. Think about what a minimal entry point looks like.

Progressive disclosure isn't just a UX pattern anymore. It's a token optimization strategy. Give agents a way to discover capabilities incrementally instead of dumping everything upfront.

Structural safety is non-negotiable. Prompt-based guardrails are the security equivalent of honor system parking. Build permission models into your tools, not your prompts. Classify operations by risk level and enforce that classification in code.

Ship machine-friendly output formats. JSON by default in non-interactive contexts. Stable exit codes. Deterministic output. These are documented principles for agentic CLI design, and they matter because your next power user might not have hands.