展示HN:我构建的开源代理在Gemini-3-flash-preview的TerminalBench上排名第一
Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview

原始链接: https://github.com/dirac-run/dirac

## Dirac:经济高效且高性能的编码代理 Dirac 是一款新的开源编码代理,在复杂的重构任务中取得了最先进的成果,最近**在 Terminal-Bench-2 排行榜上名列第一,得分为 65.2%**,使用了 Gemini 3-Flash-Preview – 超过了 Google 的基线和 Junie CLI。 Dirac 专注于最大化“性价比”,通过智能地整理上下文,平均**降低 64.8% 的 API 成本**,同时*提高*准确性和速度。它利用哈希锚定并行编辑和 AST 操作等技术,在基准测试中实现了**100% 的准确率**,且成本远低于竞争对手。 值得注意的是,Dirac 避免了最小上下文提示 (MCP),并优先考虑有效的工具而非极简主义。它可以通过 VS Code Marketplace 或 npm 轻松安装,并通过环境变量支持多个 AI 提供商。Dirac 是 Cline 的一个分支,并采用 Apache License 2.0 许可。

一个由GodelNumbering构建的新的开源代理,可在GitHub (dirac-run) 上获得,在具有挑战性的命令行代理基准测试TerminalBench上取得了领先的性能。它的得分是65.2% – 超过了谷歌的官方结果(47.8%)和领先的闭源模型Junie CLI(64.3%)。 开发者主动回应了最近关于在TerminalBench上作弊的担忧,强调他们的代理没有使用任何禁止的技术,例如预先插入的数据,并且遵守了排行榜规则。尽管他们在8天前提交了一个更新官方排行榜的拉取请求但没有得到回复,他们还是直接分享了结果。 一位评论者询问了与其他模型(如Qwen)的兼容性以及潜在的性能改进,强调了“harness”(围绕模型的框架)在实现强大结果中的重要性。该帖子还包含一个Y Combinator申请的公告。
相关文章

原文

Dirac topped the Terminal-Bench-2 leaderboard for gemini-3-flash-preview with a 65.2% score!

It is a well studied phenomenon that any given model's reasoning ability degrades with the context length. If we can keep context tightly curated, we improve both accuracy and cost while making larger changes tractable in a single task.

Dirac is an open-source coding agent built with this in mind. It reduces API costs by 64.8% on average while producing better and faster work. Using hash-anchored parallel edits, AST manipulation, and a suite of advanced optimizations. Oh, and no MCP.

Our goal: Optimize for bang-for-the-buck on tooling with bare minimum prompting instead of going blindly minimalistic.

Dirac is benchmarked against other leading open-source agents on complex, real-world refactoring tasks. Dirac consistently achieves 100% accuracy at a fraction of the cost. These evals are run on public github repos and should be reproducible by anyone.

🏆 TerminalBench 2.0 Leaderboard: Dirac recently topped the Terminal-Bench-2 leaderboard with a 65.2% score using gemini-3-flash-preview. This outperforms both Google's official baseline (47.6%) and the top closed-source agent Junie CLI (64.3%). This was achieved without any benchmark-specific info or any AGENTS.md files being inserted.

Note on the cost table below: A bug was discovered in Cline, the parent repo, after running these evals (issue #10314). We have submitted a PR #10315 to fix this. This bug caused the evals for Dirac and Cline to slightly underreport the numbers ($0.03 vs $0.05 per million token cache read). Although there won't be a large difference, we will update the evals soon.

🟢 Success | 🟡 Incomplete | 🔴 Failure

Cost Comparison: Dirac is 64.8% cheaper than the competition (a 2.8x cost reduction).

* Expected number of files to be modified/created to complete the task.

See evals/README.md for detailed task descriptions and methodology.

Install Dirac from the VS Code Marketplace.

Install the Dirac CLI globally using npm:

Alternatively, use our official installation script (macOS/Linux):

curl -fsSL https://raw.githubusercontent.com/dirac-run/dirac/master/scripts/install.sh | bash
  1. Authenticate:
  2. Run your first task:
    dirac "Analyze the architecture of this project"

Configuration (Environment Variables)

You can provide API keys via environment variables to skip the dirac auth step. This is ideal for CI/CD or non-persistent environments:

  • ANTHROPIC_API_KEY
  • OPENAI_API_KEY
  • OPENROUTER_API_KEY
  • GEMINI_API_KEY
  • GROQ_API_KEY
  • MISTRAL_API_KEY
  • XAI_API_KEY (x.ai)
  • HF_TOKEN (HuggingFace)
  • ... and others (see src/shared/storage/env-config.ts for the full list).
  • dirac "prompt": Start an interactive task.
  • dirac -p "prompt": Run in Plan Mode to see the strategy before executing.
  • dirac -y "prompt": Yolo Mode (auto-approve all actions, great for simple fixes).
  • git diff | dirac "Review these changes": Pipe context directly into Dirac.
  • dirac history: View and resume previous tasks.
  1. Open the Dirac sidebar in VS Code.
  2. Configure your preferred AI provider (Anthropic, OpenAI, OpenRouter, etc.).
  3. Start a new task by describing what you want to build or fix.
  4. Watch Dirac go!

Dirac is open source and licensed under the Apache License 2.0.

Dirac is a fork of the excellent Cline project. We are grateful to the Cline team and contributors for their foundational work.


Built with ❤️ by Max Trivedi at Dirac Delta Labs

联系我们 contact @ memedata.com