大型语言模型代理循环结合工具使用的非凡有效性
The unreasonable effectiveness of an LLM agent loop with tool use

原始链接: https://sketch.dev/blog/agent-loop

在最近一篇博文中,Philip Zeyliger 讨论了他开发 Sketch,一款 AI 编程助手,的经历。他强调了使用大型语言模型(LLM)结合工具使用时,其核心循环惊人的简单:AI 获取用户输入,生成输出和工具调用,如果存在则执行这些调用,然后要么请求更多输入,要么呈现最终输出。 Sketch 利用强大的大型语言模型(Claude 3.7 Sonnet),并可访问 bash 等工具,使其能够处理诸如深奥的 git 操作、复杂的合并以及修复类型错误等任务。AI 甚至可以安装缺失的工具并适应不同的命令行选项。 除了 bash 之外,专门的工具对于提高性能和开发人员工作流程至关重要,尤其是在文本编辑方面,因为大型语言模型难以处理基于行的编辑器。Zeyliger 认为,自主循环将使以前过于专业或不稳定而无法通过传统方式自动化的日常任务自动化。他设想自定义的、临时的 LLM 代理循环将在开发人员的工作流程中变得普遍,用于将堆栈跟踪与 git 提交关联等任务。

这个Hacker News帖子讨论了循环使用工具的LLM代理的有效性。原帖强调了这些代理,尤其是在编码任务中的表现令人惊讶地好,并承认它们可能偶尔需要人工干预。评论者分享了使用GPT-4o、Claude和Gemini等LLM进行编码的正面和负面经验。许多人强调选择合适的模型和工具的重要性,一些人认为某些模型在规划或代码编辑等特定领域表现出色。一些用户已经成功地将代理用于代码生成、调试甚至生成测试等任务。一些人告诫不要过度依赖自动化流程,建议采取合作的方式,即指导和引导LLM,而不是盲目地遵循。其他人正在开发用于自动化测试和审查LLM生成的代码的工具。该帖子表明,虽然LLM对编码很有用,但这项技能需要学习,并且需要了解各种工具的优缺点。

原文

2025-05-15 by Philip Zeyliger

My co-workers and I have been working on an AI Programming Assistant called Sketch for the last few months. The thing I've been most surprised by is how shockingly simple the main loop of using an LLM with tool use is:

def loop(llm):
    msg = user_input()
    while True:
        output, tool_calls = llm(msg)
        print("Agent: ", output)
        if tool_calls:
            msg = [ handle_tool_call(tc) for tc in tool_calls ]
        else:
            msg = user_input()

There's some pomp and circumstance to make the above work (here's the full script) , but the core idea is the above 9 lines. Here, llm() is a function that sends the system prompt, the conversation so far, and the next message to the LLM API.

Tool use is the fancy term for "the LLM returns some output that corresponds to a schema," and, in the full script, we tell the LLM (in its system prompt and tool description prompts) that it has access to bash.

With just that one very general purpose tool, the current models (we use Claude 3.7 Sonnet extensively) can nail many problems, some of them in "one shot." Whereas I used to look up an esoteric git operation and then cut and paste, now I just ask Sketch to do it. Whereas I used to handle git merges manually, now I let Sketch take a first pass. Whereas I used to change a type and go through the resulting type checker errors one by one (or, let's be real, with perl -pie ridiculousness), I give it a shot with Sketch. If appropriately prompted, the agentic loop can be persistent. If you don't have some tool installed, it'll install it. If your `grep` has different command line options, it adapts. (It can also be infuriating! "Oh, this test doesn't pass... let's just skip it," it sometimes says, maddeningly.)

For many workflows, agentic tools specialize. Sketch's quiver of tools is not just bash, as we've found that a handful of extra tools improve the quality, speed up iterations, and facilitate better developer workflows. Tools that let the LLM edit text correctly are surprisingly tricky. Seeing the LLM struggle with sed one-liners re-affirms that visual (as opposed to line) editors are a marvel.

I have no doubt that agent loops will get incorporated into more day to day automation tedium that's historically been too specific for general purpose tools and too esoteric and unstable to automate traditionally. I keep thinking of how much time I've spent correlating stack traces with git commits, and how good LLMs are at doing a first pass on it. We'll be seeing more custom, ad hoc, throw-away LLM agent loops in our bin/ directories. Grab your favorite bearer token and give it a shot.

Also published at philz.dev/blog/agent-loop/.

sketch.dev · merde.ai · pi.dev

联系我们 contact @ memedata.com