从大型语言模型到人工智能代理：AI系统开发背后的真实历程是什么？

从大型语言模型到人工智能代理：AI系统开发背后的真实历程是什么？
From LLM to AI Agent: What's the Real Journey Behind AI System Development?

原始链接: https://www.codelink.io/blog/post/ai-system-development-llm-rag-ai-workflow-agent

并非所有 AI 系统都需要是复杂的 AI 智能体。通常，利用大型语言模型 (LLM) 的更简单的解决方案能为特定任务提供更好的价值。LLM 擅长基于知识的任务，但需要增强才能处理实时数据或内部信息。检索增强生成 (RAG) 通过向量数据库为 LLM 提供相关的上下文，从而提高准确性并实现数据操作。工具使用和 AI 工作流程通过将 LLM 连接到电子邮件和日历等 API 来自动化业务流程，非常适合结构化任务。另一方面，AI 智能体可以自主运行，规划任务、使用工具、评估结果并独立做出决策。当需要高度自主性并且可以无需明确的用户触发即可启动工作流程时，它们非常理想。优先考虑可靠性而非高级功能，从更简单的方法入手，根据需要增加复杂性。实施测试和防护措施以应对 LLM 的非确定性。重点选择最适合当前问题的架构。

这个Hacker News讨论串探讨了从简单的LLM到复杂的AI智能体的演变。一些评论者分享了他们在构建AI系统方面的经验和观点。许多人发现基于检索增强生成（RAG）的设置很成功，强调其相比更复杂的智能体架构而言，更加稳定和易于维护。一个关键的争论围绕着AI智能体应该拥有多少自主权。一些人设想智能体能够独立做出决策（例如创建职位描述），而另一些人则强调人类监督的重要性，尤其是在关键领域。该讨论串突出了平衡AI能力与人类判断和控制的挑战。讨论还涉及到使用自然语言进行任务定义的好处，这使得非程序员也能参与其中。然而，一些人告诫不要过度依赖LLM，强调它们的“常识”与人类相比是有限的，并且需要在更广泛的语境中“教导”LLM做出适当的回应。共识似乎倾向于这样一种工作流程：智能体辅助人类，并在关键决策时请求批准，这与Copilot或Devin等工具的操作方式类似。

原文

AI agents are a hot topic, but not every AI system needs to be one.

While agents promise autonomy and decision-making power, simpler & more cost-saving solutions better serve many real-world use cases. The key lies in choosing the right architecture for the problem at hand.

In this post, we'll explore recent developments in Large Language Models (LLMs) and discuss key concepts of AI systems.

We've worked with LLMs across projects of varying complexity, from zero-shot prompting to chain-of-thought reasoning, from RAG-based architectures to sophisticated workflows and autonomous agents.

This is an emerging field with evolving terminology. The boundaries between different concepts are still being defined, and classifications remain fluid. As the field progresses, new frameworks and practices emerge to build more reliable AI systems.

To demonstrate these different systems, we'll walk through a familiar use case – a resume-screening application – to reveal the unexpected leaps in capability (and complexity) at each level.

Pure LLM

A pure LLM is essentially a lossy compression of the internet, a snapshot of knowledge from its training data. It excels at tasks involving this stored knowledge: summarizing novels, writing essays about global warming, explaining special relativity to a 5-year-old, or composing haikus.

However, without additional capabilities, an LLM cannot provide real-time information like the current temperature in NYC. This distinguishes pure LLMs from chat applications like ChatGPT, which enhance their core LLM with real-time search and additional tools.

That said, not all enhancements require external context. There are several prompting techniques, including in-context learning and few-shot learning that help LLMs tackle specific problems without the need of context retrieval.

Example:

To check if a resume is a good fit for a job description, an LLM with one-shot prompting and in-context learning can be utilized to classify it as Passed or Failed.

AI System Development - Workflow-04.png

RAG (Retrieval Augmented Generation)

Retrieval methods enhance LLMs by providing relevant context, making them more current, precise, and practical. You can grant LLMs access to internal data for processing and manipulation. This context allows the LLM to extract information, create summaries, and generate responses. RAG can also incorporate real-time information through the latest data retrieval.

Example:

The resume screening application can be improved by retrieving internal company data, such as engineering playbooks, policies, and past resumes, to enrich the context and make better classification decisions.
Retrieval typically employs tools like vectorization, vector databases, and semantic search.

AI System Development - Workflow-03.png

Tool Use & AI Workflow

LLMs can automate business processes by following well-defined paths. They're most effective for consistent, well-structured tasks.

Tool use enables workflow automation. By connecting to APIs, whether for calculators, calendars, email services, or search engines, LLMs can leverage reliable external utilities instead of relying on their internal, non-deterministic capabilities.

Example:

An AI workflow can connect to the hiring portal to fetch resumes and job descriptions → Evaluate qualifications based on experience, education, and skills → Send appropriate email responses (rejection or interview invitation).
For this resume scanning workflow, the LLM requires access to the database, email API, and calendar API. It follows predefined steps to automate the process programmatically.

AI System Development - Workflow-02.png

AI Agent

AI Agents are systems that reason and make decisions independently. They break down tasks into steps, use external tools as needed, evaluate results, and determine the following actions: whether to store results, request human input, or proceed to the next step.

This represents another layer of abstraction above tool use & AI workflow, automating both planning and decision-making.

While AI workflows require explicit user triggers (like button clicks) and follow programmatically defined paths, AI Agents can initiate workflows independently and determine their sequence and combination dynamically.

Example:

An AI Agent can manage the entire recruitment process, including parsing CVs, coordinating availability via chat or email, scheduling interviews, and handling schedule changes.
This comprehensive task requires the LLM to access databases, email and calendar APIs, plus chat and notification systems.

AI System Development - Workflow-01.png

Key takeaway

1. Not every system requires an AI agent

Start with simple, composable patterns and add complexity as needed. For some systems, retrieval alone suffices. In our resume screening example, a straightforward workflow works well when the criteria and actions are clear. Consider an Agent approach only when greater autonomy is needed to reduce human intervention.

2. Focus on reliability over capability

The non-deterministic nature of LLMs makes building dependable systems challenging. While creating proofs of concept is quick, scaling to production often reveals complications. Begin with a sandbox environment, implement consistent testing methods, and establish guardrails for reliability.