在模型上下文协议中进行符号代数的探险
Adventures in Symbolic Algebra with Model Context Protocol

原始链接: https://www.stephendiehl.com/posts/computer_algebra_mcp/

我一直在尝试 Anthropic 的 MCP 协议,该协议允许大型语言模型 (LLM) 使用外部工具。由于大型语言模型难以处理复杂的张量计算,我通过 MCP 集成了计算机代数系统(Mathematica、SymPy 等)。这使得大型语言模型能够处理自然语言和任务规划,而代数系统则进行精确的数学运算。 MCP 生态系统仍处于起步阶段,文档零散且主要依赖 Node.js。由于其随机性和复杂的 JSON 模式,调试非常具有挑战性。然而,其核心概念很有前景。 例如,大型语言模型通常无法进行整数分解,但使用 MCP,它们可以将此任务委托给专用工具。我用一个阻尼谐振子问题对此进行了演示,其中大型语言模型协调 SymPy 精确地求解了微分方程。 虽然 MCP 在将大型语言模型与形式化验证系统集成以及使复杂数学更容易访问方面显示出前景,但由于能够执行本地代码,安全是一个重大问题。我已经在我的 GitHub 上分享了我的代码,但我强烈建议谨慎使用,因为它存在潜在的安全风险。

这篇 Hacker News 帖子讨论了 Stephen Diehl 的“使用模型上下文协议 (MCP) 进行符号代数的冒险”。评论者表达了对 MCP 潜力的兴趣,特别是将其用于将自然语言请求转换为符号 AI 软件的代码。然而,一些人对其可扩展性以及 JSON 模式混淆语言模型的复杂性表示担忧。 几位用户指出了它与早期符号 AI 工作的相似之处以及“硬编码”解决方案的风险。他们强调了精确查询的窄接口的重要性,以及大型语言模型 (LLM) 与其他机器有效交互的必要性。讨论涉及使用 Lean 等证明验证器生成训练数据,以及 LLM 在完全捕捉人类对符号建模的洞察力方面的局限性。 建议了实际应用,例如将 MCP 与 Prolog 或 Jupyter Notebook/代码解释器一起使用。还提到了 MCP 的安全隐患。一些用户分享了他们自己相关的项目,并讨论了在符号代数系统中处理数百万个方程式的挑战,普遍认为当前的系统难以应对这种规模。

原文

I spent last weekend playing with this new MCP protocol all the kids are talking about. And it's fun, but a bit early and rough around the edges.

MCP, if you're not familiar, is Anthropic's answer to the question: "How do we get AI to actually DO things instead of just TALK about doing them?" It's a protocol that allows language models to call external tools, much like how your friend who claims to know everything actually calls their more knowledgeable friend behind your back. The core idea of MCP is to standardize the way language models call external tools, instead of writing custom connectors for each AI model and each tool, you implement the protocol once on each side. It's the USB-C of AI tooling, if USB-C were still in its awkward adolescent phase.

Notably the MCP server runs locally on your machine letting you the language model invoke arbitrary code and commands by using one of the LLM desktop clients to call out to the local server. This is probably as dangerous and reckless as you might think, so there's a definite security concern here, be warned. But let's not let that get in the way of a fun experiment.

My particular itch stemmed from watching Claude (and its cousins like o4-mini-high and DeepSeek-R1) really struggle with tensor calculus. If you've ever asked an LLM to perform complex symbolic manipulation, you know the drill: confident answers, beautiful LaTeX formatting, and results that would make your math professor weep bitter tears. These models, despite their impressive linguistic capabilities, are absolute disasters when it comes to keeping track of indices in tensor expressions or manipulating complicated algebraic forms. The expressions involved in even moderate general relativity problems are HUGE, with hundreds of terms and complex index gymnastics.

But we already have specialized tools that excel at this! Computer algebra systems like Mathematica, Sympy, Cadabra, and EinsteinPy were built specifically for this purpose. So the obvious solution presented itself: let's expose these tools to the LLM through MCP and let each system do what it does best. The LLM handles the natural language understanding and planning, while the symbolic algebra system performs the actual mathematical manipulations with perfect precision.

Working with the MCP ecosystem is like visiting a frontier town in the Wild West. The documentation exists in the form of scattered campfire stories, the implementations have a distinctly "I wrote this at a 3 AM hackathon" vibe, and everything is strangely Node-heavy. This Node fixation likely stems from most MCP tools being designed to call REST services for cloud applications. Then there's the peculiar ecosystem of suspiciously self-referential products from companies that just happen to sell AI coding assistants. There's definitely a faint whiff of opportunism in the air.

Debugging an MCP server is a crazy exercise. You're essentially working with a stochastic black box that communicates through a complex web of JSON schemas attached to docstring annotations. When something goes wrong, good luck figuring out if it's your server, the client, the LLM's interpretation, or just the model having a laugh at your expense. The non-deterministic nature of the whole setup means that something can work perfectly five times in a row and then spontaneously fail on the sixth attempt for reasons that remain shrouded in mystery.

The basic implementation, however, is refreshingly straightforward, similar to FastAPI if FastAPI were designing its endpoints for a language model. Here's a simple example that highlights why this approach matters: Ask any LLM to factor a large integer, and watch it confidently fabricate entirely wrong answers. By design, transformers can't perform the arbitrary computation required for integer factorization. They've merely memorized some factorizations from the internet (and even those, poorly).

from mcp.server.fastmcp import FastMCP
import subprocess

mcp = FastMCP("Demo", instructions="You factor integers.")

def factor_number(number):
    result = subprocess.run(['factor', str(number)], capture_output=True, text=True)
    return result.stdout.strip()

@mcp.tool()
def factor(a: int) -> int:
    """Factor an integer"""
    return factor_number(a)

With this simple tool, we can now expose the actual computation to the LLM using the GNU factor shell command. Instead of making up factors, the model recognizes when to delegate the task to a specialized tool.

When it works, it's pretty magical. The successful integration of these components (properly formatted JSON, accurate tool selection by the language model, and reliable server processing) demonstrates the potential of this architectural approach. While the technology is still in its early stages of development, the possibilities for integration with formal verification systems are particularly compelling. The combination of LLMs with theorem provers like Lean could enable natural language interfaces to formal mathematics, allowing automated reasoning systems to benefit from the language models' ability to understand informal mathematical concepts while maintaining rigorous logical foundations. Similarly, making computer algebra systems accessible through natural language could dramatically lower the barrier to entry for complex mathematical computation. However, significant research challenges remain in ensuring reliable and semantically correct interactions between these different systems.

Let me show you how this looks in practice with a classic example from physics: solving the damped harmonic oscillator with a forcing term. This is the bread and butter of undergraduate physics, described by:

$$
m\frac{d^2x}{dt^2} + c\frac{dx}{dt} + kx = F(t)
$$

Where m is mass, c is the damping coefficient, k is the spring constant, and F(t) is an external time-dependent force.

Here's how the workflow unfolds with our sympy-mcp server:

Assistant: (Internal tool call) intro_many to introduce variables with appropriate assumptions:

{
  "variables": [
    {"var_name": "t", "pos_assumptions": ["real"], "neg_assumptions": []},
    {"var_name": "m", "pos_assumptions": ["real", "positive"], "neg_assumptions": []},
    {"var_name": "c", "pos_assumptions": ["real", "positive"], "neg_assumptions": []},
    {"var_name": "k", "pos_assumptions": ["real", "positive"], "neg_assumptions": []}
  ]
}

Assistant: (Internal tool call) introduce_function to define x(t) as a function:

{
  "func_name": "x"
}

Assistant: (Internal tool call) introduce_expression to create the differential equation:

{
  "expr_str": "Eq(m*Derivative(x(t), t, t) + c*Derivative(x(t), t) + k*x(t), F(t))"
}

Assistant: (Internal tool call) introduce_function to define the forcing function F(t):

{
  "func_name": "F"
}

Assistant: (Internal tool call) dsolve_ode to solve the ODE:

{
  "expr_key": "expr_12",
  "func_name": "x"
}

And voilà! Instead of hallucinating a solution, the LLM delegates to SymPy which returns the correct solution:

$$
x{\left(t \right)} = C_{1} e^{\frac{t \left(- c + \sqrt{c^{2} - 4 k m}\right)}{2 m}} + C_{2} e^{- \frac{t \left(c + \sqrt{c^{2} - 4 k m}\right)}{2 m}} + \frac{e^{\frac{t \left(- c + \sqrt{c^{2} - 4 k m}\right)}{2 m}} \int F{\left(t \right)} e^{\frac{c t}{2 m}} e^{- \frac{t \sqrt{c^{2} - 4 k m}}{2 m}}, dt}{\sqrt{c^{2} - 4 k m}} - \frac{e^{- \frac{t \left(c + \sqrt{c^{2} - 4 k m}\right)}{2 m}} \int F{\left(t \right)} e^{\frac{c t}{2 m}} e^{\frac{t \sqrt{c^{2} - 4 k m}}{2 m}}, dt}{\sqrt{c^{2} - 4 k m}}
$$

No hallucinated terms, no mysterious constants appearing out of nowhere, just the correct solution. The LLM handles the natural language interaction and orchestration, while the computer algebra system does what it does best ... exact symbolic manipulation.

Anyways all the code is up on Github here so maybe someone else will find it useful. If you have Cursor or Claude installed add the following to ~/.cursor/mcp.json or ~/Library/Application Support/Claude/claude_desktop_config.json to install the MCP server.

{
  "mcpServers": {
    "sympy-mcp": {
      "command": "uv",
      "args": [
        "run",
        "--with",
        "https://github.com/sdiehl/sympy-mcp/releases/download/0.1/sympy_mcp-0.1.0-py3-none-any.whl",
        "python",
        "server.py"
      ]
    }
  }
}

Or maybe slightly better, run it from a Docker image.

{
  "mcpServers": {
    "sympy-mcp": {
      "command": "docker",
      "args": [
        "run",
        "-i",
        "-p",
        "8081:8081",
        "--rm",
        "ghcr.io/sdiehl/sympy-mcp:latest"
      ]
    }
  }
}

And maybe read the source code for what you're installing here, because I feel like a lot of people are going to install these MCP servers without knowing what they're doing and thus install a lot of malware and exploits. There's basically no security going on here. And that could be a big problem. Tread carefully.

联系我们 contact @ memedata.com