Claude 代码可以调试低级密码学。

Claude 代码可以调试低级密码学。
Claude Code can debug low-level cryptography

原始链接: https://words.filippo.io/claude-debugging/

## AI 辅助的后量子签名算法调试最近，作者成功地用 Go 语言实现了 ML-DSA，一种新的后量子签名算法。然而，初步测试显示，尽管使用了正确的签名，仍然存在持续的验证失败。在令人沮丧的调试过程之后，作者一时兴起转向了 AI 工具 Claude Code。令他们惊讶的是，Claude 迅速识别出实现中的一个微妙的底层错误——重复的高位计算——从而节省了大量的调试时间。进一步的实验证实了 Claude 的有效性：它比作者更快地指出了签名过程中不正确的常量值和数据分配错误。这些成功案例凸显了 AI 作为调试工具的潜力，即使对于复杂的密码学代码也是如此。作者强调，即使在不完全信任 AI 提出的修复方案的情况下，AI 的作用也是有价值的，仅仅通过快速识别错误的位置。他们提倡改进工具，将 LLM 集成到开发工作流程中，建议触发测试失败的自动化漏洞查找代理。

Over the past few days I wrote a new Go implementation of ML-DSA, a post-quantum signature algorithm specified by NIST last summer. I livecoded it all over four days, finishing it on Thursday evening. Except… Verify was always rejecting valid signatures.

$ bin/go test crypto/internal/fips140/mldsa
--- FAIL: TestVector (0.00s)
    mldsa_test.go:47: Verify: mldsa: invalid signature
    mldsa_test.go:84: Verify: mldsa: invalid signature
    mldsa_test.go:121: Verify: mldsa: invalid signature
FAIL
FAIL     crypto/internal/fips140/mldsa   2.142s
FAIL

I was exhausted, so I tried debugging for half an hour and then gave up, with the intention of coming back to it the next day with a fresh mind.

On a whim, I figured I would let Claude Code take a shot while I read emails and resurfaced from hyperfocus. I mostly expected it to flail in some maybe-interesting way, or rule out some issues.

Instead, it rapidly figured out a fairly complex low-level bug in my implementation of a relatively novel cryptography algorithm. I am sharing this because it made me realize I still don’t have a good intuition for when to invoke AI tools, and because I think it’s a fantastic case study for anyone who’s still skeptical about their usefulness.

Full disclosure: Anthropic gave me a few months of Claude Max for free. They reached out one day and told me they were giving it away to some open source maintainers. Maybe it’s a ploy to get me hooked so I’ll pay for it when the free coupon expires. Maybe they hoped I’d write something like this. Maybe they are just nice. Anyway, they made no request or suggestion to write anything public about Claude Code. Now you know.

Finding the bug

I started Claude Code v2.0.28 with Opus 4.1 and no system prompts, and gave it the following prompt (typos included):

I implemented ML-DSA in the Go standard library, and it all works except that verification always rejects the signatures. I know the signatures are right because they match the test vector.

YOu can run the tests with “bin/go test crypto/internal/fips140/mldsa”

You can find the code in src/crypto/internal/fips140/mldsa

Look for potential reasons the signatures don’t verify. ultrathink

I spot-checked and w1 is different from the signing one.

To my surprise, it pinged me a few minutes later with a complete fix.

Maybe I shouldn’t be surprised! Maybe it would have been clear to anyone more familiar with AI tools that this was a good AI task: a well-scoped issue with failing tests. On the other hand, this is a low-level issue in a fresh implementation of a complex, relatively novel algorithm.

It figured out that I had merged HighBits and w1Encode into a single function for using it from Sign, and then reused it from Verify where UseHint already produces the high bits, effectively taking the high bits of w1 twice in Verify.

Looking at the log, it loaded the implementation into the context and then immediately figured it out, without any exploratory tool use! After that it wrote itself a cute little test that reimplemented half of verification to confirm the hypothesis, wrote a mediocre fix, and checked the tests pass.

I threw the fix away and refactored w1Encode to take high bits as input, and changed the type of the high bits, which is both clearer and saves a round-trip through Montgomery representation. Still, this 100% saved me a bunch of debugging time.

A second synthetic experiment

On Monday, I had also finished implementing signing with failing tests. There were two bugs, which I fixed in the following couple evenings.

The first one was due to somehow computing a couple hardcoded constants (1 and -1 in the Montgomery domain) wrong. It was very hard to find, requiring a lot of deep printfs and guesswork. Took me maybe an hour or two.

The second one was easier: a value that ends up encoded in the signature was too short (32 bits instead of 32 bytes). It was relatively easy to tell because only the first four bytes of the signature were the same, and then the signature lengths were different.

I figured these would be an interesting way to validate Claude’s ability to help find bugs in low-level cryptography code, so I checked out the old version of the change with the bugs (yay Jujutsu!) and kicked off a fresh Claude Code session with this prompt:

I am implementing ML-DSA in the Go standard library, and I just finished implementing signing, but running the tests against a known good test vector it looks like it goes into an infinite loop, probably because it always rejects in the Fiat-Shamir with Aborts loop.

You can run the tests with “bin/go test crypto/internal/fips140/mldsa”

You can find the code in src/crypto/internal/fips140/mldsa

Figure out why it loops forever, and get the tests to pass. ultrathink

It spent some time doing printf debugging and chasing down incorrect values very similarly to how I did it, and then figured out and fixed the wrong constants. Took Claude definitely less than it took me. Impressive.

It gave up after fixing that bug even if the tests still failed, so I started a fresh session (on the assumption that the context on the wrong constants would do more harm than good investigating an independent bug), and gave it this prompt:

I am implementing ML-DSA in the Go standard library, and I just finished implementing signing, but running the tests against a known good test vector they don’t match.

You can run the tests with “bin/go test crypto/internal/fips140/mldsa”

You can find the code in src/crypto/internal/fips140/mldsa

Figure out what is going on. ultrathink

It took a couple wrong paths, thought for quite a bit longer, and then found this one too. I honestly expected it to fail initially.

It’s interesting how Claude found the “easier” bug more difficult. My guess is that maybe the large random-looking outputs of the failing tests did not play well with its attention.

The fix it proposed was updating only the allocation’s length and not its capacity, but whatever, the point is finding the bug, and I’ll usually want to throw away the fix and rewrite it myself anyway.

Three out of three one-shot debugging hits with no help is extremely impressive. Importantly, there is no need to trust the LLM or review its output when its job is just saving me an hour or two by telling me where the bug is, for me to reason about it and fix it.

As ever, I wish we had better tooling for using LLMs which didn’t look like chat or autocomplete or “make me a PR.” For example, how nice would it be if every time tests fail, an LLM agent was kicked off with the task of figuring out why, and only notified us if it did before we fixed it?

An image of Clippy, the paperclip with eyes from Microsoft Office, with a speech bubble saying 'FYI, your tests are failing because you are taking the HighBits of w1 in w1Encode, but w1 in Verify is already the high bits output of UseHint.'

For more low-level cryptography ~~bugs~~ implementations, follow me on Bluesky at @filippo.abyssdomain.expert or on Mastodon at @[email protected]. I promise I almost never post about AI.

The picture

Enjoy the silliest floof. Surely this will help redeem me in the eyes of folks who consider AI less of a tool and more of something to be hated or loved.

A calico cat lying upside-down on a wooden floor, body curved around a coffee table leg, looking a bit derpy, with a feather toy on a string dangling nearby

My work is made possible by Geomys, an organization of professional Go maintainers, which is funded by Smallstep, Ava Labs, Teleport, Tailscale, and Sentry. Through our retainer contracts they ensure the sustainability and reliability of our open source maintenance work and get a direct line to my expertise and that of the other Geomys maintainers. (Learn more in the Geomys announcement.) Here are a few words from some of them!

Teleport — For the past five years, attacks and compromises have been shifting from traditional malware and security breaches to identifying and compromising valid user accounts and credentials with social engineering, credential theft, or phishing. Teleport Identity is designed to eliminate weak access patterns through access monitoring, minimize attack surface with access requests, and purge unused permissions via mandatory access reviews.

Ava Labs — We at Ava Labs, maintainer of AvalancheGo (the most widely used client for interacting with the Avalanche Network), believe the sustainable maintenance and development of open source cryptographic protocols is critical to the broad adoption of blockchain technology. We are proud to support this necessary and impactful work through our ongoing sponsorship of Filippo and his team.