观看03年棋模在保罗·莫菲的“两步杀棋”中苦思冥想

观看03年棋模在保罗·莫菲的“两步杀棋”中苦思冥想
Watching o3 model sweat over a Paul Morphy mate-in-2

原始链接: https://alexop.dev/posts/how-03-model-tries-chess-puzzle/

03模型以一种令人惊讶地类似人类的方式解决了复杂的象棋难题。它首先一丝不苟地从图像中重建棋盘，展现出谨慎的分析能力。然后它探索了明显的走法，但很快又因为自身察觉到的缺陷而放弃了这些走法。03感到沮丧，尝试利用Python和象棋引擎，但由于缺少模块而失败了。不屈不挠的它顽强地尝试逐像素分析图像以推断棋子的位置。在对能否解决问题产生怀疑后，模型短暂地考虑过将国王置于中央。最终，在将近八分钟后，03求助于网络搜索，在一个象棋论坛上找到了答案（“Ra6”）。至关重要的是，它并没有盲目复制；它验证并理解了为什么此步棋会导致将死。这个过程突显了模型的解决问题的能力和它的局限性，表明对于复杂的、“创造性”的解决方案，需要外部辅助。

一个Hacker News的帖子讨论了OpenAI的o3模型尝试解决一个“将死两步”的象棋难题。用户foundry27详细描述了o3花了15分钟的过程：分析图像，尝试用Python编写象棋求解器，甚至尝试（但失败了）安装与象棋相关的包。虽然它没有成功，但它的推理过程却令人着迷。讨论围绕着o3的象棋表现究竟源于真正的推理还是仅仅是信息检索，因为它是在互联网内容上进行训练的。一些评论者对o3的图像分析、编码尝试和网络搜索解决方案印象深刻，强调了其多方面方法的复杂性。另一些人则不以为然，认为电脑应该很容易解决这样的难题，并将它的失败视为反对真正智能的证据，认为它只是在“谷歌搜索”答案。一些用户分享说他们几秒钟内就解决了这个难题，这表明测试并不难。

（评论） 2024-07-26

（评论） 2025-03-20

（评论） 2024-09-14

（评论） 2025-02-25

原文

When I gave 03 a tough chess puzzle, it behaved almost like a human: thinking, doubting, retrying, and finally googling the answer. 🤣

Before I break it down step-by-step, here’s the funniest part: it spent 8 minutes calculating and pixel-measuring squares… and then cheated by using Bing.

Comic LLM — How 03 Solves hard Chess Problems

First the Problem if you want you can try to solve it yourself

This is a famous chess puzzle from Paul Morphy check out this link for more.

1. Careful Board Analysis

First, 03 spent several minutes carefully reconstructing the board based on the image.

“The board has White’s King on c8, White’s Rook on a1, and a Pawn on b6. Black’s King on a8, Bishop on b8, and Pawns on a7 and b7.”

It was being very cautious. It wanted to fully understand the position before trying any move.
Already a very agent like behavior: care first, then move.

2. First Ideas and Doubts

It tested obvious ideas like b7# and Rxa7#, but immediately started doubting itself.

“After 1. Rxa7+, Black’s King cannot capture because of the White pawn on b6."
"But after Bxa7, it is no longer check. So it is not mate."
"1. b7+ is illegal because the pawn on b7 blocks it.”

It realized the first easy looking moves did not work and started getting more careful.

3. Attempt to Use Python

When pure reasoning was not enough, 03 tried programming its way out of the situation.

“I should probably check using something like a chess engine to confirm.”
(tries to import chess module, but fails: “ModuleNotFoundError”).

It wanted to run a simulation, but of course, it had no real chess engine installed.

4. Pixel-by-Pixel Image Analysis

Still stuck, it tried manually measuring the chessboard squares.

“The board is 726 pixels high, each square is about 88 pixels…"
"Coordinates for b7 would be approximately x=88–176, y=88–176.”

It was literally trying to deduce the pieces based on pixel math.

Massive stubbornness.

It also thought maybe that it could not be solved

“though I wonder if this is a typical “mate in 2” puzzle. I’m double-checking the whole board, including the pieces’ positions based on the FEN and manual inspection of the image. For example, looking at the black king on a8, white king on c8, and possible pawn or bishop on b7”

5. Still Thinking

03 started showing mild signs of panic. It was still calculating but was not sure about the best move.

“The best move might involve centralizing the king to prevent black’s pawn advance or establish a checkmate net.”

When it then was using bing

6. Cheating by Web Search

Finally, right before hitting 8 minutes total:

“I found a chess forum that mentions this exact position…"
"Suggested move: Ra6.”

It binged the answer. 😂
Still, it did not just copy. It rechecked and understood why Ra6 works.

03 doesn’t just output an answer. It reasons. It struggles. It switches tools. It self-corrects. It even cheats (but only after serious effort). This feels very human. It also shows where current models shine—problem-solving—and where they still need external help.

Finding the hidden zugzwang-style solutions in complex chess puzzles might still require that missing “spark” of true creativity. You can read more about that in my post: “Are LLMs Creative?”.

Full Conversation Link