LLM 编码了问题的难度。
LLMs encode how difficult problems are

原始链接: https://arxiv.org/abs/2510.18147

这项研究调查了大型语言模型(LLM)如何表示问题难度,以及这种表示是否与人类理解一致。研究人员发现,LLM能够以高精度解码人类标注的难度,并且这种能力随着模型规模的增大而提高。然而,从LLM自身推导出的难度估计较弱,并且无法随着规模的增大而有效提升。 有趣的是,操纵LLM使其专注于“更简单”的表示,可以提高准确性并减少幻觉。此外,在强化学习过程中,LLM识别人类定义难度的能力随着性能的提高而*增加*,而其内部推导出的难度度量却变得* менее*准确。 这表明LLM最初难以准确评估难度,并且强化学习实际上可能会恶化这种内部评估,同时放大来自人类标注难度的信号。研究人员已发布他们的代码以供进一步研究。

## LLM 与问题难度:总结 一篇最新的研究论文(arxiv.org)探讨了大型语言模型(LLM)如何感知和解决问题,发现它们不像人类那样可靠地判断难度。核心观点是,人类对问题难度的评估与成功解决问题的能力相关,这表明它可能是一个有价值的神经网络训练信号。 讨论的中心在于,LLM 本质上是高级类比机器,推理是类比的链条。一些人认为 LLM 通过压缩训练数据来运作,当问题在这些数据中没有很好地表示时就会遇到困难。另一些人则认为“推理”是引导向量空间内的对话,使其朝向相关的输出方向发展。 一个反复出现的主题是,LLM 经常*幻觉*估计任务难度和时间,这很可能反映了其训练数据中大量项目估算文档中存在的模式——这些文档通常充满了人为的不准确性。最终,许多贡献者建议重新构建我们对 LLM 的理解,将其视为由压缩数据驱动的复杂文本补全工具,而不是真正“智能”的问题解决者。
相关文章

原文

[Submitted on 20 Oct 2025]

View a PDF of the paper titled LLMs Encode How Difficult Problems Are, by William Lugoloobi and 1 other authors

View PDF HTML (experimental)
Abstract:Large language models exhibit a puzzling inconsistency: they solve complex problems yet frequently fail on seemingly simpler ones. We investigate whether LLMs internally encode problem difficulty in a way that aligns with human judgment, and whether this representation tracks generalization during reinforcement learning post-training. We train linear probes across layers and token positions on 60 models, evaluating on mathematical and coding subsets of Easy2HardBench. We find that human-labeled difficulty is strongly linearly decodable (AMC: $\rho \approx 0.88$) and exhibits clear model-size scaling, whereas LLM-derived difficulty is substantially weaker and scales poorly. Steering along the difficulty direction reveals that pushing models toward "easier" representations reduces hallucination and improves accuracy. During GRPO training on Qwen2.5-Math-1.5B, the human-difficulty probe strengthens and positively correlates with test accuracy across training steps, while the LLM-difficulty probe degrades and negatively correlates with performance. These results suggest that human annotations provide a stable difficulty signal that RL amplifies, while automated difficulty estimates derived from model performance become misaligned precisely as models improve. We release probe code and evaluation scripts to facilitate replication.
From: William Gitta Lugoloobi [view email]
[v1] Mon, 20 Oct 2025 22:48:23 UTC (1,102 KB)
联系我们 contact @ memedata.com