从记忆到推理:损失曲率谱系中
From Memorization to Reasoning in the Spectrum of Loss Curvature

原始链接: https://arxiv.org/abs/2510.24256

这项研究调查了记忆在Transformer模型(包括语言和视觉模型)中的表现形式。作者们证明,通过分析模型的损失景观的曲率可以识别出记忆信息——记忆数据会产生比非记忆数据更尖锐的曲率。 他们基于这种曲率开发了一种权重编辑技术,能够比现有方法*更*有效地减少不必要的记忆,同时保持整体语言流畅性。然而,这种编辑特别影响了依赖高度专业知识的任务,例如事实检索和算术,即使在保持更广泛的推理能力的同时。 该研究表明,这些任务利用了模型权重空间中独特且狭义定义的区域,并且消除与记忆相关的组件也会移除这些特定技能的关键要素。这项工作为了解神经网络中的记忆提供了更深入的理解,并提供了一种有针对性的移除方法,突出了模型中存在的特殊结构。

这次黑客新闻的讨论围绕着一篇最近的 arXiv 论文,探讨**神经网络中的损失曲率与记忆和推理之间的区别**。该论文提出了一种使用 K-FAC 分析权重矩阵的方法,将其分解为基于曲率的组件。 与最初的假设相反,研究表明**较低的曲率对应于记忆**,因为这些方向在多个示例中保持一致,而**较高的曲率与泛化相关**——代表更尖锐、特定于示例的记忆。这与更好的泛化发生在损失景观中更宽、更平滑的最小值内的观点一致。 评论员将此与**Sharpness Aware Minimization (SAM)** 和 Karpathy 关于模型优先考虑推理而非死记硬背的愿景相提并论,可能利用外部“预言家”进行事实回忆。核心思想是识别并可能移除记忆组件,提炼出一个“推理核心”或为改进泛化释放空间。讨论还涉及 LLM 环境下“事实”的构成这一哲学问题,提倡学习*像*乘法这样的算法,而不是记忆乘法表。
相关文章

原文

View a PDF of the paper titled From Memorization to Reasoning in the Spectrum of Loss Curvature, by Jack Merullo and 3 other authors

View PDF HTML (experimental)
Abstract:We characterize how memorization is represented in transformer models and show that it can be disentangled in the weights of both language models (LMs) and vision transformers (ViTs) using a decomposition based on the loss landscape curvature. This insight is based on prior theoretical and empirical work showing that the curvature for memorized training points is much sharper than non memorized, meaning ordering weight components from high to low curvature can reveal a distinction without explicit labels. This motivates a weight editing procedure that suppresses far more recitation of untargeted memorized data more effectively than a recent unlearning method (BalancedSubnet), while maintaining lower perplexity. Since the basis of curvature has a natural interpretation for shared structure in model weights, we analyze the editing procedure extensively on its effect on downstream tasks in LMs, and find that fact retrieval and arithmetic are specifically and consistently negatively affected, even though open book fact retrieval and general logical reasoning is conserved. We posit these tasks rely heavily on specialized directions in weight space rather than general purpose mechanisms, regardless of whether those individual datapoints are memorized. We support this by showing a correspondence between task data's activation strength with low curvature components that we edit out, and the drop in task performance after the edit. Our work enhances the understanding of memorization in neural networks with practical applications towards removing it, and provides evidence for idiosyncratic, narrowly-used structures involved in solving tasks like math and fact retrieval.
From: Jack Merullo [view email]
[v1] Tue, 28 Oct 2025 10:09:35 UTC (2,148 KB)
[v2] Fri, 31 Oct 2025 00:26:33 UTC (2,148 KB)
联系我们 contact @ memedata.com