语言模型是单射的，因此是可逆的。

语言模型是单射的，因此是可逆的。
Language models are injective and hence invertible

这篇研究论文挑战了Transformer语言模型由于非注入性组件而导致信息丢失的假设。作者通过数学和实验证明，这些模型实际上是**注入性**的——这意味着不同的输入*总是*映射到不同的输出，从而可以完美地重建原始文本。他们证明了这一特性存在于模型的初始状态，并在训练过程中持续存在，并通过对六个领先语言模型的数十亿次测试进行了确认，没有发现任何“碰撞”（不同的输入产生相同的输出）。此外，他们引入了**SipIt**，一种利用这种注入性从模型的隐藏激活中*精确*重建输入文本的算法，并保证了线性时间效率。这确立了可实践、可证明的可逆性。研究结果强调了注入性是语言模型的一个关键、可利用的特征，可能提高其部署的透明度、可解释性和安全性。

[Submitted on 17 Oct 2025 (v1), last revised 21 Oct 2025 (this version, v3)]

View a PDF of the paper titled Language Models are Injective and Hence Invertible, by Giorgos Nikolaou and 5 other authors

View PDF HTML (experimental)

Abstract:Transformer components such as non-linear activations and normalization are inherently non-injective, suggesting that different inputs could map to the same output and prevent exact recovery of the input from a model's representations. In this paper, we challenge this view. First, we prove mathematically that transformer language models mapping discrete input sequences to their corresponding sequence of continuous representations are injective and therefore lossless, a property established at initialization and preserved during training. Second, we confirm this result empirically through billions of collision tests on six state-of-the-art language models, and observe no collisions. Third, we operationalize injectivity: we introduce SipIt, the first algorithm that provably and efficiently reconstructs the exact input text from hidden activations, establishing linear-time guarantees and demonstrating exact invertibility in practice. Overall, our work establishes injectivity as a fundamental and exploitable property of language models, with direct implications for transparency, interpretability, and safe deployment.

From: Andrea Santilli [view email]
[v1] Fri, 17 Oct 2025 10:25:30 UTC (3,980 KB)
[v2] Mon, 20 Oct 2025 07:29:02 UTC (3,980 KB)
[v3] Tue, 21 Oct 2025 14:44:49 UTC (3,980 KB)

语言模型是单射的，因此是可逆的。 Language models are injective and hence invertible

语言模型是单射的，因此是可逆的。
Language models are injective and hence invertible