乐杰帕

乐杰帕
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

## LeJEPA：一种新的自监督学习方法本文介绍**LeJEPA**，一种建立在联合嵌入预测架构（JEPAs）基础上的新型自监督学习目标。针对现有JEPAs缺乏理论基础和实践指导的问题，LeJEPA提供了一种可扩展且理论合理的替代方案。其核心创新在于确定各向同性高斯分布是嵌入表示的理想选择，并引入**草图各向同性高斯正则化（SIGReg）**来强制执行这一点。这使得训练过程更加简化，只有一个超参数，线性复杂度，并且在各种架构（ResNets、ViTs、ConvNets）和数据集上具有稳定性。值得注意的是，LeJEPA消除了常见的启发式方法，如停止梯度或师生设置，将实现简化至约50行代码，并能够进行高效的分布式训练。在10多个数据集上的经验结果表明，LeJEPA具有强大的性能——使用线性评估在ImageNet-1k上使用ViT-H/14达到79%的准确率——并突出了LeJEPA有潜力重振自监督预训练作为人工智能研究的一个基本领域。代码可在GitHub上获取。

## JEPA：一种新的自监督学习方法最近的Hacker News讨论集中在JEPA（联合嵌入预测架构）上，这是一种由Yann LeCun提出的自监督学习方法。JEPA旨在超越自回归大型语言模型（LLM）的局限性，通过预测来学习世界模型，而不仅仅是预测下一个token。 LeCun认为，下一个token的预测是一种低效的训练目标，并且Transformer虽然强大，但并非最终的架构。他提倡能量模型，它可以更好地识别和拒绝无意义的状态。然而，讨论显示出怀疑态度。一些评论员认为对LLM的批评通常很薄弱，并且JEPA尚未在规模上展示出具有竞争力的性能。人们对基准比较提出了担忧，特别是JEPA在特定数据集上的表现与通用模型相比。尽管如此，许多人仍然对人工智能的架构创新持乐观态度，认为仍然可以实现显著的效率提升。一些用户分享了他们自己的实验经验，结果好坏参半。总的来说，这次对话凸显了关于人工智能架构和训练方法未来的持续争论。

[Submitted on 11 Nov 2025 (v1), last revised 14 Nov 2025 (this version, v3)]

View a PDF of the paper titled LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics, by Randall Balestriero and 1 other authors

View PDF HTML (experimental)

Abstract:Learning manipulable representations of the world and its dynamics is central to AI. Joint-Embedding Predictive Architectures (JEPAs) offer a promising blueprint, but lack of practical guidance and theory has led to ad-hoc R&D. We present a comprehensive theory of JEPAs and instantiate it in {\bf LeJEPA}, a lean, scalable, and theoretically grounded training objective. First, we identify the isotropic Gaussian as the optimal distribution that JEPAs' embeddings should follow to minimize downstream prediction risk. Second, we introduce a novel objective--{\bf Sketched Isotropic Gaussian Regularization} (SIGReg)--to constrain embeddings to reach that ideal distribution. Combining the JEPA predictive loss with SIGReg yields LeJEPA with numerous theoretical and practical benefits: (i) single trade-off hyperparameter, (ii) linear time and memory complexity, (iii) stability across hyper-parameters, architectures (ResNets, ViTs, ConvNets) and domains, (iv) heuristics-free, e.g., no stop-gradient, no teacher-student, no hyper-parameter schedulers, and (v) distributed training-friendly implementation requiring only $\approx$50 lines of code. Our empirical validation covers 10+ datasets, 60+ architectures, all with varying scales and domains. As an example, using imagenet-1k for pretraining and linear evaluation with frozen backbone, LeJEPA reaches 79\% with a ViT-H/14. We hope that the simplicity and theory-friendly ecosystem offered by LeJEPA will reestablish self-supervised pre-training as a core pillar of AI research (\href{this https URL}{GitHub repo}).

From: Randall Balestriero [view email]
[v1] Tue, 11 Nov 2025 18:21:55 UTC (12,072 KB)
[v2] Wed, 12 Nov 2025 14:26:39 UTC (12,072 KB)
[v3] Fri, 14 Nov 2025 08:38:32 UTC (12,072 KB)

乐杰帕 LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

乐杰帕
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics