乐杰帕
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

原始链接: https://arxiv.org/abs/2511.08544

## LeJEPA:一种新的自监督学习方法 本文介绍**LeJEPA**,一种建立在联合嵌入预测架构(JEPAs)基础上的新型自监督学习目标。针对现有JEPAs缺乏理论基础和实践指导的问题,LeJEPA提供了一种可扩展且理论合理的替代方案。 其核心创新在于确定各向同性高斯分布是嵌入表示的理想选择,并引入**草图各向同性高斯正则化(SIGReg)**来强制执行这一点。这使得训练过程更加简化,只有一个超参数,线性复杂度,并且在各种架构(ResNets、ViTs、ConvNets)和数据集上具有稳定性。 值得注意的是,LeJEPA消除了常见的启发式方法,如停止梯度或师生设置,将实现简化至约50行代码,并能够进行高效的分布式训练。在10多个数据集上的经验结果表明,LeJEPA具有强大的性能——使用线性评估在ImageNet-1k上使用ViT-H/14达到79%的准确率——并突出了LeJEPA有潜力重振自监督预训练作为人工智能研究的一个基本领域。代码可在GitHub上获取。

## JEPA:一种新的自监督学习方法 最近的Hacker News讨论集中在JEPA(联合嵌入预测架构)上,这是一种由Yann LeCun提出的自监督学习方法。JEPA旨在超越自回归大型语言模型(LLM)的局限性,通过预测来学习世界模型,而不仅仅是预测下一个token。 LeCun认为,下一个token的预测是一种低效的训练目标,并且Transformer虽然强大,但并非最终的架构。他提倡能量模型,它可以更好地识别和拒绝无意义的状态。 然而,讨论显示出怀疑态度。一些评论员认为对LLM的批评通常很薄弱,并且JEPA尚未在规模上展示出具有竞争力的性能。人们对基准比较提出了担忧,特别是JEPA在特定数据集上的表现与通用模型相比。 尽管如此,许多人仍然对人工智能的架构创新持乐观态度,认为仍然可以实现显著的效率提升。一些用户分享了他们自己的实验经验,结果好坏参半。总的来说,这次对话凸显了关于人工智能架构和训练方法未来的持续争论。
相关文章

原文

View a PDF of the paper titled LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics, by Randall Balestriero and 1 other authors

View PDF HTML (experimental)
Abstract:Learning manipulable representations of the world and its dynamics is central to AI. Joint-Embedding Predictive Architectures (JEPAs) offer a promising blueprint, but lack of practical guidance and theory has led to ad-hoc R&D. We present a comprehensive theory of JEPAs and instantiate it in {\bf LeJEPA}, a lean, scalable, and theoretically grounded training objective. First, we identify the isotropic Gaussian as the optimal distribution that JEPAs' embeddings should follow to minimize downstream prediction risk. Second, we introduce a novel objective--{\bf Sketched Isotropic Gaussian Regularization} (SIGReg)--to constrain embeddings to reach that ideal distribution. Combining the JEPA predictive loss with SIGReg yields LeJEPA with numerous theoretical and practical benefits: (i) single trade-off hyperparameter, (ii) linear time and memory complexity, (iii) stability across hyper-parameters, architectures (ResNets, ViTs, ConvNets) and domains, (iv) heuristics-free, e.g., no stop-gradient, no teacher-student, no hyper-parameter schedulers, and (v) distributed training-friendly implementation requiring only $\approx$50 lines of code. Our empirical validation covers 10+ datasets, 60+ architectures, all with varying scales and domains. As an example, using imagenet-1k for pretraining and linear evaluation with frozen backbone, LeJEPA reaches 79\% with a ViT-H/14. We hope that the simplicity and theory-friendly ecosystem offered by LeJEPA will reestablish self-supervised pre-training as a core pillar of AI research (\href{this https URL}{GitHub repo}).
From: Randall Balestriero [view email]
[v1] Tue, 11 Nov 2025 18:21:55 UTC (12,072 KB)
[v2] Wed, 12 Nov 2025 14:26:39 UTC (12,072 KB)
[v3] Fri, 14 Nov 2025 08:38:32 UTC (12,072 KB)
联系我们 contact @ memedata.com