基于JEPA的图像自监督学习

基于JEPA的图像自监督学习
Self-Supervised Learning from Images with JEPA (2023)

本文介绍了一种新颖的基于图像的联合嵌入预测架构（I-JEPA），这是一种避免手工数据增强的图像自监督学习方法。I-JEPA 通过预测同一图像内单个上下文块中多个目标块的表示来学习语义图像表示。其关键设计原则在于掩码策略，需要大的语义目标块和空间分布的、信息丰富的上下文块。这种设计鼓励 I-JEPA 学习有意义的语义特征。该方法是非生成式的，并展示了其与视觉转换器（Vision Transformers）强大的可扩展性。作者成功地在 16 个 A100 GPU 上用不到 72 小时的时间在 ImageNet 数据集上训练了一个 ViT-Huge/14 模型，并在各种下游任务（包括线性分类、目标计数和深度预测）中取得了具有竞争力的性能。结果表明，I-JEPA 是一种有效且高效的方法，无需依赖数据增强即可学习高质量的图像表示。

Hacker News 最新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录基于 JEPA 的图像自监督学习 (arxiv.org) Brysonbw 47分钟前 9 分 | 隐藏 | 过去 | 收藏 | 1 评论 justanotheratom 13分钟前 [–] JEPA大概优于Transformer。有专家能解释一下这篇论文的意义吗？回复加入我们 6 月 16-17 日在旧金山举办的 AI 初创公司学校！指导方针 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系我们搜索：

（评论） 2023-12-25

超越自注意力：小型语言模型如何预测下一个标记 2024-02-06

（评论） 2024-02-25

（评论） 2024-07-31

原文

[Submitted on 19 Jan 2023 (v1), last revised 13 Apr 2023 (this version, v3)]

View a PDF of the paper titled Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture, by Mahmoud Assran and 7 other authors

View PDF

Abstract:This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations. We introduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), a non-generative approach for self-supervised learning from images. The idea behind I-JEPA is simple: from a single context block, predict the representations of various target blocks in the same image. A core design choice to guide I-JEPA towards producing semantic representations is the masking strategy; specifically, it is crucial to (a) sample target blocks with sufficiently large scale (semantic), and to (b) use a sufficiently informative (spatially distributed) context block. Empirically, when combined with Vision Transformers, we find I-JEPA to be highly scalable. For instance, we train a ViT-Huge/14 on ImageNet using 16 A100 GPUs in under 72 hours to achieve strong downstream performance across a wide range of tasks, from linear classification to object counting and depth prediction.

From: Mahmoud Assran [view email]
[v1] Thu, 19 Jan 2023 18:59:01 UTC (3,080 KB)
[v2] Thu, 30 Mar 2023 18:28:46 UTC (3,077 KB)
[v3] Thu, 13 Apr 2023 17:59:37 UTC (6,252 KB)

基于JEPA的图像自监督学习 Self-Supervised Learning from Images with JEPA (2023)

基于JEPA的图像自监督学习
Self-Supervised Learning from Images with JEPA (2023)