Three things everyone should know about Vision Transformers

Centigonal · 2025-04-24T16:14:11 1745511251

There's something that tickles me about this paper's title. The thought that everyone should know these three things. The idea of going to my neighbor who's a retired K-12 teacher and telling her about how adding MLP-based patch pre-processing layers improves Bert-like self-supervised training based on patch masking.

woopwoop · 2025-04-24T16:28:36 1745512116

Clickbait titles are something of a tradition in this field by now. Some important paper titles include "One weird trick for parallelizing convolutional neural networks", "Attention is all you need", and "A picture is worth 16x16 words". Personally I still find it kind of irritating, but to each their own I guess.

minimaxir · 2025-04-24T16:35:18 1745512518

Only the first one is clickbait in the style of blogs that incentivize you to click on the headline (i.e. the information gap), the last two are just fun puns.

janalsncm · 2025-04-24T17:15:48 1745514948

Honestly I took the first one as making fun of that trope. Usually the “one weird trick to” ends in some tabloid-style thing like lose 15 pounds or find out if your husband is loyal. So “parallizing CNNs” is a joke, as if that’s something you’d see in a checkout isle.

woopwoop · 2025-04-24T17:25:30 1745515530

In what sense is "Attention is all you need" a pun?

minimaxir · 2025-04-24T17:29:03 1745515743

It's a reference to the lyric "love is all you need" from the song "All You Need Is Love" by the Beatles, and it uses a faux-synonym with a different meaning.

adultSwim · 2025-04-24T17:49:24 1745516964

"Attention is all you need" is an outlier. They backed up their bold claim with breakthrough results.

For modest incremental improvements, I greatly prefer boring technical titles. Not everything needs to a stochastic parrot. We see this dynamic with building luxury condos. On any individual project, making that pick will help juice profit. When the whole city follows that , it leads to a less desirable outcome.

pixl97 · 2025-04-24T16:25:42 1745511942

Hey, when the AI powered T-rex is chasing you down you'll wish you paid attention that the vision transformers perception is based on movement!

Had to throw some Jurassic Park humor in here.

guerrilla · 2025-04-24T16:59:10 1745513950

Yeah, I guess today was the day that I learned I am not part of "everyone". I feel so left out now.

i5heu · 2025-04-24T16:38:58 1745512738

I put this paper into 4o so i can check if it is relevant, so that you do not have to do this too here are the bullet points:

- Vision Transformers can be parallelized to reduce latency and improve optimization without sacrificing accuracy.

- Fine-tuning only the attention layers is often sufficient for adapting ViTs to new tasks or resolutions, saving compute and memory.

- Using MLP-based patch preprocessing improves performance in masked self-supervised learning by preserving patch independence.

Jamesoncrate · 2025-04-24T17:44:35 1745516675

just read the abstract

jmugan · 2025-04-24T17:51:05 1745517065

You would think. I don't know about this paper in particular, but I'm continually surprised about how much more I get out of LLM summaries of papers than the abstracts of papers written by the authors.

mananaysiempre · 2025-04-24T21:13:41 1745529221

Paper abstracts are not optimized by drive-by readers like you and me. They are optimized for active researchers in the field reading their daily arXiv digest that lists all the new papers across the categories they work in, and needing to take the read/don't-read decision for each entry there as efficiently as possible.

If you’ve already decided you’re interested in the paper, then the Introduction and/or Conclusion sections are what you’re looking for.

andai · 2025-04-25T17:25:19 1745601919

Wouldn't a more comprehensive, digestible bullet point summary be even more helpful to actual researchers choosing which papers to read?

tough · 2025-04-24T20:38:03 1745527083

This would be an interesting metric to track, how different an abstract generated from LLM giving it the paper as source, vs the actual abstract is, and if it has any correlation whatsoever with the overall quality of the paper or not

kridsdale3 · 2025-04-24T22:36:14 1745534174

Same. I don't think GP deserves the downvotes.

（评论） (comments)

（评论）
(comments)