![]() |
|
![]() |
|
> The code isn't that complicated, you could probably implement training and inference for a single model architecture, from scratch, on a single kind of GPU, with reasonable performance, as an individual with a background in programming and who still remembers their calculus and linear algebra, with a year or so of self study. Great overview. One gap I've been working on (daily) since October is the math working towards MA's Mathematics for Machine Learning course (https://mathacademy.com/courses/mathematics-for-machine-lear...). I wrote about my progress (http://gmays.com/math) if anyone else is interested in a similar path. I recently crossed 200 days of doing math daily (at least a lesson a day). It's definitely taking longer than I want, but I also have limited time (young kids + startup + investing). The 'year of self study' definitely depends on where you're starting from and how much time you have, but it's very doable if you can dedicate an hour or two a day. |
![]() |
|
> you could probably implement training and inference for a single model architecture, from scratch, on a single kind of GPU, with reasonable performance… with a year or so I have implemented inference of Whisper https://github.com/Const-me/Whisper and Mistral https://github.com/Const-me/Cgml/tree/master/Mistral/Mistral... models on all GPUs which support Direct3D 11.0 API. The performance is IMO very reasonable. A year might be required when the only input is the research articles. In practice, we also have reference Python implementations of these models. Possible to test different functions or compute shaders against the corresponding pieces from the reference implementations, by comparing saved output tensors between the reference and the newly built implementation. Due to that simple trick, I think I have spent less than 1 month part-time for each of these two projects. |
![]() |
|
IMO, SSMs are an optimization. They don't represent enough of a fundamental departure from the kinds of things Transformers can _do_. So, while I like the idea of saving on the energy costs, I speculate that such saving can be obtained with other optimizations while staying with transformer blocks. Hence, the motivation to change is a bit of an uphill here. I would love to hear counter-arguments to this view. :) Furthermore, I think a replacement will require that we _understand_ what the current crop of models are doing mechanically. Some of it was motivated in [1]. [1] https://openaipublic.blob.core.windows.net/neuron-explainer/... |
![]() |
|
Magical thinking. Nature uses gradient descent to evolve all of us and our companions on this planet. If something better were out there, we would see it at work in the natural world.
|
![]() |
|
There's definitely scientific insight and analysis. E.g. "In-context Learning and Induction Heads" is an excellent paper. Another paper ("ROME") https://arxiv.org/abs/2202.05262 formulates hypothesis over how these models store information, and provide experimental evidence. The thing is, a 3-layer MLP is basically an associative memory + a bit of compute. People understand that if you stack enough of them you can compute or memorize pretty much anything. Attention provides information routing. Again, that is pretty well-understood. The rest is basically finding an optimal trade-off. These trade-off are based on insights based on experimental data. So this architecture is not so much accidental as it is general. Specific representations used by MLPs are poorly understood, but there's definitely a progress on understanding them from first principles by building specialized models. |
![]() |
|
Note that not all brains are so severely damaged with this illusion. Most of them actually get pretty clearly that they are next to useless without its organic, social and environmental companions.
|
![]() |
|
Indeed. In my company Slack, our primary professional communications tool, I can count a few people with anime avatars. Not very many, but it counts.
|
![]() |
|
The lack of punctuation and capitalization is a weird zoomer style of writing in lowercase because "it's more chill." It is very common in people < 25 years old. They'll grow out of it.
|
![]() |
|
Does github need a cartoonish cat with 5 octopus-like legs to be its logo? Of course not, but it makes it memorable and funny. And besides, anime is extremely mainstream these days.
|
![]() |
|
I'm glad you enjoy anime girls but surely you can see why it's different than a project's logo? One is directly related to the project, the other isn't. It's not even contextually related. |
![]() |
|
I think you vastly overestimate how much people care about model censorship. There are a bunch of open models that aren't censored. Llama 3 is still way more popular because it's just smarter.
|
![]() |
|
They're suggesting that 99.99% of people don't mind if AI reflects biases of society. Which is weird because I'm pretty sure most people in the world aren't old white middle class Americans
|
![]() |
|
yes, yes, bias like the fact that Wehrmacht was not a human menagerie that 0.01% of the population insist we live in. https://www.google.com/search?q=gemini+german+soldier prompt-injected mandatory diversity has led to the most hilarious shit I've seen generative AI do so far. but, yes, of course, other instances of 'I reject your reality and substitute my own' - like depicting medieval Europe to be as diverse, vibrant and culturally enriched as American inner cities - those are doubleplusgood. |
![]() |
|
Encoding our current biases into LLMs is one way to go, but there's probably a better way to do it. Your leap to "thou shalt not search this" is missing the possible middle ground |
![]() |
|
I'd like to see this using ONNX and streaming from storage (I have my reasons, but mostly about using commodity hardware for "slow" batch processing without a GPU)
|
![]() |
|
At least they use punctuation. We've recently had a project on HN where the author used only lower cases and no punctuation because they equated it to being chained by the system.
|
![]() |
|
Author is probably young, that's how gen-z are these days, if they dont have autocorrect on, the whole text will be in lowercase. Also it looks more casual and authentic, less LLM generated |
Of course, this is not all there is to a modern LLM, it would probably take another thousand lines or two to implement training, and many more than that to make it fast on all the major CPU and GPU architectures. If you want a flexible framework that lets a developer define any model you want and still goes as fast as it can, the complexity spirals.
Most programmers have an intuition that duplicating a large software project from scratch, like Linux or Chromium for example, would require incredible amounts of expertise, manpower and time. It's not something that a small team can achieve in a few months. You're limited by talent, not hardware.
LLMs are very different. THe code isn't that complicated, you could probably implement training and inference for a single model architecture, from scratch, on a single kind of GPU, with reasonable performance, as an individual with a background in programming and who still remembers their calculus and linear algebra, with a year or so of self study. What makes LLMs difficult is getting access to all the hardware to train them, getting the data, and being able to preprocess that data.