![]() |
|
![]() |
| Using PyTorch is not "LLMs from the ground up".
It's a fine PyTorch tutorial but let's not pretend it's something low level. |
![]() |
| You could always go deeper and from some points of view, it's not "from the ground up" enough unless you build your own autograd and tensors from plain numpy arrays. |
![]() |
| But isn't it the beauty of llm's that they need comparably little preparation (unstructured text as input) and pick the features on their own so to say?
edit: grammar |
![]() |
| Yes, if you want an LLM that doesn't listen to instructions and just endlessly babbles about anything and everything.
What turned GPT into chatGPT was a lot of structured training with human feedback. |
![]() |
| Quite a cry, in a submission page from one of the most language "obsessed" in this community.
Now: "code" is something you establish - as the content of the codex medium (see https://en.wikipedia.org/wiki/Codex for its history); from the field of law, a set of rules, exported in use to other domains since at least the mid XVI century in English. "Program" is something you publish, with the implied content of a set of intentions ("first we play Bach then Mozart" - the use postdates "code"-as-"set of rules" by centuries). "Develop" is something you unfold - good, but it does not imply "rules" or "[sequential] process" like the other two terms. |
![]() |
| > If that's not the signal the author wants to send
You can't use a language that will be taken by everyone the same way. The public is heterogeneous - its subsets will use different "codes". |
![]() |
| > software development
Wrong angle. There is a problem, your consideration of the problem, the refinement of your solution to the problem: the solution gradually unfolds - it is developed. |
![]() |
| This is great! Hope it works on a Windows 11 machine too (I often find that when Windows isn't explicitly mentioned, the code isn't tested on it and usually fails to work due to random issues). |
![]() |
| No, I mean, a transformer is a very specific model architecture, and your simple language model has nothing to do with that architecture. Unless I’m missing something. |
![]() |
| I still call it a transformer because the inputs are tokenized and computed to produce completions, not from lookups or assembling based on rules.
> Unless I'm missing something. Only that I said "without taking the LLM approach" meaning tokens aren't scored in high-dimensional vectors, just as far simpler JSON bigrams. I don't think that disqualifies using the term "transformer" - I didn't want to call it a "computer" or a "completer". Have a better word? > JSON instead of vectors I did experiment with a low-dimensional vector approach from scratch, you can paste this into your browser console: https://gist.github.com/bennyschmidt/ba79ba64faa5ba18334b4ae... But the n-gram approach is better, I don't think vectors start to pull away on accuracy until they are capturing a lot more contextual information (where there is already a lot of context inferred from the structure of an n-gram). |
![]() |
| And it fits the definition doesn't it since it tokenizes inputs to compute them against pre-trained ones, rather than being based on rules/lookups or arbitrary logic/algorithms?
Even in CSS a matrix "transform" is the same concept - the word "transform" is not unique to language models, more a reference to how 1 set of data becomes another by way of computation. Same with tile engines / game dev. Say I wanted to rotate a map, this could be a simple 2D tic-tac-toe board or a 3D MMO tile map, anything in between: Input [
]Output [
]The method that takes the input and gives that output is called a "transformer" because it is not looking up some rule that says where to put the new values, it's performing math on the data structure whose result determines the new values. It's not unique to language models. If anything vector word embeddings are much later to this concept than math and game dev. An example of use of word "Transformer" outside language models in JavaScript is Three.js' https://threejs.org/docs/#examples/en/controls/TransformCont... I used Three.js to build https://www.playshadowvane.com/ - built the engine from scratch and recall working with vectors (e.g. THREE Vector3 for XYZ stuff) years before they were being popularized by LLMs. |
![]() |
| I get this question only on Hacker News, and am baffled as to why (and also the question "isn't this just n-grams, nothing more?").
https://github.com/bennyschmidt/next-token-prediction ^ If you look at this GitHub repo, should be obvious it's a token prediction library - the video of the browser demo shown there clearly shows it being used with an to autocomplete text based on your domain-specific data. Is THAT a Markov chain, nothing more? What a strange question, the answer is an obvious "No" - it's a front-end library for predicting text and pixels (AKA tokens). https://github.com/bennyschmidt/llimo This project, which uses the aforementioned library is a chat bot. There's an added NLP layer that uses parts-of-speech analysis to transform your inputs into a cursor that is completed (AKA "answered"). See the video where I am chatting with the bot about Paris? Is that nothing more than a standard Markov chain? Nothing else going on? Again the answer is an obvious "No" it's a chat bot - what about the NLP work, or the chat interface, etc. makes you ask if it's nothing more than a standard [insert vague philosophical idea]? To me, your question is like when people were asking if jQuery "is just a monad"? I don't understand the significance of the question - jQuery is a library for web development. Maybe there are some similarities to this philosophical concept "monad"? See: https://stackoverflow.com/questions/10496932/is-jquery-a-mon... It's like saying "I looked at your website and have concluded it is nothing more than an Array." |
![]() |
| This page is just a container for a youtube video. I suggest updating this HN link to point to the video directly, which contains the same links as the page in its description. |
![]() |
| yeah really valuable stuff. so we know how the ginormous model that we can't train or host works (putting practice there are so many hacks and optimizations that none of them work like this). great. |
![]() |
| Not true.
Your resource is really bad. "We'll then load the trained GPT-2 model weights released by OpenAI into our implementation and generate some text." |
![]() |
| I’m not sure why you’d want to build an LLM these days - you won’t be able to train it anyway. It’d make a lot of sense to teach people how to build stuff with LLMs, not LLMs themselves. |
Andrej's series is excellent, Sebastian's book + this video are excellent. There's a lot of overlap but they go into more detail on different topics or focus on different things. Andrej's entire series is absolutely worth watching, his upcoming Eureka Labs stuff is looking extremely good too. Sebastian's blog and book are definitely worth the time and money IMO.