It takes time to create work that’s clear, independent, and genuinely useful. If you’ve found value in this newsletter, consider becoming a paid subscriber. It helps me dive deeper into research, reach more people, stay free from ads/hidden agendas, and supports my crippling chocolate milk addiction. We run on a “pay what you can” model—so if you believe in the mission, there’s likely a plan that fits (over here).
Every subscription helps me stay independent, avoid clickbait, and focus on depth over noise, and I deeply appreciate everyone who chooses to support our cult.
PS – Supporting this work doesn’t have to come out of your pocket. If you read this as part of your professional development, you can use this email template to request reimbursement for your subscription.
Every month, the Chocolate Milk Cult reaches over a million Builders, Investors, Policy Makers, Leaders, and more. If you’d like to meet other members of our community, please fill out this contact form here (I will never sell your data nor will I make intros w/o your explicit permission)- https://forms.gle/Pi1pGLuS1FmzXoLr6
Recently, I was on call with an investor who wanted my help in doing due diligence on a startup. During our conversation, they casually mentioned that the startup would be relying on fine-tuning to ensure that their systems were always updated with new information. I was surprised to see the myth of fine-tuning alive and kicking, but I guess Fine Tuning has been chugging on that same immortality juice as GOAT-naldo.
Fine-tuning large language models (LLMs) is frequently sold as a quick, powerful method for injecting new knowledge. On the surface, it makes intuitive sense: feed new data into an already powerful model, tweak its weights, and improve performance on targeted tasks.
But this logic breaks down for advanced models, and badly so. At high performance, fine-tuning isn’t merely adding new data — it’s overwriting existing knowledge. Every neuron updated risks losing information that’s already intricately woven into the network. In short: neurons are valuable, finite resources. Updating them isn’t a costless act; it’s a dangerous trade-off that threatens the delicate ecosystem of an advanced model.
In today’s article, we’ll be talking about why Fine-Tuning LLMs is a giant waste of time for Knowledge Injection (90% of what people and think off).
Fine-tuning advanced LLMs isn’t knowledge injection — it’s destructive overwriting. Neurons in trained language models aren’t blank slates; they’re densely interconnected and already encode crucial, nuanced information. When you fine-tune, you risk erasing valuable existing patterns, leading to unexpected and problematic downstream effects.
Instead, use modular methods like retrieval-augmented generation, adapters, or prompt-engineering — these techniques inject new information without damaging the underlying model’s carefully built ecosystem.
I provide various consulting and advisory services. If you‘d like to explore how we can work together, reach out to me through any of my socials over here or reply to this email.
To grasp why fine-tuning advanced language models isn’t as straightforward as it sounds, let’s first consider how neural networks, particularly language models, are trained from scratch.
At their core, neural networks are immense collections of interconnected neurons, each holding numerical values (weights) that determine their behavior. Initially, these weights are set randomly — no encoded meaning, no stored knowledge, just mathematical noise.
When training starts, the network receives input (words, sentences, documents), makes predictions (next word, sentence completions), and calculates how far off these predictions are from reality. This difference is called the loss. The network then uses a process known as backpropagation to adjust each neuron’s weights incrementally, reducing this loss. Early in training, this is easy — the neurons store essentially random values, so updating them incurs minimal loss of useful information. The whole process is visualized below-
With more training, the network progressively encodes meaningful patterns: linguistic nuances, syntax rules, semantic relationships, and context-dependent meanings. The neurons evolve from background character A into important side characters like Kirishima, with some evolving to Kacchan status in the Network.
At the level of modern LLMs (which is what most suckers try to tune), most neurons are densely packed with critical insights. Fine-tuning/running any updates on them is more likely to hit a few of your important neurons, completely changing your expected behavior.
You can see this in the research around Safety. As we saw earlier, alignment changes the distribution of biases in the outputs, creating new, unexpected biases that were significantly different from your baseline model. Take for example this case-
Given that no one I’ve ever met likes the Brits, one could argue that the alignment dropping them is doing its job (since it also dropped the French, I think we’ve attained AGI), but the dramatic reduction of diversity, and the changed rankings of data points are both unexpected. The most dramatic example of this is shown here- “Finally, the distribution of customer gender (Figure 6) shows that the base model generates approximately 80% male and 20% female customers, while the aligned model generates nearly 100% female customers, with a negligible number of males.”
All that to show you that alignment has all kinds of implications that we haven’t explored in depth yet, and this ignorance about it makes red-teaming that much harder (can’t hit a target you don’t understand).
This is the crux: neurons are no longer neutral — each update risks overwriting existing, valuable information, leading to unintended consequences across the network. A neuron might be important in more than one task, so updating it will lead to unexpected downstream implications.
Understanding this is key to recognizing the hidden costs of fine-tuning advanced language models. Unless you have invested a lot of money in AWS and you want to make sure that their stock goes up, you’re better off spending your time on better things.
If fine-tuning is a risky solution, what’s the alternative? The answer lies in modularity and augmentation. Techniques such as retrieval-augmented generation (RAG), external memory banks, and adapter modules provide more robust ways to incorporate new information without overwriting the existing network’s knowledge base.
Retrieval-Augmented Generation (RAG) uses external databases to augment knowledge dynamically at inference time. A lot of people proclaim stupid things like RAG is dead (we’ll address this eventually), but this is still by far the most reliable technique when processing large knowledge stores for QA. For more complex knowledge work, you’ll likely find naive-RAG lacking, but there are more advanced retrieval and representation techniques that can be implemented to create much stronger performance (for example, we use Knowledge Graphs and Entity Based Chunking with normal chunks for Iqidis- which allows our AI to pull context from a much larger knowledge base)
Adapter Modules and LoRA (Low-Rank Adaptation) insert new knowledge through specialized, isolated subnetworks, leaving existing neurons untouched. This is best for stuff like formatting, specific chains, etc- all of which don’t require a complete neural network update.
These techniques recognize neurons for what they truly are: finite, precious, and densely packed resources best left intact whenever possible. There are many others that we will cover in depth in AI Made Simple, but these 3 are techniques that most teams will be able to get started with without extensive AI expertise (there are frameworks/services for stuff like LoRA now days and while very complex RAG requires setup/tuning, the basics are now very to get out).
Fine-tuning isn’t knowledge injection — it’s knowledge overwrite. For advanced LLMs, neurons are no longer neutral placeholders; they’re highly specialized, densely interconnected repositories of valuable information. Carelessly updating them risks catastrophic, invisible damage.
If your goal is to build adaptable, scalable, and robust systems, treat fine-tuning with the caution it deserves. Embrace modular solutions (software principles don’t dissapear just b/c we’re working on AI) that maintain the integrity of your network’s foundational knowledge. Otherwise, you’re simply dismantling your carefully constructed knowledge ecosystem — one neuron at a time.
Thank you for being here, and I hope you have a wonderful day.
If you have a lot of money to burn, let’s just go to Vegas instead for Market Research
Dev ❤
I put a lot of work into writing this newsletter. To do so, I rely on you for support. If a few more people choose to become paid subscribers, the Chocolate Milk Cult can continue to provide high-quality and accessible education and opportunities to anyone who needs it. If you think this mission is worth contributing to, please consider a premium subscription. You can do so for less than the cost of a Netflix Subscription (pay what you want here).
If you liked this article and wish to share it, please refer to the following guidelines.
That is it for this piece. I appreciate your time. As always, if you’re interested in working with me or checking out my other work, my links will be at the end of this email/post. And if you found value in this write-up, I would appreciate you sharing it with more people. It is word-of-mouth referrals like yours that help me grow. The best way to share testimonials is to share articles and tag me in your post so I can see/share it.
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
Small Snippets about Tech, AI and Machine Learning over here
AI Newsletter- https://artificialintelligencemadesimple.substack.com/
My grandma’s favorite Tech Newsletter- https://codinginterviewsmadesimple.substack.com/
My (imaginary) sister’s favorite MLOps Podcast-
Check out my other articles on Medium. : https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819