Offline vs. online ML pipelines
Your weekly AI edge – Subscribe now and unlock a FREE eBook
Architecting a dataset generation feature pipeline
If you don't separate offline and online pipelines now....
Scaling your AI system will break down later.
People often don’t fully grasp the distinction between offline and online pipelines.
This isn’t their fault - it’s the way we’re taught.
Most tutorials and courses present these concepts together in a single Notebook.
This is great for learning but deceiving when it comes to real-world production systems.
In reality, offline and online pipelines are completely different beasts.
(Especially when scaling AI in production)
Here's the difference:
These are batch pipelines that run on a schedule or are triggered by specific events.
They operate behind the scenes to handle large-scale processes like:
Data collection
ETL
Feature generation
Model training
In my work, we use MLOps frameworks like ZenML to manage and orchestrate these offline pipelines.
This allows them to run independently, decoupled from the immediate needs of the system.
These pipelines are real-time or near real-time services that interact directly with the end-user or application (usually labeled as inference pipelines)
Examples of inference pipelines in the GenAI world:
Workflows
Agents
LLM RESTful endpoints
These pipelines must be available 24/7, providing immediate responses, similar to deploying RESTful APIs in software engineering.
... and these are typically two completely different applications in production.
In our recent Second Brain AI Assistant course, we separated these pipelines by creating two different Python applications.
They're connected through storage layers (e.g., vector databases and model registries).
When the user asks a question, the system isn’t ingesting data in real-time...
It’s pulled from a pre-populated vector database built by offline pipelines.
Why does any of this matter?
Understanding this separation is crucial for anyone building scalable, production-level AI systems.
So, next time you see a Notebook combining offline and online tasks, remember:
That’s not how things work in the real world.
If you’re working in production environments, start thinking about how to decouple these pipelines.
(It will be extremely useful in helping you build robust, efficient AI systems.)
Check the first lesson from the Second Brain AI Assistant course to see this in action:
Join 39,000+ AI researchers, enthusiasts, and practitioners reading distilled insights on large language models, agents, and breakthroughs — every week.
Expert-Curated Insights – LLM research and techniques, clearly explained.
Top Tools & Frameworks – Handpicked open-source libraries and agent infra.
Real Signals, Zero Hype – Practical trends, experiments, and use cases—no noise.
The intricacies and breadth of GenAI and LLMs can sometimes eclipse their practical application. It is pivotal to understand the foundational concepts needed to implement Generative AI. This guide explains the core concepts behind -of-the-art generative models by combining theory and hands-on application.
Generative AI Foundations in Python begins by laying a foundational understanding, presenting the fundamentals of generative LLMs and their historical evolution, while also setting the stage for deeper exploration. You’ll also understand how to apply generative LLMs in real-world applications. The book cuts through the complexity and offers actionable guidance on deploying and fine-tuning pre-trained language models with Python.
I’ve been working with GenAI for 3+ years.
Here’s something all engineers must come to terms with:
If you’re building LLM-powered applications, at some point, you’ll need to generate high-quality datasets to fine-tune SLMs.
Why?
→ Fine-tuning SLMs reduces costs, latency, and throughput while maintaining high accuracy for specific tasks.
→ Some domains require specialized fine-tuning for better domain adaptation.
→ Fine-tuned models give you more control over AI behavior and response generation.
That’s exactly what we’re tackling with our Second Brain AI Assistant course.
... and today, I’m breaking down the dataset generation feature pipeline we built for fine-tuning our summarization SLM.
The input to our generation pipeline will be raw documents from MongoDB (Notion & crawled resources).
And the output is a high-quality summarization dataset published to Hugging Face’s dataset registry.
Since this pipeline generates features used to train an LLM, it’s called a feature pipeline.
Here’s how it works, step by step:
Data Extraction → Pulls raw documents from MongoDB and standardizes formatting.
Document Exploration → Analyzes length & quality scores distributions to make informed decisions.
Data Filtering → Removes low-value content, keeping only high-quality documents.
Summarization → We use a more powerful LLM (e.g.,
gpt-4o
) to generate multiple summaries per document by varying temperature and sampling parameters (a process known as distillation)Quality Control → Filters out poor-quality summaries.
Dataset Splitting → Divides data into training, evaluation, and test sets (done before storing the dataset and not at training time!)
Versioning & Deployment → Publishes the final dataset to Hugging Face.
To keep the pipeline reproducible, trackable, and scalable, we manage it using ZenML, which:
→ Orchestrates the entire workflow from extraction to deployment.
→ Ensures traceability & versioning of pipeline runs & datasets.
→ Allows dynamic configuration for different filtering, summarization & structuring techniques.
Even if you’re not deep into fine-tuning, at some point, you’ll need a structured way to generate datasets for specialized AI applications.
This is one of the most critical components of your pipeline.
Want to learn more?
Check out Lesson 3 from the Second Brain AI Assistant course:
Perks: Exclusive discounts on our recommended learning resources
(books, live courses, self-paced courses and learning platforms).
The LLM Engineer’s Handbook: Our bestseller book on teaching you an end-to-end framework for building production-ready LLM and RAG applications, from data collection to deployment (get up to 20% off using our discount code).
Free open-source courses: Master production AI with our end-to-end open-source courses, which reflect real-world AI projects and cover everything from system architecture to data collection, training and deployment.
If not otherwise stated, all images are created by the author.