离线机器学习流水线 vs. 在线机器学习流水线

离线机器学习流水线 vs. 在线机器学习流水线
Offline vs. online ML pipelines

原始链接: https://decodingml.substack.com/p/offline-vs-online-ml-pipelines

本文强调了构建可扩展AI系统时离线和在线机器学习管道的关键区别。离线管道处理批量处理任务，例如数据收集、ETL、特征生成和模型训练，通常使用ZenML等框架进行编排。另一方面，在线管道是与用户直接交互的实时服务（推理管道）。文章强调了在生产环境中分离这些管道的重要性，这与入门材料中经常介绍的组合方法不同。这种分离对于构建强大高效的AI系统至关重要。文章介绍了一个用于微调小型语言模型 (SLM) 的数据集生成特征管道。此管道包括数据提取、探索、过滤、使用更强大的LLM进行摘要、质量控制、数据集分割、版本控制以及部署到Hugging Face。ZenML用于编排，确保可重复性、可追溯性和可扩展性。这种结构化的方法对于构建需要高质量数据集的专业AI应用程序的任何人来说都很有价值。

Hacker News 最新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录离线 vs. 在线机器学习管道 (decodingml.substack.com) rbanffy 1 天前 13 分 | 隐藏 | 过去 | 收藏 | 2 条评论 Ratelman 1 天前 | 下一条 [–] 可能是我遗漏了什么，但这篇文章怎么会出现在 Hacker News 首页？感觉更像是一则广告。回复 Havoc 1 天前 | 上一条 [–] Substack 的 cookie 功能对其他人也坏了吗？拒绝按钮不起作用。回复指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系我们搜索：

LLM 量化可视化指南 2024-07-31

机器学习工程在线图书 2024-01-24

Meta 如何大规模训练大型语言模型 2024-06-14

StreamDiffusion：实时交互生成的管道级解决方案 2023-12-25

原文

Offline vs. online ML pipelines
Your weekly AI edge – Subscribe now and unlock a FREE eBook
Architecting a dataset generation feature pipeline

If you don't separate offline and online pipelines now....

Scaling your AI system will break down later.

People often don’t fully grasp the distinction between offline and online pipelines.

This isn’t their fault - it’s the way we’re taught.

Most tutorials and courses present these concepts together in a single Notebook.

This is great for learning but deceiving when it comes to real-world production systems.

In reality, offline and online pipelines are completely different beasts.

(Especially when scaling AI in production)

Here's the difference:

These are batch pipelines that run on a schedule or are triggered by specific events.

They operate behind the scenes to handle large-scale processes like:

Data collection
ETL
Feature generation
Model training

In my work, we use MLOps frameworks like ZenML to manage and orchestrate these offline pipelines.

This allows them to run independently, decoupled from the immediate needs of the system.

These pipelines are real-time or near real-time services that interact directly with the end-user or application (usually labeled as inference pipelines)

Examples of inference pipelines in the GenAI world:

Workflows
Agents
LLM RESTful endpoints

These pipelines must be available 24/7, providing immediate responses, similar to deploying RESTful APIs in software engineering.

... and these are typically two completely different applications in production.

In our recent Second Brain AI Assistant course, we separated these pipelines by creating two different Python applications.

They're connected through storage layers (e.g., vector databases and model registries).

When the user asks a question, the system isn’t ingesting data in real-time...

It’s pulled from a pre-populated vector database built by offline pipelines.

Why does any of this matter?

Understanding this separation is crucial for anyone building scalable, production-level AI systems.

So, next time you see a Notebook combining offline and online tasks, remember:

That’s not how things work in the real world.

If you’re working in production environments, start thinking about how to decouple these pipelines.

(It will be extremely useful in helping you build robust, efficient AI systems.)

Check the first lesson from the Second Brain AI Assistant course to see this in action:

Join 39,000+ AI researchers, enthusiasts, and practitioners reading distilled insights on large language models, agents, and breakthroughs — every week.

Expert-Curated Insights – LLM research and techniques, clearly explained.
Top Tools & Frameworks – Handpicked open-source libraries and agent infra.
Real Signals, Zero Hype – Practical trends, experiments, and use cases—no noise.

The intricacies and breadth of GenAI and LLMs can sometimes eclipse their practical application. It is pivotal to understand the foundational concepts needed to implement Generative AI. This guide explains the core concepts behind -of-the-art generative models by combining theory and hands-on application.

Generative AI Foundations in Python begins by laying a foundational understanding, presenting the fundamentals of generative LLMs and their historical evolution, while also setting the stage for deeper exploration. You’ll also understand how to apply generative LLMs in real-world applications. The book cuts through the complexity and offers actionable guidance on deploying and fine-tuning pre-trained language models with Python.

Subscribe now and get the free eBook

I’ve been working with GenAI for 3+ years.

Here’s something all engineers must come to terms with:

If you’re building LLM-powered applications, at some point, you’ll need to generate high-quality datasets to fine-tune SLMs.

Why?

→ Fine-tuning SLMs reduces costs, latency, and throughput while maintaining high accuracy for specific tasks.

→ Some domains require specialized fine-tuning for better domain adaptation.

→ Fine-tuned models give you more control over AI behavior and response generation.

That’s exactly what we’re tackling with our Second Brain AI Assistant course.

... and today, I’m breaking down the dataset generation feature pipeline we built for fine-tuning our summarization SLM.

The input to our generation pipeline will be raw documents from MongoDB (Notion & crawled resources).

And the output is a high-quality summarization dataset published to Hugging Face’s dataset registry.

Since this pipeline generates features used to train an LLM, it’s called a feature pipeline.

Here’s how it works, step by step:

Data Extraction → Pulls raw documents from MongoDB and standardizes formatting.
Document Exploration → Analyzes length & quality scores distributions to make informed decisions.
Data Filtering → Removes low-value content, keeping only high-quality documents.
Summarization → We use a more powerful LLM (e.g., gpt-4o) to generate multiple summaries per document by varying temperature and sampling parameters (a process known as distillation)
Quality Control → Filters out poor-quality summaries.
Dataset Splitting → Divides data into training, evaluation, and test sets (done before storing the dataset and not at training time!)
Versioning & Deployment → Publishes the final dataset to Hugging Face.

To keep the pipeline reproducible, trackable, and scalable, we manage it using ZenML, which:

→ Orchestrates the entire workflow from extraction to deployment.

→ Ensures traceability & versioning of pipeline runs & datasets.

→ Allows dynamic configuration for different filtering, summarization & structuring techniques.

Even if you’re not deep into fine-tuning, at some point, you’ll need a structured way to generate datasets for specialized AI applications.

This is one of the most critical components of your pipeline.

Want to learn more?

Check out Lesson 3 from the Second Brain AI Assistant course:

Perks: Exclusive discounts on our recommended learning resources
(books, live courses, self-paced courses and learning platforms).
The LLM Engineer’s Handbook: Our bestseller book on teaching you an end-to-end framework for building production-ready LLM and RAG applications, from data collection to deployment (get up to 20% off using our discount code).
Free open-source courses: Master production AI with our end-to-end open-source courses, which reflect real-world AI projects and cover everything from system architecture to data collection, training and deployment.

If not otherwise stated, all images are created by the author.

离线机器学习流水线 vs. 在线机器学习流水线 Offline vs. online ML pipelines

离线机器学习流水线 vs. 在线机器学习流水线
Offline vs. online ML pipelines