RisingWave：一个开源的流处理与管理平台

RisingWave：一个开源的流处理与管理平台
RisingWave: An Open‑Source Stream‑Processing and Management Platform

原始链接: https://github.com/risingwavelabs/risingwave

## RisingWave：实时数据流处理与管理 RisingWave是一个流处理平台，旨在提供简单、经济高效的实时数据分析。它独特地将流处理*与*基于开放的Apache Iceberg™格式的内置存储和持久化结合在一起。 RisingWave每秒可以摄取数百万个事件，来源包括流式和批量数据，从而可以持续分析实时数据以及历史信息。它提供熟悉的Postgres兼容SQL接口和Python DataFrame API用于查询。主要特性包括：低延迟在线服务、通过Iceberg实现持久的离线存储、自动状态管理以及与现有Postgres工具的无缝集成。其弹性磁盘缓存优化性能并降低云存储成本。 RisingWave在流式分析（仪表盘、交易）、事件驱动型应用（欺诈检测）、实时数据增强以及机器学习特征工程等用例中表现出色。它支持独立部署、Docker和Kubernetes部署，并提供托管云选项。

## RisingWave：一个新的流处理平台 RisingWave 是一个开源的流处理和管理平台，近期备受关注，尤其是在 Kafka 会议上的展示之后。它被拿来与 Materialize 等工具进行比较，提供类似的功能用于实时数据分析。许多评论者分享了他们的经验和潜在用例。一位用户成功地用 ClickHouse 物化视图取代了 Timescale 的连续聚合，简化了设置并获得了良好的性能。其他人正在探索使用 RisingWave 来取代昂贵的批量分析任务，目标是实现即时、增量更新。提及的用例包括实时分析、事件驱动型应用以及基于数据存在/不存在触发操作。虽然该项目看起来很有前景，但目前生产环境下的经验报告还比较有限。讨论主要集中在其相对于传统发布/订阅系统（如 Redis）在复杂转换和维护中间状态（如移动平均线）方面的潜在优势。该平台的架构和 SQL 的使用也受到了赞扬。

原文

🌊 Ride the Wave of Streaming Data.

Docs | Benchmarks | Demos

RisingWave is a stream processing and management platform designed to offer the simplest and most cost-effective way to process, analyze, and manage real-time event data — with built-in support for the Apache Iceberg™ open table format. It provides both a Postgres-compatible SQL interface and a DataFrame-style Python interface.

RisingWave can ingest millions of events per second, continuously join and analyze live streams with historical data, serve ad-hoc queries at low latency, and persist fresh, consistent results to Apache Iceberg™ or any other downstream system.

Install RisingWave standalone mode:

curl -L https://risingwave.com/sh | sh

To learn about other installation options, such as using a Docker image, see Quick Start.

Stream, Store, and Query — All in One

RisingWave delivers a full end-to-end streaming data platform — combining real-time processing with built-in storage and open-format persistence.

It supports:

Ingestion: Ingest millions of events per second from streaming and batch sources.
Stream processing: Perform real-time incremental processing to join and analyze live data with historical tables.
Delivery: Deliver fresh, consistent results to data lakes (e.g., Apache Iceberg™) or any destination.

What sets RisingWave apart is its integrated storage engine:

Online serving: Row-based storage optimized for point and range queries with single-digit millisecond latency.
Offline persistence: Built-in Apache Iceberg™ integration for low-cost, durable storage with open access for external query engines.

With RisingWave, real-time data isn’t just processed — it’s stored, queried, and shared across your entire stack.

RisingWave is designed to be easier to use and more cost-efficient:

Seamless integration: Connects via the PostgreSQL wire protocol, working with psql, JDBC, and any Postgres tool.
Expressive SQL: Supports structured, semi-structured, and unstructured data with a familiar SQL dialect.
No manual state tuning: Eliminates complex state management configurations.

RisingWave stores tables, materialized views, and internal states of stream processing jobs in S3 (or equivalent object storage), providing:

High performance: Optimized for complex queries, including joins and time windowing.
Fast recovery: Restores from system failures within seconds.
Dynamic scaling: Instantly adjusts resources to handle workload spikes.

Beyond caching hot data in memory, RisingWave supports elastic disk cache, a powerful performance optimization that uses local disks or EBS for efficient data caching. This minimizes access to S3, lowering processing latency and cutting S3 access costs.

Apache Iceberg™ native support

RisingWave natively integrates with Apache Iceberg™, enabling continuous ingestion of streaming data into Iceberg tables. It can also read directly from Iceberg, perform automatic compaction, and maintain table health over time. Since Iceberg is an open table format, results are accessible by other query engines — making storage not only cost-efficient, but interoperable by design.

In what use cases does RisingWave excel?

RisingWave is particularly effective for the following use cases:

Streaming analytics: Achieve sub-second data freshness in live dashboards, ideal for high-stakes scenarios like stock trading, sports betting, and IoT monitoring.
Event-driven applications: Develop sophisticated monitoring and alerting systems for critical applications such as fraud and anomaly detection.
Real-time data enrichment: Continuously ingest data from diverse sources, conduct real-time data enrichment, and efficiently deliver the results to downstream systems.
Feature engineering: Transform batch and streaming data into features in your machine learning models using a unified codebase, ensuring seamless integration and consistency.

RisingWave Cloud offers the easiest way to run RisingWave in production.

For Docker deployment, please refer to Docker Compose.

For Kubernetes deployment, please refer to Kubernetes with Helm or Kubernetes with Operator.

Looking for help, discussions, collaboration opportunities, or a casual afternoon chat with our fellow engineers and community members? Join our Slack workspace!

RisingWave uses Scarf to collect anonymized installation analytics. These analytics help support us understand and improve the distribution of our package. The privacy policy of Scarf is available at https://about.scarf.sh/privacy-policy.

RisingWave also collects anonymous usage statistics to better understand how the community is using RisingWave. The sole intention of this exercise is to help improve the product. Users may opt out easily at any time. Please refer to the user documentation for more details.

RisingWave is distributed under the Apache License (Version 2.0). Please refer to LICENSE for more information.

Thanks for your interest in contributing to the project! Please refer to RisingWave Developer Guide for more information.