kdb+的未来?
The future of kdb+?

原始链接: https://www.timestored.com/b/the-future-of-kdb/

Ryan Hamilton 的帖子讨论了 kdb+ 编程语言的当前状态,特别是其在历史市场数据存储和分析、本地量化分析、实时流计算引擎和分布式计算方面的用途。 他指出,虽然 kdb+ 对这些任务仍然有效,但 ClickHouse、Query DB 等新技术以及 Google BigQuery 和 AWS Redshift 等云供应商为大数据查询提供了更快的处理速度和更大的灵活性。 对于本地定量分析,DuckDB、Polars 和 PyKX 等 Python 库由于与 kdb+ 相比易于使用、速度快且成本低,因此变得越来越受欢迎。 对于实时流和分布式计算,Kafka、Flink、Risingwave 等选项为 kdb+ 提供了强有力的竞争。 Hamilton 的结论是,虽然 kdb+ 仍然是一项强大的技术,但它面临着来自较新解决方案的激烈竞争,这些解决方案通过开源计划和标准化工作(例如 Apache Iceberg 和 Parquet)学习并改进了 kdb+ 的优势。 为了保持竞争力,Hamilton 建议 kdb+ 开发人员专注于提供具有简单许可结构的免费版本,提高核心产品的性能,缩短新产品的学习曲线,并解决组织内的高营销费用和更广泛的战略决策。

TimeScale 是一个 PostgreSQL 扩展,可通过压缩和良好的支持快速访问大量刻度数据。 与 KDB 不同,TimeScale 不需要大量投资许可费用或学习复杂的语言。 它还支持用于复制、身份验证和其他操作的标准 SQL 命令。 虽然 KDB 提供了强大的定量分析功能,但其高成本和具有挑战性的语法可能使其不适合某些用户。 在简单应用程序以外的代码可读性和可扩展性方面,KDB 与 Python、DuckDB 和 Polars 等替代方案相比存在不足,特别是考虑到顾问要求带来的相关运营成本。 Clickhouse 和 QuestDB 等较新的开源数据库可能会提供可行的替代方案,用于处理刻度数据以及本地定量分析,并提高可读性和可扩展性。 此外,以 Parquet 等开放格式存储数据以及 Arrow 等高性能框架可能代表了行业的未来方向。 总体而言,虽然 KDB 在一些量化交易者中仍然很受欢迎,但其与成本、复杂性以及与现代基础设施的有限兼容性相关的局限性可能最终会导致采用更新、更容易获得的替代方案。
相关文章

原文

July 24th, 2024 by admin

(2024-08-03: This post got 10K+ views on the front page of Hacker News to see the followup discussion go here.)

It’s been 2 years since I worked full time in kdb+ but people seem to always want to talk to me about kdb+ and where I think it’s going, so to save rehashing the same debates I’m going to put it here and refer to it in future. Please leave a comment if you want and I will reply.

Let’s first look at the use cases for kdb+, consider the alternatives, then which I think will win for each use-case and why.

Use Cases

A. Historical market data storage and analysis. – e.g. MS Horizon, Citi CloudKDB, UBS Krypton (3 I worked on).
B. Local quant analysis – e.g. Liquidity analysis, PnL analysis, profitability per client.
C. Real-time Streaming Calcuation Engines – e.g. Streaming VWAP, Streaming TCA…
D. Distributed Computing – e.g. Margin calculations for stock portfolios or risk analysis. Spread data out, perform costly calcs, recombine.

Alternatives

Historical Market Data – kdb+ Alternatives

A large number of users want to query big data to get minute bars, perform asof joins or more advanced time-series analysis.

  • New Database Technologies – Clickhouse, QuestDB.
  • Cloud Vendors – Bigquery / redshift
  • Market Data as a Service

Let me tell you three secrets, 1. Most users don’t need the “speed” of kdb+. 2. Most internal bank platforms don’t fully unleash the speed of kdb+. 3. The competitors are now fast enough. I mean clickbench are totally transparent on benchmarking..

Likely Outcome: – Kdb+ can hold their existing clients but haven’t and won’t get the 2nd tier firms as they either want cloud native or something else. The previous major customers for this had to invest heavily to build their own platform. As far as I’m hearing the kdb cloud platform still needs work.

Local Quant Analysis – Alternatives

  • Python – with DuckDB
  • Python – with Polars
  • Python – with PyKX
  • Python – with dataframe/modin/….

Now I’m exaggerating slightly but the local quant analysis game is over and everyone has realised Python has won. The only question is who will provide the speedy add-on. In one corner we have widely popular free community tools that know how to generate interest at huge scale, are fast and well funded. In the other we have a niche company that never spread outside finance, wants to charge $300K to get started and has an exotic syntax.

Likely Outcome: DuckDB or Polars. Why? It’s free. People at Uni will start with it and not change. Any sensible quant currently in a firm will want to use a free tool so that they are guaranteed to be able to use similar analytics at their next firm. WIthout that ability they can only go places that have kdb+ else face losing a large percentage of their skillset.

Real-time Streaming / Distributed Computing

These were always the less popular cases for kdb+ and never the ones that “won” the contract. The ironic thing is, combining streaming with historical data in one model is kdbs largest strength. However the few times I’ve seen it done, it’s either taken someone very experienced and skillful or it has become a mess. These messes have been so bad it’s put other parts of the firm off adopting kdb+ for other use cases.

Likely Outcome: Unsure which will win but not kdb+. Kafka has won mindshare and is deployed at scale but flink/risingwave etc. are upcoming stars.

Summary

Kdb+ is an absolutely amazing technology but it’s about the same amazing today as it was 15 years ago when I started. In that time the world has moved on. The best open source companies have stolen the best kdb+ ideas:

  • Parquet/Iceberg is basically kdb+ on disk format for optimized column storage.
  • Apache Arrow – in-memory format is kdb+ in memory column format.
  • Even Kafka log/replay/ksql concept could be viewed as similar to a tplog viewed from a certain angle.
  • QuestDB / DuckDB / Clickhouse all have asof joins

Not only have the competitors learnt and taken the best parts of kdb+ but they have standardised on them. e.g. Snowflake, Dremio, Confluent, Databricks are all going to support Apache Iceberg/parquet. QuestDB / DuckDB / Python are all going to natively support parquet. This means in comparisons it’s no longer KX against one competitor, it’s KX against many competitors at once. If your data is parquet, you can run any of them against your data.

As many at KX would agree I’ve talked to them for years on issues around this and to be fair they have changed but they are not changing quick enough.
They need to do four things:

  1. Get a free version out there that can be used for many things and have an easy reasonable license for customers with less money to use.
  2. Focus on making the core product great. – For years we had Delta this and now it’s kdb.ai. In the meantime mongodb/influxdb won huge contracts with a good database alone.
  3. Reduce the steep learning curve. Make kdb+ easier to learn by even changing the language and technology if need be.
  4. You must become more popular else it’s a slow death

This is focussing on the core tech product.
Looking more widely at their financials and other huge costs/initiatives such as AI and massive marketing spending, wider changes at the firm should also be considered.

Author: Ryan Hamilton

联系我们 contact @ memedata.com