Show HN: KVBoost – 为 HuggingFace 提供块级 KV 缓存复用,首字延迟(TTFT)提升 5–48 倍
Show HN: KVBoost – chunk-level KV cache reuse for HuggingFace, 5–48x faster TTFT

原始链接: https://pythongiant.github.io/KVBoost/

```python from kvboost import KVBoost # 加载模型 engine = KVBoost.from_pretrained("Qwen/Qwen2.5-3B") # 预热共享前缀(仅需执行一次) engine.warm("You are a helpful assistant...") # 后续所有调用均可复用缓存 result = engine.generate(prompt) # 打印 KV 复用率 print(result.kv_reuse_ratio) # ✓ 80%+ ```

Sorry.
相关文章

原文
from kvboost import KVBoost

engine = KVBoost.from_pretrained(
  "Qwen/Qwen2.5-3B"
)

# Warm a shared prefix once
engine.warm("You are a helpful assistant...")

# All subsequent calls reuse cache
result = engine.generate(prompt)

print(result.kv_reuse_ratio)  # ✓ 80%+

联系我们 contact @ memedata.com