Show HN: KVBoost – 为 HuggingFace 提供块级 KV 缓存复用，首字延迟（TTFT）提升 5

Show HN: KVBoost – 为 HuggingFace 提供块级 KV 缓存复用，首字延迟（TTFT）提升 5–48 倍
Show HN: KVBoost – chunk-level KV cache reuse for HuggingFace, 5–48x faster TTFT

原始链接: https://pythongiant.github.io/KVBoost/

```python from kvboost import KVBoost # 加载模型 engine = KVBoost.from_pretrained("Qwen/Qwen2.5-3B") # 预热共享前缀（仅需执行一次） engine.warm("You are a helpful assistant...") # 后续所有调用均可复用缓存 result = engine.generate(prompt) # 打印 KV 复用率 print(result.kv_reuse_ratio) # ✓ 80%+ ```

Sorry.

from kvboost import KVBoost

engine = KVBoost.from_pretrained(
"Qwen/Qwen2.5-3B"
)

# Warm a shared prefix once
engine.warm("You are a helpful assistant...")

# All subsequent calls reuse cache
result = engine.generate(prompt)

print(result.kv_reuse_ratio) # ✓ 80%+

Show HN: KVBoost – 为 HuggingFace 提供块级 KV 缓存复用，首字延迟（TTFT）提升 5–48 倍 Show HN: KVBoost – chunk-level KV cache reuse for HuggingFace, 5–48x faster TTFT

Show HN: KVBoost – 为 HuggingFace 提供块级 KV 缓存复用，首字延迟（TTFT）提升 5–48 倍
Show HN: KVBoost – chunk-level KV cache reuse for HuggingFace, 5–48x faster TTFT