๐Ÿ“ฐ Story

hackernews_ai ยท Jun 10, 2026 ยท news

โ† Live feed ๐Ÿ“ฐ Daily recap ๐Ÿ—“๏ธ Weekly recap ๐Ÿ”” RSS

Show HN: Llmbuffer โ€“ Python library for cache-optimized LLM conversation history

Why it matters

Matches feed focus: agent.

Releasing llmbuffer - a Python library that maximizes LLM prompt cache hits. It handles dynamic context, compaction, and tool output truncation or summarization via flexible hooks. Expect 10x cost savings in typical usage. It supports: - Stateful (PromptManager) and stateless/functional (pure dict to dict) interfaces - Three transition modes for moving messages from short term to long term message histories: none, manual, and agent_cycle. The agent_cycle mode, for instance, let's you truncate, summarize, or strip tool outputs when moving messages to long term history after an agent loop finishes - Pluggable compaction hooks when history gets long - Provider adapters for Anthropic (cache_control markers), OpenAI (automatic prefix caching), Gemini, and local models via HF tokenizers - A benchmark suite with --compare mode showing llmbuffer vs naive side-by-side, reading real cache hit counts from each provider's usage metadata. In my usage, I see >90% of tokens are cached despite frequent dynamic context changing. MIT license. GitHub: https://github.com/scottpurdy/llmbuffer PyPI: https://pypi.org/project/llmbuffer/ Comments URL: https://news.ycombinator.com/item?id=48481750 Points: 1 # Comments: 1

agent
Read the original at github.com โ†’Open in live feedDaily recap for 2026-06-10

Related stories 4 items