๐Ÿง  Agent Engineering Wiki

๐Ÿ› ๏ธ Solution ยท 3 sources

โ† Knowledge map ๐Ÿ“ฐ Live feed ๐Ÿ“ˆ Storylines ๐Ÿ”” RSS

Context compaction: summarize, compress, and curate the working set

TL;DR

Keep memory *inside* the context window but small: summarize old turns, compress history, and deliberately curate what stays in-context each step ("context engineering"). The agent forgets less because the working set is chosen, not just truncated.

State of the art

"Context engineering and memory management" has emerged as a discipline of its own โ€” treating the prompt as a managed working set rather than an append-only log. Techniques range from rolling summarization to LLM-guided compression of long-term memory (MemRefine) and memory systems that explicitly model association, forgetting, and synthesis rather than storing everything. Compaction is increasingly paired with an external store: compress the working set, offload the rest to a vector/graph KB, and rehydrate on demand.

What's new

Compression is getting smarter than naive summarization โ€” LLM-guided methods (MemRefine) and forgetting/synthesis-aware memory systems aim to preserve signal density rather than just shrink token count.

Trade-offs

Cheap on infra (no external store) and keeps everything the model needs in one place, but summarization is lossy and irreversible โ€” a detail dropped early can't be recovered later, and aggressive compaction can quietly degrade task fidelity. Best for single-session, long-horizon tasks where recency dominates and the full history isn't needed verbatim.

Why it matters for platform engineers

Often the highest-leverage first move: it directly attacks token cost and latency (the bill scales with context size) without standing up new infrastructure. The risk is silent quality loss, so it needs evaluation โ€” which makes it a tuning knob, not a set-and-forget fix.