๐Ÿ“ฐ Story

arxiv_cs_lg ยท Jun 18, 2026 ยท paper

โ† Live feed ๐Ÿ“ˆ Storylines ๐Ÿ“ฐ Daily recap ๐Ÿ—“๏ธ Weekly recap ๐Ÿ”” RSS

Execution-State Capsules: Graph-Bound Execution-State Checkpoint and Restore for Low-Latency, Small-Batch, On-Device Physical-AI Serving

In brief

Mainstream LLM serving systems reuse prefix work mainly through paged or radix key-value (KV) caches. This is highly effective for high-throughput, high-concurrency serving, but it manages only one positional fragment...

agenteval
Read the original at arxiv.org โ†’Open in live feedDaily recap for 2026-06-18

Related stories 4 items