Story
arxiv_cs_lg ยท Jun 30, 2026 ยท paper
arxiv.orgJun 30, 2026
original source linked
In brief
Long-context Large Language Model inference is severely bottlenecked by the massive Key-Value (KV) cache, yet existing sparse attention methods often suffer from static fixed-budget (Top-k) retrieval or rely on proxy...
Feed lens
evaluation