Story
arxiv_cs_ai ยท Jul 1, 2026 ยท paper
arxiv.orgJul 1, 2026
original source linked
In brief
Repository-level performance-optimization benchmarks such as GSO, SWE-Perf and SWE-fficiency evaluate coding agents by applying patches to real repositories and comparing runtime against unoptimized baselines and offi...
Feed lens
agenteval