Story

arxiv_cs_ai ยท Jul 1, 2026 ยท paper

Source brief

Are Performance-Optimization Benchmarks Reliably Measuring Coding Agents?

arxiv.orgJul 1, 2026
original source linked

In brief

Repository-level performance-optimization benchmarks such as GSO, SWE-Perf and SWE-fficiency evaluate coding agents by applying patches to real repositories and comparing runtime against unoptimized baselines and offi...

Feed lens
agenteval

Continue reading

Read the original at arxiv.org โ†’Open in live feedRead that dayโ€™s brief

Earlier in this thread 3 items