LLM Digest

Story

hackernews_ai · Jun 5, 2026 · news

Source brief

Show HN: Lazarus, a coding agent for long-horizon tasks

github.comJun 5, 2026
original source linked

In brief

I have been interested in long-horizon coding tasks for a while, especially with benchmarks like FrontierSWE, where even the best coding agents like Codex and Claude Code struggle to complete tasks. These agents come...

Read the original at github.com →Open in live feed Read that day’s brief

Show HN: Lazarus, a coding agent for long-horizon tasks

Earlier in this thread 4 items

Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

I tested Haiku vs. Sonnet across 3 agent tasks – the cheap model won every time

DeepWeb-Bench: A Deep Research Benchmark Demanding Massive Cross-Source Evidence and Long-Horizon Derivation