hackernews_ai ยท Jun 5, 2026 ยท news
Show HN: Lazarus, a coding agent for long-horizon tasks
I have been interested in long-horizon coding tasks for a while, especially with benchmarks like FrontierSWE, where even the best coding agents like Codex and Claude Code struggle to complete tasks. These agents come with a collection of tools like bash, file edits, grep, glob, etc. Lazarus takes a different approach. The idea is to give the model exactly one tool: a persistent Python runtime. Model writes Python code, executes it, and receives stdout/stderr. Through Python it inspects repos, reads and edits files, runs builds, executes tests, invokes linters, even build custom harnesses and automate whatever workflows it needs. The motivation for this was: - Tool selection itself is a planning problem. - Specialized tools are often difficult to compose together efficiently. - Long-horizon tasks frequently require custom workflows that predefined tools don't provide. - Python is expressive enough for the model to build those workflows itself. Another decision is avoid agent hierarchies. Lazarus runs a single tool-calling loop rather than managers, planners, and worker agents. The intuition being current models are much better at writing code than coordinating fleets of agents. Agent orchestration consumes context, introduces extra modes of failure, and adds complexity. How does Lazarus manage context? When the "usable" context window of a model is nearly exhausted, the model gets one final opportunity to execute a Python tool call, containing anything it wants to preserve: no