Agent Engineering Wiki
Agent engineering · knowledge map
Agents plan multi-step work badly — they loop, stall, or skip steps
🧱 Obstacle·planning·active·4 sources·updated 2026-06-30
Give an agent a goal that takes ten steps and it will often take the wrong ones: charge ahead on an ambiguous request instead of asking, decompose the task into a plan that drifts, get stuck in a retry loop, or skip a step it needed. Planning — turning a goal into the right ordered sequence of actions, and knowing when to stop or ask — is a distinct failure mode from tool use or memory, and it's where long-horizon agents most visibly fall down.
The dominant control structure is still the ReAct loop (reason → act → observe, repeat), and the production lesson is that the loop alone isn't enough — Stripe's financial-compliance agent pairs a ReAct framework with dedicated infrastructure and guardrails to keep multi-step runs on track at production scale, evidence that planning reliability is an architecture problem, not a prompt. Two refinements are emerging on top. First, knowing when to ask vs. proceed: DiscoBench measures clarification-aware deep search, scoring whether an agent recognizes an under-specified goal and asks rather than confidently planning down the wrong path — treating "ask a question" as a first-class planning action. Second, learning to plan from experience rather than re-deriving a plan cold each run: GUI agents that autonomously explore and reuse *hindsight* experience plan repetitive interface tasks better than zero-shot decomposition, and DAIN's dynamic agent-interaction network adapts the collaboration/reasoning structure to the task instead of running a fixed plan. The through-line is that robust planning comes from *structure around the loop* — explicit decomposition, clarification gates, learned priors, and a harness that can re-plan — not from a single cleverer prompt.
Bad planning is what turns a capable model into an unreliable agent: it's the source of runaway loops (a cost problem), of confidently wrong work on ambiguous tickets, and of the long-horizon failures that erode trust. The engineering job is to wrap the model's reasoning in a controllable harness — bounded loops, explicit decomposition, clarification checkpoints, and re-planning on failure — and to prove it works with trajectory-level eval rather than hoping a bigger model plans better on its own. Planning sits upstream of orchestration: once you can decompose reliably, the question becomes who executes each step.
- Production-grade AI agents for financial compliance: Lessons from Stripe
- When Search Agents Should Ask: DiscoBench for Clarification-Aware Deep Search
- Empowering GUI Agents via Autonomous Experience Exploration and Hindsight Experience Utilization for Task Planning
- DAIN: Dynamic Agent-Based Interaction Network for Efficient and Collaborative Multimodal Reasoning