Sarang Kulkarni on Lessons from Building Deep Research Agents in Production
Opens the thread with production lessons and a working definition of deep research agentic systems.
3 items · 3 sources · 3 days
TL;DR
Deep research agents — systems that run multi-step research using dynamic reasoning and multi-hop retrieval — moved from field notes toward repeatable practice over three weeks. InfoQ opened with Sarang Kulkarni's production lessons on building them; an arXiv paper then flagged an evaluation gap, arguing existing benchmarks only score single-shot output and proposing multi-turn evaluation under process-level feedback; AWS closed with an end-to-end build using Deep Agents and Bedrock AgentCore.
What's new: The latest beat is AWS's hands-on walkthrough building a competitive research agent end to end on Deep Agents plus Bedrock AgentCore — the thread's first concrete, platform-specific build pattern, after the earlier production-lessons and evaluation-method posts.
💡 Deep research agents are maturing into something an engineer can actually ship: there's now a managed-cloud build pattern to copy and a multi-turn evaluation method to judge whether the agent improves under feedback rather than just its one-shot answer.
Opens the thread with production lessons and a working definition of deep research agentic systems.
Names the evaluation gap — existing benchmarks test only single-shot output — and proposes multi-turn evaluation under process-level feedback.
First concrete end-to-end build pattern, on AWS Deep Agents and Bedrock AgentCore.