Story
arxiv_llm_reliability ยท Jun 25, 2026 ยท paper
arxiv.orgJun 25, 2026
original source linked
In brief
Root cause analysis (RCA) poses a holistic test of LLM agentic capabilities, such as long-context understanding, multi-step reasoning, and tool use. However, existing datasets suffer from a fundamental gap: they label...
Feed lens
agenticevaluation