Xiaomi's MiMo Code beats Claude Code at ultra-long, 200+ step tasks
Xiaomi releases an open-source agentic coding harness it claims outperforms Claude Code on very long, multi-step tasks — a notable entrant from a non-Western lab.
14 articles · 4 categories
Thursday, Jun 11, 2026
In 30 seconds
Coding agents were the story of the day. Xiaomi open-sourced MiMo Code, an agentic harness it says outlasts Claude Code on ultra-long, 200-plus-step tasks, while OpenAI moved to acquire Ona to give Codex secure, persistent cloud environments for long-running agents. Underneath the headlines, AWS published hard numbers — frontier teams reporting 4.5x and occasionally 10x productivity gains — and shipped Agent-EvalKit, an Apache-2.0 toolkit for measuring whether those agents actually work.
Anthropic had a busy news cycle of its own: a new enterprise alliance with DXC, the launch of Claude Corps, and a notable retreat on a safeguard policy that researchers had warned could 'sabotage' frontier work done with Claude. The company says it will make Fable 5's frontier-development safeguards visible rather than silent.
Around the edges, OpenAI leaned into policy and science — backing the EU's content-transparency Code of Practice and showcasing an astrophysicist using Codex to simulate black holes — and Sarah Guo's widely shared essay reframed the open-models debate as a fight between model labs and agent labs over what's ultimately 'untrainable.'
The day's center of gravity: new agentic coding harnesses, infrastructure to run agents at length, and tools to evaluate and govern them.
Xiaomi releases an open-source agentic coding harness it claims outperforms Claude Code on very long, multi-step tasks — a notable entrant from a non-Western lab.
OpenAI plans to fold Ona into Codex to add secure, persistent cloud environments — the substrate long-running enterprise agents need.
AWS profiles teams redesigning how software gets built around AI, citing 4.5x productivity gains and, in some cases, more than 10x.
An Apache-2.0 toolkit that brings evaluation infrastructure to AI coding assistants including Claude Code, Kiro CLI, and Kilo Code.
An open-source project proposing capability-scoped API access as a safer way to hand agents real-world permissions.
A three-front news day for Anthropic — a new enterprise partnership, a talent program, and a reversal on a contested safeguard.
Anthropic partners with DXC to push Claude deeper into enterprise integration and services work.
Anthropic launches Claude Corps, a new program framing how it deploys talent and Claude into the field.
After a Wired scoop, Anthropic says it will make Fable 5's frontier-development safeguards visible instead of silent, easing fears the rules hampered legitimate research.
Beyond the Ona deal, OpenAI pressed on policy and showcased Codex in frontier science.
OpenAI backs the EU Code of Practice on AI content transparency, advancing provenance standards for AI-generated content.
Chi-kwan Chan uses Codex to build black hole simulations, testing extreme physics and Einstein's general relativity — a concrete scientific-computing case study.
The day's reading and tinkering: a sharp essay on the lab landscape, a steady library release, and a hands-on agent skill.
Latent Space spotlights Sarah Guo's essay reframing the open-models debate as a contest between model labs and agent labs over the limits of what can be trained.
Simon Willison's alpha extends the ?_extra= pattern to queries and rows, another step toward a stable Datasette 1.0.
A former consultant shares an agent skill that generates polished, consultant-grade HTML slide decks — sidestepping Office tooling.