For the past 3 years, AI evals have been my professional focus. 1 The most common objection I hear to evals is “our product is hard to eval”. This objection is a product smell. Artifacts that are hard for you to verif... Context & related coverage →
OpenWiki generates and maintains codebase documentation so coding agents can find the repo context they need without loading everything into one instruction file. Context & related coverage →
MiniMax-M3 : Added support for the new MiniMax-M3 model , with a fast follow-on of BF16/FP8 indexer via MSA , MXFP4 support , FP8 sparse GQA , and extensive... · DeepSeek-V4 keeps maturing : Following its debut, DeepS... Context & related coverage →
Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding This is an interesting new open weights (MIT licensed) model, the first model release from DeepReinforce. [...] with variants including 9B Dense, 31B Dense, 35B MoE... Context & related coverage →
Release: llm-coding-agent 0.1a0 Another Fable 5 experiment. Now that my LLM library has evolved into more of an agent framework it's time to see what a simple coding agent would look like built on it. I started a new... Context & related coverage →
Elastic open-sourced Atlas, a system built on Elasticsearch that maintains three categories of memory for agents. Atlas integrates with agents via MCP and maintains per-user isolation of memories. When evaluated on qu... Context & related coverage →
Evaluating long-running, stateful agents needs a new kind of runner. Here's how Deep Agents, LangSmith sandboxes, and observability plug into Harbor. Context & related coverage →
How vLLM Semantic Router turns vllm-sr/auto into a bounded micro-agent runtime for Confidence, Ratings, ReMoM, Fusion, Workflows, and benchmark-shaped collaboration. Context & related coverage →
Memory for a long-horizon LLM agent is a contract about what each future decision is allowed to see. The simplest contract appends past observations, tool calls, and reflections to every prompt, which makes prior cont... Context & related coverage →
AlloyDB is an AI-native database—it isn’t just a passive data store, it intelligently understands and processes your data. With AlloyDB, you get industry-leading vector and hybrid search, near 100% accurate natural la... Context & related coverage →