{"date":"2026-06-22","title":"What happened in AI — Jun 22, 2026","generated_at":"2026-06-23T00:03:11Z","intro":["Security led the day: OpenAI unveiled Daybreak — Codex Security and a GPT-5.5-Cyber model built to find, validate, and patch vulnerabilities at scale — alongside Patch the Planet to back open-source maintainers. A Latent Space conversation with Gray Swan reinforced the framing that AI security is its own discipline, not cybersecurity with a model bolted on.","Underneath ran two engineering threads: evals (why they miss real failures, lessons from financial agents, and a benchmark for poisoned memory) and agent-memory tooling (PMB over MCP, Headroom for context compression). Infrastructure kept pace, with AWS Graviton5 reaching GA and NVIDIA's Vera CPU bound for Los Alamos."],"highlights":["OpenAI launched Daybreak — Codex Security and GPT-5.5-Cyber — to find, validate, and patch vulnerabilities at scale, plus Patch the Planet to support open-source maintainers.","Evals were the recurring theme: why most evals miss real failures (the Linear sales-email case), three years of lessons building financial-agent evals, and a reproducible benchmark for poisoned agent memory.","Agent-memory tooling shipped: PMB (local-first memory for coding agents over MCP) and Headroom (a context-compression layer for agents).","Infra moved: AWS Graviton5 hit GA with 192 cores and formally verified VM isolation, NVIDIA's Vera CPU heads to Los Alamos supercomputers, and liquid cooling now runs at 45°C.","OpenAI's Codex-maxxing guide and an agent-friendly architecture writeup pushed long-running, autonomous coding workflows."],"article_count":15,"categories":[{"name":"AI Security & Red-Teaming","slug":"ai-security-red-teaming","summary":"OpenAI made security its headline of the day, shipping Daybreak tools to find and patch vulnerabilities at scale and an initiative to back open-source maintainers, while a Gray Swan conversation argued AI security is a discipline of its own.","articles":[{"title":"Daybreak: Tools for securing every organization in the world","summary":"OpenAI introduces Daybreak — including Codex Security and GPT-5.5-Cyber — to help organizations find, validate, and patch vulnerabilities at scale.","source":"openai_blog","url":"https://openai.com/index/daybreak-securing-the-world","published":"Mon, 22 Jun 2026 10:00:00 GMT"},{"title":"Patch the Planet: a Daybreak initiative to support open source maintainers","summary":"A Daybreak program that helps open-source maintainers find, validate, and fix vulnerabilities with AI plus expert review.","source":"openai_blog","url":"https://openai.com/index/patch-the-planet","published":"Mon, 22 Jun 2026 10:00:00 GMT"},{"title":"Red-Teaming after Mythos — Zico Kolter & Matt Fredrikson, Gray Swan","summary":"OpenAI board member Zico Kolter and Gray Swan CEO Matt Fredrikson explain why AI security is not just “cybersecurity with AI.”","source":"latent_space","url":"https://www.latent.space/p/gray-swan","published":"Mon, 22 Jun 2026 21:06:55 GMT"}]},{"name":"Evals & Agent Reliability","slug":"evals-agent-reliability","summary":"A strong day for the unglamorous work of measuring agents: why typical evals miss the failures that matter, hard lessons from years of financial-agent evals, and a reproducible benchmark for memory poisoning.","articles":[{"title":"Why most AI evals would miss the Linear sales email failure","summary":"A case study in why pass/fail eval suites overlook the subtle, context-dependent failures that actually break agents in production.","source":"hackernews_ai","url":"https://tenureai.dev/writing/why-most-ai-evals-would-miss-the-linear-sales-email-failure/","published":"Mon, 22 Jun 2026 21:51:32 +0000"},{"title":"Lessons from Building Evals for Financial AI Agents","summary":"Three years of hard-won lessons on designing evals where correctness, cost, and trust are non-negotiable.","source":"hackernews_ai","url":"https://www.primerapp.com/blog/lessons-from-3-years-of-evals/","published":"Mon, 22 Jun 2026 08:51:04 +0000"},{"title":"Agent-memory systems admit poisoned facts — a reproducible benchmark","summary":"A benchmark showing agent-memory stores will accept and retrieve poisoned facts — a concrete reliability and security gap to test against.","source":"hackernews_ai","url":"https://github.com/arsenis-cmd/clai-benchmarks","published":"Mon, 22 Jun 2026 13:18:24 +0000"}]},{"name":"Agent Memory & Context Engineering","slug":"agent-memory-context-engineering","summary":"Two pieces of practical tooling for the perennial agent problems of remembering and fitting context: local-first memory over MCP and a dedicated context-compression layer.","articles":[{"title":"PMB — local-first memory for AI coding agents over MCP","summary":"A single-file SQLite + LanceDB store with hybrid BM25/vector retrieval, exposed to coding agents over MCP — no server or API keys.","source":"hackernews_ai","url":"https://github.com/oleksiijko/pmb/blob/main/README.md","published":"Mon, 22 Jun 2026 15:03:20 +0000"},{"title":"Headroom — the context compression layer for AI agents","summary":"A drop-in layer that compresses context so agents stay within window limits on long-running tasks.","source":"hackernews_ai","url":"https://github.com/headroomlabs-ai/headroom","published":"Mon, 22 Jun 2026 07:50:07 +0000"}]},{"name":"Coding Agents & Multi-Agent Systems","slug":"coding-agents-multi-agent","summary":"From keeping long-running coding work alive to architectures that read well for both humans and agents — plus Sakana packaging a multi-agent system as a single model.","articles":[{"title":"Codex-maxxing for long-running work","summary":"How Jason Liu uses Codex to preserve context, manage complex projects, and keep work going beyond a single prompt.","source":"openai_blog","url":"https://openai.com/index/codex-maxxing-long-running-work","published":"Mon, 22 Jun 2026 00:00:00 GMT"},{"title":"Agile and Coding: An Agent- and Human-Friendly Architecture","summary":"Structuring codebases so the same architecture is legible to autonomous agents and the humans reviewing them.","source":"hackernews_ai","url":"https://davidvujic.blogspot.com/2026/06/an-agent-and-human-friendly-architecture.html","published":"Mon, 22 Jun 2026 08:39:01 +0000"},{"title":"Sakana Fugu multi-agent system delivered as one model","summary":"Sakana packages a multi-agent system into a single deployable model — a notable take on collapsing orchestration into inference.","source":"hackernews_ai","url":"https://github.com/SakanaAI/fugu","published":"Mon, 22 Jun 2026 23:44:04 +0000"}]},{"name":"AI Infrastructure & Hardware","slug":"ai-infrastructure-hardware","summary":"The platform layer had a busy day: a new general-purpose ARM server chip, NVIDIA silicon and cooling for AI factories and national labs, and a real-world multimodal retrieval architecture on AWS.","articles":[{"title":"AWS Graviton5 Reaches General Availability with 192 Cores and Formally Verified VM Isolation","summary":"EC2 M9g/M9gd instances ship with 192 ARM cores, Nitro-based formally verified VM isolation, and DDR5-8800 memory.","source":"infoq_ai_ml","url":"https://www.infoq.com/news/2026/06/aws-graviton5-ga/?utm_campaign=infoq_content&utm_source=infoq&utm_medium=feed&utm_term=AI%2C+ML+%26+Data+Engineering","published":"Mon, 22 Jun 2026 10:05:00 GMT"},{"title":"NVIDIA Vera CPU Opens the Way for Agentic Scientific AI at Los Alamos National Laboratory","summary":"New Los Alamos supercomputers built with HPE and NVIDIA tap Vera CPUs to unlock agentic AI for scientific discovery.","source":"nvidia_blog","url":"https://blogs.nvidia.com/blog/nvidia-vera-cpu-los-alamos-national-laboratory/","published":"Mon, 22 Jun 2026 13:00:20 +0000"},{"title":"Hotter Than a Hot Tub: The 45°C Breakthrough to Cool AI's Biggest Machines","summary":"NVIDIA's newest AI servers can run cooling liquid up to 45°C, cutting the energy and water cost of cooling AI factories.","source":"nvidia_blog","url":"https://blogs.nvidia.com/blog/liquid-cooling-ai-factories/","published":"Mon, 22 Jun 2026 05:00:22 +0000"},{"title":"Embed the world: Multimodal AI for searchable aerial imagery at scale","summary":"A reference architecture on Amazon Bedrock and OpenSearch Serverless for embedding and searching aerial imagery, with OpenStreetMap ground-truth evaluation.","source":"aws_ml_blog","url":"https://aws.amazon.com/blogs/machine-learning/embed-the-world-multimodal-ai-for-searchable-aerial-imagery-at-scale/","published":"Mon, 22 Jun 2026 16:32:15 +0000"}]}]}