[AINews] GLM-5.2: the top Frontend Coding model in the world, IndexShare for Speculative Decoding
Introduces speculative decoding into the news cycle via IndexShare, bundled with the GLM-5.2 open-model release.
3 items · 3 sources · 3 days
Operational story trace
Follow in this browser to see new updates on your Live feed.
Latest change
Modal and Decagon published a joint walkthrough showing how speculative decoding cut their inference latency to state-of-the-art levels, moving the technique from benchmark claims into a reproducible production serving recipe.
In mid-June speculative decoding moved from a research trick toward a default inference-latency lever. It surfaced alongside the GLM-5.2 open-model release as IndexShare, then NVIDIA reported DFlash speculative decoding lifting throughput up to 15x on Blackwell — pulling the technique into both the open-model and hardware-vendor conversations within a week.
Arc
Introduces speculative decoding into the news cycle via IndexShare, bundled with the GLM-5.2 open-model release.
Adds the hardware-vendor angle — NVIDIA's DFlash speculative decoding cited for up to 15x inference throughput on Blackwell.
Closes the loop with a production recipe: Modal and Decagon report state-of-the-art inference latency using speculative decoding.
Introduces speculative decoding into the news cycle via IndexShare, bundled with the GLM-5.2 open-model release.
Adds the hardware-vendor angle — NVIDIA's DFlash speculative decoding cited for up to 15x inference throughput on Blackwell.
Closes the loop with a production recipe: Modal and Decagon report state-of-the-art inference latency using speculative decoding.
What to watch — open questions
Storylines are threaded mechanically from the feed: stories that share a distinctive anchor across multiple days and sources. Each item links to its original source. The evidence trace, current state, and open questions are written by the editor routine and refreshed whenever a new beat lands.