Story
arxiv_cs_lg ยท Jun 24, 2026 ยท paper
Source brief
Quantization Inflates Reasoning: Token Inflation as a Hidden Cost of Low-Bit Reasoning Models
arxiv.orgJun 24, 2026
original source linked
In brief
Quantization is widely used to reduce the inference cost of large language models, but its effect on reasoning models is not fully captured by final-answer accuracy or per-token latency. We show that low-bit post-trai...
Feed lens
agenticevaluation