Story
arxiv_cs_lg · Jul 2, 2026 · paper
arxiv.orgJul 2, 2026
original source linked
In brief
On-policy self-distillation (OPSD) has emerged as a practical method for training large language models (LLMs) to reason, where a single model acts as both the teacher and the student with different levels of informat...
Feed lens
eval