LLM Digest
Subscribe

Story

arxiv_cs_lg · Jul 2, 2026 · paper

Source brief

DemoPSD: Disagreement-Modulated Policy Self-Distillation

arxiv.orgJul 2, 2026
original source linked

In brief

On-policy self-distillation (OPSD) has emerged as a practical method for training large language models (LLMs) to reason, where a single model acts as both the teacher and the student with different levels of informat...

Feed lens
eval

Continue reading

Read the original at arxiv.org →Open in live feedRead that day’s brief

Earlier in this thread 4 items