The Bridge-Garden Dilemma in LLM Distillation: Why Mixing Hard and Soft Labels Works

AI & ML·May 27, 2026·2 min read·via ArXivOriginal source →

The Bridge-Garden Dilemma in LLM Distillation: Why Mixing Hard and Soft Labels Works

arXiv:2605.26246v1 Announce Type: new Abstract: Knowledge distillation (KD) transfers knowledge from a large teacher model to a smaller student. In language modeling, the student is trained either on tokens sampled from the teacher (hard labels) or the teacher's full next-token distribution (soft labels). Despite soft labels appear strictly richer, we find that mixing hard and soft labels consistently yields better results. Crucially, we show that this gain cannot be explained by closer teacher

The Bridge-Garden Dilemma in LLM Distillation: Why Mixing Hard and Soft Labels Works

The Bridge-Garden Dilemma in LLM Distillation: Why Mixing Hard and Soft Labels Works

More Stories

To see to it that the forces of Napoleon are driven out of Spain (1809)

SQLite is all you need for durable workflows

Bill C-22 Is a Mess of the Government's Own Making

CVE-2026-48710: A Maintainer's Perspective