Quantized Keys Steal Attention: Bias Correction for KV-Cache Compression in Video Diffusion

AI & ML·May 27, 2026·2 min read·via ArXivOriginal source →

Quantized Keys Steal Attention: Bias Correction for KV-Cache Compression in Video Diffusion

arXiv:2605.26266v1 Announce Type: new Abstract: Chunk-wise autoregressive video diffusion models rely on a KV cache of previously generated chunks to avoid redundant computation, but this cache quickly becomes a memory bottleneck as videos grow longer. Methods that quantize the KV cache to low bitwidths reduce memory pressure but degrade video quality. We show that a key driver of this degradation is a systematic bias in attention weights: due to the convexity of the exponential in softmax atte

Quantized Keys Steal Attention: Bias Correction for KV-Cache Compression in Video Diffusion

Quantized Keys Steal Attention: Bias Correction for KV-Cache Compression in Video Diffusion

More Stories

To see to it that the forces of Napoleon are driven out of Spain (1809)

SQLite is all you need for durable workflows

Bill C-22 Is a Mess of the Government's Own Making

CVE-2026-48710: A Maintainer's Perspective