EvoSpec: Evolving Speculative Decoding via Real-Time Vocabulary and Parameter Adaptation

AI & ML··2 min read·via ArXivOriginal source →

EvoSpec: Evolving Speculative Decoding via Real-Time Vocabulary and Parameter Adaptation

arXiv:2605.27390v2 Announce Type: new Abstract: Speculative decoding accelerates Large Language Model inference via a draft-then-verify paradigm, yet the output projection layer becomes a bottleneck as vocabulary sizes scale. While existing static pruning methods effectively reduce this overhead, they suffer from precipitous drops in acceptance rate in specialized domains or topic-switching scenarios due to their inability to capture dynamic distribution shifts. To address this, we introduce Ev

More Stories