Stateful Inference for Low-Latency Multi-Agent Tool Calling

AI & ML··2 min read·via ArXivOriginal source →

Stateful Inference for Low-Latency Multi-Agent Tool Calling

arXiv:2605.26289v1 Announce Type: new Abstract: Multi-agent tool calling is becoming the dominant interaction pattern for LLM-based systems, yet existing inference frameworks treat each tool call as an independent request, re-processing the entire conversation from scratch even though 85-95% of the prompt is unchanged from the previous turn. We present a stateful inference architecture that converts the $O(n_t)$ per-turn cost of conventional serving into an $O(\Delta_t)$ delta-only cost: a pe

More Stories