Stateful Inference for Low-Latency Multi-Agent Tool Calling

arXiv:2605.26289v1 Announce Type: new Abstract: Multi-agent tool calling is becoming the dominant interaction pattern for LLM-based systems, yet existing inference frameworks treat each tool call as an independent request, re-processing the entire conversation from scratch even though 85-95% of the prompt is unchanged from the previous turn. We present a stateful inference architecture that converts the $O(n_t)$ per-turn cost of conventional serving into an $O(\Delta_t)$ delta-only cost: a pe

Stateful Inference for Low-Latency Multi-Agent Tool Calling

Stateful Inference for Low-Latency Multi-Agent Tool Calling

More Stories

To see to it that the forces of Napoleon are driven out of Spain (1809)

SQLite is all you need for durable workflows

Bill C-22 Is a Mess of the Government's Own Making

CVE-2026-48710: A Maintainer's Perspective