ARBITER: Reasoning Trajectory Basins and Majority Vote Failures in Test-Time Sampling

AI & ML·May 27, 2026·2 min read·via ArXivOriginal source →

ARBITER: Reasoning Trajectory Basins and Majority Vote Failures in Test-Time Sampling

arXiv:2605.26172v1 Announce Type: new Abstract: When language models use test-time sampling, they generate multiple reasoning trajectories and select an answer by majority vote. We show that these trajectories are not independent: for a given question, they concentrate into a small number of clusters, or reasoning basins, each defined by a normalized final answer and the solutions that reach it. A majority vote therefore selects the most stable basin rather than the most accurate one, which cre

ARBITER: Reasoning Trajectory Basins and Majority Vote Failures in Test-Time Sampling

ARBITER: Reasoning Trajectory Basins and Majority Vote Failures in Test-Time Sampling

More Stories

To see to it that the forces of Napoleon are driven out of Spain (1809)

SQLite is all you need for durable workflows

Bill C-22 Is a Mess of the Government's Own Making

CVE-2026-48710: A Maintainer's Perspective