ARBITER: Reasoning Trajectory Basins and Majority Vote Failures in Test-Time Sampling

AI & ML··2 min read·via ArXivOriginal source →

ARBITER: Reasoning Trajectory Basins and Majority Vote Failures in Test-Time Sampling

arXiv:2605.26172v1 Announce Type: new Abstract: When language models use test-time sampling, they generate multiple reasoning trajectories and select an answer by majority vote. We show that these trajectories are not independent: for a given question, they concentrate into a small number of clusters, or reasoning basins, each defined by a normalized final answer and the solutions that reach it. A majority vote therefore selects the most stable basin rather than the most accurate one, which cre

More Stories