Asking Is Not Enough: Protocol Sensitivity in LLM Confidence Calibration

arXiv:2605.27752v1 Announce Type: new Abstract: LLM confidence calibration is often evaluated by comparing two signals: token-probability scores and verbalized confidence. These signals are sometimes treated as direct readouts of model uncertainty, but their comparison depends on measurement choices that are rarely made explicit. In the main analysis, we hold the verbalized-confidence elicitation fixed: a single prompt template, probability scale, and output format. We then vary the measurement

Asking Is Not Enough: Protocol Sensitivity in LLM Confidence Calibration

Asking Is Not Enough: Protocol Sensitivity in LLM Confidence Calibration

More Stories

To see to it that the forces of Napoleon are driven out of Spain (1809)

SQLite is all you need for durable workflows

Bill C-22 Is a Mess of the Government's Own Making

CVE-2026-48710: A Maintainer's Perspective