Causally Robust Reward Learning from Reason-Augmented Preference Feedback

arXiv:2603.04861v1 Announce Type: new Abstract: Preference-based reward learning is widely used for shaping agent behavior to match a user's preference, yet its sparse binary feedback makes it especially vulnerable to causal confusion. The learned reward often latches onto spurious features that merely co-occur with preferred trajectories during training, collapsing when those correlations disappear or reverse at test time. We introduce ReCouPLe, a lightweight framework that uses natural langua

Causally Robust Reward Learning from Reason-Augmented Preference Feedback

Causally Robust Reward Learning from Reason-Augmented Preference Feedback

More Stories

The Worst Acquisition in History, Again

TSA leaves passenger needing surgery after illegally forcing her through scanner

Show HN: Reconstruct any image using primitive shapes, runs in-browser via WASM

How Cursor is evolving through its Composer coding models built on Chinese open models, as coding agents like Claude Code threaten to make code editors obsolete