ICLR 2026 Workshop on Reasoning and Planning for LLMsAccepted2026

Drift Dominates Contradiction in Multi-Turn Constraint Reasoning

Shows that after solver-guided repair, the dominant failure mode is not contradiction but answers that violate a still-satisfiable maintained state.

816 problems · 4 open-weight models · 68–95% of residual errors are satisfiable drift

Per-turn accuracy curves for Qwen3-8B, Qwen3-32B, gpt-oss-20b, gpt-oss-120b, and MUS-Repair across models.

Abstract

Shows that after solver-guided repair, the dominant failure mode is not contradiction but satisfiable drift, where the maintained state remains consistent while the returned answer violates prior commitments.