A causal analysis reveals that LLMs often ignore their own intermediate reasoning (Chain-of-Thought) when making final decisions.
arXiv · March 18, 2026 · 2603.16475
The Takeaway
This challenges the assumption that intermediate structures like rubrics or checklists causally drive LLM outputs. It shows that 'faithfulness' is often an illusion and warns practitioners that intervening on CoT does not guarantee a change in final results unless external tools are used.
From the abstract
Schema-guided reasoning pipelines ask LLMs to produce explicit intermediate structures -- rubrics, checklists, verification queries -- before committing to a final decision. But do these structures causally determine the output, or merely accompany it? We introduce a causal evaluation protocol that makes this directly measurable: by selecting tasks where a deterministic function maps intermediate structures to decisions, every controlled edit implies a unique correct output. Across eight models