AI & ML Breaks Assumption

A causal analysis reveals that LLMs often ignore their own intermediate reasoning (Chain-of-Thought) when making final decisions.

arXiv · March 18, 2026 · 2603.16475

Oleg Somov, Mikhail Chaichuk, Mikhail Seleznyov, Alexander Panchenko, Elena Tutubalina

The Takeaway

This challenges the assumption that intermediate structures like rubrics or checklists causally drive LLM outputs. It shows that 'faithfulness' is often an illusion and warns practitioners that intervening on CoT does not guarantee a change in final results unless external tools are used.

From the abstract

Schema-guided reasoning pipelines ask LLMs to produce explicit intermediate structures -- rubrics, checklists, verification queries -- before committing to a final decision. But do these structures causally determine the output, or merely accompany it? We introduce a causal evaluation protocol that makes this directly measurable: by selecting tasks where a deterministic function maps intermediate structures to decisions, every controlled edit implies a unique correct output. Across eight models