AI & ML Breaks Assumption

This study proves that reasoning traces (Chain-of-Thought) causally shape model behavior and generalization, even when the final answer is held constant.

arXiv · March 16, 2026 · 2603.12397

Pengcheng Wen, Yanxu Zhu, Jiapeng Sun, Han Zhu, Yujin Zhou, Chi-Min Chan, Sirui Han, Yike Guo

Why it matters

It refutes the idea that CoT is merely post-hoc rationalization. It shows that training on reasoning alone is sufficient to alter model behavior, implying that supervising only the final answer is insufficient for safety and alignment.

From the abstract

Chain-of-Thought (CoT) is often viewed as a window into LLM decision-making, yet recent work suggests it may function merely as post-hoc rationalization. This raises a critical alignment question: Does the reasoning trace causally shape model generalization independent of the final answer? To isolate reasoning's causal effect, we design a controlled experiment holding final harmful answers constant while varying reasoning paths. We construct datasets with \textit{Evil} reasoning embracing malice

Read the original paper →

← Back to today's papers