Mechanistic analysis reveals that LLMs fail at character counting not because they lack the information, but because 'negative circuits' in the final layers actively suppress the correct answer.
April 2, 2026
Original Paper
From Early Encoding to Late Suppression: Interpreting LLMs on Character Counting Tasks
arXiv · 2604.00778
The Takeaway
It challenges the idea that symbolic failures are due to a lack of data or scale. Practitioners can use this insight to improve reasoning by targeting specific late-layer MLP components rather than simply scaling or instruction-tuning.
From the abstract
Large language models (LLMs) exhibit failures on elementary symbolic tasks such as character counting in a word, despite excelling on complex benchmarks. Although this limitation has been noted, the internal reasons remain unclear. We use character counting (e.g., "How many p's are in apple?") as a minimal, controlled probe that isolates token-level reasoning from higher-level confounds. Using this setting, we uncover a consistent phenomenon across modern architectures, including LLaMA, Qwen, an