Shows that LLM self-correction fails primarily due to 'session context' and can be significantly improved by moving the review to a fresh, independent session.
arXiv · March 13, 2026 · 2603.12123
Why it matters
A zero-cost, infrastructure-free way to improve LLM reliability. It proves that the bottleneck in self-correction isn't model capability, but context contamination, changing how developers should design 'human-in-the-loop' or 'agent-review' pipelines.
From the abstract
Large language models struggle to catch errors in their own outputs when the review happens in the same session that produced them. This paper introduces Cross-Context Review (CCR), a straightforward method where the review is conducted in a fresh session with no access to the production conversation history. We ran a controlled experiment: 30 artifacts (code, technical documents, presentation scripts) with 150 injected errors, tested under four review conditions -- same-session Self-Review (SR)