AI & ML Breaks Assumption

Shows that LLM self-correction fails primarily due to 'session context' and can be significantly improved by moving the review to a fresh, independent session.

arXiv · March 13, 2026 · 2603.12123

Tae-Eun Song

Why it matters

A zero-cost, infrastructure-free way to improve LLM reliability. It proves that the bottleneck in self-correction isn't model capability, but context contamination, changing how developers should design 'human-in-the-loop' or 'agent-review' pipelines.

From the abstract

Large language models struggle to catch errors in their own outputs when the review happens in the same session that produced them. This paper introduces Cross-Context Review (CCR), a straightforward method where the review is conducted in a fresh session with no access to the production conversation history. We ran a controlled experiment: 30 artifacts (code, technical documents, presentation scripts) with 150 injected errors, tested under four review conditions -- same-session Self-Review (SR)

Read the original paper →

← Back to today's papers