AI models often 'forget' your API key or password in long chats; this 'sponsorship' mechanism ensures they never do.
Standard attention often prunes low-frequency but high-importance tokens. Transactional attention allows 'sponsoring' these tokens in the KV-cache, achieving 100% retrieval for critical information that would otherwise be discarded.
Transactional Attention: Semantic Sponsorship for KV-Cache Retention
arXiv · 2604.11288
At K=16 tokens (0.4% of a 4K context), every existing KV-cache compression method achieves 0% on credential retrieval. The failure mode is dormant tokens: credentials, API keys, and configuration values that receive near-zero attention but become essential at generation time. Because these tokens lack the statistical signals that eviction policies rely on, no method based on attention scores, reconstruction loss, or learned retention gates retains them. We introduce Transactional Attention (TA),