AI safety filters are vulnerable to 'death by a thousand cuts'—gradually building up harmful intent over many innocent-looking messages.
April 15, 2026
Original Paper
The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems
arXiv · 2604.11309
The Takeaway
The 'Salami Slicing' threat shows that jailbreaking doesn't require a clever single prompt. Instead, you can chain numerous low-risk inputs that individually pass every safety filter but cumulatively build a high-risk state. This bypasses almost all current safety monitoring, which usually looks at prompts in isolation. It reveals a fundamental flaw in 'stateless' safety filters. For AI safety engineers, this means we must move toward 'context-aware' or 'stateful' monitoring that can track the accumulation of intent across an entire conversation. Your filters are being 'sliced' into submission.
From the abstract
Large Language Models (LLMs) face prominent security risks from jailbreaking, a practice that manipulates models to bypass built-in security constraints and generate unethical or unsafe content. Among various jailbreak techniques, multi-turn jailbreak attacks are more covert and persistent than single-turn counterparts, exposing critical vulnerabilities of LLMs.However, existing multi-turn jailbreak methods suffer from two fundamental limitations that affect the actual impact in real-world scena