Provides the first formal proof that safety is non-compositional, meaning two individually safe AI agents can become hazardous when combined.
arXiv · March 18, 2026 · 2603.15973
The Takeaway
This is a critical theoretical result for AI safety and agent orchestration. It proves that verifying individual agents is insufficient for system-level safety, demanding a shift toward verifying emergent conjunctive dependencies in agent networks.
From the abstract
This paper contains the first formal proof that safety is non-compositional in the presence of conjunctive capability dependencies: two agents each individually inca- pable of reaching any forbidden capability can, when combined, collectively reach a forbidden goal through an emergent conjunctive dependency.