NASimJax provides a 100x throughput increase for autonomous penetration testing simulators by reimplementing the environment in JAX.
March 23, 2026
Original Paper
NASimJax: GPU-Accelerated Policy Learning Framework for Penetration Testing
arXiv · 2603.19864
The Takeaway
Reinforcement learning for cybersecurity is bottlenecked by slow, CPU-bound simulators. Moving the entire training pipeline and environment to hardware accelerators enables the training of generalized agents on realistic, large-scale network topologies previously considered computationally infeasible.
From the abstract
Penetration testing, the practice of simulating cyberattacks to identify vulnerabilities, is a complex sequential decision-making task that is inherently partially observable and features large action spaces. Training reinforcement learning (RL) policies for this domain faces a fundamental bottleneck: existing simulators are too slow to train on realistic network scenarios at scale, resulting in policies that fail to generalize. We present NASimJax, a complete JAX-based reimplementation of the N