Unlocks Maximum Entropy RL for high-dimensional humanoid control, matching or doubling the performance of dominant deterministic baselines.
arXiv · March 16, 2026 · 2603.12612
Why it matters
Previously, the 'curse of dimensionality' forced practitioners toward deterministic policies in humanoid tasks; this framework's dimension-wise entropy modulation enables robust exploration in complex action spaces, leading to massive gains in difficult benchmarks like basketball and balancing.
From the abstract
Scaling Maximum Entropy Reinforcement Learning (RL) to high-dimensional humanoid control remains a formidable challenge, as the ``curse of dimensionality'' induces severe exploration inefficiency and training instability in expansive action spaces. Consequently, recent high-throughput paradigms have largely converged on deterministic policy gradients combined with massive parallel simulation. We challenge this compromise with FastDSAC, a framework that effectively unlocks the potential of maximu