LangMARL introduces agent-level credit assignment and policy gradient evolution directly in the natural language space for multi-agent coordination.
April 2, 2026
Original Paper
LangMARL: Natural Language Multi-Agent Reinforcement Learning
arXiv · 2604.00722
The Takeaway
It bridges the gap between classical MARL and LLM agents by providing dense, causal language feedback rather than relying on coarse global outcomes. This significantly improves sample efficiency and interpretability in complex, cooperative agent environments.
From the abstract
Large language model (LLM) agents struggle to autonomously evolve coordination strategies in dynamic environments, largely because coarse global outcomes obscure the causal signals needed for local policy refinement. We identify this bottleneck as a multi-agent credit assignment problem, which has long been studied in classical multi-agent reinforcement learning (MARL) but remains underaddressed in LLM-based systems. Building on this observation, we propose LangMARL, a framework that brings cred