AI & ML Paradigm Shift

Formulates Hierarchical Instruction Following as a Constrained Markov Decision Process to ensure LLMs prioritize system prompts over user instructions.

March 18, 2026

Original Paper

HIPO: Instruction Hierarchy via Constrained Reinforcement Learning

Keru Chen, Jun Luo, Sen Lin, Yingbin Liang, Alvaro Velasquez, Nathaniel Bastian, Shaofeng Zou

arXiv · 2603.16152

The Takeaway

Instead of hoping the model follows 'don't do X,' HIPO treats system instructions as strict algorithmic boundaries, effectively solving the priority asymmetry problem that leads to jailbreaks and instruction drift.

From the abstract

Hierarchical Instruction Following (HIF) refers to the problem of prompting large language models with a priority-ordered stack of instructions. Standard methods like RLHF and DPO typically fail in this problem since they mainly optimize for a single objective, failing to explicitly enforce system prompt compliance. Meanwhile, supervised fine-tuning relies on mimicking filtered, compliant data, which fails to establish the priority asymmetry at the algorithmic level. In this paper, we introduce