AI & ML Efficiency Breakthrough

TIDE is a post-training early-exit system that allows individual tokens to skip unnecessary layers, improving throughput by up to 8% with minimal calibration.

March 24, 2026

Original Paper

TIDE: Token-Informed Depth Execution for Per-Token Early Exit in LLM Inference

Jaber Jaber, Osama Jaber

arXiv · 2603.21365

The Takeaway

Unlike most early-exit schemes, TIDE requires no retraining and works with standard causal LMs via fused CUDA kernels. It demonstrates that nearly all tokens can exit early during decoding without sacrificing accuracy on complex tasks like multi-step math.

From the abstract

Large language models run every token through every layer, regardless of difficulty. We present TIDE, a post-training system that attaches tiny learned routers at periodic checkpoint layers and, at inference time, selects the earliest layer whose hidden state has converged for each token. TIDE requires no model retraining, works with any HuggingFace causal LM, auto-detects GPU architecture, and supports float32, float16, and bfloat16 through fused CUDA kernels. On an NVIDIA A100 with DeepSeek R1