A student AI can inherit a hidden subconscious bias from its teacher even through perfectly normal-looking lessons.
April 29, 2026
Original Paper
Subliminal Steering: Stronger Encoding of Hidden Signals
arXiv · 2604.25783
The Takeaway
Training a smaller model on the outputs of a larger one is a common way to save costs. This study shows that hidden behavioral biases in the teacher model are passed down to the student with high precision. These biases are encoded so clearly that the original steering vector used to manipulate the teacher can be recovered from the student's final weights. This subliminal steering occurs even when the training data seems completely innocuous and unrelated to the bias. It reveals that fine-tuning or distillation can create a permanent, hidden back-door of influence between models.
From the abstract
Subliminal learning describes a student language model inheriting a behavioral bias by fine-tuning on seemingly innocuous data generated by a biased teacher model. Prior work has begun to characterize this phenomenon but leaves open questions about the scope of signals it can transfer, the mechanisms that explain it, and the precision with which a bias can be encoded by seemingly unrelated data. We tackle all three problems by introducing subliminal steering, a variant of subliminal learning in