AI & ML Nature Is Weird

A student AI can inherit a hidden subconscious bias from its teacher even through perfectly normal-looking lessons.

April 29, 2026

Original Paper

Subliminal Steering: Stronger Encoding of Hidden Signals

George Morgulis, John Hewitt

arXiv · 2604.25783

The Takeaway

Training a smaller model on the outputs of a larger one is a common way to save costs. This study shows that hidden behavioral biases in the teacher model are passed down to the student with high precision. These biases are encoded so clearly that the original steering vector used to manipulate the teacher can be recovered from the student's final weights. This subliminal steering occurs even when the training data seems completely innocuous and unrelated to the bias. It reveals that fine-tuning or distillation can create a permanent, hidden back-door of influence between models.

From the abstract

Subliminal learning describes a student language model inheriting a behavioral bias by fine-tuning on seemingly innocuous data generated by a biased teacher model. Prior work has begun to characterize this phenomenon but leaves open questions about the scope of signals it can transfer, the mechanisms that explain it, and the precision with which a bias can be encoded by seemingly unrelated data. We tackle all three problems by introducing subliminal steering, a variant of subliminal learning in

Read the original paper →

← Back to today's papers