AI & ML Scaling Insight

Mechanistic probing reveals a directional asymmetry in how LLMs encode hierarchy: hypernymy is redundant and resilient, while hyponymy is fragile and compact.

March 19, 2026

Original Paper

Do Language Models Encode Semantic Relations? Probing and Sparse Feature Analysis

Andor Diera, Ansgar Scherp

arXiv · 2603.17624

The Takeaway

This provides a blueprint for understanding structured reasoning in LLMs using Sparse Autoencoders (SAEs). It proves that certain semantic relations are far more easily 'broken' by ablation than others, which has direct implications for model steering and knowledge editing.

From the abstract

Understanding whether large language models (LLMs) capture structured meaning requires examining how they represent concept relationships. In this work, we study three models of increasing scale: Pythia-70M, GPT-2, and Llama 3.1 8B, focusing on four semantic relations: synonymy, antonymy, hypernymy, and hyponymy. We combine linear probing with mechanistic interpretability techniques, including sparse autoencoders (SAE) and activation patching, to identify where these relations are encoded and ho