Mechanistic probing reveals a directional asymmetry in how LLMs encode hierarchy: hypernymy is redundant and resilient, while hyponymy is fragile and compact.
March 19, 2026
Original Paper
Do Language Models Encode Semantic Relations? Probing and Sparse Feature Analysis
arXiv · 2603.17624
The Takeaway
This provides a blueprint for understanding structured reasoning in LLMs using Sparse Autoencoders (SAEs). It proves that certain semantic relations are far more easily 'broken' by ablation than others, which has direct implications for model steering and knowledge editing.
From the abstract
Understanding whether large language models (LLMs) capture structured meaning requires examining how they represent concept relationships. In this work, we study three models of increasing scale: Pythia-70M, GPT-2, and Llama 3.1 8B, focusing on four semantic relations: synonymy, antonymy, hypernymy, and hyponymy. We combine linear probing with mechanistic interpretability techniques, including sparse autoencoders (SAE) and activation patching, to identify where these relations are encoded and ho