AI & ML New Capability

Uses Sparse Autoencoders (SAEs) to identify and steer cultural representations in LLMs, eliciting rare cultural concepts that prompting alone misses.

March 25, 2026

Original Paper

Steering LLMs for Culturally Localized Generation

Simran Khanuja, Hongbin Liu, Shujian Zhang, John Lambert, Mingqing Chen, Rajiv Mathews, Lun Wang

arXiv · 2603.23301

The Takeaway

It provides a white-box method to solve cultural bias. Instead of black-box prompting, practitioners can use 'Cultural Embeddings' to steer models toward long-tail cultural knowledge without needing expensive localized fine-tuning.

From the abstract

LLMs are deployed globally, yet produce responses biased towards cultures with abundant training data. Existing cultural localization approaches such as prompting or post-training alignment are black-box, hard to control, and do not reveal whether failures reflect missing knowledge or poor elicitation. In this paper, we address these gaps using mechanistic interpretability to uncover and manipulate cultural representations in LLMs. Leveraging sparse autoencoders, we identify interpretable featur