AI & ML Open Release

CLT-Forge democratizes mechanistic interpretability by providing an end-to-end library for training Cross-Layer Transcoders and generating feature attribution graphs.

March 24, 2026

Original Paper

CLT-Forge: A Scalable Library for Cross-Layer Transcoders and Attribution Graphs

Florent Draye, Abir Harrasse, Vedant Palit, Tung-Yu Wu, Jiarui Liu, Punya Syon Pandey, Roderick Wu, Terry Jingchen Zhang, Zhijing Jin, Bernhard Schölkopf

arXiv · 2603.21014

The Takeaway

Dictionary learning and transcoders are the current frontier for understanding LLM internals; this tool scales these techniques with compressed activation caching and automated circuit tracing. It allows researchers to visualize and analyze compact, shared feature representations across model layers.

From the abstract

Mechanistic interpretability seeks to understand how Large Language Models (LLMs) represent and process information. Recent approaches based on dictionary learning and transcoders enable representing model computation in terms of sparse, interpretable features and their interactions, giving rise to feature attribution graphs. However, these graphs are often large and redundant, limiting their interpretability in practice. Cross-Layer Transcoders (CLTs) address this issue by sharing features acro