Outperforms fine-tuned baselines in code optimization by using semantics-preserving transformations as a generative intermediate representation.
March 17, 2026
Original Paper
SemRep: Generative Code Representation Learning with Code Transformations
arXiv · 2603.13640
The Takeaway
By training models to predict code transformations rather than just raw tokens, it achieves 6.7x better robustness and matches the performance of models 685B parameters larger while using 25% less inference compute. This is a significant step toward making specialized coding agents efficient enough for local deployment.
From the abstract
Code transformation is a foundational capability in the software development process, where its effectiveness relies on constructing a high-quality code representation to characterize the input code semantics and guide the transformation. Existing approaches treat code transformation as an end-to-end learning task, leaving the construction of the representation needed for semantic reasoning implicit in model weights or relying on rigid compiler-level abstractions. We present SemRep, a framework