A biology-native transformer architecture that mirrors cellular transcription and translation, enabling interpretable predictions across DNA, RNA, and protein.
March 25, 2026
Original Paper
Central Dogma Transformer III: Interpretable AI Across DNA, RNA, and Protein
arXiv · 2603.23361
The Takeaway
Rather than treating biological data as generic sequences, this model uses a 'Virtual Cell Embedder' that respects the spatial compartmentalization of a cell. Its ability to discover clinical side effects from unperturbed data suggests a move toward truly mechanistic biological AI.
From the abstract
Biological AI models increasingly predict complex cellular responses, yet their learned representations remain disconnected from the molecular processes they aim to capture. We present CDT-III, which extends mechanism-oriented AI across the full central dogma: DNA, RNA, and protein. Its two-stage Virtual Cell Embedder architecture mirrors the spatial compartmentalization of the cell: VCE-N models transcription in the nucleus and VCE-C models translation in the cytosol. On five held-out genes, CD