AI models for biology are actually 'smarter' at the beginning than at the end.
April 17, 2026
Original Paper
Intermediate Layers Encode Optimal Biological Representations in Single-Cell Foundation Models
arXiv · 2604.14838
The Takeaway
In standard machine learning, the rule of thumb is that the final layer of a neural network contains the most sophisticated and 'ready-to-use' information. However, when testing massive AI 'foundation models' used to understand single cells, researchers found that the most important biological insights are actually buried in the middle or even the very first layers. This flips the script on how we build and use AI in medicine. It means that when scientists use these models to predict how a cell might react to a new drug, they have been looking at the 'finished product' when the real treasure was in the early drafts. By focusing on these earlier layers, we can capture much more nuanced details about how life works that the final output usually smooths over.
From the abstract
Current single-cell foundation model benchmarks universally extract final layer embeddings, assuming these represent optimal feature spaces. We systematically evaluate layer-wise representations from scFoundation (100M parameters) and Tahoe-X1 (1.3B parameters) across trajectory inference and perturbation response prediction. Our analysis reveals that optimal layers are task-dependent (trajectory peaks at 60% depth, 31% above final layers) and context-dependent (perturbation optima shift 0-96% a