Paradigm Shift
/ Category lead
Shifts AI evaluation from static benchmarks to interactive agentic environments requiring fluid adaptation.
ARC-AGI is the industry standard for measuring generalization; version 3 moves the goalposts to agentic reasoning and planning without explicit instructions, where current frontier models still fail significantly (1% vs 100% human).