Paradigm Shift

329 papers · Page 1 of 7

Paradigm Shift / Category lead

Shifts AI evaluation from static benchmarks to interactive agentic environments requiring fluid adaptation.

ARC-AGI is the industry standard for measuring generalization; version 3 moves the goalposts to agentic reasoning and planning without explicit instructions, where current frontier models still fail significantly (1% vs 100% human).

By SeriesFusion Editorial Board · March 27, 2026

Filter by desk: AI Computing Robotics Math Quantum Physics Space Earth Chemistry Engineering Ecology Biology Neuroscience Health Psychology Economics Society

First foundation model to unify text, image, audio, and video using native masked diffusion instead of autoregressive serialization.

LLM-guided program evolution has discovered a new data-shuffling rule for SGD that provably and empirically outperforms standard Random Reshuffling.

A comprehensive analysis of AI safety vulnerabilities including automated circuit discovery, latent adversarial training, and power-law scaling of jailbreak success.

Identifies a fundamental quality-exploration dilemma in Diffusion Language Models where remasking improves single-sample quality but kills reasoning diversity.

Introduces training-free and model-free trajectory planning by computing diffusion score functions directly from data libraries via kernel-weighted estimation.

Proposes a decision-centric architecture that separates signal estimation from control policy to make LLM system decisions explicit and inspectable.

Truth Anchoring (TAC) provides a post-hoc calibration method to align LLM uncertainty metrics with actual factual correctness.

Identifies 'diversity collapse' in the popular GRPO reinforcement learning method and introduces MUPO to maintain broad reasoning paths.

Replaces manual rubric-tuning for synthetic data with an automated gradient-guided optimization framework based on influence estimation.

Introduces HiLL, a framework that jointly trains a 'hinter' and 'reasoner' to prevent advantage collapse in reinforcement learning for hard tasks.

LangMARL introduces agent-level credit assignment and policy gradient evolution directly in the natural language space for multi-agent coordination.

Stochastic Attention achieves a global receptive field in O(log n) layers by using randomized routing inspired by the fruit fly connectome.

Routing-Free MoE replaces centralized routing with individual expert-level activation, eliminating the need for Softmax and Top-K load balancing.

Policy Improvement Reinforcement Learning (PIRL) shifts the training objective from reward maximization to explicit maximization of policy progress across iterations.

Proposes dense point trajectories as universal 'visual tokens' for behavior that generalize across different species and non-rigid objects.

Achieves 'zero forgetting' in continual learning by stacking frozen domain-specific MoE-LoRA adapters with a meta-router.

Replaces standard relative Softmax attention with 'Multiscreening' to allow absolute query-key relevance, yielding 3.2x faster inference at 100K context.

Replaces the heuristic constant momentum (0.9) with a parameter-free, physics-inspired schedule that speeds up convergence by nearly 2x.

Proposes a mathematical framework where 'spectral gaps' in parameter updates control phase transitions like grokking and loss plateaus.

Proposes a neuroscience-grounded memory architecture that makes interactions cheaper and more accurate with experience, rather than relying on expanding context windows.

Introduces DASES, a framework that replaces passive validation with active 'falsification' to ensure scientific models learn actual mechanisms rather than just winning benchmarks.

Switches the training objective from hard Next-Token Prediction to predicting 'concepts' (sets of semantically related tokens).

Proves that LLM agent capability (pass@1) and reliability (consistency) diverge systematically, with frontier models often having the highest 'meltdown' rates.

Learns stable, interpretable Koopman generators for nonlinear PDEs from trajectory data alone without any physics supervision.

Shows that VLMs can overcome deep-seated perceptual biases and optical illusions by using image manipulation tools rather than more training data.

A novel neural primitive based on metriplectic dynamics that outperforms Transformers in data efficiency and generalization.

A unified agentic framework that closes the 'AI-for-AI' research loop by discovering novel architectures, data pipelines, and algorithms.

Decouples high-level intent planning from low-level motor control in Vision-Language-Action (VLA) models to prevent the degradation of pre-trained VLM representations.

Demonstrates that independent aggregation (Hybrid Confirmation Tree) consistently outperforms the standard 'AI-as-advisor' paradigm across diverse high-stakes domains.

Shows that deep learning models for medical imaging (MRI) can be trained using synthetic quaternion Julia fractals instead of sensitive human clinical data.

Provides a formal framework for optimizing models whose decisions actively change the distribution of the data they encounter.

Introduces a rigorous algorithm to determine if two different neural networks share the same underlying 'algorithmic interpretation' without needing to manually define the circuits.

Replaces heuristic ReAct-style agent loops with a mathematical framework based on control theory to prevent LLM agents from over-deliberating or using excessive tools.

Introduces geometry-aware parallel refinement for diffusion language models, bypassing fixed-block decoding limitations.

Knowledge distillation can be performed by injecting 'experience' into prompts rather than updating model weights.

Gaussian Joint Embeddings provide a probabilistic alternative to deterministic SSL, eliminating the need for architectural asymmetries to prevent collapse.

Identifies a 'stability asymmetry' signature where deceptive models maintain stable internal beliefs while producing fragile, unstable external responses under perturbation.

Challenges the 'filter-first' data paradigm by showing that training on uncurated data with quality-score labels outperforms training on high-quality filtered subsets.

Introduces a 'clone-robust' mechanism (YRWR) to prevent AI model producers from strategically gaming the rankings in crowd-sourced arenas like Chatbot Arena.

Introduces neural topology probing to identify causally influential 'hub neurons' in Vision-Language Models that govern cross-modal behavior.

Proposes a new reinforcement learning policy compression method based on long-horizon state-space coverage instead of immediate action-matching.

Identifies that standard Transformer attention matrices are fundamentally ill-conditioned and proposes a drop-in 'preconditioned' replacement.

Challenges the necessity of discrete action tokenizers in robotics by using a continuous, single-stage flow matching policy.

Introduces a marketplace infrastructure that rebrands AI agents from mere tools into peer participants in a verifiable production network.

Introduces a vision model testbed that aligns AI visual attention (scanpaths) with human gaze without sacrificing classification accuracy.

Collapses the standard vision backbone-plus-decoder architecture into a single early-fusion Transformer stack for both perception and task modeling.

Couples visual representations directly into the RL optimization process (RLVR) for vision-language models using a structured reward reweighting mechanism.

Proposes 'Amdahl’s Law for AI,' proving that human effort in AI-assisted work is bottlenecked by the fraction of 'novel' tasks rather than agent capability.

Shifts protein fitness optimization from continuous embeddings to discrete Quadratic Unconstrained Binary Optimization (QUBO).

Introduces LongCat-Next, a 'Native Multimodal' model that treats vision and audio as first-class discrete tokens rather than language-centric attachments.