Challenges the necessity of discrete action tokenizers in robotics by using a continuous, single-stage flow matching policy.
Paradigm Shift arxiv | Mar 31
Moves autonomous driving from 'predict-then-plan' to an interleaved VLA model where future frames and ego-actions are generated step-by-step.
New Capability arxiv | Mar 31
A non-Turing-complete DSL that compiles high-level LLM routing and agent policies directly into verified infrastructure artifacts like Kubernetes NetworkPolicies.
New Capability arxiv | Mar 31
Introduces a marketplace infrastructure that rebrands AI agents from mere tools into peer participants in a verifiable production network.
Paradigm Shift arxiv | Mar 31
Scales Maximum Entropy population synthesis from 20 to 50+ categorical attributes by replacing exact expectation sums with Persistent Contrastive Divergence.
Efficiency Breakthrough arxiv | Mar 31
Reveals that the tight architectural coupling of image generation and understanding in unified models creates a new class of reciprocal safety vulnerabilities.
Breaks Assumption arxiv | Mar 31
Introduces a vision model testbed that aligns AI visual attention (scanpaths) with human gaze without sacrificing classification accuracy.
Paradigm Shift arxiv | Mar 31
Shows that standard task-completion benchmarks fail to distinguish agent capabilities and proposes 'Working Memory Fidelity' as a more predictive metric.
Scaling Insight arxiv | Mar 31
The first self-supervised, domain-agnostic model for LiDAR ground segmentation, eliminating the need for per-sensor manual labeling.
Open Release arxiv | Mar 31
A production-grade framework that converts LLM/RAG evaluation into a deployment decision workflow using Pareto frontiers and CI gates.
New Capability arxiv | Mar 31
Collapses the standard vision backbone-plus-decoder architecture into a single early-fusion Transformer stack for both perception and task modeling.
Paradigm Shift arxiv | Mar 31
Couples visual representations directly into the RL optimization process (RLVR) for vision-language models using a structured reward reweighting mechanism.
Paradigm Shift arxiv | Mar 31
A unified framework for neural network recombination that achieves state-of-the-art fine-tuning with fewer than 200 parameters.
Efficiency Breakthrough arxiv | Mar 31
Enables Active Learning for tabular data without model retraining by iteratively optimizing the 'labeled context' of foundation models.
New Capability arxiv | Mar 31
Harmful intent in LLMs can be detected geometrically even after safety 'refusal' mechanisms have been surgically removed.
Breaks Assumption arxiv | Mar 31
For LLM-driven optimization, complex meta-heuristics like simulated annealing are unnecessary; simple greedy hill climbing is a superior default.
Breaks Assumption arxiv | Mar 31
Mathematical proof that LayerNorm structurally reduces model complexity compared to RMSNorm due to its mean-centering geometry.
Scaling Insight arxiv | Mar 31
Proposes 'Amdahl’s Law for AI,' proving that human effort in AI-assisted work is bottlenecked by the fraction of 'novel' tasks rather than agent capability.
Paradigm Shift arxiv | Mar 31
Lie Generator Networks enable linear system identification with guaranteed physical stability and dissipation by construction rather than through loss penalties.
New Capability arxiv | Mar 31
GIFT bootstraps image-to-CAD generation by turning inference-time failures into synthetic training data, reducing inference compute by 80%.
Efficiency Breakthrough arxiv | Mar 31
A modular, JAX-based framework and taxonomy for Reinforcement Learning with Diffusion and Flow policies.
Open Release arxiv | Mar 31
Achieves high-quality 3D reconstruction and camera pose estimation from sparse views without any pre-trained priors or ground-truth annotations.
New Capability arxiv | Mar 31
Near-lossless KV cache compression using angular quantization in the Walsh-Hadamard domain at ~3.5 bits per element.
Efficiency Breakthrough arxiv | Mar 31
Mechanistic analysis reveals that over-refusal and harmful-intent refusal in LLMs occupy distinct representation subspaces.
Breaks Assumption arxiv | Mar 31
Introduces 'Hidden Ads,' a new class of semantic backdoor attacks that inject promotional content into VLM responses based on natural user behavior.
New Capability arxiv | Mar 31
Shifts protein fitness optimization from continuous embeddings to discrete Quadratic Unconstrained Binary Optimization (QUBO).
Paradigm Shift arxiv | Mar 31
Introduces LongCat-Next, a 'Native Multimodal' model that treats vision and audio as first-class discrete tokens rather than language-centric attachments.
Paradigm Shift arxiv | Mar 31
Achieves zero-shot, prompt-free object removal in diffusion models purely through self-attention manipulation.
New Capability arxiv | Mar 31
VoxAnchor uses mmWave radar to authenticate speech by matching acoustics to physical throat vibrations.
New Capability arxiv | Mar 31
RAGent enables training-free, deployment-time human activity recognition for mmWave radar using agentic reasoning.
New Capability arxiv | Mar 31
Proposes SOL-Nav, which replaces raw visual features in navigation with structured language descriptions for LLM-based agents.
Paradigm Shift arxiv | Mar 31
Bridges the gap between free-form natural language and safety-critical UAV navigation using Signal Temporal Logic (STL) translation and repair.
New Capability arxiv | Mar 31
Sci-Mind introduces an 'Adversarial Cognitive Dialectic' where specialized agents debate to refine mathematical models.
Paradigm Shift arxiv | Mar 31
Achieves a 79,000x reduction in energy per inference for insulin dose calculation using Spiking Neural Networks (SNNs).
Efficiency Breakthrough arxiv | Mar 31
Introduces 'Umwelt Engineering,' the deliberate constraint of an agent's linguistic environment to improve reasoning.
Paradigm Shift arxiv | Mar 31
PRBench reveals that current top-tier coding agents have a 0% success rate in end-to-end physics paper reproduction.
Breaks Assumption arxiv | Mar 31
Introduces Composer, a paradigm that generates input-specific parameter adaptations at inference time to enable dynamic per-input model specialization.
Paradigm Shift arxiv | Mar 31
Kuaishou releases KAT-Coder-V2, an agentic coding model achieving state-of-the-art results on SWE-bench Verified through a 'Specialize-then-Unify' paradigm.
Open Release arxiv | Mar 31
Provides empirical evidence and a mechanistic explanation for why LoRA drastically reduces catastrophic forgetting in sequential fine-tuning compared to full fine-tuning.
Scaling Insight arxiv | Mar 31
TianJi is the first 'AI meteorologist' system capable of autonomously driving complex numerical models to verify physical hypotheses in atmospheric science.
New Capability arxiv | Mar 31
A controlled study proving that the temporal organization (curriculum) of multimodal data is a first-order variable in balancing reasoning vs. OCR capabilities.
Scaling Insight arxiv | Mar 31
SkyNet extends MuZero to partially-observable stochastic games by adding auxiliary belief-aware heads, significantly outperforming baselines in complex card games.
Paradigm Shift arxiv | Mar 31
Heracles uses a state-conditioned diffusion middleware to bridge precise motion tracking with generative recovery for humanoid robots.
New Capability arxiv | Mar 31
Sortify is the first fully autonomous LLM agent deployed in production for closed-loop recommendation ranking optimization.
New Capability arxiv | Mar 31
AutoStan demonstrates a CLI coding agent that autonomously builds and iteratively improves interpretable Bayesian models in Stan.
New Capability arxiv | Mar 31
Identifies emergent social risks in multi-agent systems, such as spontaneous collusion and conformity, that occur even when agents are not explicitly instructed to do so.
Breaks Assumption arxiv | Mar 31
Uses spectral decomposition of inverse dynamics to enable real-time planning of long-horizon robotic manipulation tasks (10+ contact modes).
Efficiency Breakthrough arxiv | Mar 31
Introduces SCOUT, a routing framework that intelligently selects which Image-to-3D reconstruction model to use based on input difficulty and cost constraints.
New Capability arxiv | Mar 31
GraySense enables geospatial object tracking using only encrypted network packet sizes without any access to raw video streams.
New Capability arxiv | Mar 31
KVSculpt moves beyond simple eviction/merging to optimize unconstrained KV pairs in continuous space for extreme cache compression.
Efficiency Breakthrough arxiv | Mar 31
A rigorous analysis of the AIMO 3 math competition reveals that raw model capability dominates inference-time prompt optimization by an order of magnitude.
Breaks Assumption arxiv | Mar 31
Wan-R1 successfully applies Group Relative Policy Optimization (GRPO) to flow-based video models to enable verifiable spatial reasoning.
New Capability arxiv | Mar 31
The eigenvalue tail index of a neural network's weight matrices serves as a near-perfect (R^2 = 0.984) diagnostic for label noise in the training data.
Scaling Insight arxiv | Mar 31
Poppy provides a training-free way to refine monocular surface normals using single-shot polarization measurements at test time.
New Capability arxiv | Mar 31
SAGE mitigates multimodal hallucinations by monitoring 'attention sinks' and dynamically modulating self-attention during the decoding process.
Efficiency Breakthrough arxiv | Mar 31
ATLAS-RTC introduces token-level runtime control that detects and corrects LLM drift from structured output contracts during the forward pass.
New Capability arxiv | Mar 31
Guardrails successfully implements and flight-tests Control Barrier Functions on an F-16 fighter jet to enforce safety limits in real-time.
New Capability arxiv | Mar 31
ITQ3_S achieves high-fidelity 3-bit LLM inference by using rotation-domain smoothing to eliminate the catastrophic precision loss caused by outliers.
Efficiency Breakthrough arxiv | Mar 31
The Physics-Guided Transformer (PGT) embeds physical priors (like diffusion and causality) directly into the self-attention mechanism via heat-kernel biases.
Paradigm Shift arxiv | Mar 31
Iterative Motion Imitation enables bicycle robots to perform unassisted front-flips by learning from initially 'impossible' reference motions.
New Capability arxiv | Mar 31
Proteina-Complexa unifies generative flow-based modeling with structure-based 'hallucination' to set a new SOTA in atomistic protein binder design.
New Capability arxiv | Mar 31
ExFusion enables Transformer models to gain the capacity of Mixture-of-Experts during training while remaining a standard dense model for deployment.
Efficiency Breakthrough arxiv | Mar 31
SARL improves reasoning models by rewarding the 'topology' of thoughts rather than just the final answer, enabling effective RL without ground-truth labels.
Paradigm Shift arxiv | Mar 31
Dataset Concentration (DsCo) achieves nearly lossless dataset reduction by aligning distributions via diffusion models, cutting storage and training costs by half.
Efficiency Breakthrough arxiv | Mar 31
Correlated Diffusion replaces independent noise with structured MCMC dynamics, enabling generative modeling on hyper-efficient probabilistic computers.
Paradigm Shift arxiv | Mar 31
This study challenges the common 'best practice' of atomic decomposition for LLM judges, showing that holistic evaluation is often superior at detecting incompleteness.
Breaks Assumption arxiv | Mar 31
An autonomous agent reveals that domain-specific molecular architectures are largely unnecessary; standard transformers with better tuning outperform custom designs.
Breaks Assumption arxiv | Mar 31
Decoupled language models reduce the compute required for OCR domain adaptation by 95% while matching SOTA transformer accuracy.
Efficiency Breakthrough arxiv | Mar 31
This paper clarifies that Diffusion Maps (DMAPs) are not actually a dimensionality reduction tool, but rather a spectral representation that requires specific combinations to form a chart.
Paradigm Shift arxiv | Mar 31
The first framework for bit-identical deep learning training that produces MD5-verified identical weights across independent runs.
New Capability arxiv | Mar 31
Drift-AR enables single-step (1-NFE) high-fidelity image generation by reinterpreting AR prediction entropy as a physical drifting field.
Efficiency Breakthrough arxiv | Mar 31
Meta-Harness automates the engineering of the 'code' surrounding LLMs, improving RAG and agent performance by optimizing retrieval and context management logic.
New Capability arxiv | Mar 31
ROVED reduces the expensive human feedback required for preference-based RL by up to 90% by leveraging vision-language embeddings and uncertainty filtering.
Efficiency Breakthrough arxiv | Mar 31
PhysNet embeds physical tumor growth dynamics directly into the latent feature space of a CNN, rather than just as a constraint on the output.
Paradigm Shift arxiv | Mar 31
This paper proves that reward hacking is a structural equilibrium of optimized AI agents, not a bug, and provides a computable 'distortion index' to predict it.
Paradigm Shift arxiv | Mar 31
Moves VLM grounding from text-based coordinates to a direct visual token selection mechanism via special pointing tokens.
Paradigm Shift arxiv | Mar 31
Introduces Heddle, a trajectory-centric system that resolves the long-tail latency bottleneck of tool calls in agentic Reinforcement Learning.
Efficiency Breakthrough arxiv | Mar 31
Bypasses expensive formal verification solvers by designing neural networks that are 'verifiable by design' using the fast trivial Lipschitz bound.
Paradigm Shift arxiv | Mar 31
A training-free metacognitive framework that gives LLMs explicit control over expanding, pruning, and repairing reasoning trajectories during inference.
New Capability arxiv | Mar 31
Presents PReD, the first foundation model and 1.3M-sample dataset specifically for electromagnetic signal perception and decision-making.
New Capability arxiv | Mar 31
Replaces traditional fixed-update rules in online learning with a causal Transformer to track switching experts in non-stationary environments.
Paradigm Shift arxiv | Mar 31
Replaces the classic Newton-Raphson power-flow solver with a differentiable GPU-accelerated simulation.
Efficiency Breakthrough arxiv | Mar 31
Transitions reasoning model optimization from coarse sequence-level advantages to fine-grained token dynamics.
New Capability arxiv | Mar 31
Moves beyond next-token prediction to model reasoning as gradient-based energy minimization over latent trajectories.
Paradigm Shift arxiv | Mar 31
Introduces lightweight equilibration to the Muon optimizer, significantly stabilizing and accelerating LLM pretraining.
Efficiency Breakthrough arxiv | Mar 31
Discovers that LLM hidden states undergo geometric 'warping' at digit-count boundaries, mimicking human psychological perception.
Scaling Insight arxiv | Mar 31
Enables instruction-following in low-resource languages by simply merging target language base models with English-instructed models.
Efficiency Breakthrough arxiv | Mar 31
Enhances Kolmogorov-Arnold Networks (KAN) with fractal interpolation to approximate non-smooth and rough functions.
New Capability arxiv | Mar 31
Exposes a massive robustness gap in Vision-Language-Action (VLA) models, where simple paraphrasing causes up to 50% success drops.
Breaks Assumption arxiv | Mar 31
An evolutionary framework for GPU kernel generation that outperforms frontier models like Claude 4.6 and Gemini 3.0.
Efficiency Breakthrough arxiv | Mar 31
HISA eliminates the quadratic O(L²) bottleneck in sparse attention indexers, enabling efficient long-context scaling for models like DeepSeek-V3.
Efficiency Breakthrough arxiv | Mar 31
Researchers have used LLMs to evolve entirely new Reinforcement Learning update rules from scratch that compete with human-designed baselines like PPO and SAC.
New Capability arxiv | Mar 31
The 'Scaffold Effect' reveals that Vision-Language Models in clinical settings often fabricate reasoning based on prompt framing rather than actual visual data.
Breaks Assumption arxiv | Mar 31
Entropic Claim Resolution (ECR) shifts RAG from retrieving 'relevant' documents to retrieving 'discriminative' evidence that minimizes hypothesis uncertainty.
Paradigm Shift arxiv | Mar 31
IsoQuant leverages SO(4) isoclinic rotations to achieve a 4.5x-4.7x speedup in low-bit KV-cache quantization over existing methods.
Efficiency Breakthrough arxiv | Mar 31
The 'Bidirectional Coherence Paradox' demonstrates that LLM performance and explanation quality can be inversely correlated depending on domain observability.
Paradigm Shift arxiv | Mar 31
COvolve creates an automated curriculum for open-ended learning by co-evolving environments and policies as executable code through a zero-sum game.
Paradigm Shift arxiv | Mar 31
INSID3 achieves state-of-the-art one-shot image segmentation using only frozen DINOv3 features without any training, fine-tuning, or auxiliary models.
Efficiency Breakthrough arxiv | Mar 31
EdgeDiT provides a hardware-aware blueprint for running massive Diffusion Transformers (DiT) on mobile NPUs with a 1.6x reduction in latency.
Efficiency Breakthrough arxiv | Mar 31
LAD achieves 3x lower latency than previous driving language models by generating textual reasoning and motion plans at up to 20 Hz.
Efficiency Breakthrough arxiv | Mar 31