Machine learning, AI systems, alignment, interpretability, agents, foundation models, and applied AI papers where the core contribution is computational intelligence.
Filter by category: Paradigm Challenge Breaks Assumption First Ever Nature Is Weird Practical Magic Cosmic Scale Life Origin Open Release Efficiency Leap New Capability Scaling Insight
Paradigm Shift
Quantifies an emergent 'self' in robots as an invariant subnetwork that persists across continual learning of variable tasks.
New Capability
Applies reinforcement learning with a cycle-consistency reward to drastically improve natural language to Lean4 autoformalization.
Efficiency Breakthrough
A 5M-parameter OCR model that rivals billion-parameter vision-language models, proving data-centric curation can beat raw parameter scale.
New Capability
Reformulates molecular discovery as an autonomous MCTS planning problem over executable chemical operations rather than just similarity-based prediction.
Scaling Insight
Identifies a 'critical threshold' in human-AI symbiosis beyond which human capability collapses abruptly and irreversibly due to over-delegation.
Paradigm Shift
Moves automated research from stateless linear pipelines to a persistent Research World Model that maintains a self-correcting knowledge graph of gaps and methods.
Efficiency Breakthrough
Achieves high-fidelity sub-seasonal weather forecasting with a 276M parameter model that matches 1.6B parameter baselines in accuracy and speed.
Open Release
Releases 55 hours of continuous 30fps expert human computer-use videos to address the 'missing ingredient' for desktop automation agents.
Paradigm Shift
Introduces a 'sorry-driven' formal decomposition that allows LLM agents to solve complex proofs by isolating and independently verifying subgoals.
Breaks Assumption
Reveals that self-distillation degrades out-of-distribution reasoning by suppressing 'epistemic verbalization' (the model's expression of uncertainty).
Paradigm Shift
Enforces hard incompressibility constraints in neural operators using spectral Leray projection, ensuring physically admissible fluid simulations.
New Capability
An autonomous agentic pipeline discovered novel white-box adversarial attacks that outperform existing methods by up to 300%.
Efficiency Breakthrough
Agentic Variation Operators (AVO) replace fixed evolutionary heuristics with coding agents to discover GPU kernels that outperform FlashAttention-4 by 10.5%.
New Capability
UI-Voyager achieves an 81.0% success rate on AndroidWorld, exceeding human-level performance in mobile GUI automation.
Paradigm Shift
LensWalk introduces a 'reason-plan-observe' loop that allows agents to dynamically control the temporal sampling and density of the videos they analyze.
Paradigm Shift
The Free-Market Algorithm (FMA) is a zero-parameter metaheuristic that discovers complex pathways in chemistry and economics through emergent supply-and-demand dynamics.
Open Release
VFIG enables high-fidelity conversion of rasterized technical figures into editable, scalable SVGs using a new 66K-pair dataset.
Paradigm Shift
MARCH eliminates 'LLM-as-a-judge' confirmation bias by using information asymmetry to force verification agents to check claims without seeing the original response.
Efficiency Breakthrough
DreamerAD accelerates imagination-based training for autonomous driving by 80x, compressing 100-step diffusion sampling down to a single step.
Efficiency Breakthrough
The Multilevel Euler-Maruyama (ML-EM) method allows diffusion models to perform sampling at the computational cost of a single model evaluation.
New Capability
Wasserstein Parallel Transport provides a formal framework for counterfactual prediction in evolving probability distributions.
Paradigm Challenge
So there’s this new AI researcher that’s actually starting to fact-check real math papers and point out exactly where the professors messed up.
Paradigm Challenge
Get this: only about 10% of the computer code used in those fancy Nature papers actually works if you try to run it yourself.
Practical Magic
Researchers figured out they could trick a robot into handing someone a knife instead of an apple using nothing but a printed drink coaster.
Nature Is Weird
Your AI assistant’s 'brain' can be secretly messed with by random emails in your inbox, changing how it treats you without you ever knowing.
Practical Magic
Imagine wireless internet that's actually as fast as a physical cable—no lag, no matter how many devices the signal bounces through.
Breaks Assumption
Effective semantic alignment for low-resource languages can be achieved with only 10,000 noisy synthetic pairs, matching the performance of models trained on 1 million samples.
Paradigm Shift
Mechanistic interpretability reveals that LLMs possess 'affect reception' circuits that detect emotional content even when explicit keywords are removed.
Efficiency Breakthrough
Sparse Feature Attention (SFA) reduces attention costs from quadratic in sequence length and linear in dimension to a fraction based on feature sparsity, enabling 2.5x speedups.
Scaling Insight
hidden states in LLMs occupy a Riemannian submanifold where tokens are Voronoi regions, revealing a universal 'hourglass' intrinsic dimension profile across all tested models.
Breaks Assumption
Forcing AI agents to use human-comprehensible language causes a 50% efficiency drop compared to their own 'inscrutable' communication protocols.
Efficiency Breakthrough
Standard quantization destroys the small parameter 'deltas' that encode post-training knowledge; Delta-Aware Quantization (DAQ) fixes this by optimizing for sign preservation.
Efficiency Breakthrough
Hybrid Associative Memory (HAM) layers allow the KV cache to grow dynamically based only on information that an internal RNN cannot predict.
New Capability
Small adapters can provide frozen decoder-only LLMs with persistent latent-space memory that survives across separate sessions.
Scaling Insight
The standard 'Chinchilla Approach 2' for fitting scaling laws is systematically biased, potentially leading to millions of dollars in wasted compute at frontier scales.
Paradigm Shift
Gradient boosting exhibits a 'first-mover bias' where correlated features selected early in the tree sequence gain an artificial, self-reinforcing importance in SHAP rankings.
New Capability
Introduces a framework for LLMs to self-improve reasoning in specific domains by autonomously mining and constructing training environments directly from the open web.
Paradigm Shift
Establishes a formal mathematical equivalence between Classifier-Free Guidance (CFG) and alignment-based objectives, allowing for CFG-like quality without inference-time overhead.
Efficiency Breakthrough
Proposes an agentic architecture that achieves O(1) token complexity relative to dataset size by strictly separating intent parsing from deterministic data execution.
Efficiency Breakthrough
Achieves high-fidelity diffusion generation in just 3 steps by distilling layer-wise time embeddings from reference trajectories.
Breaks Assumption
Finds that nominal instruction-tuning with LoRA often fails to improve (and can even degrade) verifiable instruction-following despite improvements on broader benchmarks.
Paradigm Shift
Shifts symbolic regression from discrete genetic search to a continuous, embedding-driven optimization paradigm.
Scaling Insight
Reveals that RLVR-driven reasoning improvements in LLMs are the result of highly sparse changes to a tiny fraction of 'critical' token distributions.
Breaks Assumption
Identifies that the full source code (skill body) of a tool is the primary signal for LLM tool selection, far outweighing the importance of descriptions or metadata.
Paradigm Shift
Replaces standard autoregressive document OCR with a parallel diffusion-based denoising framework.
Efficiency Breakthrough
Introduces a verifier that operates directly on the latent hidden states of Diffusion Transformers, avoiding the need for costly pixel-space decoding during inference-time scaling.
Paradigm Shift
Demonstrates that Hebbian plasticity can induce emergent attractor dynamics in robot controllers, enabling rapid adaptation without backpropagation.
Breaks Assumption
Uncovers that neural operator digital twins are acutely vulnerable to sparse adversarial perturbations on boundary conditions that bypass standard anomaly detection.
New Capability
Leverages unstructured clinical notes during training to boost the performance of models that are deployed using only structured EHR data.
Scaling Insight
Robotic bipedal mass scales with the square of leg length rather than the cubic scaling found in biological systems.