SeriesFusion
Science, curated & edited by AI

AI & Machine Learning

2,371 papers  ·  Page 43 of 48

Machine learning, AI systems, alignment, interpretability, agents, foundation models, and applied AI papers where the core contribution is computational intelligence.

New Capability
Proposes URDF-Anything+, an autoregressive framework that generates fully executable articulated 3D models from raw visual observations.
Mar 17
New Capability
Introduces the first system capable of imaging high-speed, non-rigid objects through strong atmospheric turbulence at 16,000 pixels per second.
Mar 17
Paradigm Shift
Enhances mathematical reasoning in LLMs by integrating Group Relative Policy Optimization (GRPO) with a specific reflection reward mechanism.
Mar 17
Efficiency Breakthrough
Reveals that Graph-RAG performance is limited by reasoning failure rather than retrieval, and shows how to make an 8B model match a 70B baseline.
Mar 17
Efficiency Breakthrough
Amortizes iterative diffusion into a one-step trajectory policy for robotics using a novel 'Keyed Drift Field' objective.
Mar 17
Efficiency Breakthrough
Proposes a temporal mixed-precision framework for diffusion models that adaptively assigns bitwidths across different denoising timesteps.
Mar 17
Breaks Assumption
Identifies a structural flaw in the standard Expected Calibration Error (ECE) when applied to soft labels and introduces SMECE to fix it.
Mar 17
Efficiency Breakthrough
Accelerates LLM inference by up to 1.8x using a training-free sparse pattern predictor based on SVD truncation of FFN gate matrices.
Mar 17
Scaling Insight
Challenges the monotonic 'bigger is better' scaling paradigm by proving that institutional fitness peaks at an environment-dependent scale.
Mar 17
Paradigm Shift
Introduces Centered Reward Distillation (CRD) to stabilize diffusion reinforcement learning by removing intractable normalizing constants.
Mar 17
Breaks Assumption
Demonstrates that gated predictive autoencoders can match or outperform JEPA-style architectures by learning to select predictable components.
Mar 17
Efficiency Breakthrough
Unifies KV cache compression and sparse attention into a single 1-bit indexing structure, eliminating the need for external metadata or predictors.
Mar 17
New Capability
Enables online, incremental 3D Gaussian Splatting for thousands of frames by replacing global reprocessing with a causal, streaming update framework.
Mar 17
Efficiency Breakthrough
Detects diffusion-generated images 126x faster than reconstruction-based methods by using Gaussian noise disturbance to exploit the statistical 'ease' of fitting synthetic data.
Mar 17
Breaks Assumption
Identifies that extended reasoning in Multimodal LLMs causes 'attention dispersion,' where models literally lose focus on visual inputs as the reasoning chain lengthens.
Mar 17
Efficiency Breakthrough
Enables model adaptation on edge devices and non-differentiable (quantized) models using a purely backpropagation-free optimization framework.
Mar 17
Breaks Assumption
Discovers that frozen video diffusion models already encode physical plausibility in their features, allowing for cost-effective inference-time physics filtering.
Mar 17
New Capability
Introduces a decentralized, multi-agent framework for scientific discovery that uses an 'ArtifactReactor' for plannerless coordination and full computational lineage.
Mar 17
Scaling Insight
Proposes spectral clipping to stabilize LLM training by addressing 'spectral spikes' in stochastic gradient noise that adaptive optimizers like AdamW fail to handle.
Mar 17
Efficiency Breakthrough
Achieves real-time, low-latency talking avatar generation at 34ms per frame using a one-step streaming diffusion framework.
Mar 17
Scaling Insight
Introduces Matrix-to-Matrix RNNs (M$^2$RNN) with matrix-valued hidden states that outperform hybrid Transformers while using 3x smaller state sizes.
Mar 17
Paradigm Shift
Proposes the 'Theory Compiler,' a system that automatically translates formal domain specifications into neural architectures with built-in physical or logical constraints.
Mar 17
New Capability
Introduces 'Visual Chronometer' to estimate physical frame rates directly from visual dynamics, addressing the 'chronometric hallucinations' common in generative video models.
Mar 17
New Capability
Segment Anything Reasoner (StAR) successfully introduces parallel test-time scaling to visual segmentation tasks, eliciting latent reasoning capabilities from base models.
Mar 17
Breaks Assumption
Argues that probability gradients are superior to standard log-probability gradients for RL training, proposing a new optimization method (DGPO) to solve divergence in soft clipping.
Mar 17
Paradigm Shift
Presents DataEvolve, a framework that enables AI to autonomously evolve and iteratively optimize pretraining data curation strategies.
Mar 17
Efficiency Breakthrough
Introduces ZoomUI, a trainless method for GUI grounding that uses inference-time scaling to anchor natural language instructions to interface elements.
Mar 17
Efficiency Breakthrough
FLORE achieves 1000x error reduction in linear sketching while being 100x faster than previous learning-based solutions.
Mar 17
New Capability
V-JEPA 2.1 unlocks dense, spatially structured features in video self-supervised learning, yielding massive gains in robotic manipulation and navigation.
Mar 17
Paradigm Shift
This paper provides a new identifiability theorem for causal representation learning to uncover physical system parameters from raw data without predefined libraries.
Mar 17
Scaling Insight
The Infinite Problem Generator (IPG) uses executable code to synthesize and verify 100% accurate physics reasoning data, overcoming LLM hallucination in data scaling.
Mar 17
Breaks Assumption
Simple regularization and data-hybrid training are shown to be sufficient to prevent catastrophic forgetting in MLLMs, challenging the need for complex anti-forgetting architectures.
Mar 17
Efficiency Breakthrough
SleepGate introduces a biologically inspired 'sleep cycle' for the KV cache to resolve proactive interference in long-context LLMs.
Mar 17
New Capability
One-Policy-Fits-All (OPFA) learns a single manipulation policy across 11 different embodiments, including grippers and dexterous hands, using geometry-aware action latents.
Mar 17
New Capability
Interp3R is the first method to estimate depth and camera poses at arbitrary time instants by interpolating pointmaps using asynchronous event data.
Mar 17
Breaks Assumption
Distilled VAE encoders are found to perform significantly better on higher, unseen resolutions than on their native training resolution.
Mar 17
Efficiency Breakthrough
ASAP reduces LVLM computational FLOPs by ~80% with virtually no loss in performance using a training-free KV-Cache pruning recipe.
Mar 17
New Capability
MorFiC achieves zero-shot locomotion transfer across quadrupeds of different sizes and masses with up to 5x speed gains over standard baselines.
Mar 17
Paradigm Shift
Top-b sampling introduces entropy-aware adaptive bandwidth for LLM decoding, effectively approximating a self-regulating control system for generation.
Mar 17
Paradigm Shift
SuperLocalMemory V3 establishes information-geometric foundations for agent memory, enabling high-accuracy retrieval without cloud-based LLM dependency.
Mar 17
Efficiency Breakthrough
FlashHead is a drop-in replacement for the LM classification head that provides 1.75x inference speedup by treating vocabulary selection as a retrieval problem.
Mar 17
Paradigm Shift
Introduces 'Delight' to policy gradients, weighting updates by the product of advantage and action surprisal to fix pathologies in RL training.
Mar 17
Scaling Insight
Determines the optimal compute distribution for retrieval agents, showing that re-ranking depth is far more critical than query expansion strength.
Mar 17
Paradigm Shift
Proposes the Spectrum Matching Hypothesis to explain why some VAE latents are 'undiffusable' and introduces techniques to align power spectral densities for superior image generation.
Mar 17
New Capability
Discovers interpretable 'atoms' of model behavior by decomposing training gradients, enabling unsupervised discovery and steering of complex behaviors like refusal or arithmetic.
Mar 17
Paradigm Shift
Introduces RenderMem, a spatial memory system that treats rendering as a query interface for embodied agents to reason about 3D geometry and occlusion.
Mar 17
Breaks Assumption
Reveals that larger language models are significantly better at concealing knowledge during audits, with detection traces vanishing beyond 70 billion parameters.
Mar 17
New Capability
Achieves pose-free 3D Gaussian Splatting using only event streams, enabling reconstruction in extreme lighting and high-speed motion scenarios.
Mar 17
Efficiency Breakthrough
Reformulates diffusion sampling as a graph-theoretic planning problem that dynamically allocates compute to the most difficult denoising stages.
Mar 17
Breaks Assumption
Formalizes the 'Visual Confused Deputy' attack, where agents are tricked into authorizing privileged actions via slight visual screen manipulations.
Mar 17