Machine Learning

147 papers

ActTail achieves 80% activation sparsity in LLMs with significantly lower perplexity degradation than uniform methods by using Heavy-Tailed Self-Regularization theory.

Efficiency Breakthrough arxiv | Mar 16

This paper proposes a method to align and personalize LLMs directly from raw user interactions using self-distillation, bypassing the need for explicit human labels or RLHF.

Paradigm Shift arxiv | Mar 16

The researchers demonstrate that prompt injection is caused by 'role confusion' in the latent space, where models assign authority based on the style of writing rather than the source of the text.

Breaks Assumption arxiv | Mar 16

This theoretical work refutes the 'Garbage In, Garbage Out' mantra for modern ML, proving that high-dimensional model capacity can asymptotically overcome predictor error and structural uncertainty.

Breaks Assumption arxiv | Mar 16

Introduces the Budget-Sensitive Discovery Score (BSDS), a formally verified metric machine-checked in Lean 4 for evaluating AI-guided scientific candidate selection.

Paradigm Shift arxiv | Mar 16

ReBalance is a training-free framework that dynamically modulates 'thinking' length in reasoning models to prune redundancy during overthinking and promote exploration during underthinking.

Efficiency Breakthrough arxiv | Mar 16

This study proves that reasoning traces (Chain-of-Thought) causally shape model behavior and generalization, even when the final answer is held constant.

Breaks Assumption arxiv | Mar 16

SpectralGuard identifies a 'memory collapse' vulnerability in State Space Models (like Mamba) where adversarial inputs can drive the transition operator's spectral radius to zero.

Breaks Assumption arxiv | Mar 16

Surg-R1 is a specialized surgical reasoning model released alongside the largest surgical Chain-of-Thought dataset (320,000 pairs).

Open Release arxiv | Mar 16

This paper establishes a systematic protocol for 'stitching' heterogeneous Vision Foundation Models (e.g., CLIP and DINOv2) to share early layers while retaining specialized capabilities.

Paradigm Shift arxiv | Mar 16

Achieves 100x speedup in robotic action generation by distilling iterative flow/diffusion models into a one-step policy without a pre-trained teacher.

Efficiency Breakthrough arxiv | Mar 16

Introduces Modal Logical Neural Networks (MLNNs) as a differentiable logic layer that bridges deep learning with symbolic Kripke semantics for regulated AI.

Paradigm Shift arxiv | Mar 16

Demonstrates a robot that improves its own locomotion by identifying and physically 'self-destructing' redundant or inhibiting limbs during its lifetime.

Paradigm Shift arxiv | Mar 16

Enables training-free infinite video generation (hour-scale) by using evolving memory tokens to solve identity drift and motion stagnation.

New Capability arxiv | Mar 16

Reveals that standard global correlation metrics for LLM judges fail to predict success in 'best-of-n' selection tasks due to within-prompt signal loss.

Breaks Assumption arxiv | Mar 16

Reduces Chain-of-Thought (CoT) compute costs by 14-55% by learning the optimal 'early-exit' points for Large Reasoning Models.

Efficiency Breakthrough arxiv | Mar 16

Discovers that as LLMs scale, their complex non-linear depth dynamics converge into accurate, low-order linear surrogates.

Scaling Insight arxiv | Mar 16

Derives an exact, unbiased policy gradient for Reinforcement Learning on Diffusion LLMs, bypassing the need for sequence-level likelihood approximations.

Paradigm Shift arxiv | Mar 16

Shows that tool-augmented agents suffer from 'recommendation drift' where they provide unsafe advice under tool corruption while maintaining high ranking scores.

Breaks Assumption arxiv | Mar 16

Accelerates Diffusion Transformers (DiTs) by 2x using a training-free framework that selectively reduces computation in non-aesthetic image regions.

Efficiency Breakthrough arxiv | Mar 16

Challenges the standard practice of deep PPO training by proving that consensus aggregation of 'wider' parallel runs is 8x more sample efficient than multiple epochs.

Breaks Assumption arxiv | Mar 16

Releases Feynman, an agentic pipeline and 100k-sample dataset for generating high-quality, knowledge-rich diagrams with grounded captions.

Open Release arxiv | Mar 16

Introduces the largest-ever multi-modal CAD dataset with 10 million annotations for 1 million models to enable geometric deep learning on BRep data.

Open Release arxiv | Mar 16

Unlocks Maximum Entropy RL for high-dimensional humanoid control, matching or doubling the performance of dominant deterministic baselines.

New Capability arxiv | Mar 16

Introduces a training-free framework that allows LLM agents to dynamically scale their reasoning depth based on a pre-defined token/tool budget.

Efficiency Breakthrough arxiv | Mar 16

Achieves a 98x speedup in LLM routing on AMD hardware using Flash Attention and prompt compression, enabling high-context classification without a dedicated GPU.

Efficiency Breakthrough arxiv | Mar 16

Proposes modeling the world in the feature space of frozen geometry foundation models instead of pixels, achieving 5x faster depth forecasting.

Paradigm Shift arxiv | Mar 16

A retrosynthesis model that explicitly learns strategic bond-disconnection reasoning via reinforcement learning with a round-trip accuracy reward.

New Capability arxiv | Mar 16

Longitudinal evidence reveals that successive ChatGPT versions are converging in output diversity, suggesting potential model collapse from synthetic data saturation.

Scaling Insight arxiv | Mar 16

A new system enables humanoid robots to play competitive tennis rallies with humans by learning from imperfect, fragmented motion data.

New Capability arxiv | Mar 16

Adversarial test case evolution improves code reinforcement learning by creating harder, more discriminative verification signals that drive better model performance.

Scaling Insight arxiv | Mar 16

Modality-level disaggregation enables cost-optimal MLLM serving across heterogeneous GPUs over commodity PCIe, bypassing the need for expensive NVLink interconnects.

Efficiency Breakthrough arxiv | Mar 16

Probing of Vision-Language-Action (VLA) models reveals that the action decoder largely ignores the reasoning logic in Chain-of-Thought, relying almost exclusively on object names.

Breaks Assumption arxiv | Mar 16

SciDesignBench provides a massive simulator-grounded environment for scientific inverse design, revealing that current LLMs struggle significantly with iterative refinement.

New Capability arxiv | Mar 16

A hardware-algorithm co-design for Spiking Neural Networks achieves up to 69x energy efficiency gains using an SRAM-based Compute-in-Memory accelerator.

Efficiency Breakthrough arxiv | Mar 16

The TaoBench benchmark proves that state-of-the-art math LLMs fail on equivalent logic problems when presented outside of the standard 'MathLib' framework.

Breaks Assumption arxiv | Mar 16

A self-supervised robotic system detects novel objects by training bespoke detectors on-the-fly from human video demonstrations, bypassing language-based prompts.

New Capability arxiv | Mar 16

AIM enables post-training modulation of large models to change utility levels or focus features without any retraining or additional data.

New Capability arxiv | Mar 16

Achieves 4x visual token compression and 80% lower training cost while unifying multimodal comprehension and generation.

Efficiency Breakthrough arxiv | Mar 16

First training-free method for debiasing reward models using Sparse Autoencoder (SAE) interventions.

New Capability arxiv | Mar 16

Breaks the long-standing accuracy-robustness trade-off in VLMs by localizing adversarial robustness to shallow layers.

Breaks Assumption arxiv | Mar 16

A flow-based navigation policy that achieves zero-shot sim-to-real transfer across wheeled, quadrupedal, and humanoid platforms.

New Capability arxiv | Mar 16

A small-scale molecular reasoning model that outperforms ultra-large foundation models via structured chain-of-thought and RL.

Paradigm Shift arxiv | Mar 16

Adaptive VLM Routing reduces inference costs for Computer Use Agents by up to 78% with negligible accuracy loss.

Efficiency Breakthrough arxiv | Mar 16

Distills a 2B Vision-Language Retriever into a 70M text-only encoder for visual document retrieval with 50x lower latency.

Efficiency Breakthrough arxiv | Mar 16

Reveals that 'reasoning' gains in fine-tuned LLMs may be artifacts of task familiarity rather than improved capability.

Breaks Assumption arxiv | Mar 16

MotionAnymesh automatically transforms static 3D meshes into simulation-ready, articulated digital twins for robotics using vision-language models grounded in physical priors.

New Capability arxiv | Mar 16

ThinkStream introduces a 'Watch-Think-Speak' paradigm for video reasoning that allows models to incrementally update understanding and decide when to respond in real-time.

Paradigm Shift arxiv | Mar 16

This paper presents an exact federated unlearning protocol for foundation models that is pointwise identical to centralized retraining but uses fixed-size messages.

Breaks Assumption arxiv | Mar 16

CleanSight provides a training-free, test-time defense for backdoored vision-language models by detecting and pruning 'attention stealing' visual tokens.

Efficiency Breakthrough arxiv | Mar 16

This study proves that even with a 'perfect' noise transition matrix, statistically consistent noise-correction methods still suffer from performance collapse.

Breaks Assumption arxiv | Mar 16

Structured distillation for personalized agent memory achieves an 11x reduction in token count while preserving 96% of the retrieval quality of verbatim history.

Efficiency Breakthrough arxiv | Mar 16

Multimodal OCR (MOCR) treats charts, diagrams, and tables as code-level targets (e.g., TikZ, SVG) rather than just cropping them as pixels.

New Capability arxiv | Mar 16

A cross-dataset study reveals that modern general-purpose vision models (GP-VMs) outperform specialized medical architectures in 2D medical image segmentation.

Breaks Assumption arxiv | Mar 16

Connects DDIM reverse chains to fractal geometry, providing a mathematical explanation for why diffusion models switch from global context to local detail.

Paradigm Shift arxiv | Mar 16

Reveals that linearized attention never converges to the NTK limit in practice, explaining its unique 'influence malleability' compared to standard networks.

Breaks Assumption arxiv | Mar 16

Induces pretrained video models to perform SOTA image restoration using less than 2% of the training data required by specialized architectures.

Efficiency Breakthrough arxiv | Mar 16

Achieves 'zero-hyperparameter' circuit analysis by using a foundation model to perform in-context regression, bypassing hours of manual tuning.

Efficiency Breakthrough arxiv | Mar 16

Proposes Causal Process Reward (CPR) to fix 'cherry-picking' in MLLM reasoning by coupling answer correctness with step-level logical alignment.

Paradigm Shift arxiv | Mar 16

Introduces Bilateral Context Conditioning to DeepSeek's GRPO, allowing models to cross-reference successful and failed reasoning traces during optimization.

Efficiency Breakthrough arxiv | Mar 16

Enables RMSNorm to reuse MXFP8 block scales, reducing the reduction operation size by 32x with a 2.4x kernel speedup.

Efficiency Breakthrough arxiv | Mar 16

Finds that privacy vulnerability and utility are both concentrated in a tiny fraction of 'critical weights' based on their location rather than value.

Breaks Assumption arxiv | Mar 16

STEVO-Bench reveals that current 'video world models' fail to simulate physical processes when the camera looks away or lights go out.

Breaks Assumption arxiv | Mar 16

Optimizes diffusion models via Direct Preference Optimization (DPO) to generate human motion that is inherently executable by real humanoid robots.

New Capability arxiv | Mar 16

Reimagines 3D molecules as continuous vector fields rather than discrete graphs, decoupling structure learning from atom types.

Paradigm Shift arxiv | Mar 16

Proves the existence of a 'distributional simplicity bias' in diffusion models, where low-order statistics are learned linearly while high-order correlations require cubic sample complexity.

Scaling Insight arxiv | Mar 16

OpenSanctions Pairs releases a massive benchmark for entity matching, proving that local LLMs can now match production rule-based systems in high-stakes compliance tasks.

Open Release arxiv | Mar 13

Speculative Decoding Scaling Laws (SDSL) provides a theoretical framework to predict optimal throughput hyperparameters for LLM inference systems before pre-training.

Scaling Insight arxiv | Mar 13

This paper introduces a graph tokenization framework that allows standard Transformers like BERT to beat specialized Graph Neural Networks without any architectural changes.

Paradigm Shift arxiv | Mar 13

The first open recipe for training embodied intelligence at the 1,000-GPU scale, achieving a 40x speedup in training cycles for GR00T models.

Efficiency Breakthrough arxiv | Mar 13

Routing signatures reveal that MoE experts are highly task-specific, allowing a simple linear classifier to identify task categories with 92.5% accuracy based only on routing patterns.

Breaks Assumption arxiv | Mar 13

A new method for training axis-aligned decision trees using gradient descent and backpropagation, allowing trees to be integrated into end-to-end neural networks.

New Capability arxiv | Mar 13

REOPOLD achieves 10x better sample efficiency in reasoning distillation, enabling 7B models to match 32B teachers with significantly less training data.

Efficiency Breakthrough arxiv | Mar 13

PACED introduces a weight kernel that focuses distillation on the 'Zone of Proximal Development,' where the student's gradient signal-to-noise ratio is highest.

Efficiency Breakthrough arxiv | Mar 13

Continual Representation Learning (CoRe) moves PEFT from weight-level updates to representation-space interventions, solving catastrophic forgetting in dynamic environments.

Paradigm Shift arxiv | Mar 13

Cyber-attack capabilities of AI models scale log-linearly with inference-time compute, with no plateau in sight.

Scaling Insight arxiv | Mar 13

SoLA introduces the first reversible model editing framework that allows precise revocation of specific knowledge updates.

New Capability arxiv | Mar 13

LLM-based user simulators create an 'easy mode' for agents that fails to capture real human frustration, ambiguity, and feedback nuances.

Breaks Assumption arxiv | Mar 13

Machine unlearning in LLMs is often a 'mirage' that can be bypassed using simple multi-hop reasoning or entity aliasing.

Breaks Assumption arxiv | Mar 13

InstantHDR achieves high-quality 3D HDR reconstruction 700x faster than current optimization-based methods.

Efficiency Breakthrough arxiv | Mar 13

Theoretical analysis proves that Langevin dynamics is fundamentally non-robust to score function errors, justifying the shift to Diffusion Models.

Paradigm Shift arxiv | Mar 13

HAPO resolves the advantage collapse problem in sparse-reward RL for reasoning models using a Thompson-sampled hindsight mechanism.

Paradigm Shift arxiv | Mar 13

Adversarial prompt injection causes jailbreak success rates to transition from polynomial to exponential scaling with inference-time samples.

Scaling Insight arxiv | Mar 13

RewardHackingAgents establishes a benchmark for evaluating whether ML-engineering agents are actually solving tasks or just tampering with the evaluation code.

New Capability arxiv | Mar 13

TimeSqueeze achieves 20x faster convergence and 8x higher data efficiency for time-series foundation models by using dynamic, content-aware patching.

Efficiency Breakthrough arxiv | Mar 13

MirrorDrift demonstrates a successful SLAM-targeted attack on production-grade 'secure' LiDARs using simple actuated mirrors rather than complex signal injection.

Breaks Assumption arxiv | Mar 13

An evaluation of 17 LLMs reveals a 'conversation tax' where multi-turn interactions consistently degrade diagnostic reasoning compared to single-shot prompts.

Breaks Assumption arxiv | Mar 13

This paper introduces Finsler geometry to manifold learning, allowing for the capture of asymmetric data relationships like density hierarchies that Riemannian methods ignore.

Paradigm Shift arxiv | Mar 13

Re-evaluating high-profile medical AI safety claims reveals that reported triage failures were artifacts of the 'exam-style' evaluation format rather than model incapacity.

Breaks Assumption arxiv | Mar 13

DART enables real-time multi-class detection for open-vocabulary models like SAM3, achieving up to 25x speedup without any weight modifications.

Efficiency Breakthrough arxiv | Mar 13

Softmax normalization mathematically mandates the creation of attention sinks to serve as 'null states' when models need to ignore input.

Breaks Assumption arxiv | Mar 13

LongFlow provides an 11x throughput boost for reasoning models by specifically optimizing KV cache for long-output (vs long-input) scenarios.

Efficiency Breakthrough arxiv | Mar 13

Manifold-Optimal Guidance reformulates Classifier-Free Guidance (CFG) as a Riemannian control problem, eliminating the artifacts and saturation typical of high guidance scales.

Paradigm Shift arxiv | Mar 13

Tiny Aya is a 3.35B parameter multilingual model that achieves state-of-the-art results across 70 languages, challenging the need for massive scale in global AI.

Open Release arxiv | Mar 13

An empirical study reveals that models under 7B parameters have a fundamental utilization bottleneck that prevents them from using retrieved context effectively.

Breaks Assumption arxiv | Mar 13

Mobile-GS achieves real-time Gaussian Splatting on mobile devices by replacing the sorting-based alpha-blending bottleneck with depth-aware order-independent rendering.

Efficiency Breakthrough arxiv | Mar 13

Expert Threshold Routing (ET) replaces standard top-k token-choice with an independent thresholding mechanism, achieving 1.6x faster training convergence.

Paradigm Shift arxiv | Mar 13

RoboClaw introduces 'Entangled Action Pairs' to allow robots to autonomously collect data by learning to reset their own environment.

New Capability arxiv | Mar 13

The discovery of 'Helicoid Dynamics' identifies a critical safety failure where frontier LLMs accurately name their reasoning errors but are structurally unable to stop repeating them.

Breaks Assumption arxiv | Mar 13

Achieves 99.5% performance on Needle-In-A-Haystack benchmarks while retaining only 3% of the KV cache budget.

Efficiency Breakthrough arxiv | Mar 13

Applying Rotary Positional Embeddings (RoPE) to only 10% of hidden dimensions is sufficient for full model convergence, enabling 10x memory savings in positional caches.

Scaling Insight arxiv | Mar 13

Distills high-fidelity joint audio-visual generation into a real-time streaming model capable of 25 FPS on a single GPU.

Efficiency Breakthrough arxiv | Mar 13

Shows that simple sequential fine-tuning with LoRA outperforms complex algorithms for continual reinforcement learning in VLA models.

Breaks Assumption arxiv | Mar 13

Proves that policy gradient algorithms naturally collapse entropy and provides a mathematical fix to preserve exploration and diversity.

Breaks Assumption arxiv | Mar 13

Achieves hour-scale real-time human animation by solving the unbounded memory growth and inconsistent noise states in autoregressive diffusion.

Efficiency Breakthrough arxiv | Mar 13

Introduces the Compression-Consistency Principle, arguing that LLMs prefer truth only when false alternatives are structurally harder to compress.

Paradigm Shift arxiv | Mar 13

Replaces unstructured LLM debates with 'Deliberative Collective Intelligence,' producing formal decision packets with minority reports and accountability trails.

New Capability arxiv | Mar 13

Provides a learning-theoretic characterization of model collapse, proving exactly when replaying past outputs destroys model diversity.

Scaling Insight arxiv | Mar 13

Enables agents to autonomously discover the group structure of their environments to learn disentangled representations without human priors.

Paradigm Shift arxiv | Mar 13

Unifies leading membership inference attacks into a single framework and uses Bayesian variance inference to enable privacy auditing with 10x less compute.

Efficiency Breakthrough arxiv | Mar 13

Automates the entire robotic data generation loop, including a self-resetting mechanism that restores unstructured workspaces without human intervention.

New Capability arxiv | Mar 13

Bridges the gap between parametric CAD and direct B-Rep synthesis using LLMs and primitive grounding.

New Capability arxiv | Mar 13

Eliminates lookahead bias in financial backtesting through a series of yearly-partitioned pretrained LLMs.

Paradigm Shift arxiv | Mar 13

Recovers hidden ODE parameters from sparse data with a 487x speedup over gradient-based methods.

Efficiency Breakthrough arxiv | Mar 13

Eliminates the 2.5x latency penalty of dynamic adapters in LLMs via pre-gating and fused CUDA kernels.

Efficiency Breakthrough arxiv | Mar 13

Enables concurrent perception and reasoning for continuous video streams in Multimodal Large Language Models.

New Capability arxiv | Mar 13

Fits promptable visual segmentation (SAM) into a 1.3M parameter model for real-time in-sensor execution.

Efficiency Breakthrough arxiv | Mar 13

First framework for interpreting 4D molecular trajectories into natural language explanations.

New Capability arxiv | Mar 13

Exhaustive circuit mapping of a biological foundation model reveals massive redundancy and annotation bias.

Scaling Insight arxiv | Mar 13

Solves GNN over-squashing by using global effective resistance to identify and rewire structural bottlenecks.

Paradigm Shift arxiv | Mar 13

Cross-domain sensor model that handles variable signal lengths and resolutions without retraining.

New Capability arxiv | Mar 13

Achieves high-fidelity one-step (1 NFE) 3D robotic manipulation using training-time drifting fields.

Efficiency Breakthrough arxiv | Mar 13

Introduces the first billion-scale SAR vision foundation model and a massive unified benchmark for all-weather geospatial semantic segmentation.

Open Release arxiv | Mar 13

Demonstrates that simply using XML tags during translation outperforms complex pipelines for cross-lingual label projection while actually improving translation quality.

Breaks Assumption arxiv | Mar 13

Achieves up to 14.4x higher decoding throughput in long-context LLMs via a training-free framework that reuses sparse memory at semantic boundaries.

Efficiency Breakthrough arxiv | Mar 13

Enables multimodal agents to continually improve from experience and skills without any parameter updates through a dual-stream visual grounding framework.

New Capability arxiv | Mar 13

A 3D vision-language pipeline that grounds medical diagnosis in longitudinal brain MRI via regional volumetric assessments to eliminate VLM hallucinations.

New Capability arxiv | Mar 13

Integrates Neural ODEs with NeRFs to enable continuous-time scene dynamics that can extrapolate far beyond the original training sequence.

New Capability arxiv | Mar 13

Proposes a unified image tokenizer that reconciles the conflicting requirements of visual understanding and generation using a residual evolution process.

Paradigm Shift arxiv | Mar 13

Identifies and solves the 'information self-locking' failure mode where RL-trained agents stop asking informative questions in active reasoning tasks.

Breaks Assumption arxiv | Mar 13

A specialized distributed serving system for 'Any-to-Any' multimodal models that achieves 5.79x lower tail latency via component disaggregation.

Efficiency Breakthrough arxiv | Mar 13

Shows that LLM self-correction fails primarily due to 'session context' and can be significantly improved by moving the review to a fresh, independent session.

Breaks Assumption arxiv | Mar 13

Automates the generation of GPU-parallelized RL environments from text/code specifications, achieving up to 22,000x speedups for less than $10.

Efficiency Breakthrough arxiv | Mar 13

Establishes scaling laws for sampling compute in LLM Reinforcement Learning, providing a playbook for optimal parallel rollout and batch allocation.

Scaling Insight arxiv | Mar 13

Selects high-quality synthetic code data using 'Reverse Mutual Information' to achieve full-dataset performance with 75% less data.

Efficiency Breakthrough arxiv | Mar 13

Accelerates sparse attention by 75% by reusing lightning indexer decisions across layers, tackling the hidden bottleneck in production-grade LLMs.

Efficiency Breakthrough arxiv | Mar 13

Discovers that task-specific experts are so dense around pretrained weights that random parameter perturbations can compete with complex RL methods like PPO.

Breaks Assumption arxiv | Mar 13

Reveals that 'Reasoning LLMs-as-Judges' can lead to policies that generate highly effective adversarial outputs to deceive other judges and inflate benchmarks.

Breaks Assumption arxiv | Mar 13

Introduces a feature-matching objective for LLM fine-tuning that targets sequence-level statistics without requiring reward models or ground-truth verifiers.

Paradigm Shift arxiv | Mar 13

Integrates Chain-of-Thought reasoning directly into the Diffusion Transformer denoising process to solve complex spatial and logical tasks.

New Capability arxiv | Mar 13

Reduces visual tokens by up to 100x using an autoregressive gazing module, enabling 19x faster 4K/1000-frame video understanding.

Efficiency Breakthrough arxiv | Mar 13

Uncovers an emergent Hue-Saturation-Lightness (HSL) subspace in FLUX.1's VAE latent space, allowing for precise, training-free color control.

Breaks Assumption arxiv | Mar 13

Enables VideoLLMs to perform complex logical reasoning simultaneously with video playback without incurring the latency of standard test-time scaling.

New Capability arxiv | Mar 13

An open foundation model for humanoid robots that achieves high performance using only 30 hours of real-world robot data by pre-training on egocentric human videos.

Open Release arxiv | Mar 13

A unified streaming visual backbone that performs perception, 3D reconstruction, and robotic action simultaneously from a continuous video stream.

New Capability arxiv | Mar 13

Introduces adaptive video tokenization that allocates tokens based on scene complexity, reducing token usage by 24% while improving reconstruction quality.

Efficiency Breakthrough arxiv | Mar 13

Demonstrates that the stochasticity in standard regularized model training (like cross-validation) can serve as a 'free' and effective exploration strategy for contextual bandits.

Paradigm Shift arxiv | Mar 13