EFFICIENCY_BREAKTHROUGH

375 papers · Page 1 of 4

The first open recipe for training embodied intelligence at the 1,000-GPU scale, achieving a 40x speedup in training cycles for GR00T models.

AI & ML arxiv | Mar 13

REOPOLD achieves 10x better sample efficiency in reasoning distillation, enabling 7B models to match 32B teachers with significantly less training data.

AI & ML arxiv | Mar 13

PACED introduces a weight kernel that focuses distillation on the 'Zone of Proximal Development,' where the student's gradient signal-to-noise ratio is highest.

AI & ML arxiv | Mar 13

InstantHDR achieves high-quality 3D HDR reconstruction 700x faster than current optimization-based methods.

AI & ML arxiv | Mar 13

TimeSqueeze achieves 20x faster convergence and 8x higher data efficiency for time-series foundation models by using dynamic, content-aware patching.

AI & ML arxiv | Mar 13

DART enables real-time multi-class detection for open-vocabulary models like SAM3, achieving up to 25x speedup without any weight modifications.

AI & ML arxiv | Mar 13

LongFlow provides an 11x throughput boost for reasoning models by specifically optimizing KV cache for long-output (vs long-input) scenarios.

AI & ML arxiv | Mar 13

Mobile-GS achieves real-time Gaussian Splatting on mobile devices by replacing the sorting-based alpha-blending bottleneck with depth-aware order-independent rendering.

AI & ML arxiv | Mar 13

Achieves 99.5% performance on Needle-In-A-Haystack benchmarks while retaining only 3% of the KV cache budget.

AI & ML arxiv | Mar 13

Distills high-fidelity joint audio-visual generation into a real-time streaming model capable of 25 FPS on a single GPU.

AI & ML arxiv | Mar 13

Achieves hour-scale real-time human animation by solving the unbounded memory growth and inconsistent noise states in autoregressive diffusion.

AI & ML arxiv | Mar 13

Unifies leading membership inference attacks into a single framework and uses Bayesian variance inference to enable privacy auditing with 10x less compute.

AI & ML arxiv | Mar 13

Recovers hidden ODE parameters from sparse data with a 487x speedup over gradient-based methods.

AI & ML arxiv | Mar 13

Eliminates the 2.5x latency penalty of dynamic adapters in LLMs via pre-gating and fused CUDA kernels.

AI & ML arxiv | Mar 13

Fits promptable visual segmentation (SAM) into a 1.3M parameter model for real-time in-sensor execution.

AI & ML arxiv | Mar 13

Achieves high-fidelity one-step (1 NFE) 3D robotic manipulation using training-time drifting fields.

AI & ML arxiv | Mar 13

Achieves up to 14.4x higher decoding throughput in long-context LLMs via a training-free framework that reuses sparse memory at semantic boundaries.

AI & ML arxiv | Mar 13

A specialized distributed serving system for 'Any-to-Any' multimodal models that achieves 5.79x lower tail latency via component disaggregation.

AI & ML arxiv | Mar 13

Automates the generation of GPU-parallelized RL environments from text/code specifications, achieving up to 22,000x speedups for less than $10.

AI & ML arxiv | Mar 13

Selects high-quality synthetic code data using 'Reverse Mutual Information' to achieve full-dataset performance with 75% less data.

AI & ML arxiv | Mar 13

Accelerates sparse attention by 75% by reusing lightning indexer decisions across layers, tackling the hidden bottleneck in production-grade LLMs.

AI & ML arxiv | Mar 13

Reduces visual tokens by up to 100x using an autoregressive gazing module, enabling 19x faster 4K/1000-frame video understanding.

AI & ML arxiv | Mar 13

Introduces adaptive video tokenization that allocates tokens based on scene complexity, reducing token usage by 24% while improving reconstruction quality.

AI & ML arxiv | Mar 13

ActTail achieves 80% activation sparsity in LLMs with significantly lower perplexity degradation than uniform methods by using Heavy-Tailed Self-Regularization theory.

AI & ML arxiv | Mar 16

ReBalance is a training-free framework that dynamically modulates 'thinking' length in reasoning models to prune redundancy during overthinking and promote exploration during underthinking.

AI & ML arxiv | Mar 16

Achieves 100x speedup in robotic action generation by distilling iterative flow/diffusion models into a one-step policy without a pre-trained teacher.

AI & ML arxiv | Mar 16

Reduces Chain-of-Thought (CoT) compute costs by 14-55% by learning the optimal 'early-exit' points for Large Reasoning Models.

AI & ML arxiv | Mar 16

Accelerates Diffusion Transformers (DiTs) by 2x using a training-free framework that selectively reduces computation in non-aesthetic image regions.

AI & ML arxiv | Mar 16

Introduces a training-free framework that allows LLM agents to dynamically scale their reasoning depth based on a pre-defined token/tool budget.

AI & ML arxiv | Mar 16

Achieves a 98x speedup in LLM routing on AMD hardware using Flash Attention and prompt compression, enabling high-context classification without a dedicated GPU.

AI & ML arxiv | Mar 16

Modality-level disaggregation enables cost-optimal MLLM serving across heterogeneous GPUs over commodity PCIe, bypassing the need for expensive NVLink interconnects.

AI & ML arxiv | Mar 16

A hardware-algorithm co-design for Spiking Neural Networks achieves up to 69x energy efficiency gains using an SRAM-based Compute-in-Memory accelerator.

AI & ML arxiv | Mar 16

Achieves 4x visual token compression and 80% lower training cost while unifying multimodal comprehension and generation.

AI & ML arxiv | Mar 16

Adaptive VLM Routing reduces inference costs for Computer Use Agents by up to 78% with negligible accuracy loss.

AI & ML arxiv | Mar 16

Distills a 2B Vision-Language Retriever into a 70M text-only encoder for visual document retrieval with 50x lower latency.

AI & ML arxiv | Mar 16

CleanSight provides a training-free, test-time defense for backdoored vision-language models by detecting and pruning 'attention stealing' visual tokens.

AI & ML arxiv | Mar 16

Structured distillation for personalized agent memory achieves an 11x reduction in token count while preserving 96% of the retrieval quality of verbatim history.

AI & ML arxiv | Mar 16

Induces pretrained video models to perform SOTA image restoration using less than 2% of the training data required by specialized architectures.

AI & ML arxiv | Mar 16

Achieves 'zero-hyperparameter' circuit analysis by using a foundation model to perform in-context regression, bypassing hours of manual tuning.

AI & ML arxiv | Mar 16

Introduces Bilateral Context Conditioning to DeepSeek's GRPO, allowing models to cross-reference successful and failed reasoning traces during optimization.

AI & ML arxiv | Mar 16

Enables RMSNorm to reuse MXFP8 block scales, reducing the reduction operation size by 32x with a 2.4x kernel speedup.

AI & ML arxiv | Mar 16

Truncated-Reasoning Self-Distillation (TRSD) allows models to maintain accuracy even when their chain-of-thought traces are heavily shortened.

AI & ML arxiv | Mar 17

The ICaRus architecture allows multiple different models to share a single, frozen KV cache for the same prompt.

AI & ML arxiv | Mar 17

Using parallel associative scans achieves a 44x speedup in training continuous-time Spiking Neural Networks (SNNs).

AI & ML arxiv | Mar 17

RelayCaching eliminates redundant prefill computation in multi-agent systems by reusing the decoding-phase KV cache from previous agents.

AI & ML arxiv | Mar 17

Pretrained Transformers exhibit a pervasive inter-head linear structure where many attention heads can be reconstructed from a small set of peer heads.

AI & ML arxiv | Mar 17

FineRMoE extends MoE granularity to both intermediate and output dimensions, achieving a 136x increase in decoding throughput.

AI & ML arxiv | Mar 17

Distribution-Conditioned Diffusion Decoding enables high-fidelity image generation from pre-trained VLMs without expensive full-model retraining.

AI & ML arxiv | Mar 17

Qianfan-OCR introduces 'Layout-as-Thought,' enabling a 4B model to outperform 235B models on complex document parsing and layout analysis.

AI & ML arxiv | Mar 17

Achieves significant tool-selection accuracy gains in LLM semantic routers with zero added serving-time latency or cost.

AI & ML arxiv | Mar 17

A training-free acceleration method for diffusion language models that achieves a 4x speedup in image generation.

AI & ML arxiv | Mar 17

Implements bio-inspired 'mental-state dynamics' to achieve O(N) complexity in Vision Transformers.

AI & ML arxiv | Mar 17

Reduces the number of real-world robot rollouts needed for policy comparison by up to 70% using safe, anytime-valid inference.

AI & ML arxiv | Mar 17

Outperforms fine-tuned baselines in code optimization by using semantics-preserving transformations as a generative intermediate representation.

AI & ML arxiv | Mar 17

A 140M-parameter networking foundation model (PLUME) that outperforms frontier LLMs on protocol analysis by learning from native packet structures.

AI & ML arxiv | Mar 17

Replaces the quadratic cost of self-attention in Diffusion Transformers with a convection-diffusion PDE solved in the Fourier domain.

AI & ML arxiv | Mar 17

Implicit Maximum Likelihood Estimation (IMLE) achieves multimodal trajectory planning performance comparable to diffusion models while being 100x faster.

AI & ML arxiv | Mar 17

Greedy Information Projection (GIP) provides a fast, geometrically-principled method for selecting training data that balances quality and diversity, achieving full-data performance with a fraction of the examples.

AI & ML arxiv | Mar 17

Traditional Spiking Neural Network (SNN) sparsity is a performance 'illusion' on GPUs; temporal aggregation is required for actual 13x speedups.

AI & ML arxiv | Mar 17

Enables training of CNNs from scratch in true 4-bit precision on commodity CPUs with virtually no loss in accuracy.

AI & ML arxiv | Mar 17

Introduces the FLUX preprocessing pipeline, which reduces LLM training compute by 34% by maximizing high-quality token retention.

AI & ML arxiv | Mar 17

Reduces the RAM requirement for speech neuroprosthesis CTC decoding from 320 GB to 10 GB without sacrificing accuracy.

AI & ML arxiv | Mar 17

Reveals that Graph-RAG performance is limited by reasoning failure rather than retrieval, and shows how to make an 8B model match a 70B baseline.

AI & ML arxiv | Mar 17

Amortizes iterative diffusion into a one-step trajectory policy for robotics using a novel 'Keyed Drift Field' objective.

AI & ML arxiv | Mar 17

Proposes a temporal mixed-precision framework for diffusion models that adaptively assigns bitwidths across different denoising timesteps.

AI & ML arxiv | Mar 17

Accelerates LLM inference by up to 1.8x using a training-free sparse pattern predictor based on SVD truncation of FFN gate matrices.

AI & ML arxiv | Mar 17

Unifies KV cache compression and sparse attention into a single 1-bit indexing structure, eliminating the need for external metadata or predictors.

AI & ML arxiv | Mar 17

Detects diffusion-generated images 126x faster than reconstruction-based methods by using Gaussian noise disturbance to exploit the statistical 'ease' of fitting synthetic data.

AI & ML arxiv | Mar 17

Enables model adaptation on edge devices and non-differentiable (quantized) models using a purely backpropagation-free optimization framework.

AI & ML arxiv | Mar 17

Achieves real-time, low-latency talking avatar generation at 34ms per frame using a one-step streaming diffusion framework.

AI & ML arxiv | Mar 17

Introduces ZoomUI, a trainless method for GUI grounding that uses inference-time scaling to anchor natural language instructions to interface elements.

AI & ML arxiv | Mar 17

FLORE achieves 1000x error reduction in linear sketching while being 100x faster than previous learning-based solutions.

AI & ML arxiv | Mar 17

SleepGate introduces a biologically inspired 'sleep cycle' for the KV cache to resolve proactive interference in long-context LLMs.

AI & ML arxiv | Mar 17

ASAP reduces LVLM computational FLOPs by ~80% with virtually no loss in performance using a training-free KV-Cache pruning recipe.

AI & ML arxiv | Mar 17

FlashHead is a drop-in replacement for the LM classification head that provides 1.75x inference speedup by treating vocabulary selection as a retrieval problem.

AI & ML arxiv | Mar 17

Reformulates diffusion sampling as a graph-theoretic planning problem that dynamically allocates compute to the most difficult denoising stages.

AI & ML arxiv | Mar 17

Generates novel, structurally plausible protein sequences from small alignments using a training-free stochastic attention mechanism on a standard laptop.

AI & ML arxiv | Mar 17

Adaptive computation for multimodal LLMs drastically reduces compute waste on easy cases while focusing on hard ones.

AI & ML arxiv | Mar 17

HO-SFL enables backprop-free fine-tuning on edge devices without the convergence penalty typical of zeroth-order methods.

AI & ML arxiv | Mar 17

RAZOR provides a lightweight, targeted unlearning framework for Transformers and Diffusion models without retraining.

AI & ML arxiv | Mar 17

Introduces an asynchronous Mixture-of-Transformers architecture for autonomous driving that decouples slow reasoning from fast action execution.

AI & ML arxiv | Mar 17

Achieves over 80% of full-resolution VLM performance while using only 1% of the original pixel budget through bio-inspired foveated sampling.

AI & ML arxiv | Mar 17

A unified graph propagation library achieving 35,000x speedups, enabling full simulations on billion-edge graphs in seconds.

AI & ML arxiv | Mar 17

AdaAnchor enables LLMs to perform multi-step reasoning entirely in latent space with an adaptive halting mechanism to optimize compute.

AI & ML arxiv | Mar 17

AnoleVLA replaces the standard Transformer backbone in robotic Vision-Language-Action models with Deep State Space Models for a 3x speedup.

AI & ML arxiv | Mar 17

Writer-R1-4B outperforms 100B+ parameter models in creative writing by utilizing memory-augmented self-reflection and fine-grained criteria generation.

AI & ML arxiv | Mar 17

Ultra-low-bitrate image compression achieves 50% bitrate savings by treating decoding as a 'next-frame' video prediction task using diffusion priors.

AI & ML arxiv | Mar 17

HapticVLA achieves tactile-aware robotic manipulation at 86.7% success rate without requiring any physical tactile sensors at inference time.

AI & ML arxiv | Mar 17

IConE enables stable self-supervised learning even at batch size 1, overcoming the memory bottlenecks of high-dimensional scientific and medical data.

AI & ML arxiv | Mar 17

FlashU is the first framework to accelerate unified multimodal models by exploiting the distinct neuron sets used for generation vs. understanding.

AI & ML arxiv | Mar 17

MeMix is a training-free, plug-and-play module that reduces 3D reconstruction error by up to 40% in long sequences by mitigating state drift.

AI & ML arxiv | Mar 17

PrismMirror is the first monocular human frontal view synthesis model to achieve real-time inference (24 FPS) without external geometric models.

AI & ML arxiv | Mar 17

A 4B parameter model matches a 120B parameter model in program verification through a rigorous data curation pipeline.

AI & ML arxiv | Mar 17

Bridges the gap between generative (MAE) and predictive (I-JEPA) self-supervised learning, achieving a 10% performance boost.

AI & ML arxiv | Mar 17

Accelerates state-of-the-art 3D human mesh recovery by over 10x, enabling real-time vision-only humanoid teleoperation.

AI & ML arxiv | Mar 17

Introduces Mixture-of-Depths Attention (MoDA) to solve signal degradation in deep LLMs with hardware-efficient implementation.

AI & ML arxiv | Mar 17

Achieves 1,000x speedups in Bayesian inverse problems by replacing repeated MCMC sampling with one-step preconditioned generative transport.

AI & ML arxiv | Mar 17

RSM achieves 20x faster training for recursive reasoning models and enables test-time scaling for up to 20,000 refinement steps.

AI & ML arxiv | Mar 18

Reduces high-quality 3D head avatar creation time from over 24 hours to 0.5 seconds per frame.

AI & ML arxiv | Mar 18

Fuses categorical sampling into the LM-head matmul to eliminate logit materialization and speed up LLM decoding by up to 19%.

AI & ML arxiv | Mar 18