EFFICIENCY_BREAKTHROUGH EFFICIENCY_BREAKTHROUGH
375 papers · Page 1 of 4
The first open recipe for training embodied intelligence at the 1,000-GPU scale, achieving a 40x speedup in training cycles for GR00T models.
AI & ML arxiv | Mar 13
REOPOLD achieves 10x better sample efficiency in reasoning distillation, enabling 7B models to match 32B teachers with significantly less training data.
AI & ML arxiv | Mar 13
PACED introduces a weight kernel that focuses distillation on the 'Zone of Proximal Development,' where the student's gradient signal-to-noise ratio is highest.
AI & ML arxiv | Mar 13
InstantHDR achieves high-quality 3D HDR reconstruction 700x faster than current optimization-based methods.
AI & ML arxiv | Mar 13
TimeSqueeze achieves 20x faster convergence and 8x higher data efficiency for time-series foundation models by using dynamic, content-aware patching.
AI & ML arxiv | Mar 13
DART enables real-time multi-class detection for open-vocabulary models like SAM3, achieving up to 25x speedup without any weight modifications.
AI & ML arxiv | Mar 13
LongFlow provides an 11x throughput boost for reasoning models by specifically optimizing KV cache for long-output (vs long-input) scenarios.
AI & ML arxiv | Mar 13
Mobile-GS achieves real-time Gaussian Splatting on mobile devices by replacing the sorting-based alpha-blending bottleneck with depth-aware order-independent rendering.
AI & ML arxiv | Mar 13
Achieves 99.5% performance on Needle-In-A-Haystack benchmarks while retaining only 3% of the KV cache budget.
AI & ML arxiv | Mar 13
Distills high-fidelity joint audio-visual generation into a real-time streaming model capable of 25 FPS on a single GPU.
AI & ML arxiv | Mar 13
Achieves hour-scale real-time human animation by solving the unbounded memory growth and inconsistent noise states in autoregressive diffusion.
AI & ML arxiv | Mar 13
Unifies leading membership inference attacks into a single framework and uses Bayesian variance inference to enable privacy auditing with 10x less compute.
AI & ML arxiv | Mar 13
Recovers hidden ODE parameters from sparse data with a 487x speedup over gradient-based methods.
AI & ML arxiv | Mar 13
Eliminates the 2.5x latency penalty of dynamic adapters in LLMs via pre-gating and fused CUDA kernels.
AI & ML arxiv | Mar 13
Fits promptable visual segmentation (SAM) into a 1.3M parameter model for real-time in-sensor execution.
AI & ML arxiv | Mar 13
Achieves high-fidelity one-step (1 NFE) 3D robotic manipulation using training-time drifting fields.
AI & ML arxiv | Mar 13
Achieves up to 14.4x higher decoding throughput in long-context LLMs via a training-free framework that reuses sparse memory at semantic boundaries.
AI & ML arxiv | Mar 13
A specialized distributed serving system for 'Any-to-Any' multimodal models that achieves 5.79x lower tail latency via component disaggregation.
AI & ML arxiv | Mar 13
Automates the generation of GPU-parallelized RL environments from text/code specifications, achieving up to 22,000x speedups for less than $10.
AI & ML arxiv | Mar 13
Selects high-quality synthetic code data using 'Reverse Mutual Information' to achieve full-dataset performance with 75% less data.
AI & ML arxiv | Mar 13
Accelerates sparse attention by 75% by reusing lightning indexer decisions across layers, tackling the hidden bottleneck in production-grade LLMs.
AI & ML arxiv | Mar 13
Reduces visual tokens by up to 100x using an autoregressive gazing module, enabling 19x faster 4K/1000-frame video understanding.
AI & ML arxiv | Mar 13
Introduces adaptive video tokenization that allocates tokens based on scene complexity, reducing token usage by 24% while improving reconstruction quality.
AI & ML arxiv | Mar 13
ActTail achieves 80% activation sparsity in LLMs with significantly lower perplexity degradation than uniform methods by using Heavy-Tailed Self-Regularization theory.
AI & ML arxiv | Mar 16
ReBalance is a training-free framework that dynamically modulates 'thinking' length in reasoning models to prune redundancy during overthinking and promote exploration during underthinking.
AI & ML arxiv | Mar 16
Achieves 100x speedup in robotic action generation by distilling iterative flow/diffusion models into a one-step policy without a pre-trained teacher.
AI & ML arxiv | Mar 16
Reduces Chain-of-Thought (CoT) compute costs by 14-55% by learning the optimal 'early-exit' points for Large Reasoning Models.
AI & ML arxiv | Mar 16
Accelerates Diffusion Transformers (DiTs) by 2x using a training-free framework that selectively reduces computation in non-aesthetic image regions.
AI & ML arxiv | Mar 16
Introduces a training-free framework that allows LLM agents to dynamically scale their reasoning depth based on a pre-defined token/tool budget.
AI & ML arxiv | Mar 16
Achieves a 98x speedup in LLM routing on AMD hardware using Flash Attention and prompt compression, enabling high-context classification without a dedicated GPU.
AI & ML arxiv | Mar 16
Modality-level disaggregation enables cost-optimal MLLM serving across heterogeneous GPUs over commodity PCIe, bypassing the need for expensive NVLink interconnects.
AI & ML arxiv | Mar 16
A hardware-algorithm co-design for Spiking Neural Networks achieves up to 69x energy efficiency gains using an SRAM-based Compute-in-Memory accelerator.
AI & ML arxiv | Mar 16
Achieves 4x visual token compression and 80% lower training cost while unifying multimodal comprehension and generation.
AI & ML arxiv | Mar 16
Adaptive VLM Routing reduces inference costs for Computer Use Agents by up to 78% with negligible accuracy loss.
AI & ML arxiv | Mar 16
Distills a 2B Vision-Language Retriever into a 70M text-only encoder for visual document retrieval with 50x lower latency.
AI & ML arxiv | Mar 16
CleanSight provides a training-free, test-time defense for backdoored vision-language models by detecting and pruning 'attention stealing' visual tokens.
AI & ML arxiv | Mar 16
Structured distillation for personalized agent memory achieves an 11x reduction in token count while preserving 96% of the retrieval quality of verbatim history.
AI & ML arxiv | Mar 16
Induces pretrained video models to perform SOTA image restoration using less than 2% of the training data required by specialized architectures.
AI & ML arxiv | Mar 16
Achieves 'zero-hyperparameter' circuit analysis by using a foundation model to perform in-context regression, bypassing hours of manual tuning.
AI & ML arxiv | Mar 16
Introduces Bilateral Context Conditioning to DeepSeek's GRPO, allowing models to cross-reference successful and failed reasoning traces during optimization.
AI & ML arxiv | Mar 16
Enables RMSNorm to reuse MXFP8 block scales, reducing the reduction operation size by 32x with a 2.4x kernel speedup.
AI & ML arxiv | Mar 16
Truncated-Reasoning Self-Distillation (TRSD) allows models to maintain accuracy even when their chain-of-thought traces are heavily shortened.
AI & ML arxiv | Mar 17
The ICaRus architecture allows multiple different models to share a single, frozen KV cache for the same prompt.
AI & ML arxiv | Mar 17
Using parallel associative scans achieves a 44x speedup in training continuous-time Spiking Neural Networks (SNNs).
AI & ML arxiv | Mar 17
RelayCaching eliminates redundant prefill computation in multi-agent systems by reusing the decoding-phase KV cache from previous agents.
AI & ML arxiv | Mar 17
Pretrained Transformers exhibit a pervasive inter-head linear structure where many attention heads can be reconstructed from a small set of peer heads.
AI & ML arxiv | Mar 17
FineRMoE extends MoE granularity to both intermediate and output dimensions, achieving a 136x increase in decoding throughput.
AI & ML arxiv | Mar 17
Distribution-Conditioned Diffusion Decoding enables high-fidelity image generation from pre-trained VLMs without expensive full-model retraining.
AI & ML arxiv | Mar 17
Qianfan-OCR introduces 'Layout-as-Thought,' enabling a 4B model to outperform 235B models on complex document parsing and layout analysis.
AI & ML arxiv | Mar 17
Achieves significant tool-selection accuracy gains in LLM semantic routers with zero added serving-time latency or cost.
AI & ML arxiv | Mar 17
A training-free acceleration method for diffusion language models that achieves a 4x speedup in image generation.
AI & ML arxiv | Mar 17
Implements bio-inspired 'mental-state dynamics' to achieve O(N) complexity in Vision Transformers.
AI & ML arxiv | Mar 17
Reduces the number of real-world robot rollouts needed for policy comparison by up to 70% using safe, anytime-valid inference.
AI & ML arxiv | Mar 17
Outperforms fine-tuned baselines in code optimization by using semantics-preserving transformations as a generative intermediate representation.
AI & ML arxiv | Mar 17
A 140M-parameter networking foundation model (PLUME) that outperforms frontier LLMs on protocol analysis by learning from native packet structures.
AI & ML arxiv | Mar 17
Replaces the quadratic cost of self-attention in Diffusion Transformers with a convection-diffusion PDE solved in the Fourier domain.
AI & ML arxiv | Mar 17
Implicit Maximum Likelihood Estimation (IMLE) achieves multimodal trajectory planning performance comparable to diffusion models while being 100x faster.
AI & ML arxiv | Mar 17
Greedy Information Projection (GIP) provides a fast, geometrically-principled method for selecting training data that balances quality and diversity, achieving full-data performance with a fraction of the examples.
AI & ML arxiv | Mar 17
Traditional Spiking Neural Network (SNN) sparsity is a performance 'illusion' on GPUs; temporal aggregation is required for actual 13x speedups.
AI & ML arxiv | Mar 17
Enables training of CNNs from scratch in true 4-bit precision on commodity CPUs with virtually no loss in accuracy.
AI & ML arxiv | Mar 17
Introduces the FLUX preprocessing pipeline, which reduces LLM training compute by 34% by maximizing high-quality token retention.
AI & ML arxiv | Mar 17
Reduces the RAM requirement for speech neuroprosthesis CTC decoding from 320 GB to 10 GB without sacrificing accuracy.
AI & ML arxiv | Mar 17
Reveals that Graph-RAG performance is limited by reasoning failure rather than retrieval, and shows how to make an 8B model match a 70B baseline.
AI & ML arxiv | Mar 17
Amortizes iterative diffusion into a one-step trajectory policy for robotics using a novel 'Keyed Drift Field' objective.
AI & ML arxiv | Mar 17
Proposes a temporal mixed-precision framework for diffusion models that adaptively assigns bitwidths across different denoising timesteps.
AI & ML arxiv | Mar 17
Accelerates LLM inference by up to 1.8x using a training-free sparse pattern predictor based on SVD truncation of FFN gate matrices.
AI & ML arxiv | Mar 17
Unifies KV cache compression and sparse attention into a single 1-bit indexing structure, eliminating the need for external metadata or predictors.
AI & ML arxiv | Mar 17
Detects diffusion-generated images 126x faster than reconstruction-based methods by using Gaussian noise disturbance to exploit the statistical 'ease' of fitting synthetic data.
AI & ML arxiv | Mar 17
Enables model adaptation on edge devices and non-differentiable (quantized) models using a purely backpropagation-free optimization framework.
AI & ML arxiv | Mar 17
Achieves real-time, low-latency talking avatar generation at 34ms per frame using a one-step streaming diffusion framework.
AI & ML arxiv | Mar 17
Introduces ZoomUI, a trainless method for GUI grounding that uses inference-time scaling to anchor natural language instructions to interface elements.
AI & ML arxiv | Mar 17
FLORE achieves 1000x error reduction in linear sketching while being 100x faster than previous learning-based solutions.
AI & ML arxiv | Mar 17
SleepGate introduces a biologically inspired 'sleep cycle' for the KV cache to resolve proactive interference in long-context LLMs.
AI & ML arxiv | Mar 17
ASAP reduces LVLM computational FLOPs by ~80% with virtually no loss in performance using a training-free KV-Cache pruning recipe.
AI & ML arxiv | Mar 17
FlashHead is a drop-in replacement for the LM classification head that provides 1.75x inference speedup by treating vocabulary selection as a retrieval problem.
AI & ML arxiv | Mar 17
Reformulates diffusion sampling as a graph-theoretic planning problem that dynamically allocates compute to the most difficult denoising stages.
AI & ML arxiv | Mar 17
Generates novel, structurally plausible protein sequences from small alignments using a training-free stochastic attention mechanism on a standard laptop.
AI & ML arxiv | Mar 17
Adaptive computation for multimodal LLMs drastically reduces compute waste on easy cases while focusing on hard ones.
AI & ML arxiv | Mar 17
HO-SFL enables backprop-free fine-tuning on edge devices without the convergence penalty typical of zeroth-order methods.
AI & ML arxiv | Mar 17
RAZOR provides a lightweight, targeted unlearning framework for Transformers and Diffusion models without retraining.
AI & ML arxiv | Mar 17
Introduces an asynchronous Mixture-of-Transformers architecture for autonomous driving that decouples slow reasoning from fast action execution.
AI & ML arxiv | Mar 17
Achieves over 80% of full-resolution VLM performance while using only 1% of the original pixel budget through bio-inspired foveated sampling.
AI & ML arxiv | Mar 17
A unified graph propagation library achieving 35,000x speedups, enabling full simulations on billion-edge graphs in seconds.
AI & ML arxiv | Mar 17
AdaAnchor enables LLMs to perform multi-step reasoning entirely in latent space with an adaptive halting mechanism to optimize compute.
AI & ML arxiv | Mar 17
AnoleVLA replaces the standard Transformer backbone in robotic Vision-Language-Action models with Deep State Space Models for a 3x speedup.
AI & ML arxiv | Mar 17
Writer-R1-4B outperforms 100B+ parameter models in creative writing by utilizing memory-augmented self-reflection and fine-grained criteria generation.
AI & ML arxiv | Mar 17
Ultra-low-bitrate image compression achieves 50% bitrate savings by treating decoding as a 'next-frame' video prediction task using diffusion priors.
AI & ML arxiv | Mar 17
HapticVLA achieves tactile-aware robotic manipulation at 86.7% success rate without requiring any physical tactile sensors at inference time.
AI & ML arxiv | Mar 17
IConE enables stable self-supervised learning even at batch size 1, overcoming the memory bottlenecks of high-dimensional scientific and medical data.
AI & ML arxiv | Mar 17
FlashU is the first framework to accelerate unified multimodal models by exploiting the distinct neuron sets used for generation vs. understanding.
AI & ML arxiv | Mar 17
MeMix is a training-free, plug-and-play module that reduces 3D reconstruction error by up to 40% in long sequences by mitigating state drift.
AI & ML arxiv | Mar 17
PrismMirror is the first monocular human frontal view synthesis model to achieve real-time inference (24 FPS) without external geometric models.
AI & ML arxiv | Mar 17
A 4B parameter model matches a 120B parameter model in program verification through a rigorous data curation pipeline.
AI & ML arxiv | Mar 17
Bridges the gap between generative (MAE) and predictive (I-JEPA) self-supervised learning, achieving a 10% performance boost.
AI & ML arxiv | Mar 17
Accelerates state-of-the-art 3D human mesh recovery by over 10x, enabling real-time vision-only humanoid teleoperation.
AI & ML arxiv | Mar 17
Introduces Mixture-of-Depths Attention (MoDA) to solve signal degradation in deep LLMs with hardware-efficient implementation.
AI & ML arxiv | Mar 17
Achieves 1,000x speedups in Bayesian inverse problems by replacing repeated MCMC sampling with one-step preconditioned generative transport.
AI & ML arxiv | Mar 17
RSM achieves 20x faster training for recursive reasoning models and enables test-time scaling for up to 20,000 refinement steps.
AI & ML arxiv | Mar 18
Reduces high-quality 3D head avatar creation time from over 24 hours to 0.5 seconds per frame.
AI & ML arxiv | Mar 18
Fuses categorical sampling into the LM-head matmul to eliminate logit materialization and speed up LLM decoding by up to 19%.
AI & ML arxiv | Mar 18