EFFICIENCY_BREAKTHROUGH

375 papers · Page 2 of 4

Achieves microsecond-level kinodynamic motion planning for high-DOF robots by using differential flatness to solve boundary value problems analytically.

AI & ML arxiv | Mar 18

Demonstrates that masked diffusion language models can be 21.8x more compute-efficient than traditional autoregressive models when scaled correctly.

AI & ML arxiv | Mar 18

Introduces Helium, a serving framework that treats agentic workflows as data query plans to optimize redundant LLM calls and KV caches.

AI & ML arxiv | Mar 18

Presents ZipCal, a model-agnostic calibration data selection strategy for pruning and quantization that is 240x faster than model-based methods.

AI & ML arxiv | Mar 18

VQKV uses Vector Quantization to achieve over 80% KV cache compression with almost zero loss in model performance.

AI & ML arxiv | Mar 18

FEAT is a linear-complexity foundation model designed specifically for extremely large-scale structured (tabular) data.

AI & ML arxiv | Mar 18

Enables stable 4-bit microscaling (MXFP4) quantization for Multi-modal LLMs, which previously suffered from performance collapse.

AI & ML arxiv | Mar 18

Low-precision optimizer states cause 'state staleness' where updates round back to stored values, but scheduled resets can fully recover performance loss.

AI & ML arxiv | Mar 18

GIST achieves O(N) complexity for Graph Transformers while maintaining gauge invariance, enabling scaling to meshes with 750K nodes.

AI & ML arxiv | Mar 18

Pretrained 3D generative models can be repurposed for high-quality part segmentation using less than 1% of the typical labeled data.

AI & ML arxiv | Mar 18

HoloByte is a tokenizer-free framework that projects byte sequences into a continuous hyperspherical manifold to bypass the morphological limits of discrete tokens.

AI & ML arxiv | Mar 19

AwaRes enables low-resolution Vision-Language Models to retrieve only the high-resolution image crops needed for a specific query via tool-calling.

AI & ML arxiv | Mar 19

Provides a systematic profiling of VLM inference bottlenecks and releases 'recipes' that cut time-to-first-token by up to 93%.

AI & ML arxiv | Mar 19

A backbone-agnostic denoising objective that allows small GNNs to outperform large models pretrained on much larger supervised datasets in physical sciences.

AI & ML arxiv | Mar 19

A dynamic data pruning framework that cuts dense retriever training time by 50% while actually improving retrieval accuracy.

AI & ML arxiv | Mar 19

Achieves up to a 1,000x gain in RLHF data efficiency by using information-directed exploration and epistemic neural networks.

AI & ML arxiv | Mar 19

Introduces a reward framework that reduces LLM reasoning verbosity by optimizing for 'Information Density' via entropy reduction per step.

AI & ML arxiv | Mar 19

Generates 9 million grid points of 3D spatiotemporal physical fields in seconds, a 10,000x speedup over traditional physics simulations.

AI & ML arxiv | Mar 19

Replaces quadratic self-attention with $O(N \log N)$ phase-native coupling for time-series, enabling massive context windows.

AI & ML arxiv | Mar 19

Achieves an 80% reduction in Chain-of-Thought (CoT) tokens while slightly increasing reasoning accuracy.

AI & ML arxiv | Mar 19

Extends LLM context from 32K to 128K by teaching models to selectively skip global attention for ~80% of tokens.

AI & ML arxiv | Mar 19

Knowledge-Aware Active Learning (KA2L) uses latent space probing to identify what an LLM doesn't know and generates targeted synthetic questions.

AI & ML arxiv | Mar 19

S-VGGT introduces structure-aware subscene decomposition to break the quadratic scaling bottleneck of 3D foundation models.

AI & ML arxiv | Mar 19

DSS-GAN is the first generative adversarial network to use a Mamba (State Space Model) backbone for high-quality image synthesis.

AI & ML arxiv | Mar 19

Synthetic videos of simple geometric shapes are more effective than massive real-world datasets for teaching video-language models fundamental temporal reasoning.

AI & ML arxiv | Mar 19

Anomaly detection can be performed directly using a primary model's internal neuron output ranges, eliminating the need for expensive external AD models.

AI & ML arxiv | Mar 19

Truncated backpropagation for video decoding reduces the memory cost of fine-tuning video diffusion models from linear to constant.

AI & ML arxiv | Mar 19

ProbeFlow achieves 14.8x faster action decoding in Vision-Language-Action (VLA) models without any retraining.

AI & ML arxiv | Mar 19

Parallel multi-token prediction can be achieved in standard LLMs without training auxiliary models or modifying weights.

AI & ML arxiv | Mar 19

CARE provides a recipe for converting standard GQA models into high-efficiency Multi-head Latent Attention (MLA) architectures.

AI & ML arxiv | Mar 19

VideoAtlas enables navigation and reasoning over long-form video using compute that scales only logarithmically with video length.

AI & ML arxiv | Mar 19

MUD provides a faster, lower-overhead alternative to Muon for transformer training, achieving up to 2.6x higher throughput.

AI & ML arxiv | Mar 19

LoST introduces a semantic-first 3D tokenizer that reduces the token count for 3D shape generation by up to 99.9%.

AI & ML arxiv | Mar 19

MineDraft achieves a 75% throughput increase in speculative decoding by overlapping the drafting and verification stages.

AI & ML arxiv | Mar 20

Q-Drift corrects quantization-induced noise in diffusion models using a plug-and-play sampler adjustment that requires only 5 calibration runs.

AI & ML arxiv | Mar 20

Achieves depth-independent training memory bounded to approximately twice the inference footprint.

AI & ML arxiv | Mar 20

A decoder-free world model that trains 1.59x faster than DreamerV3 while outperforming it on tasks with small, task-relevant objects.

AI & ML arxiv | Mar 20

Fixes the 'squeezing effect' in Direct Preference Optimization (DPO) using an efficient logit-space Sharpness-Aware Minimization.

AI & ML arxiv | Mar 20

PreSCAN predicts NeRF reconstruction quality in under 30 seconds, achieving a 1000x speedup over Neural Architecture Search.

AI & ML arxiv | Mar 20

TopoChunker maps documents to a Structured Intermediate Representation (SIR) to preserve hierarchical context during RAG chunking.

AI & ML arxiv | Mar 20

AFBS-BO automates the discovery of layer-specific sparse attention hyperparameters, making long-context acceleration 'plug-and-play.'

AI & ML arxiv | Mar 20

Discounted Beta-Bernoulli (DBB) reward estimation solves the variance collapse and sample inefficiency inherent in point-estimation RLVR methods for LLM reasoning.

AI & ML arxiv | Mar 20

EntropyCache achieves up to 26x speedup for Diffusion Language Models by using decoded token entropy as a proxy for KV cache staleness.

AI & ML arxiv | Mar 20

AIMER provides a calibration-free criterion for expert pruning in MoE models that matches state-of-the-art performance in seconds.

AI & ML arxiv | Mar 20

DDPO addresses the 'overthinking' and 'overconfidence' issues in Large Reasoning Models (LRMs) by optimizing answer length based on task difficulty.

AI & ML arxiv | Mar 20

Enables high-fidelity 3D satellite surface reconstruction in a single forward pass without per-scene optimization.

AI & ML arxiv | Mar 20

Matches the performance of the complex SFT+GRPO reasoning pipeline for Vision-Language Models in 1/7th of the training time.

AI & ML arxiv | Mar 20

Provides a mathematically grounded, efficient offline policy optimization method for Diffusion LLMs by estimating trajectory probabilities with a single forward pass.

AI & ML arxiv | Mar 20

Uses a lightweight GRPO-trained policy to select optimal video frames, reducing processing time by 93% while actually improving Video QA accuracy.

AI & ML arxiv | Mar 20

Bootstraps reasoning-heavy RL by stochastically injecting few-shot demonstrations into training prompts via a curriculum.

AI & ML arxiv | Mar 20

Aligns diffusion models with human preferences using only 100 samples, outperforming SOTA methods that use thousands.

AI & ML arxiv | Mar 20

Any-order autoregressive models can outperform diffusion-based classifiers while being 25x more efficient.

AI & ML arxiv | Mar 20

A GPU-accelerated metaheuristic framework that solves combinatorial optimization problems orders of magnitude faster than traditional MIP solvers.

AI & ML arxiv | Mar 20

Reduces reaction latency in flow-based VLA models by 10x, enabling real-time responsiveness on consumer GPUs.

AI & ML arxiv | Mar 20

A 30B MoE model with only 3B active parameters achieves Gold Medal-level performance in International Math and Informatics Olympiads.

AI & ML arxiv | Mar 20

Achieves state-of-the-art LLM distillation using 10-25% of the data required by standard fine-tuning.

AI & ML arxiv | Mar 23

Accelerates MoE inference by speculating future experts to overlap CPU-GPU memory transfers with computation.

AI & ML arxiv | Mar 23

Achieve 97% of Oracle reward performance using only 20% of the training labels for complex LLM reasoning.

AI & ML arxiv | Mar 23

The first Joint Embedding Predictive Architecture (JEPA) to train stably end-to-end from raw pixels with massive planning speedups.

AI & ML arxiv | Mar 23

DAPA speeds up GELU computation by 16x and reduces hardware DSP utilization by 16x for on-device Transformer deployment.

AI & ML arxiv | Mar 23

Spectral Tempering achieves near-oracle embedding compression for dense retrieval without requiring any labeled data or grid searching.

AI & ML arxiv | Mar 23

Empirically proves that most Transformer layers are redundant, enabling a 54% training cost reduction through non-uniform budget allocation.

AI & ML arxiv | Mar 23

Warm-Start Flow Matching provides a guaranteed speedup for image/text generation by using lightweight models as initial priors.

AI & ML arxiv | Mar 23

Adaptive Layerwise Perturbation (ALP) solves the training-inference mismatch and importance ratio blowup in LLM reinforcement learning.

AI & ML arxiv | Mar 23

EvidenceRL uses reinforcement learning (GRPO) to explicitly optimize for evidence adherence, reducing hallucinations in high-stakes RAG pipelines.

AI & ML arxiv | Mar 23

Accelerates diffusion-based image decoders by an order of magnitude using multi-scale sampling and one-step distillation.

AI & ML arxiv | Mar 23

Reduces covariance tracking error by 30x by reformulating the problem as rigid-body motion on Lie groups.

AI & ML arxiv | Mar 23

Achieves a 19x reduction in inference cost and 16x in latency for agentic workflows by evolving hybrid LLM-and-code pipelines.

AI & ML arxiv | Mar 23

Reduces long-context inference latency by 26.4x using a training-free, structure-aware prompt compression framework.

AI & ML arxiv | Mar 23

Introduces the first reinforcement learning framework to compress implicit reasoning steps in looped language models.

AI & ML arxiv | Mar 23

Achieves O(1) time complexity for dense component attribution in SwiGLU Transformers using a single forward-backward pass.

AI & ML arxiv | Mar 23

A training-free method to fix intra-modal misalignment in CLIP by decomposing projectors into an isotropic aligned subspace.

AI & ML arxiv | Mar 23

NASimJax provides a 100x throughput increase for autonomous penetration testing simulators by reimplementing the environment in JAX.

AI & ML arxiv | Mar 23

SAGE achieves state-of-the-art translation for low-resource languages while reducing training data requirements by 97.1% via RL-guided curation.

AI & ML arxiv | Mar 23

Memori reduces agent token costs by 20x by replacing raw conversation history with a persistent layer of semantic triples and summaries.

AI & ML arxiv | Mar 23

2K Retrofit enables 2K-resolution inference for any 3D geometric foundation model without modifying or retraining the backbone.

AI & ML arxiv | Mar 23

A k-means variant that is up to 7x faster than FAISS and Scikit-Learn on CPUs and 4x faster than cuVS on GPUs.

AI & ML arxiv | Mar 23

Reduces the computational cost of Neural Architecture Search for ensembles from O(M) to O(1).

AI & ML arxiv | Mar 23

Quantifies LLM uncertainty in a single generation pass without auxiliary models or repeated sampling.

AI & ML arxiv | Mar 23

Introduces a long-horizon video agent that uses 93% fewer frames than GPT-5/standalone LMMs while achieving higher accuracy.

AI & ML arxiv | Mar 23

Provides a robust method for distilling discrete diffusion models that maintains quality and diversity even with very few sampling steps.

AI & ML arxiv | Mar 23

Achieves over 10x faster sampling for diffusion language models by shifting the process into continuous semantic space.

AI & ML arxiv | Mar 24

Integrates fast scalar rewards with slow generative CoT reasoning to reduce reward model token consumption by 20%.

AI & ML arxiv | Mar 24

Enables precise prompt routing by predicting the expected reward of a model before any response is generated.

AI & ML arxiv | Mar 24

Reduces Tree of Thought (ToT) computational overhead by up to 75% using plug-and-play predictors for pruning.

AI & ML arxiv | Mar 24

STAC achieves a 10x memory reduction and 4x speedup for real-time streaming 3D reconstruction using spatio-temporal cache compression.

AI & ML arxiv | Mar 24

DiffMark enables multi-bit watermarking that is transferable across different frozen diffusion models with a 45x speedup over current methods.

AI & ML arxiv | Mar 24

VGS-Decoding is a training-free method to mitigate medical VLM hallucinations by reweighting token probabilities based on their visual dependency.

AI & ML arxiv | Mar 24

GEM is the first native graph-based index for multi-vector (ColBERT-style) retrieval, achieving up to 16x speedups over existing single-vector index adaptations.

AI & ML arxiv | Mar 24

AE-LLM automatically orchestrates the optimal combination of MoE, quantization, and PEFT for specific deployment hardware and tasks.

AI & ML arxiv | Mar 24

Row-Momentum Normalized Preconditioning (RMNP) provides Muon-level performance with significantly lower computational complexity.

AI & ML arxiv | Mar 24

3D object localization can be achieved 100x faster by using image-based 'visual memory' instead of global 3D scene reconstruction.

AI & ML arxiv | Mar 24

Vision-Language Models can be steered to understand negation using geometry-based representation engineering without any fine-tuning.

AI & ML arxiv | Mar 24

Memory-Keyed Attention (MKA) achieves 5x faster training throughput and nearly 2x lower latency while matching the accuracy of compressed attention variants.

AI & ML arxiv | Mar 24

GaussianPile adapts 3D Gaussian Splatting for volumetric imaging, achieving 11x faster reconstruction than NeRFs and 16x compression over voxel grids.

AI & ML arxiv | Mar 24

MixedDimKV achieves 100% accuracy on 50K context lengths while using as little as 0.26% of the traditional KV cache.

AI & ML arxiv | Mar 24

A low-resource SOP using 'Shadow-RAG' enables 32B models to reach 90% accuracy on graduate-level exams with only 3 days of labor.

AI & ML arxiv | Mar 24

A routing framework that uses internal prefill activations to select the optimal LLM for a task, capturing 45% of the oracle accuracy gap with 74% cost savings.

AI & ML arxiv | Mar 24

A training-free visual token pruning framework for Large Vision-Language Models that preserves geometric structure through subspace reconstruction.

AI & ML arxiv | Mar 24

Free Sinewich enables parameter-efficient multi-task learning using frequency-based weight modulation with near-zero overhead.

AI & ML arxiv | Mar 24