EFFICIENCY_BREAKTHROUGH

EFFICIENCY_BREAKTHROUGH

375 papers · Page 4 of 4

A unified framework for neural network recombination that achieves state-of-the-art fine-tuning with fewer than 200 parameters.

AI & ML arxiv | Mar 31

GIFT bootstraps image-to-CAD generation by turning inference-time failures into synthetic training data, reducing inference compute by 80%.

AI & ML arxiv | Mar 31

Near-lossless KV cache compression using angular quantization in the Walsh-Hadamard domain at ~3.5 bits per element.

AI & ML arxiv | Mar 31

Achieves a 79,000x reduction in energy per inference for insulin dose calculation using Spiking Neural Networks (SNNs).

AI & ML arxiv | Mar 31

Uses spectral decomposition of inverse dynamics to enable real-time planning of long-horizon robotic manipulation tasks (10+ contact modes).

AI & ML arxiv | Mar 31

KVSculpt moves beyond simple eviction/merging to optimize unconstrained KV pairs in continuous space for extreme cache compression.

AI & ML arxiv | Mar 31

SAGE mitigates multimodal hallucinations by monitoring 'attention sinks' and dynamically modulating self-attention during the decoding process.

AI & ML arxiv | Mar 31

ITQ3_S achieves high-fidelity 3-bit LLM inference by using rotation-domain smoothing to eliminate the catastrophic precision loss caused by outliers.

AI & ML arxiv | Mar 31

ExFusion enables Transformer models to gain the capacity of Mixture-of-Experts during training while remaining a standard dense model for deployment.

AI & ML arxiv | Mar 31

Dataset Concentration (DsCo) achieves nearly lossless dataset reduction by aligning distributions via diffusion models, cutting storage and training costs by half.

AI & ML arxiv | Mar 31

Decoupled language models reduce the compute required for OCR domain adaptation by 95% while matching SOTA transformer accuracy.

AI & ML arxiv | Mar 31

Drift-AR enables single-step (1-NFE) high-fidelity image generation by reinterpreting AR prediction entropy as a physical drifting field.

AI & ML arxiv | Mar 31

ROVED reduces the expensive human feedback required for preference-based RL by up to 90% by leveraging vision-language embeddings and uncertainty filtering.

AI & ML arxiv | Mar 31

Introduces Heddle, a trajectory-centric system that resolves the long-tail latency bottleneck of tool calls in agentic Reinforcement Learning.

AI & ML arxiv | Mar 31

Replaces the classic Newton-Raphson power-flow solver with a differentiable GPU-accelerated simulation.

AI & ML arxiv | Mar 31

Introduces lightweight equilibration to the Muon optimizer, significantly stabilizing and accelerating LLM pretraining.

AI & ML arxiv | Mar 31

Enables instruction-following in low-resource languages by simply merging target language base models with English-instructed models.

AI & ML arxiv | Mar 31

An evolutionary framework for GPU kernel generation that outperforms frontier models like Claude 4.6 and Gemini 3.0.

AI & ML arxiv | Mar 31

HISA eliminates the quadratic O(L²) bottleneck in sparse attention indexers, enabling efficient long-context scaling for models like DeepSeek-V3.

AI & ML arxiv | Mar 31

IsoQuant leverages SO(4) isoclinic rotations to achieve a 4.5x-4.7x speedup in low-bit KV-cache quantization over existing methods.

AI & ML arxiv | Mar 31

INSID3 achieves state-of-the-art one-shot image segmentation using only frozen DINOv3 features without any training, fine-tuning, or auxiliary models.

AI & ML arxiv | Mar 31

EdgeDiT provides a hardware-aware blueprint for running massive Diffusion Transformers (DiT) on mobile NPUs with a 1.6x reduction in latency.

AI & ML arxiv | Mar 31

LAD achieves 3x lower latency than previous driving language models by generating textual reasoning and motion plans at up to 20 Hz.

AI & ML arxiv | Mar 31

Hydra unifies ColBERT-style retrieval and autoregressive generation into a single Vision-Language Model using a single LoRA adapter.

AI & ML arxiv | Mar 31

StreamingVLA eliminates execution halting in robots by asynchronously parallelizing observation, generation, and execution.

AI & ML arxiv | Mar 31

ResAdapt learns a per-frame visual budget allocator that optimizes input resolution before encoding.

AI & ML arxiv | Mar 31

RNNs can be trained online without Jacobian propagation, matching BPTT performance at 1000x less memory.

AI & ML arxiv | Mar 31

IF4 introduces an adaptive 4-bit data type that switches between Float and Integer representations to minimize quantization error.

AI & ML arxiv | Mar 31

Decouples data mixture ratio selection from continual pre-training by optimizing distribution vectors post-hoc with 15-35x lower compute cost.

AI & ML arxiv | Apr 1

Combines differentiable optimization with exact ILP solvers to achieve a 10x performance gain in solving NP-hard combinatorial scheduling problems.

AI & ML arxiv | Apr 1

A fabricated 16nm SoC that performs real-time 3D occupancy mapping under 6 mW, reducing query energy by over 80%.

AI & ML arxiv | Apr 1

Generates complete, simulatable analog circuits in milliseconds, outperforming search-based methods by over 600x.

AI & ML arxiv | Apr 1

Introduces PolarQuant, a quantization method that uses Hadamard rotation to make LLM weights near-lossless at 5-bit without calibration data.

AI & ML arxiv | Apr 1

Scales curvature-aware bilevel optimization to BERT-sized models using KFAC, significantly outperforming standard gradient unrolling.

AI & ML arxiv | Apr 1

Enables infinite-length video understanding on a single consumer GPU (RTX 3090) through a training-free visual memory mechanism.

AI & ML arxiv | Apr 1

Obtain epistemic and aleatoric uncertainty from a single forward-backward pass of an unmodified pretrained LLM.

AI & ML arxiv | Apr 1

A vector-wise sparse attention mechanism that accelerates long-context video inference by 2.6x with zero loss in accuracy.

AI & ML arxiv | Apr 1

A unified quantization and runtime framework for deploying multiple LoRA-adapted generative models on edge devices simultaneously.

AI & ML arxiv | Apr 1

A 1D continuous image tokenizer that uses semantic masking to achieve a 64x reduction in token usage without sacrificing generation fidelity.

AI & ML arxiv | Apr 1

A compiler approach to agent logs that reduces token consumption by 50-66% while improving context learning performance.

AI & ML arxiv | Apr 1

A stabilization mechanism for adapting LLMs to time-series tasks that reduces memory footprint by up to 1,776x.

AI & ML arxiv | Apr 1

Applies Shapley values from cooperative game theory to solve the 'free-rider' problem in GRPO-based reinforcement learning post-training.

AI & ML arxiv | Apr 1

Produces high-fidelity SHAP explanations for tabular data 1000x faster than traditional methods by integrating them directly into the model architecture.

AI & ML arxiv | Apr 1

Proposes a unified tensor-factorization view of attention that encompasses MHA, GQA, and MLA while reducing parameter counts by an order of magnitude.

AI & ML arxiv | Apr 1

Recovers short-text performance in context-extended LLMs using 60x less data than current state-of-the-art distillation methods.

AI & ML arxiv | Apr 2

Introduces entropy-guided adaptive decoding that gives small models reasoning performance comparable to frontier models at a fraction of the cost.

AI & ML arxiv | Apr 2

Proposes a 'no-backprop' stochastic process memory for edge agents that solves the retention-forgetting tradeoff with fixed compute.

AI & ML arxiv | Apr 2

MAC-Attention achieves 14x attention-phase speedups and reduces KV cache accesses by 99% for long-context LLMs by reusing computation from semantically similar queries.

AI & ML arxiv | Apr 2

A modified 110M parameter ColBERT model can identify fine-grained evidence spans as accurately as a 27B parameter LLM, but at a fraction of the cost.

AI & ML arxiv | Apr 2

A lightweight framework for triaging agentic trajectories post-deployment without the cost of human review or auxiliary LLM calls.

AI & ML arxiv | Apr 2

A cross-graph tuning-free prompting framework for GNNs that achieves massive gains on unseen graphs without retraining.

AI & ML arxiv | Apr 2

Self-Routing removes the need for learned routers in Mixture-of-Experts (MoE) by using hidden states directly for expert assignment.

AI & ML arxiv | Apr 2

Improves Qwen2.5-7B performance on AIME2024 by 137% through test-time iterative rethinking and majority-voted pseudo-labels.

AI & ML arxiv | Apr 2

Automates mathematical optimization modeling using reinforcement learning with solver-derived rewards instead of human process supervision.

AI & ML arxiv | Apr 2

Optimizes LLM inference scheduling by treating output length as a heavy-tailed distribution rather than a point estimate.

AI & ML arxiv | Apr 2

Introduces negative early exit and adaptive boosting to make Monte Carlo Tree Search (MCTS) practical for real-time LLM inference.

AI & ML arxiv | Apr 2

Achieves a major breakthrough in dataset distillation, reaching 60% accuracy on ImageNet-1K using only a handful of synthetic images.

AI & ML arxiv | Apr 2

Enables 'Elastic Inference' where a single trained model can be converted to multiple lower-precision formats on-the-fly without retraining.

AI & ML arxiv | Apr 2

Scales imitation learning data efficiency by generating synthetic 'multi-view' demonstrations from a single expert trajectory.

AI & ML arxiv | Apr 2

Proposes Physical Imitation Learning (PIL) to offload up to 87% of a control policy's mechanical power to passive robotic joints.

AI & ML arxiv | Apr 2

CircuitProbe identifies reasoning circuits in Transformers 1000x faster than brute-force methods and predicts the efficacy of layer duplication.

AI & ML arxiv | Apr 2

Spectral Compact Training (SCT) enables training 70B-parameter architectures on consumer hardware like the Steam Deck (8GB RAM) via permanent SVD factors.

AI & ML arxiv | Apr 2

This paper achieves O(1) complexity for multimillion-class classification by leveraging predefined vector systems in the latent space.

AI & ML arxiv | Apr 2

Molecular Memory allows MoE systems to recover previously learned domain expertise 9-11x faster by utilizing cost-penalized fitness metrics that preserve dormant experts.

AI & ML arxiv | Apr 2

OBD-LLM uses second-order Hessian information to achieve 20-40% better low-rank decomposition accuracy than the current state-of-the-art SVD-LLM.

AI & ML arxiv | Apr 2

PixelPrune identifies and removes pixel-level redundancy before the Vision Transformer encoder, delivering up to 4.2x inference speedup for high-resolution VLM tasks.

AI & ML arxiv | Apr 2

EmbedPart achieves a 100x speedup over Metis for graph partitioning by clustering node embeddings rather than operating on raw graph structures.

AI & ML arxiv | Apr 2

A lightweight probing method predicts LLM downstream task performance from internal representations during training, reducing evaluation latency from one hour to three minutes.

AI & ML arxiv | Apr 2

Canonical Correlation Analysis (CCA) can reduce image representation dimensionality by 75% while actually improving downstream performance through cross-model agreement.

AI & ML arxiv | Apr 2

Decouples weather forecasting from spatial resolution by using Flow Matching to super-resolve coarse trajectories as a post-processing step.

AI & ML arxiv | Apr 2

Introduces S0 tuning for hybrid RNN-attention models, outperforming LoRA by 10.8% with zero inference overhead.

AI & ML arxiv | Apr 2

Reduces the compute cost of LLM test-time scaling by up to 67% using conformal prediction to calibrate reasoning paths.

AI & ML arxiv | Apr 2

Combines the YOCO architecture with recursive computation to scale representational depth without inflating the KV cache.

AI & ML arxiv | Apr 2

Solves the long-standing trade-off in low-rank matrix recovery by achieving both optimal sample complexity and fast convergence.

AI & ML arxiv | Apr 2

Enables Gaussian Processes to scale on modern parallel hardware by removing the need for Cholesky decompositions.

AI & ML arxiv | Apr 2