EFFICIENCY_BREAKTHROUGH EFFICIENCY_BREAKTHROUGH
375 papers · Page 4 of 4
A unified framework for neural network recombination that achieves state-of-the-art fine-tuning with fewer than 200 parameters.
AI & ML arxiv | Mar 31
GIFT bootstraps image-to-CAD generation by turning inference-time failures into synthetic training data, reducing inference compute by 80%.
AI & ML arxiv | Mar 31
Near-lossless KV cache compression using angular quantization in the Walsh-Hadamard domain at ~3.5 bits per element.
AI & ML arxiv | Mar 31
Achieves a 79,000x reduction in energy per inference for insulin dose calculation using Spiking Neural Networks (SNNs).
AI & ML arxiv | Mar 31
Uses spectral decomposition of inverse dynamics to enable real-time planning of long-horizon robotic manipulation tasks (10+ contact modes).
AI & ML arxiv | Mar 31
KVSculpt moves beyond simple eviction/merging to optimize unconstrained KV pairs in continuous space for extreme cache compression.
AI & ML arxiv | Mar 31
SAGE mitigates multimodal hallucinations by monitoring 'attention sinks' and dynamically modulating self-attention during the decoding process.
AI & ML arxiv | Mar 31
ITQ3_S achieves high-fidelity 3-bit LLM inference by using rotation-domain smoothing to eliminate the catastrophic precision loss caused by outliers.
AI & ML arxiv | Mar 31
ExFusion enables Transformer models to gain the capacity of Mixture-of-Experts during training while remaining a standard dense model for deployment.
AI & ML arxiv | Mar 31
Dataset Concentration (DsCo) achieves nearly lossless dataset reduction by aligning distributions via diffusion models, cutting storage and training costs by half.
AI & ML arxiv | Mar 31
Decoupled language models reduce the compute required for OCR domain adaptation by 95% while matching SOTA transformer accuracy.
AI & ML arxiv | Mar 31
Drift-AR enables single-step (1-NFE) high-fidelity image generation by reinterpreting AR prediction entropy as a physical drifting field.
AI & ML arxiv | Mar 31
ROVED reduces the expensive human feedback required for preference-based RL by up to 90% by leveraging vision-language embeddings and uncertainty filtering.
AI & ML arxiv | Mar 31
Introduces Heddle, a trajectory-centric system that resolves the long-tail latency bottleneck of tool calls in agentic Reinforcement Learning.
AI & ML arxiv | Mar 31
Replaces the classic Newton-Raphson power-flow solver with a differentiable GPU-accelerated simulation.
AI & ML arxiv | Mar 31
Introduces lightweight equilibration to the Muon optimizer, significantly stabilizing and accelerating LLM pretraining.
AI & ML arxiv | Mar 31
Enables instruction-following in low-resource languages by simply merging target language base models with English-instructed models.
AI & ML arxiv | Mar 31
An evolutionary framework for GPU kernel generation that outperforms frontier models like Claude 4.6 and Gemini 3.0.
AI & ML arxiv | Mar 31
HISA eliminates the quadratic O(L²) bottleneck in sparse attention indexers, enabling efficient long-context scaling for models like DeepSeek-V3.
AI & ML arxiv | Mar 31
IsoQuant leverages SO(4) isoclinic rotations to achieve a 4.5x-4.7x speedup in low-bit KV-cache quantization over existing methods.
AI & ML arxiv | Mar 31
INSID3 achieves state-of-the-art one-shot image segmentation using only frozen DINOv3 features without any training, fine-tuning, or auxiliary models.
AI & ML arxiv | Mar 31
EdgeDiT provides a hardware-aware blueprint for running massive Diffusion Transformers (DiT) on mobile NPUs with a 1.6x reduction in latency.
AI & ML arxiv | Mar 31
LAD achieves 3x lower latency than previous driving language models by generating textual reasoning and motion plans at up to 20 Hz.
AI & ML arxiv | Mar 31
Hydra unifies ColBERT-style retrieval and autoregressive generation into a single Vision-Language Model using a single LoRA adapter.
AI & ML arxiv | Mar 31
StreamingVLA eliminates execution halting in robots by asynchronously parallelizing observation, generation, and execution.
AI & ML arxiv | Mar 31
ResAdapt learns a per-frame visual budget allocator that optimizes input resolution before encoding.
AI & ML arxiv | Mar 31
RNNs can be trained online without Jacobian propagation, matching BPTT performance at 1000x less memory.
AI & ML arxiv | Mar 31
IF4 introduces an adaptive 4-bit data type that switches between Float and Integer representations to minimize quantization error.
AI & ML arxiv | Mar 31
Decouples data mixture ratio selection from continual pre-training by optimizing distribution vectors post-hoc with 15-35x lower compute cost.
AI & ML arxiv | Apr 1
Combines differentiable optimization with exact ILP solvers to achieve a 10x performance gain in solving NP-hard combinatorial scheduling problems.
AI & ML arxiv | Apr 1
A fabricated 16nm SoC that performs real-time 3D occupancy mapping under 6 mW, reducing query energy by over 80%.
AI & ML arxiv | Apr 1
Generates complete, simulatable analog circuits in milliseconds, outperforming search-based methods by over 600x.
AI & ML arxiv | Apr 1
Introduces PolarQuant, a quantization method that uses Hadamard rotation to make LLM weights near-lossless at 5-bit without calibration data.
AI & ML arxiv | Apr 1
Scales curvature-aware bilevel optimization to BERT-sized models using KFAC, significantly outperforming standard gradient unrolling.
AI & ML arxiv | Apr 1
Enables infinite-length video understanding on a single consumer GPU (RTX 3090) through a training-free visual memory mechanism.
AI & ML arxiv | Apr 1
Obtain epistemic and aleatoric uncertainty from a single forward-backward pass of an unmodified pretrained LLM.
AI & ML arxiv | Apr 1
A vector-wise sparse attention mechanism that accelerates long-context video inference by 2.6x with zero loss in accuracy.
AI & ML arxiv | Apr 1
A unified quantization and runtime framework for deploying multiple LoRA-adapted generative models on edge devices simultaneously.
AI & ML arxiv | Apr 1
A 1D continuous image tokenizer that uses semantic masking to achieve a 64x reduction in token usage without sacrificing generation fidelity.
AI & ML arxiv | Apr 1
A compiler approach to agent logs that reduces token consumption by 50-66% while improving context learning performance.
AI & ML arxiv | Apr 1
A stabilization mechanism for adapting LLMs to time-series tasks that reduces memory footprint by up to 1,776x.
AI & ML arxiv | Apr 1
Applies Shapley values from cooperative game theory to solve the 'free-rider' problem in GRPO-based reinforcement learning post-training.
AI & ML arxiv | Apr 1
Produces high-fidelity SHAP explanations for tabular data 1000x faster than traditional methods by integrating them directly into the model architecture.
AI & ML arxiv | Apr 1
Proposes a unified tensor-factorization view of attention that encompasses MHA, GQA, and MLA while reducing parameter counts by an order of magnitude.
AI & ML arxiv | Apr 1
Recovers short-text performance in context-extended LLMs using 60x less data than current state-of-the-art distillation methods.
AI & ML arxiv | Apr 2
Introduces entropy-guided adaptive decoding that gives small models reasoning performance comparable to frontier models at a fraction of the cost.
AI & ML arxiv | Apr 2
Proposes a 'no-backprop' stochastic process memory for edge agents that solves the retention-forgetting tradeoff with fixed compute.
AI & ML arxiv | Apr 2
MAC-Attention achieves 14x attention-phase speedups and reduces KV cache accesses by 99% for long-context LLMs by reusing computation from semantically similar queries.
AI & ML arxiv | Apr 2
A modified 110M parameter ColBERT model can identify fine-grained evidence spans as accurately as a 27B parameter LLM, but at a fraction of the cost.
AI & ML arxiv | Apr 2
A lightweight framework for triaging agentic trajectories post-deployment without the cost of human review or auxiliary LLM calls.
AI & ML arxiv | Apr 2
A cross-graph tuning-free prompting framework for GNNs that achieves massive gains on unseen graphs without retraining.
AI & ML arxiv | Apr 2
Self-Routing removes the need for learned routers in Mixture-of-Experts (MoE) by using hidden states directly for expert assignment.
AI & ML arxiv | Apr 2
Improves Qwen2.5-7B performance on AIME2024 by 137% through test-time iterative rethinking and majority-voted pseudo-labels.
AI & ML arxiv | Apr 2
Automates mathematical optimization modeling using reinforcement learning with solver-derived rewards instead of human process supervision.
AI & ML arxiv | Apr 2
Optimizes LLM inference scheduling by treating output length as a heavy-tailed distribution rather than a point estimate.
AI & ML arxiv | Apr 2
Introduces negative early exit and adaptive boosting to make Monte Carlo Tree Search (MCTS) practical for real-time LLM inference.
AI & ML arxiv | Apr 2
Achieves a major breakthrough in dataset distillation, reaching 60% accuracy on ImageNet-1K using only a handful of synthetic images.
AI & ML arxiv | Apr 2
Enables 'Elastic Inference' where a single trained model can be converted to multiple lower-precision formats on-the-fly without retraining.
AI & ML arxiv | Apr 2
Scales imitation learning data efficiency by generating synthetic 'multi-view' demonstrations from a single expert trajectory.
AI & ML arxiv | Apr 2
Proposes Physical Imitation Learning (PIL) to offload up to 87% of a control policy's mechanical power to passive robotic joints.
AI & ML arxiv | Apr 2
CircuitProbe identifies reasoning circuits in Transformers 1000x faster than brute-force methods and predicts the efficacy of layer duplication.
AI & ML arxiv | Apr 2
Spectral Compact Training (SCT) enables training 70B-parameter architectures on consumer hardware like the Steam Deck (8GB RAM) via permanent SVD factors.
AI & ML arxiv | Apr 2
This paper achieves O(1) complexity for multimillion-class classification by leveraging predefined vector systems in the latent space.
AI & ML arxiv | Apr 2
Molecular Memory allows MoE systems to recover previously learned domain expertise 9-11x faster by utilizing cost-penalized fitness metrics that preserve dormant experts.
AI & ML arxiv | Apr 2
OBD-LLM uses second-order Hessian information to achieve 20-40% better low-rank decomposition accuracy than the current state-of-the-art SVD-LLM.
AI & ML arxiv | Apr 2
PixelPrune identifies and removes pixel-level redundancy before the Vision Transformer encoder, delivering up to 4.2x inference speedup for high-resolution VLM tasks.
AI & ML arxiv | Apr 2
EmbedPart achieves a 100x speedup over Metis for graph partitioning by clustering node embeddings rather than operating on raw graph structures.
AI & ML arxiv | Apr 2
A lightweight probing method predicts LLM downstream task performance from internal representations during training, reducing evaluation latency from one hour to three minutes.
AI & ML arxiv | Apr 2
Canonical Correlation Analysis (CCA) can reduce image representation dimensionality by 75% while actually improving downstream performance through cross-model agreement.
AI & ML arxiv | Apr 2
Decouples weather forecasting from spatial resolution by using Flow Matching to super-resolve coarse trajectories as a post-processing step.
AI & ML arxiv | Apr 2
Introduces S0 tuning for hybrid RNN-attention models, outperforming LoRA by 10.8% with zero inference overhead.
AI & ML arxiv | Apr 2
Reduces the compute cost of LLM test-time scaling by up to 67% using conformal prediction to calibrate reasoning paths.
AI & ML arxiv | Apr 2
Combines the YOCO architecture with recursive computation to scale representational depth without inflating the KV cache.
AI & ML arxiv | Apr 2
Solves the long-standing trade-off in low-rank matrix recovery by achieving both optimal sample complexity and fast convergence.
AI & ML arxiv | Apr 2
Enables Gaussian Processes to scale on modern parallel hardware by removing the need for Cholesky decompositions.
AI & ML arxiv | Apr 2