SeriesFusion
Science, curated & edited by AI

Efficiency Breakthrough

375 papers  ·  Page 8 of 8
AI
Enables RMSNorm to reuse MXFP8 block scales, reducing the reduction operation size by 32x with a 2.4x kernel speedup.
Mar 16
AI
The first open recipe for training embodied intelligence at the 1,000-GPU scale, achieving a 40x speedup in training cycles for GR00T models.
Mar 13
AI
REOPOLD achieves 10x better sample efficiency in reasoning distillation, enabling 7B models to match 32B teachers with significantly less training data.
Mar 13
AI
PACED introduces a weight kernel that focuses distillation on the 'Zone of Proximal Development,' where the student's gradient signal-to-noise ratio is highest.
Mar 13
AI
InstantHDR achieves high-quality 3D HDR reconstruction 700x faster than current optimization-based methods.
Mar 13
AI
TimeSqueeze achieves 20x faster convergence and 8x higher data efficiency for time-series foundation models by using dynamic, content-aware patching.
Mar 13
AI
DART enables real-time multi-class detection for open-vocabulary models like SAM3, achieving up to 25x speedup without any weight modifications.
Mar 13
AI
LongFlow provides an 11x throughput boost for reasoning models by specifically optimizing KV cache for long-output (vs long-input) scenarios.
Mar 13
AI
Mobile-GS achieves real-time Gaussian Splatting on mobile devices by replacing the sorting-based alpha-blending bottleneck with depth-aware order-independent rendering.
Mar 13
AI
Achieves 99.5% performance on Needle-In-A-Haystack benchmarks while retaining only 3% of the KV cache budget.
Mar 13
AI
Distills high-fidelity joint audio-visual generation into a real-time streaming model capable of 25 FPS on a single GPU.
Mar 13
AI
Achieves hour-scale real-time human animation by solving the unbounded memory growth and inconsistent noise states in autoregressive diffusion.
Mar 13
AI
Unifies leading membership inference attacks into a single framework and uses Bayesian variance inference to enable privacy auditing with 10x less compute.
Mar 13
AI
Recovers hidden ODE parameters from sparse data with a 487x speedup over gradient-based methods.
Mar 13
AI
Eliminates the 2.5x latency penalty of dynamic adapters in LLMs via pre-gating and fused CUDA kernels.
Mar 13
AI
Fits promptable visual segmentation (SAM) into a 1.3M parameter model for real-time in-sensor execution.
Mar 13
AI
Achieves high-fidelity one-step (1 NFE) 3D robotic manipulation using training-time drifting fields.
Mar 13
AI
Achieves up to 14.4x higher decoding throughput in long-context LLMs via a training-free framework that reuses sparse memory at semantic boundaries.
Mar 13
AI
A specialized distributed serving system for 'Any-to-Any' multimodal models that achieves 5.79x lower tail latency via component disaggregation.
Mar 13
AI
Automates the generation of GPU-parallelized RL environments from text/code specifications, achieving up to 22,000x speedups for less than $10.
Mar 13
AI
Selects high-quality synthetic code data using 'Reverse Mutual Information' to achieve full-dataset performance with 75% less data.
Mar 13
AI
Accelerates sparse attention by 75% by reusing lightning indexer decisions across layers, tackling the hidden bottleneck in production-grade LLMs.
Mar 13
AI
Reduces visual tokens by up to 100x using an autoregressive gazing module, enabling 19x faster 4K/1000-frame video understanding.
Mar 13
AI
Introduces adaptive video tokenization that allocates tokens based on scene complexity, reducing token usage by 24% while improving reconstruction quality.
Mar 13