Introduces the first billion-scale SAR vision foundation model and a massive unified benchmark for all-weather geospatial semantic segmentation.
Open Release arxiv | Mar 13
Demonstrates that simply using XML tags during translation outperforms complex pipelines for cross-lingual label projection while actually improving translation quality.
Breaks Assumption arxiv | Mar 13
Achieves up to 14.4x higher decoding throughput in long-context LLMs via a training-free framework that reuses sparse memory at semantic boundaries.
Efficiency Breakthrough arxiv | Mar 13
Enables multimodal agents to continually improve from experience and skills without any parameter updates through a dual-stream visual grounding framework.
New Capability arxiv | Mar 13
A 3D vision-language pipeline that grounds medical diagnosis in longitudinal brain MRI via regional volumetric assessments to eliminate VLM hallucinations.
New Capability arxiv | Mar 13
Integrates Neural ODEs with NeRFs to enable continuous-time scene dynamics that can extrapolate far beyond the original training sequence.
New Capability arxiv | Mar 13
Proposes a unified image tokenizer that reconciles the conflicting requirements of visual understanding and generation using a residual evolution process.
Paradigm Shift arxiv | Mar 13
Identifies and solves the 'information self-locking' failure mode where RL-trained agents stop asking informative questions in active reasoning tasks.
Breaks Assumption arxiv | Mar 13
A specialized distributed serving system for 'Any-to-Any' multimodal models that achieves 5.79x lower tail latency via component disaggregation.
Efficiency Breakthrough arxiv | Mar 13
Shows that LLM self-correction fails primarily due to 'session context' and can be significantly improved by moving the review to a fresh, independent session.
Breaks Assumption arxiv | Mar 13
Automates the generation of GPU-parallelized RL environments from text/code specifications, achieving up to 22,000x speedups for less than $10.
Efficiency Breakthrough arxiv | Mar 13
Establishes scaling laws for sampling compute in LLM Reinforcement Learning, providing a playbook for optimal parallel rollout and batch allocation.
Scaling Insight arxiv | Mar 13
Selects high-quality synthetic code data using 'Reverse Mutual Information' to achieve full-dataset performance with 75% less data.
Efficiency Breakthrough arxiv | Mar 13
Accelerates sparse attention by 75% by reusing lightning indexer decisions across layers, tackling the hidden bottleneck in production-grade LLMs.
Efficiency Breakthrough arxiv | Mar 13
Discovers that task-specific experts are so dense around pretrained weights that random parameter perturbations can compete with complex RL methods like PPO.
Breaks Assumption arxiv | Mar 13
Reveals that 'Reasoning LLMs-as-Judges' can lead to policies that generate highly effective adversarial outputs to deceive other judges and inflate benchmarks.
Breaks Assumption arxiv | Mar 13
Introduces a feature-matching objective for LLM fine-tuning that targets sequence-level statistics without requiring reward models or ground-truth verifiers.
Paradigm Shift arxiv | Mar 13
Integrates Chain-of-Thought reasoning directly into the Diffusion Transformer denoising process to solve complex spatial and logical tasks.
New Capability arxiv | Mar 13
Reduces visual tokens by up to 100x using an autoregressive gazing module, enabling 19x faster 4K/1000-frame video understanding.
Efficiency Breakthrough arxiv | Mar 13
Uncovers an emergent Hue-Saturation-Lightness (HSL) subspace in FLUX.1's VAE latent space, allowing for precise, training-free color control.
Breaks Assumption arxiv | Mar 13
Enables VideoLLMs to perform complex logical reasoning simultaneously with video playback without incurring the latency of standard test-time scaling.
New Capability arxiv | Mar 13
An open foundation model for humanoid robots that achieves high performance using only 30 hours of real-world robot data by pre-training on egocentric human videos.
Open Release arxiv | Mar 13
A unified streaming visual backbone that performs perception, 3D reconstruction, and robotic action simultaneously from a continuous video stream.
New Capability arxiv | Mar 13
Introduces adaptive video tokenization that allocates tokens based on scene complexity, reducing token usage by 24% while improving reconstruction quality.
Efficiency Breakthrough arxiv | Mar 13
Demonstrates that the stochasticity in standard regularized model training (like cross-validation) can serve as a 'free' and effective exploration strategy for contextual bandits.
Paradigm Shift arxiv | Mar 13