SeriesFusion
Science, curated & edited by AI

Scaling Insight

101 papers  ·  Page 1 of 2

What changes when you scale a system up or down. Laws, regimes, and surprises that only appear at larger or smaller orders of magnitude.

Scaling Insight  /  Category lead

Neural collapse is triggered by a predictable 'feature-norm threshold' (fn*) that is invariant to training conditions, serving as a new diagnostic for training progress.

This identifies a concrete, actionable metric to predict exactly when representational reorganization occurs in deep networks. It allows practitioners to monitor training dynamics beyond loss curves, identifying the specific point where a model transitions from noise to structured feature learning.

AI
Gradient-based data valuation (TracIn) outperforms all human-crafted metadata heuristics for ordering curriculum learning in motion planners.
Apr 2
AI
Demonstrates that LLM judge panels follow power-law discovery curves, where panel size and persona diversity are critical for uncovering edge-case failures.
Apr 2
AI
Establishes a three-dimensional scaling law for RAG-pretraining, modeling the optimal data budget allocation between model parameters, tokens, and retrieval store size.
Apr 2
AI
Simple Self-Distillation (SSD) improves LLM code generation (e.g., Qwen3-30B) by 13% Pass@1 without any external verifiers or teacher models.
Apr 2
AI
Identifies a 'dual-capability bottleneck' where low-rated training data is essential for state tracking while high-rated data is needed for decision quality.
Apr 1
AI
Provides a computationally efficient 'early warning' system for emergent capabilities like grokking and induction head formation using 2-datapoint reduced density matrices.
Apr 1
AI
Identifies 'label leakage' from limited task diversity as the primary bottleneck for relational foundation models, rather than raw data volume.
Apr 1
AI
Discovers that video diffusion models commit to high-level plans in the first few denoising steps, enabling a new inference-time scaling technique called ChEaP.
Apr 1
AI
Scales multi-agent path finding to 1000 agents with near-linear runtime by decoupling geometric planning from execution-time conflict resolution.
Mar 31
AI
Synthetic multi-view generation breaks the performance ceiling of single-view robotic datasets.
Mar 31
AI
Formalizes the 'Observability Gap' to explain why coding agents plateau: humans can only provide feedback on visible outputs, while bugs reside in invisible execution states.
Mar 31
AI
Provides a high-dimensional theoretical foundation for why two-phase optimizers like DiLoCo are mathematically superior to standard SGD in specific noise regimes.
Mar 31
AI
Shows that standard task-completion benchmarks fail to distinguish agent capabilities and proposes 'Working Memory Fidelity' as a more predictive metric.
Mar 31
AI
Mathematical proof that LayerNorm structurally reduces model complexity compared to RMSNorm due to its mean-centering geometry.
Mar 31
AI
Provides empirical evidence and a mechanistic explanation for why LoRA drastically reduces catastrophic forgetting in sequential fine-tuning compared to full fine-tuning.
Mar 31
AI
A controlled study proving that the temporal organization (curriculum) of multimodal data is a first-order variable in balancing reasoning vs. OCR capabilities.
Mar 31
AI
The eigenvalue tail index of a neural network's weight matrices serves as a near-perfect (R^2 = 0.984) diagnostic for label noise in the training data.
Mar 31
AI
Discovers that LLM hidden states undergo geometric 'warping' at digit-count boundaries, mimicking human psychological perception.
Mar 31
AI
This paper establishes the formal information-theoretic limits and conditions under which self-improving AI systems can be safely verified.
Mar 31
AI
HyperP provides the first hyperparameter transfer laws for hypersphere optimization, ensuring stable scaling for models using the Muon optimizer.
Mar 31
AI
Uses the Minimum Description Length principle to predict exactly when neural networks will transition from simple 'spurious' shortcuts to complex features.
Mar 30
AI
A billion-scale time-series benchmark that identifies a 'context-length crossover' where foundation models start to crush deep learning baselines.
Mar 30
AI
Challenges the assumption that 'background' pixels are useless in GUI agents and identifies a 'recency effect' for optimal token pruning.
Mar 30
AI
An 800 Hz data glove reveals that human hand dexterity contains critical high-frequency motion energy (>100 Hz) previously invisible to standard sensors.
Mar 30
AI
Provides the first sharp theoretical characterization of why spectral optimizers like Muon drastically outperform SGD in storage capacity and scaling for language models.
Mar 30
AI
Proves that causal representation learning is possible with far fewer environments and unknown intervention targets than previously assumed.
Mar 30
AI
Reveals that synthetic rewriting is a quality multiplier for high-grade data, but fails to fix low-quality source data.
Mar 27
AI
A systematic study reveals that grokking is not an architectural property of Transformers but an interaction between weight decay and optimization stability.
Mar 27
AI
MSRL scales multimodal reward modeling by transferring reasoning capabilities from text to vision-language tasks without requiring new multimodal preference data.
Mar 27
AI
Synthetic Mixed Training allows an 8B model to finally outperform RAG on long-document comprehension by combining synthetic QAs with rewritten documents.
Mar 26
AI
Newer LLM architectures like MoE and SSMs are making 'early-exit' decoding significantly less effective than in previous generations.
Mar 26
AI
Diffusion models can be proven to generalize by capturing manifold geometry long before they achieve density estimation or memorization.
Mar 26
AI
Provides a systematic blueprint for scaling Reinforcement Learning (RL) in LLMs using multi-turn synthetic data generation and difficulty-based curricula.
Mar 26
AI
Identifies a 'critical threshold' in human-AI symbiosis beyond which human capability collapses abruptly and irreversibly due to over-delegation.
Mar 26
AI
hidden states in LLMs occupy a Riemannian submanifold where tokens are Voronoi regions, revealing a universal 'hourglass' intrinsic dimension profile across all tested models.
Mar 25
AI
The standard 'Chinchilla Approach 2' for fitting scaling laws is systematically biased, potentially leading to millions of dollars in wasted compute at frontier scales.
Mar 25
AI
Reveals that RLVR-driven reasoning improvements in LLMs are the result of highly sparse changes to a tiny fraction of 'critical' token distributions.
Mar 25
AI
Robotic bipedal mass scales with the square of leg length rather than the cubic scaling found in biological systems.
Mar 25
AI
A quantitative model that predicts the performance gain of merging independent LLM specialists before committing compute.
Mar 25
AI
Identifies the 'Caterpillar Tree' as the theoretically optimal structure for test-time computation and backtracking in LLMs.
Mar 25
AI
Persistent structural memory in neural networks is fundamentally limited by the instability of jointly-learned coordinate systems.
Mar 25
AI
Theoretical analysis reveals that the efficiency benefits of low-dimensional data structures for diffusion models diminish significantly when the data manifold is non-linear.
Mar 25
AI
Access to conversational memory allows an 8B model to outperform a 235B model on user-specific queries while reducing inference costs by 96%.
Mar 25
AI
Researchers identify a 'selection bottleneck' that mathematically determines when diverse agent teams outperform homogeneous self-consistency teams.
Mar 24
AI
This work formalizes why 'human' mathematics is distinct from the space of all valid deductions using information-theoretic compression measurements on MathLib.
Mar 24
AI
Discovers that language-centric training in Multimodal LLMs actively degrades their internal visual representation quality.
Mar 24
AI
Identifies that in-context reasoning over pretraining knowledge only emerges after specific types of fine-tuning, not from pretraining alone.
Mar 24
AI
Sensitivity to compression in Transformers spans five orders of magnitude, with early-layer MLP up-projections identified as catastrophic failure points.
Mar 24
AI
Context-aware Visual Fine-tuning (CoVFT) allows a 7B MLLM to outperform its 13B counterpart by resolving optimization conflicts in vision encoders.
Mar 24
AI
Introduces 'Mixture of Chapters' to scale Transformer memory to 262K tokens without the quadratic cost of standard attention.
Mar 24